MAT 211 CourseGuide_Lecture Notes_Summer 2015 (2)

7/17/2019 MAT 211 CourseGuide_Lecture Notes_Summer 2015 (2)

http://slidepdf.com/reader/full/mat-211-courseguidelecture-notessummer-2015-2 1/79

AT

choolIn

211

epartof Engepend

Cour

rob

Sum

Of

ent oineerint Un

1

se G

bilit

er 2

ered b

Physig andiversit

Dr. Shi

ide

and

015

cal ScComp, Ban

ra Bani

Dr. A

M

Ms.

Stati

encester S

glades

ourse

, Associ

Dr.

Dr. Md.M Shah

. PromaMs. Ru Zainab

tics

ience

oordina

ate Profe

Instruct

hipra B

Hanif Madat Hos

Anwerana Hosutfun N

tor:

ssor

rs: nik

radsain

hansainhar



2

Dr. Shipra BanikAssociate Professor

Department of Physical SciencesSchool of Engineering and Computer Science

e-mail: [email protected]

Office: Rm. 6004-A, SECSOffice hours: ST: 11:30 a.m.-13:00 p.mMW: 11:30 a.m – 13:00 p.m or by appointment

Pre-requisite: MAT 101 or equivalent. Instructional Format p/w: 2×1½ -hours lectures

Course objectives

An understanding of statistics is required in the implementation of uncertainty calculations in differentfields. It is understandable clearly by anyone, today information is everywhere and one will be bombarded with the numerical information. What is needed then? Skills are needed today to deal with allof numerical information. First, need to be critical consumers of information presented by others and

second, need to be able to reduce large amounts of data into a meaningful form so that one can makeeffective interpretations, judgments and decisions. The course ‘MAT 211 Probability and Statistics’ is animportant foundation course offered by IUB and suited for all undergraduate students who wish to majorunder the non-SECS, IUB. It covers all the usual topics in statistics and explains how theories can applyto solve real world problems. Topics include: Elementary Descriptive Techniques, Probability Theorywith Important Probability Distributions, Sampling Theory, Statistical Inference, Linear Correlation andRegression Theories and others. By the end of the course, students should have acquired sufficient skillsto be able to: follow statistical arguments in reports and presentations; understand how to apply thestatistical tools to make effective decisions and find that many of the topics and methods students learncan be used in other courses in their future education; finally, express statistical findings in non-technicallanguage.

Textbook: All students should collect:Anderson D.R., Sweeney, D.J. and Thomas A.W. (2011), Statistics for Business and Economics (11thEdition), South-Western, A Division of Thomson Learning.

Recommended ReferenceMurray R. Spigel and Larry J. Stephens (2008), Schaum’s Outline of Theory and Problems of Statistics(Fourth edition), Schaum’s Outline Series, McGraw-Hill.

Evaluation criteriaHomework will be assigned weekly. Students are not required to hand those back for grading butcompleting the given homework is essential for understanding the material and performing satisfactorilyon examinations.

The weighting scheme is as follows:Class Attendance – 5%, Two Class tests (CT) – 35%(20% + 15%), Mid-term test (MT) - 20% and Finaltest (FT) - 40%



3

Rules and regulations Students are required to attend classes on time and to take well-organized notes. If a student misses or fails to attend a class, it is his/her sole responsibility to obtain missing

information (for examples, change of exam dates, omit/add some topics, lecture notes, new homeworks etc).

For a test, no extensions or alternative times are possible and also there is no word for make-up.

For any unavoidable circumstances, the test will be strictly held on the next lecture.

No extra work will be given to improve the grade. Students are required to show matured behaviour in class. For examples, cellular phones will be

shut off during class lectures and examinations. Eating, drinking, chewing gum, readingnewspapers, socialization and sleeping are not permitted in class.

Any kinds of cheating in class are strictly prohibited and may result in a failing grade forthe course.

Students are advised to obtain a scientific calculator for use in the class. It is noticeable that thetwo variables calculator is needed for all types of calculations.

Grading scalesAbove 85%: A, 81%-85%: A-, 76%-80%: B+, 71%-75%: B, 66%-70%: B-, 61%-65%: C+, 56%-60%: C,

51%-55%: C-, 46%-50%: D+, 40%-45%: D, below 40%: F

Incomplete (I-Grade)I-grade will be given only to a student who has completed the bulk of the course works and is unable tocomplete the course due to a serious disruption not caused by the student’s own slackness.

Mid-term and Final Test: All sections will have a common examination. Materials and date will beannounced later.

Course Plan

Lecture # Topics Text/Reference

Lecture 1 Introduction: Definition: variable, scales ofmeasurement, raw data, qualitative data,quantitative data, cross-sectional data, time seriesdata, census survey, sample survey, target population, random sample, computer andstatistical packages

Course Guide, pp.7-8

HW: Text, Ex: 2,4,6,9-13, pp.21-23

Lecture 2 Summarizing qualitative data- Frequencydistribution, relative frequency distribution, barchart, pie chartApplications from real data

Summarizing quantitative data- Frequencydistribution, relative frequency distribution,cumulative frequency distribution, Applicationsfrom real data

Course Guide, pp.9-11HW: Text, Ex:4-10, pp.36-39

HW: Text, Ex:15-21, pp.46-48

Ex: 39, 41and 42, pp.65-67

Lecture 3 Histogram, ogive, line chart, stem and leaf displayApplications from real data


HW: TextEx:15-21, pp.46-48Ex:25-28, pp.52-53



4

Summarizing bi-variate data: Cross-tabulation,scatter diagram, Applications from real data

HW: Text, Ex:31, 33-36, pp.60-61

Lecture 4 Measures of average: simple mean, percentiles(median, quartiles), modeApplications from real data


HW: Text, Ex: 5-10, pp.92-94

Lecture 5 Measures of variability: variance, standarddeviation, coefficient of variation, detectingoutliers (five number summary), Applicationsfrom real data


HW: Text, Ex: 16-24, pp.100-102Ex: 40-41, pp.112-113

Lecture 6 Review Lecture 1 - Lecture 5

Lecture 7 Class Test 1(20%) TopicLecture 1-Lecture 5

Lecture 8 Working with grouped data, weighted mean,skewness, kurtosis, case study


HW: Text, Ex: 54-57, pp.128-129

Text: Case problems 1, 2, 3, 4, pp.137-141

Lecture 9 Probability Theory:

Random experiment, random variable, samplespace, events, counting rules, tree diagram,

probability defined on events


HW: Text, Ex: 1-9, pp.158-159Ex: 14-21, pp.162-164

Lecture 10 Basic relationships of probability: addition law,complement law, conditional law, multiplicationlaw

Course Guide, pp.34-36HW: TextEx: 22-27, pp.169-170Ex: 32-35, pp.176-177

Lecture 11 Review Lecture 8 - Lecture 10

Lecture 12 Mid-term test (20%) Topic:

Lecture 8-Lecture 10

Lecture 13 Normal Distribution Course Guide, pp.38-40Ex: 10-25, pp.248-250

Lecture 14 Lecture 13 continued HW: Ex: 10-25, pp.248-250



5

Lecture 15 Class test 2(15%) Topic:Lecture 13- Lecture 14

Lecture 16 Target population, random sample, table ofrandom numbers, simple random sampling, pointestimates (sample mean and sample SD)

Course Guide, pp.42-44HW: Text, Ex: 3-8, pp.272-273

Lecture 17 Interval estimation: Parameter, statistic, margin oferror (ME), statistical tables (z-table, t-table, chi-square table, F-table), confidence interval of population mean, confidence interval of population SD, Applications from real data


Lecture 18 Interval estimations about two population means,standards deviations

Applications from real data

Text, Chapter 10

Lecture 19 Test of hypothesis

Concept of hypothesis, null hypothesis,alternative hypothesis, one-tail tests, two-tail test,tests of population mean (large samples test, smallsamples test), test of population SD



Lecture 20 Lecture 19 continued Course Guide, pp.55-67

Lecture 21Test of hypothesis

Tests of two populations means, two standarddeviationsApplications from real data

Course Guide, pp.68-69Text, Chapter 11

Lecture 22 Correlation analysisConcepts of covariance and correlation(Numerical measures of bi-variate data),

Regression analysisLinear and multiple regression model, prediction,coefficient of determination



Ex: 47-51, pp.122-124

Lecture 23 Lecture 22 continued HW: Text, Ex: 4-14, 18-21, pp.570-582

Lecture 24 Review of Final Test (40%) Topics and date will beannounced later



6

MAT211 Lecture Notes

Summer 2015



7

Lecture-1Chapter-1: Introduction

Important Definition:

Data, elements, variable, observations, raw data, qualitative data, quantitative data, scales of measurement

population, random sample, census, sample survey, cross-sectional data, time series data, Computer andstatistical analysis, glossary.

Textbook: Anderson D.R., Sweeney, D.J. and Thomas A.W. (2011), Statistics for Business andEconomics (11th Edition), South-Western, A Division of Thomson Learning.

Data (or Variable) - Changing characteristics.Examples: Gender, Grade, Family size , Score, Age, and many others.

Gender, Grade- Qualitative data (letter)

Family size, Score, Age - Quantitative data (numeric value)

Family size – Whole number – Discrete data

Score, Age – Continuous data

Note: ID #, cell # are qualitative data

Observations- Data size

Variable denoted by X, Y, Z or denoted by first letter (e.g. Score – S, Age –A)

Elements – Variable (X), elements x1, x2, …., xn

Raw data – Data collected by survey, census etc. It is known as ungrouped data.

Note: Always we have raw data. We have to process or make data summary by various statisticaltechniques (we will learn all by Chapters 2-3).

Scales of measurement

Before analysis, scale of each of selected variables have to define. Specially, when we do our analysis bystatistical packages (e.g. SPSS, Minitab, Strata even in Excel also). We have to assign scale for each ofvariables those involve in our analysis.

There are four kinds of scale: nominal, ordinal, interval and ratio

Nominal, ordinal - Qualitative data

Nominal scale – The variables like Name, ID, Address, Cell # declare this scale. Not possible to doanalysis.Ordinal scale – Qualitative data like test performances (excellent, good, poor etc), quality of food (goodor bad) etc. possible to order. Some analysis is possible.



8

Interval and ratio - Quantitative data

Interval scale: Shows properties of ordinal data and interval between values are meaningful. ExampleScore for 5 students. Apply ordinal concept and differences of each of two students is meaningful.

Ratio scale – Have properties of Interval data. In addition ratio of the data values are meaningful. .

Example Score for 5 students. Apply interval concept and ratio of each of two students score ismeaningful.

Details see Textbook, p.6

Target population: The set of all elements in a particular study.

Random sample – A subset of target population. Set to set will vary for each of draws

Census - Method to collect data about target population .

Sample survey- Method to collect data about random sample.

For the purpose of statistical analysis, distinguishing time series data and cross-sectional data aremeaningful.

Time series data – Data collected over several time periods. For example, Exchange rate, interest rate,gross national product (GNP), grosses domestic product (GDP) and many others. These sorts of data w.r.ttime are meaningful.

Cross-sectional data – Data collected at same time. For example, company’s profit, students profile wecollect at the same time.

Note that in this course most of the data will be considered as cross-sectional data.

Computer and Statistical packages

Because statistical analysis generally involves large amount of data. That’s why analysis frequently usescomputer software for this work. Several very useful softwares are available in computing literature.These are: SPSS, Minitab, Matlab, Excel, Stara and many others.

HW: Text

Ex: 2,4,6,9-13, pp.21-23



9

Lecture 2Chapter-2: Summary of raw data

You will get an idea about the following:

Aim of presentation of raw data; Tabular form of raw data (e.g. Summarizing qualitative and quantitative

Data).

The aim of presentation of raw data is to make a large and complicated set of raw data into a morecompact and meaningful form. Usually, one can summarize the raw data by

(a) The tabular form

(b) The graphical form and

(c) Finally numerically such as measures of central tendency, measures of dispersion and others.

Under the tabular and the graphical form, we will learn frequency distribution (grouping data), bar graphs,

histograms, stem-leaf display method and others.

Presentation of data can be found in annual reports newspaper articles and research studies. Everyone isexposed to those types of presentations. Hence, it is important to understand how they are prepared andhow they should be interpreted.

As indicated in the Lecture 1, data can be classified as either qualitative or quantitative.

The plan of this lecture is to introduce the tabular methods, which are commonly used to summarize boththe qualitative and the quantitative data.

Summarizing qualitative data

Recall raw data and find the following data:

Table 1: Test Performances of MAT 211

Make a tabular and graphical summary of the above data.

Solution: Define T - Test performances and n =15. It is a qualitative data.

Good Good Excellent

Excellent Poor Excellent

Poor Excellent Good

Excellent Excellent Poor

Poor Good Good



Where rel Graphical

Bar Cha

Data Su

observed

Summar

Now obs

We knowqualitativ Solution:

Need to fi

classes K

0

2

4

6

8

E

T

Excelle

Good

Poor

ative freque

summary: B

t

mary: Our26%.

zing quanti

rve the follo

very well the data. Follo

Define T- Te

nd lowest an

=5. Thus, w

cellent

Tallymarks

t |||| |

||||

||||

cy (rf i )= f i /

ar or Pie cha

nalysis sho

ative data

ing data

se data are q the followi

st Score and

d highest val

find size of

ood

Tab

Frequencyi=1,2,3

6

5

4otal n =15

and percen

t

s that test pe

Table 2: Te

uantitative dg:

n =15

es of the giv

he class c= (

oor

90

87

56

67

95

10

lar summar

(# of studen

frequency (

rformances o

st score of

ta. Processin

en raw data

H-L)/K = 9.8

88 78

69 93

78 57

85 46

59 89

y

s) –f i, , ReFr

0.

0.

0.

f i) = rf ix10

Pie Chart

bserved exce

AT 101

g of these ki

et. Here L =

≈10.

Good

33%

Poor

27%

lative(percenquency – rf i0(40%)

3(33%)

6(26%)

llent 40%, g

ds data little

46 and H= 9

Excellent40%

t )(pf i)

od 33% and

bit differs fr

. Assume #

poor

m

f



11

Tabular Summary

T Tallymarks

Frequency (# of students) –f i, ,i=1,2,3

Relative(percent )Frequency – rf i (pf i)

Cumulativefrequency (Fi)

46-56 || 2 0.13(13%) 2

56-66 || 2 0.13(13%) 466-76 || 2 0.13(13%) 6

76-86 ||| 3 0.20(20%) 9

86-96 |||| | 6 0.40(40%) 15

Total n =15

HW: Text

Ex:4-10, pp.36-39

Ex:15-21, pp.46-48

Ex: 39, 41,42, pp.65-67



12

Lecture 3Summarizing Raw Data Continued

Graphical summary: Histogram, Ogive

Recall Lecture 2, Table data. We need a frequency table for the above two shapes

Histogram Ogive

Data Summary

Our analysis shows that there are 6 students score observed 86 to 96 and only 1 student score observed 46to 56 and so on.

9 students score observed less than 86, 6 students score observed less than 76 and so on.

Other Graphical summaries: stem and leaf display, line chart

Line chart - Time plots of the stock indices (We need a time series data)

0

1

2

3

4

5

6

7

46‐56 56‐66 66‐76 76‐86 86‐96

0

2

4

6

8

10

12

14

16

0 20 40 60 80 100 120



13

Stem and leaf display

Stem Leaf (Unit=1.0)

4 6

5 6 7 9

6 7 9

7 8 8

8 5 7 8 99 0 3 5

Total n=15

Summary: There are 4 students whose scores are ranging 85 to 89 and so on.

HW: Text

Ex:15-21, pp.46-48

Ex:25-28, pp.52-53

Chapter-2: Summarizing bi-variate data: Cross-tabulation, scatter diagram.

Summarizing bi-variate data

So far we have focused on tabular and graphical methods for one variable at a time. Often we need tabularand graphical summaries for two variables at a time.

Tabular Method-Cross-tabulation and Graphical method- scatter diagram are such two methods tomake decision from two qualitative and/or quantitative variables.

Tabular Method-Cross-tabulation:

Problem-1:Consider the following two variables: Quality rating and meal price($) for 10 restaurants. Data are asfollows:Quality rating: good, very good, good, excellent, very good, good, very good, very good, very good,goodMeal price($): 18,22,28,38,33,28,19,11,23,13.Make a tabular summary (or cross-table and make a data summary).



14

Solution: Define X - Quality rating and Y - Meal price. Here n =10

Table: Crosstabulation of X and Y for 10 restaurants

Y

X 10-20 20-30 30-40 Total

Good || (2) || (2) (0) 4Very good || (2) || (2) |(1) 5

Excellent (0) (0) |(1) 1

Total 4 4 2 n=10

Data summary:We see that there are 2 restaurants their quality of food is very good and meal prices are ranging 20$ to30$, 1 restaurant quality of food is excellent, 4 restaurants meal prices are ranging 10$ to 20$ and so on.

Graphical method-scatter diagramScatter diagram provide the following information about the relationship between two variables.

• strength• shape – linear, curved etc.

• Direction – positive or negative

• Presence of outliers

Problem -2: Now consider the following two variables: # of commercials and total sales for 5 sound equipment stores.Data are as follows:

# of commercials: 2, 5, 1, 3, 4 and total sales: 50, 57, 41, 54, 54

Data summary: There is a positiverelationship exists between # ofcommercials and total sales for 5sound equipment stores.

Figure: Scatter diagram of Sales and # of commercials for 5 sound equipment stores

HW: TextEx: 31, 33-36, pp.60-61

0

10

20

30

40

50

60

0 1 2 3 4 5 6

S a l e s

Comm



15

Lecture 4Chapter 3: Summarizing Raw Data (Numerical measures)

We will learn several numerical measures that provide a data summary using numeric formulas.

Now we will learn the following:

(1) Measures of average: simple mean, weighted mean, median, mode, quartiles, percentiles

(2) Measures of variation: Range, inter-quartile range, variance, standard deviation

(3) Measures of skewness: symmetry, positive skewness, negative skewness

(4) Measures of Kurtosis: leptokurtic, platykurtic and mesokurtic

Measures of average: simple mean, weighted mean, median, mode, quartiles, percentiles

Definition of average: It is a single central value that represents the whole set of data. Differentmeasures of averages are: simple mean, weighted mean, median, mode, quartiles, percentiles.

We will learn the above measures for the raw data and grouped data.

Mean: Denoted by and calculated by ∑ / .

For example, for a set of monthly starting salaries of 5 graduates: 3450, 3550, 3550, 3480, 3355.

Define X - monthly starting salaries of 5 graduates. Here ∑ / = 3477.

Median, Percentiles, Quartiles

It is denoted by pi , i =1, 2, …, 99 that means there are 99 percentiles.50th percentile is known as median and it is denoted by p50.

25th percentile is known as first quartile and it is denoted by p25.

75th percentile is known as 3rd quartile and it is denoted by p75. p50 is also known as 2nd quartile (Q2).Thus, there are 3 quartiles: These are p25 (Q1), p50 (Q2) and p75(Q3).

Calculation of percentiles: Need to sort the data

3355 3450 3480 3550 3550

For Q2: i = (pn)/100 = (50*5)/100= 2.50. The next integer 3. Thus, Q2 is 3480.

For Q1: i = (pn)/100 =(25*5)/100 = 1.25. The next integer 2. Thus, Q1 is 3450.




16

Now consider the following data: 3450, 3550, 3550, 3480, 3355, 3490

Here ∑ / = 3.4792e+003 = 3479.2

Sort the data to calculate percentiles: 3355 3450 3480 3490 3550 3550

For Q2: i = (pn)/100 = (50*6)/100= 3. It is an average value of 3rd and 4th observations of the sorted data.Thus, Q2 = (3480+3490)/2 = 3485.



Mode: It is the value that occurs with greatest frequency. Denoted by M0.

Consider the following observations

(1)

3450, 3550, 3550, 3480, 3355 - M0 is 3550.

(2) 3450, 3550, 3550, 3480, 3450 - M0 are 3450 and 3550.

(3) 3450, 3550, 3550, 3450, 3450 - M0 is 3450

(4)

3450, 3650, 3550, 3480, 3355 – no Mode.

Data Summary:

Mean = 3477 it means that most of graduates monthly starting salaries is about 3477$.

Median = 3485 it means that 50% graduates monthly starting salaries are observed below 3485$ and theremaining (50%) graduates monthly starting salaries are observed over 3485$.

First quartile = 3450 it means that 25% graduates monthly starting salaries are observed below 3450$ andthe remaining (75%) graduates monthly starting salaries are observed over 3450$.

Third quartile = 3550 it means that 75% graduates monthly starting salaries are observed below 3550$and the remaining (25%) graduates monthly starting salaries are observed over 3550$.

Mode = 3450 it means that the most common graduates monthly starting salaries is 3450$.

HW: Text

Ex: 5-10, pp.92-94



17

Lecture 5Chapter 3_Numerical measures continued

We will learn measures of variation

Recall the concept of average (Ref. Lecture 4). Follow the following: Say for example, suppose we have

the following 2 sets of raw data:

1) 15, 15,15,15,15 – Average 15 and variation 0.2) 15, 16,19, 13, 12– Average 15 and variation 2.73.

Statistical meaning of variation

Make a question - is there any difference exist between each of observations from the average value?Suppose X – score of CT1 (class test 1) and for example, suppose it is calculated average score 15.

Next investigation will be to see differences between each of student’s marks to average marks.

If difference is 0, very easy to say student score and average score is same.

If differences give us a positive (negative) sign (+(-)), we can say that student score is greater(lower) thanthe average score.

How we can measure variation of a data set. Various measures (or formulas) are available to detectvariation. These are:

1. Range, R = H-L, H-highest value of a data set and L – Lowest value of a data set

2. Inter-quartile range, IR = p75 – p25, p75- 75th percentile and p25- 25th percentile

3.

Variance (denoted by

) and is calculated by

∑

.

4.

*Standard Deviation (denoted by and is calculated by ∑ ). That means SD = sqrt(variance).

Note: Measures of variation cannot be negative. At least can be 0, recall which indicates all students gotsame scores.

Calculation for variance and SDRecall monthly starting salaries of 5 graduates: 3450, 3550, 3550, 3480, 3355, where we found ∑ / = 3477 (see L3).

Calculation Table for variance and SD

X 3450 729

3550 5329

3550 5329

3480 9

3355 14884

Here variance, ∑ = 26280/4= 6570 and SD = sqrt(variance) = 81.05$.



18

Data summary: SD = 81.054 indicates that graduates salary varies from the average salary 3477$.

Note: Variance cannot be interpreted because its unit comes as a square. For example if mean = 3477$then variance = 6570$2. Taking square root of variance removes this problem (going back to the originalunit of data), which is standard deviation (SD).

So, no interpretation for variance and talk always on SD measure.

Coefficient of variation

See Text, p.99

HW: Text, Ex: 16-24, pp.100-102

Detecting outliers (Five number summary)

See, Text, pp.109-111

HW: Text, Ex: 40-41, pp. 112-113



19

Lecture 7Class Test 1 (20%)

Exam Time: 90 minutes

Requirements:

1)

Must need a two variables scientific calculator (No alternatives).

2) Mobile will be shut off during exam time.

Format of questions

1) Lecture 1- Lecture 5 solved and HW problems2) Related Text book questions

/Good Luck with your first test/



20

Lecture 8Working with grouped data

So far we focused calculation of all measures of average and variation for ungrouped (raw data).Sometimes grouped data (frequency table) is available. In this situation, formula for ungrouped (raw data)is invalid. Follow the following:

Recall Tabular summary, where X- Test Score and n =15 (Lecture 2)

X Frequency (# of students) – f i, , i=1,2,3

46-56 2

56-66 2

66-76 2

76-86 3

86-96 6

Total n =15

X Frequency (# ofstudents) –f i, , i=1,2,3

Midpoints(mi)

f imi

46-56 2 51 102 1352

56-66 2 61 122 512

66-76 2 71 142 72

76-86 3 81 243 48

86-96 6 91 546 1176

Total n =15 1155 3160

Grouped mean (weighted mean)

∑ /

= 1155/15=77

Here variance, ∑ =3160/14=225.71 and SD = sqrt(variance) = sqrt(225.71) =

15.02

Data Summary: SD = 15.02 indicates that students score varies from the average score 77.

HW: Text, Ex: 54-55, pp.128-129




21

Measures of Skewness

We can get a general impression of skewness by drawing a histogram. To understand the concept ofskewness, consider the following 3 histograms:

Figure -1 Figure-2 Figure-3

Figure-1 is known as positively skewed or skewed to the right.

Figure -2 is known as normal/symmetric frequency curve.Figure-3 is known as negatively skewed or skewed to the left.

There are two types of skewness. These are (1) positively skewed or skewed to the right (2) negativelyskewed or skewed to the left.

Note that the normal/symmetric frequency curve is known as non-skewed curve (skewness is absent).

Definition: It gives us idea about the direction of variation of a raw data set.

Figure -1 - direction of variation is observed in left (most of frequencies).Figure -2 - direction of variation is observed in middle (most of frequencies).

Figure -3 - direction of variation is observed in right (most of frequencies).

Recall X – test score.

Figure 1 says us most of students have poor performances. It means that most of students score below theaverage value.Figure 2 says us most of students have average performances. It means that most of students score near to(more/less) the average value.Figure 3 says us most of students have good performances. It means that most of students score over theaverage value.

Measure of skewness

To detect whether skewness is present or not in a set of raw data, we will use the most commonly usedformula, known as Karl Pearson’s (known as Father of Statistics) coefficient of skewness. It is defined as

SK = 3(mean-median)/SD

Note that this formula will work for ungrouped/grouped data.

6

2 23

2

0

2

4

6

8

2 2

6

32

0

2

4

6

8

2 2 23

6

0

2

4

6

8



22

Problem

Suppose X – test score. Let mean = 15, median(50th percentile or 2nd quartile) = 17 and SD =3.Here SK = -2.00.Data summary: SK = -2.00 it means that the test score is negatively skewed. It means that most ofstudents score over 15.

Let mean = 18, median= 14 and SD =5. Here SK = 2.40.Data summary: SK = 2.40 it means that the test score is positively skewed. It means that most of studentsscore below 18.

Let mean = 16, median= 16 and SD =5. Here SK = 0.Data summary: SK = 0 it means that the test score is symmetric. It means that few student’s score belowand over 16.

KurtosisSuppose if a distribution is symmetric, the next question is about the central peak: Is it high or sharp or

short or broad.Pearson (1905) described kurtosis in comparison with the normal distribution and used phases leptokurtic,

platykurtic and mesokurtic to describe different distributions.

If the distribution has more values in the tails of the distribution and a peak, it is leptokurtic. It is a curve

like two heaping kangaroos has long tails and is peaked up in the center.

If there are fewer values in the tails, more in the shoulders and less in the peak, it is platykurtic.

A platykurtic curve, like a platypus, has a short tail and is flat-topped.

HW: Text bookEx: 5 and 6, pp.92-93 (Calculate skewness and interpret)



23

Solved Case Study

Review-Lecture 1 –Lecture 8

The following data are obtained on a variable X, the cpu time in seconds required to run a program using

a statistical package:

6.2 5.8 4.6 4.9 7.1 5.2

8.1 0.2 3.4 4.5 8.0 7.9

6.1 5.6 5.5 3.1 6.8 4.6

3.8 2.6 4.5 4.6 7.7 3.8

4.1 6.1 4.1 4.4 5.2 1.5

a) Construct a stem-leaf diagram for these data. Interpret this table.

b) Break these data into 6 classes and construct a frequency, relative frequency and

cumulative frequency table and interpret the tables using non-technical languages.

c) Using the frequency table, calculate sample mean and sample standard deviation and interpret these

two measures.

d) Construct a histogram. Construct also cumulative frequency ogive and use this ogive to approximate

the 50th

percentile, the first quartile and the third quartile.

e) Calculate sample skewness and interpret.

Solution: Denote X - cpu time in seconds required to run a program using a statistical package.

Please note that answers of the above questions can vary, please check your works very carefully.

a) Table –1: Stem-and-Leaf Display: X, n = 30

Leaf Unit = 0.10

Stem leaf

0 2

1 5

2 6

3 1488



24

4 114556669

5 22568

6 1128

7 179

8 01

Interpretation: Table 1 shows that to run 9 programs need time 4.1 to 4.9 seconds, 5 programs need 5.2

to 5.8 seconds and so on.

b) Table –2: Frequency Distribution of X

X(Classes) Frequency(f i)

__________________

0.2–1.5 1

1.5-2.8 1

2.8-4.1 6

4.1-5.4 10

5.4-6.7 6

6.7-8.1

6 __________________

n =30

Table –3: Relative Frequency Distribution of X

X(Classes) Relative Frequency(rf i) ______________________________0.2–1.5 0.03

1.5-2.8 0.03

2.8-4.1 0.20

4.1-5.4 0.33

5.4-6.7 0.20

6.7-8.1.1.1.1 0.20 _________________________________



25

∑=

6

1i

irf =1

Table–4: Cumulative Frequency Distribution of X

X(Classes) Cumulative Frequency(cf ior Fi)

_______________________________________

0.2–1.5 1

1.5-2.8 2

2.8-4.1 8

5.4-6.7 18

5.4-6.7 24

6.7-8.4 30

___________________________________________

Interpretation: Table 2 shows that 9 programs need times 4.1 to 5.4 seconds, Table 3 shows that 30 percent programs need times 4.1 to 5.4 seconds and Table 4 shows that 18 programs need at most 5.4seconds and so on.

c) Descriptive Statistics: X

Variable n Minimum Maximum

X 30 0.20 8.1

Variable Mean Median(Q2) StDev Q1 Q3

X 5.0 4.75 1.859 4.02 6.12

Interpretation:

Mean = 5.0 seconds means that most of times to run a program need approx. 5 seconds.

Median = 4.75 seconds means that 50% programs to run need less than 4.75 seconds and rest of 50% need

more than 4.75 seconds.

Standard deviation = 1.859 seconds means all the times a program did not take 5 seconds to run.



26

Q1 = 4.02 seconds means that first 25% programs to run need less than 4.02 seconds and rest of 75%

need more than 4.02 seconds.

Q3 = 6.12 seconds means that first 75% programs to run need less than 6.12 seconds and rest of 25%

need more than 6.12 seconds.

FormulaeMean - Ungrouped data:

Formula: x = ∑=

n

1i

ixn

1, where ∑ is the summation sign.

Mean - Grouped data (WEIGHTED MEAN):

Formula: x = ∑=k

1iiimf n

1

, where f i are the frequency of the ith class and m i are the midpoints of the ith

class, midpoint = (LCL+UCL) of the ith class/2 and k are the total no. of classes.

Median for Ungrouped Data:

To obtain the median of an ungrouped data set, arrange data in ascending order (smallest value to largest

value). n is odd, median is at position (n+1)2 in the ordered list. n is even, median is the mean at positions

(n/2) and (n/2)+1 in the ordered list.

Median for Grouped Data:

Formula: cf

FlM

e

e

e

M

1M2n

Me

⎥⎥⎦

⎤

⎢⎢⎣

⎡ −+= −

, whereeMl is the LCL of the median class, 1Me

F − is the cf below the

median class,eMf is the frequency of the median class and c is the size of the median class.

Quartiles for Ungrouped Data:

To calculate percentile for a small set of data, arrange the data in ascending order.

Compute an index i, where i = (p/100)n, p is the percentile of interest and n is the total no. of

observations.

If i is not an integer, round up. The next integer greater than i denote the position of the pth percentile.



27

If i is an integer, the pth percentile is the mean of the value of the positions i and i + 1.

Quartiles for Grouped Data:

Formula for ith percentile: cf

Fl p

i

i

i p

1 p100

pn

pi ⎥⎥⎦

⎤

⎢⎢⎣

⎡ −+= −

, i = 1, 2, ..., 99, Wherei p

l is the LCL of the ith

percentile class, 1 piF − is the cf below the ith percentile class,

i pf is the frequency of the ith percentile class

and c is the size of the ith percentile class. For application, refer the calculation of median for grouped

data.

Standard Deviation for Ungrouped Data:

Formula: ∑=

−−

=σn

1i

2

i )xx(1n

1, where xi are the raw data and x = ∑

=

n

1i

ixn

1.

Standard Deviation for Grouped Data:

Formula: ∑=

−−

=σn

1i

2

ii )xm(f 1n

1, where f i are the frequencies of the ith class, mi are the mid-points

of the ith class interval and x = ∑=

k

1i

iimf n

1.

d) Histogram of X

Interpretation: See Table 2 message.

Ogive of X:

Do by yourself

0

3

6

9

0.2-1.5 1.5-2.8 2.8-4.1 4.1-5.4 5.4-6.7 6.7-8.1

# o f p r o g r a m s

cpu time



28

Interpretation: See Table 4 message.

e) Sample skewness

skewness(X) = 0.4034

Formula - σ

−=

)Mx(3S

eK , where 33 +≤≤− k S i.e. skewness can range from -3 to +3. Interpretation

of Sk

• A value near -3, such as – 2.78 indicates considerable negative skewness.

• A value such as 1.57 indicates moderate positive skewness.

• A value of 0 indicates the distribution is symmetrical and there is no skewness present.

Interpretation: Sk = 0.4034 indicate that to run a few programs need time more than 5 seconds.

More case studies:




29

Lecture 9Chapter-4_Introduction to probability

Some Important Definitions:

Random experiment, random variable, sample space, events (simple event, compound event), counting

rules, combinations, permutations, tree diagram, probability defined on events

Introduction

We finished our first important part of the course (known as data summary). Even we sat for CT1. Nowwe are moving in the 2nd very important part of the course, namely “Chance Theory”. It is also known as“Probability Theory”. The word “Chance or Probability” frequently we are using in our real life. Forexamples:

(i) What is the chance of getting grade-A for the course MAT 211?(ii) What is the chance that sales will decease if we increase prices of a commodity?(iii)

What is the chance that a new investment will be profitable?

and so many other situations not possible to record all.

It is true word “Chance or Probability” using frequently.

To understand consider a situation. For example if we ask the following question to 3 students:

What is the chance of getting grade-A for the course MAT 211?

Say for example, answered the following:

Student-1: Chance is 95%

Student-2 : Chance is 100%Student-3 : Chance is 10%

Let’s explain their predicted values under the chance theory. What we can observe:

Student 1 is 95% confident he/she is getting grade A. That means past experience tells us out of 100students, 95 students had grade A.

Student 2 is 100% confident he/she is getting grade A. All the students got grade A.

Student 3 is only 10% confident (less confident) he/she is getting grade A. Only 10 students out of 100students got grade A.

How calculated?Recall the relative frequency method, where relative frequency = frequency/n and apply this. We will getthe answer. Suppose n =100, # of students got grade A is 95 (frequency), here probability is 0.95.

This formula we will use to calculate probability. Follow the following

To calculate probability of an event (recall in the previous example, one possible event grade A), we haveto very familiar with the following terms:



30

Random experiment, random variable, sample space, events

Random experiment – It is the process of getting all possible events. Events are also known asoutcomes.

Random variable- It is denoted by r.v. It is the event which one we will be interested from the all possible outcomes. In the previous example, grade A is the random variable.

Note that r.v. will vary from experiment to experiment.

Sample Space: It is very very important. Without it, will not be possible to calculate chance of an event.

It is denoted by S. It is all the possible outcomes of a random experiment. It is just the set (recall set –collection of all objects). Sometimes it will be not possible to calculate easily (note that to get the ideaabout S, we have to practice a lot!).

Several methods will be used to find S. These are:

Our knowledge, tree diagram (a wonderful method) and counting rules (permutation, combination). Wewill use combination approach most of times, however permutation approach also will be used.

Events – It is denoted by E. It is a possible outcome of our random experiment.

Formula to compute probability of an event: It is denoted by P(E) and calculated as

P(E) = (#of E)/S, 0≤P(E)≤1If

P(E) = 0, no chance to occur (improbable event).

P(E) = 0.5, 50% chance that the E will occur.P(E) – 1.0, 100% chance, that the E will occur (sure event)

Recall grade exampleP(grade A) =(#of E)/S = 95/100 = 0.95, where S = {all possible grades}, E = grade A.Summary: The randomly selected student will get grade A, chance is 95%.

Some random experiments and S (Text, p.143)

Random experiment 1:Toss a fair coin. S = {H,T}, H-head and T-tail. If E – head, then P(H) = ½ = 0.5 and P(T) = ½ = 0.5.

Random experiment 2:Select a part for inspection, S = {defective, non-defective}.

Random experiment 3:Conduct a sales call, S = {purchase, no purchase}.

Random experiment 4:Roll a fair die, S={1,2,3,4,5,6}.



31

Random experiment 5:Play a football game, S = {win, lose, tie}.

Note that in the sample space, S all possible events read as “or”. Be careful not “and”. It is impossible toget H or T in one experiment. Win and lose in one game is also impossible (realize it!)

Important concepts

mutually exclusive events, equally like events, Tree diagram, combination

Mutually exclusive events – It is the event where two possible events cannot occur simultaneously. Toss a

coin, H and T cannot occur in a single random experiment. It is written as P(H∩T) =0.

If P(H∩T) ≠0, events are mutually inclusive. Toss two coins (or one coin two times), H and T can occur

in this random experiment. For example, P(H∩T) =0.50, where S = {HH, HT, TH, TT}.

Equally like events – Two events has equal chance of being occur. Toss a coin, P(H) = P(T) = 0.5.

Tree diagram – It is a technique to make a summary of all possible events of a random experimentgraphically.

Combination - It is a formula to make a summary of all possible events of a random experiment.

Counting rules – Two rules: Combination and Permutation

Combination – It allows one to count the number of experimental outcomes when the experimentinvolves selecting n objects from a set of N objects.

For example, if we want to select 5 students from a group of 10 students, then

!!! !!! 252 possible ways students can be selected, here S = 252.

Permutation – It allows one to count the number of experimental outcomes when n objects are to beselected from a set of N objects, where the order of selection is important.

For example, if we want to select 5 students from a group of 10 students ( where order is important , then

!! !! 30240 possible ways students can be selected, here S = 30240.

Ex: 1. How many ways can three items can be selected from a group of six items? Use the letters A, B, C,

D, E and F to identify the items and list each of the different combinations of three items.Solution: S = 6 !!! 20 possible ways letters can be selected. Some examples, ABC, ABD,

ABE, ABF, ……. DEF.

Ex: 2. How many permutations of three items can be selected from a group of six items? Use the lettersA, B, C, D, E and F to identify the items and list each of the different permutations of items B, D and F.

Solution: S = 6 !! 120 possible ways letters can be selected.



32

Different permutations of items B, D and F: BDF, BFD, DBF, DFB, FDB, FBD, 6 outcomes.

Ex:3: An experiment with three outcomes has been repeated 50 times and it was learned that E1 occurred20 times, E2 occurred 13 times and E3 occurred 17 times. Assign probabilities to the outcomes.

Solution: S = {E1, E2, E3}. Here P(E1) = 20/50=0.40, P(E2) = 13/50=0.26, P(E3) = 17/50=0.34

P(S) = P(E1)+ P(E2)+ P(E3)= 0.40+0.26+0.34 = 1.0

Ex:4: A decision maker subjectively assigned the following probabilities to the four outcomes of anexperiment: P(E1) = 0.10, P(E2) = 0.15, P(E3) = 0.40 and P(E4) = 0.20. Are these probabilityassignments valid? Explain

Solution: S = {E1, E2, E3, E4}. Here P(E1) = 0.10, P(E2) = 0.15, P(E3) = 0.40, P(E4) = 0.20.

P(S)=P(E1)+P(E2)+P(E3)+P(E4)=0.10+0.15+0.40+0.20=0.85<1.0. Thus, probability assignments invalid

because P(S) ≠1.

The above two problems tell us for any random experiment, P(S) = 1.

Ex:5: Suppose that a manager of a large apartment complex provide the following probability estimatesabout the number of vacancies that will exist next month

Vacancies: 0 1 2 3 4 5Probability: 0.05 0.15 0.35 0.25 0.10 0.10

Provide the probability of each of the following events:

a. No vacancies b. At least four vacancies

c.

Two or fewer vacancies

Solution: S={0,1,2,3,4,5}

a. P(0)=0.05

b.

P(At least four vacancies) = P(4)+P(5)=0.20.

c. P(Two or fewer vacancies)= P(0)+P(1)+P(2)=0.05+0.15+ 0.35=0.55.

Ex:6: The National Sporting Goods Association conducted a survey of persons 7 years of age or olderabout participation in sports activities. The total population in this age group was reported at 248.5

million, with 120.9 million male and 127.6 million female. The number of participation for the top fivesports activities appears here

Participants

Activity Male Female

Bicycle riding 22.2 21.0

Camping 25.6 24.3

Exercise walking 28.7 57.7



33

Exercising with equipment 20.4 24.4

Swimming 26.4 34.4

a. For a randomly selected female, estimate the probability of participation in each of the sportsactivities

b. For a randomly selected male, estimate the probability of participation in each of the sports

activitiesc.

For a randomly selected person, what is the probability the person participates to exercisewalking?

d. Suppose you just happen to see an exercise walker going by. What is the probability the walker isa woman? What is the probability the walker is a man?

Solution: S = {Br, C, EW, EE,S}, where Br - Bicycle riding, C- Camping, EW- Exercise walking, EE-Exercising with equipment and S – Swimming.

a. Female can come from any sports activities. Thus P(F) = (21/248.5) +(24.3/248.5) +… +(34.4/248.5).

b.

Male can come from any sports activities. Thus P(M) = (22.2/248.5) +(25.6/248.5) +… +

(26.4/248.5).c. Person can be male or female. Thus, P(EW) = P(Male EW) +P(Female EW) = (28.7/248.5)

+(57.7/248.5)=86.4/248.5=0.34 = 34%.d. We have to consider exercise walker population. Thus, P(woman/EW) = 57.7/ (28.7+57.7) =

57.7/86.4 = 0.67 = 67%. P(man/EW) = 28.7/ (28.7+57.7) = 28.7/86.4 = 0.33 = 33%.

HW: Textbook

Ex: 1-9, pp.158-159Ex: 14-21, pp.162-164



34

Lecture 10

Basic relationships of probability (addition law, complement law, conditional law, multiplication law)

Addition Law

Suppose we have two events A and B (A, B ∈S). The chance of occurring A or B is written as

•

P(A∪B) = P(A) + P(B) - P(AB), if two events are not mutually exclusive.

• P(A∪B) = P(A) + P(B), if two events are mutually exclusive.

Keywords: Or, at least

Problem 1

Consider a case of a small assembly plant with 50 employees. Suppose on occasion, some of the workersfail to meet the performances standards by completing work late or assembly a defective product. At theend of a performance evaluation period, the production manager found that 5 of the 50 workers completed

work late, 6 of the 50 workers assembled a defective product and 2 of the 50 workers both completedwork late and assembled a defective product. Suppose one employee if selected randomly what is the probability that the worker completed work as late or will assembled a defective product?

Solution: Let L- work is completed late, D - assembled product as defective. Total employees S = 50.

We have to find P(L∪D). We know P(L∪D) = P(L) + P(D) - P(LD) = (5/50)+(6/50) – (2/50) =0.10+0.12-0.04 = 0.18 = 18%.

The chance is 18% the worker completed work as late or will assembled a defective product.

Problem 2

A telephone survey to determine viewer response to a new television show obtained the following data

Rating: Poor Below average Average Above average ExcellentFrequency: 4 8 11 14 13Suppose a viewer is selected randomly

(i) What is the chance that the viewer will rate the new show as average or better?(ii) What is the chance that the viewer will rate the new show as average or worse?

Solution: Total possible viewers S = 50.

(i)

P(average or better) = (11/50) + (14/50)+(13/50)

The viewer will rate the new show as average or better, chance is 76%.

(ii) P(average or worse) = (11/50) + (8/50) + (4/50) = 0.46 = 46%

The viewer will rate the new show as average or worse, chance is 46%.



35

Complement law (very useful law many cases!).

Suppose we have one event A, then the chance of not getting A event is defined as

P(Ac) = 1-P(A), A∈S,

Keyword: not

Recall Problem 1

(i) What is the chance that the randomly selected worker completed work will not be late?(ii) Suppose one employee if selected randomly what is the probability that the worker completed

work as late nor will assembled a defective product?

Solution: (i) P(Lc) = 1-P(L) = 1-0.10 =0.90 = 90%

The chance is 90% that the randomly selected worker completed work will not be late.

(ii) P(L∪D)c= 1- P(L∪D)= 1-0.18 = 0.82 = 82%

The chance is 82% that the randomly selected worker completed work as late nor will assembled adefective product.

Conditional law - Keyword: If, given, known, conditional

Suppose we have two events A and B (A, B ∈S), the chance of getting A when B is known (or B when Ais known) is defined as

• P(A/B) = P(AB)/P(B), P(B) ≠ 0

• P(B/A) = P(AB)/P(A), P(A) ≠0

To understand the concept, consider the following situation:

Roll a die. What is the chance of getting the die will show

(i) 2(ii) Even number(iii) 2 or even number(iv)

Not 2(v) 2 given that die will show even number(vi) 2 given that die will show odd number

Solution: S = 6. (i) P(2) = 1/6 (ii) P(Even number) =3/6 (iii) P(2∪even number) = (1/6)+(3/6)-(1/6)

(iv)P(2c)=1-P(2) = 5/6 (v) P(2/even number) = 1/3 (vi) P(2/odd number)=0

Observe carefully (i) to (iv) are unconditional probabilities, but (v) to (vi) are conditional probabilities.Here to calculate (i) to (iv) we used unconditional sample space, whether to calculate (v) to (vi) we usedconditional sample space, where has given condition from the roll we need even or odd numbers.

Multiplication law

Suppose we have two events A and B (A, B ∈S), the chance of getting A and B is defined as

• P(AB) = P(A/B)P(B) if A and B events are dependent



36

• P(AB) = P(A) P(B) if A and B events are independent

Keyword: both, joint, altogether, and

Problem

Consider the situation of the promotion status of male and female officers of a major metropolitan policeforce in the eastern United States. The force consists of 1200 officers, 960 men and 240 women. Over the past two years 324 officers on the public force received promotions. The specific breakdown of promotions for male and female officers is shown in the following Table

Table: Promotion status of police officers over the past two years

Men Women Total

Promoted 288 36 324

Not Promoted 672 204 876

Total 960 240 1200

a) Find a Joint probability table.

b)

Find marginal probabilities.c)

Suppose a male officer is selected randomly, what is the chance that the officer will be promoted?d) Suppose a female officer is selected randomly, what is the chance that the officer will not be

promoted?e)

Suppose an officer is selected randomly who got promotion, what is the chance that the officerwill be male?

f)

Suppose an officer is selected randomly who did not get promotion, what is the chance that theofficer will be female?

Solution: Here S = 1200 officers

a) Joint probability table for promotion status

Men Women Total

Promoted 0.24 0.03 0.27

Not Promoted 0.56 0.17 0.73

Total 0.80 0.20 1.00

b) P(Men) = 0.80, P(Women) = 0.20, P(Promoted) = 0.27, P(Not Promoted) = 0.73, these are known asmarginal probabilities.

c) P(Promoted/Men)=288/960.

d) P(Not Promoted/ Female) =204/240.

e) P(Male/Promotion) = 288/324.

f) P(Female/not Promoted) = 204/876.

HW: Text, Ex: 22-27, pp.169-170 and Ex: 32-35, pp.176-177



37

Lecture 12

Mid-term test -20%

Requirements:

1)

Must need a two variables scientific calculator (No alternatives).2) Mobile will be shut off during exam time.

Format of questions

3) Lecture 8-Lecture 10 solved and HW problems4) Related Text book questions

/Good Luck/



Normal

Discover The form

equation

where μ shape of3.14 and

By notati

It is true t Standar

A randomto have a

The letter The stanavailable

The final

istribution)

d by Abraha

or shape can

epends upo

mean of nohe normal p.728 respect

n X ~ N(μ,

hat once μ a

Normal Pr

variable thastandard nor

Z is commo

ard normalin tables that

page table is

m de Moivre,

be given in t

the two par

(f

rmal variablobability disively.

) read as X i

d σ are speci

bability Dis

has a normaal probabili

ly used to d

robability dican be used i

an example

a French m

e following:

meters mean

πσ

= e2

1)

2

, σ = SD ofribution) an

normally di

fied, the nor

tribution

l distributiony distributio

signate this

stribution, ar n computing

f such a tabl

38

thematician i

(μ) and stan

μ− ,2)X(2/1

the normal v e are mathe

stributed wit

al curve is

with a mean.

articular nor

eas under th probabilities

.

n 1733.

dard deviatio

∞<<∞− X

ariable (μ amatical cons

mean μ and

ompletely d

of zero and

mal random

normal cur .

The

n (σ) follows

d σ determiants, which

standard dev

termined.

tandard devi

ariable.

e have been

Lect

Cha

mathem

:

e the locatioalues are eq

iation σ.

ation of one i

computed a

re 13

ter 6

atical

n andual to

s said

d are



39

Computing Probabilities for Any Normal Probability Distribution

The reason for discussing the standard normal distribution so extensively is that probabilities for allnormal distributions are computed by using the standard normal distribution. That is, when we have anormal distribution with any mean and standard deviation, we answer probability questions about thedistribution by first converting to the standard normal distribution. Then we can use Table and the

appropriate Z values to find the desired probabilities.

ProblemConsider according to a survey, subscribers to The Wall Street Journal Interactive Edition spend averageof 27 hours per week using the computer at work. Assume the normal distribution applies and that thestandard deviation is 8 hours.

a) What is the probability a randomly selected subscriber spends less than 11 hours using thecomputer at work?

b) What percentage of the subscribers spends more than 40 hours per week using the computer atwork?

c)

A person is classified as a heavy user if he or she is in the upper 20% in terms of hours of usage.

How many hours must a subscriber use the computer in order to be classified as a heavy user?

SolutionDenote X = No. of hours per week using the computer at work, X~N(27, SD = 8)

a) Need to find p(X<11) = p(Z<-2) = 0.028

b) p(X>40) = p(Z>1.62) = 0.0526 i.e. 5.26%.

c) Need to find X when p = 0.20. Thus, X = μ + Zσ = 27 + 8Z. When p = 0.20, then find Z and substitutethe Z-value to get the value of X.

HW: Ex: 10-25, pp.248-250



40



41

Lecture 15Class Test _2 -15%

Requirements:

1)

Must need a two variables scientific calculator (No alternatives).2) Mobile will be shut off during exam time.

Format of questions

1) Lecture 13-Lecture 14 solved and HW problems2) Related Text book questions



Our stepraw data

Generallya) C b) S

The totalalso term Money,there aresample ssee Text,

The metha) I b) S

cThe popu

ere how toffective way

, there are tensus surveyample surve

count of alld census sur

anpower anmany situatrvey is used

p.269) have

od of drawinentify N uni

elect at randlumn or diaation units c

ollect randos so that gen

o ways the r and.

nits of the pvey.

time requir ons where cto select a r een construc

a random ns in the popuom, any pagonal at rand

orresponding

samples fr ral peoples c

quired infor

pulation for

ed for carryiomplete enundom part oted by each

mber consislation with t of the ranm.to the numb

42

m the targetan understan

ation may b

a certain cha

g out complmeration isf the populatf the digits 0

t the followie numbers 1om number

rs in step b)

population ad so clearly.

e obtained:

racteristics k

ete enumeraot possible.

ion using the,1, …, 9.

g steps:to N.table and pi

constitute th

nd how to su

own as com

ion will gen Thus, samp table of ran

k up the nu

random sam

Lect

ChaRandom Sa

mmarize col

plete enumer

rally be largle enumeratiom number

bers in any

ples.

re 16

pter 8 pling

ected

ation,

e andon or(e.g.

row,



43

To illustrate how to select sample by the method of use of table of random numbers, consider the

following problem:

Suppose the monthly pocket money (TK/-) given to each of the 50 School of Business students at IUB as

follows:

Pocket Money (TK/-)

1100 1500 8900 4500 2700 3800 3000 6700 2600 3600

7500 7900 4600 2000 2400 1300 8500 6500 6200 5800

6000 6800 9200 3800 1200 8000 7100 8600 8700 6300

7600 7700 2600 7800 2000 9000 7300 8400 1700 2500

5700 5300 5500 1700 3700 5400 2400 4000 1200 7300

To draw a random sample of size 10 from a population of size 50, first of all, need to identify the 50 units

of the population with the numbers 1 to 50.

Pocket Money (TK/-)

1100(1) 1500(2) 8900(3) 4500(4) 2700

(5)

3800

(6)

3000

(7)

6700 (8) 2600 (9) 3600(10)

7500(11) 7900

(12)

4600

(13)

2000

(14)

2400

(15)

1300

(16)

8500

(17)

6500

(18)

6200

(19)

5800(20)

6000

(21)

6800

(22)

9200

(23)

3800

(24)

1200

(25)

8000

(26)

7100

(27)

8600

(28)

8700

(29)

6300

(30)

7600(31) 7700

(32)

2600(33) 7800

(34)

2000

(35)

9000

(36)

7300

(37)

8400(38) 1700

(39)

2500

(40)

5700

(41)

5300

(42)

5500

(43)

1700

(44)

3700

(45)

5400

(46)

2400

(47)

4000

(48)

1200(49) 7300

(50)

Then, in the given random number table, starting with the first number and moving row wise (or column

wise or diagonal wise) to pick out the numbers in pairs, one by one, ignoring those numbers which are

greater than 50, until a selection of 10 numbers is made.

# Selected row-wise sample numbers: 27, 15, 45, 11, 02, 14, 18, 07, 39, 31



44

# Selected row-wise monthly pocket money (TK/-) of 10 students out of 50: 7100, 2400, 3700, 7500,

1500, 2000, 6500, 3000, 1700, 7600

HW:

Calculate mean and standard deviation of 10 students monthly pocket money (Use formula and

Scientific calculator)

Text, Ex: 3-8, pp.272-273



45

Lecture 17Chapter 8_Interval estimation (Estimation of Parameters)

Aim

Be familiar how to construct a confidence interval for the population parameter.

The sample statistic is calculated from the sample data and the population parameter is inferred (orestimated) from this sample statistic. In alternative words, statistics are calculated; parameters areestimated.

Two types of estimates we find: point estimate and interval estimate.

Point Estimate – It is the single best value. For example, mean and SD of total marks for a course of IUB

students are point estimates because these are single value.

Interval Estimate - Confidence Interval

The point estimate is varying for sample to sample and going to be different from the population

parameter because due to the sampling error. There is no way to know who close it is to the actual parameter. For this reason, statisticians like to give an interval estimate (confidence interval), which is arange of values used to estimate the parameter.

A confidence interval is an interval estimate with a specific level of confidence. A level of confidence is

the probability that the interval estimate will contain the parameter. The level of confidence is 1 - α. 1-α area lies within the confidence interval.

Confidence interval for based on large samples

Problem

Suppose, total marks for a course of 35 randomly selected IUB students is normally distributed with mean

78 and SD 9. Find 90%, 95% and 99% confidence intervals for population mean μ. Make a summary

based on findings.

Solution:

We have given X~N(78,9), where X - total marks for a course of 10 randomly selected IUB students and

n=35.

90% confidence interval for :

α

/ √

α

/ √

Here =78, =9, n=35, α=1-0.90 = 0.10, α/2 = 0.05 and α /=.=1.65

Thus, 781.65 9√ 35 78 1.65 9√ 35

78 2.5101 78 2.5101



46

75.4899 80.5101

Summary: Based on our findings, we are 90% confident that population mean is ranging 75.5 to 80.5.


α / √ α / √

Here =78, =9, n=35, α=1-0.95 = 0.05, α/2 = 0.025 and α /=.=1.96

Thus, 781.96 9√ 35 78 1.96 9√ 35

78 2.9817 78 2.9817 75.0183 80.9817

Summary:

Based on our findings, we are 95% confident that population mean is ranging 75.01 to 80.98.


α / √ α / √

Here =78, =9, n=35, α=1-0.99 = 0.01, α/2 = 0.005 and α /=.=2.58

Thus, 782.58 9√ 35 78 2.58 9√ 35

78 3.9249 78 3.9249 74.0751 81.9249

Summary:

Based on our findings, we are 99% confident that population mean is ranging 74.07 to 81.92.

Practice problems

1. In an effort to estimate the mean amount spent per customer for dinner at a major Atlanta restaurant,data were collected for a sample of 49 customers over a three-week period. Assume a populationdeviation of $2.50.a. At a 95% confidence level, what is the margin of error? b. If the sample mean is $22.6, what is the 95% confidence interval for the population mean?



47

Guideline:

X- Amount spent per customer for dinner at a major Atlanta restaurant. Here n=49, SD = $2.50

a) Find Margin of error (ME) =

α /

√ , here α=1-0.95 = 0.05, α/2 = 0.025 and

α /=

.=1.96

b)

95% confidence interval for the population mean: α / √ α / √

(Solve it)

2. Have a machine filling bags of popcorn; weight of bags known to be normally distributed with meanweight 14.1 oz and SD 0.3 oz. Take sample of 40 bags, what’s a 95% confidence interval for population

mean μ?

Guideline:

a) X - weight of bags. Here n=40,

=14.1,

=0.3 α=1-0.95 = 0.05, α/2 = 0.025 and

α /=.=1.96

95% confidence interval for population mean μ:

α / √ α / √

(Solve it)

3. The National Quality Research Center at the University of Michigan provides a quarterly measure ofconsumer opinions about products and services (The Wall Street Journal, February 18, 2013). A survey of40 restaurants in the Fast Food/ Pizza group showed a sample mean customer satisfaction index of 71.

Past data indicate that the population standard deviation of the index has been relatively stable with σ=5.

a. Using 95% confidence, determine the margin of error. b. Determine the margin of error if 99% confidence is desired.

Guideline:

Follow 1 and 2 questions guideline

4. The undergraduate GPA for students admitted to the top graduate business schools is 3.37. Assume thisestimate is based on a sample of 120 students admitted to the top schools. Using past years' data, the population standard deviation can be assumed known with .28. What is the 95% confidence interval

estimate of the mean undergraduate GPA for students admitted to the top graduate business schools?Guideline:

Follow 1 and 2 questions guideline

HW: Text,



48

Confidence interval for based on small samples

When sample size is less than 30 i.e. n<30, the mean has a Student's t distribution. The Student's tdistribution was created by William S. Gosset, an Irish worker. He wouldn't allow him to publish hiswork under his name, so he used the pseudonym "Student".

The Student's t distribution is very similar to the standard normal distribution.

• It is symmetric about its mean• As the sample size increases, the t distribution approaches the normal distribution.• It is bell shaped.• The t-scores can be negative or positive, but the probabilities are always positive.

(1-α)100% confidence interval for :

α

,

√ α

,

√

Problem

Suppose we have given sample heights of 20 IUB students, where = 67.3", SD = 3.6" and the

distribution is symmetric. Develop 95% confidence interval for μ and make a summary based on yourfindings.

Solution:

We have given X~N(67.3,3.6), where X - heights of 20 randomly selected IUB students and n=20.


α , √ α , √

Here =67.3, =3.6, n=25, α=1-0.95 = 0.05, α/2 = 0.025 and α ,=.=2.093

Thus, 67.3 2.093 3.620 67.3 2.093 3.6√ 20

Summary: Based on our findings, we are 95% confident that population mean is ranging 65.61 to 68.98.



Practice

1. The Intransatlan25 busineAirport.

problems

ternationaltic gatewayss travelers ihe ratings o

ir Transportirports. The

s selected antained from t

Associationmaximum p each travelhe sample of

49

surveys busissible ratingr is asked to25 business

ness traveler is 10. Supp provide a ratravelers foll

s to developse a simpleing for thew.

quality ratinrandom samiami Interna

s forle of

tional



50

6, 4, 6, 8, 7, 7, 6, 3, 3, 8, 10, 4, 8, 7, 8, 7, 5, 9, 5, 8, 4, 3, 8, 5,5Develop a 95% confidence interval estimate of the population mean rating for Miami.

2. Text book, Ex.15-17, p.324

3. Have a machine filling bags of popcorn; weight of bags known to be normally distributed with meanweight 10.5 oz and SD 0.8 oz. Take sample of 10 bags, what’s a 90% confidence interval for population

mean μ?

Confidence interval for variance and standard deviation

We have learned that estimates of population means can be made from sample means, and confidenceintervals can be constructed to better describe those estimates. Similarly, we can estimate a populationstandard deviation from a sample standard deviation, and when the original population is normallydistributed, we can construct confidence intervals of the standard deviation as well

Variances and standard deviations are a very different type of measure than an average, so we can expectsome major differences in the way estimates are made.

We know that the population variance formula, when used on a sample, does not give an unbiasedestimate of the population variance. In fact, it tends to underestimate the actual population variance. Forthat reason, there are two formulas for variance, one for a population and one for a sample. The samplevariance formula is an unbiased estimator of the population variance.

Also, both variance and standard deviation are nonnegative numbers. Since neither can take on a negativevalue, thus the normal distribution cannot be the distribution of a variance or a standard deviation. It can

be shown that if the original population of data is normally distributed, then the expression

has a

chi-square distribution with n−1 degrees of freedom.

The chi-square distribution of the quantity allows us to construct confidence intervals for the

variance and the standard deviation (when the original population of data is normally distributed).

(1-α)100% confidence interval for 2:

/

where the χ

/ values are based on a chi-square distribution with n-1 degress of freedom and 1-α is the

confidence coefficient (Details see, Text, p.440)

(1-α)100% confidence interval for :

/



where the

confidenc

χ / values

e coefficient

are based on

(Details see,

a chi-square

Text, p.440)

51

distribution ith n-1 degr ss of freedo and 1-α is he



52

Problem-1

A statistician chooses 27 randomly selected dates and when examining the occupancy records of a particular motel for those dates, finds a standard deviation of 5.86 rooms rented. If the number of roomsrented is normally distributed, find the 95% confidence interval for the population standard deviation ofthe number of rooms rented.

Solution:

Here X - Number of rooms rented, S = 5.86 and n=27

95% confidence interval for the population standard deviation (σ):

/

Here

.. ..

. .

Summary: Based on our findings, we are 95% confident that population standard deviation is ranging

4.615 to 8.031.

Problem-2

A statistician chooses 27 randomly selected dates and when examining the occupancy records of a

particular motel for those dates, finds a standard deviation of 5.86 rooms rented. If the number of roomsrented is normally distributed, find the 95% confidence interval for the population variance of the numberof rooms rented.

Solution:

Here X - Number of rooms rented, S = 5.86 and n=27

95% confidence interval for the population variance (σ2):

/

Here .. . .

21.297 64.492



53

Summary: Based on our findings, we are 95% confident that population variance is ranging 21.297 to

64.492

Practice problems

1. The variance in drug weights is critical in the pharmaceutical industry. For a specific drug,with weights measured in grams, a sample of 18 units provided a sample variance of s2=0.36.

a. Construct a 90% confidence interval estimate of the population variance for the weight of thisdrug. b. Construct a 90% confidence interval estimate of the population standard deviation.

2. The daily car rental rates for a sample of eight cities follow.

City Daily Car Rental Rate ($)

Atlanta 69

Chicago 72Dallas 75

New Orleans 67

Phoenix 62

Pittsburgh 65

San Francisco 61

Seattle 59

a. Compute the sample variance and the sample standard deviation for these data.

b. What is the 95% confidence interval estimate of the variance of car rental rates for the population?

c. What is the 90% confidence interval estimate of the standard deviation for the population?



54

Lecture 18Chapter 10

Interval estimations about two population means, standards deviations, see Text. Chapter 10



55

Lecture 19Tests of hypothesis

In general, we do not know the true value of population parameters (mean, proportion, variance,SD and others). They must be estimated based on random samples. However, we do have

hypotheses about what the true values are.

The major purpose of hypothesis testing is to choose between two competing hypotheses aboutthe value of a population parameter.

Actually, in hypothesis testing we begin by making a tentative assumption about a population

parameter. This tentative assumption is called the null hypothesis and is denoted by H0.

It is needed then to define another hypothesis, called the alternative hypothesis, which is the

opposite in H0. It is denoted by Ha or H1.

Both the null and alternative hypothesis should be stated before any statistical test of significanceis conducted.

In general, it is most convenient to always have the null hypothesis contain an equal sign, e.g.

(1) H0: μ = 100

H1: μ ≠ 100

(2) H0: μ ≥ 100H1: μ < 100

(3) H0: μ ≤ 100H1: μ > 100

Thus, note that

under H0, signs are =, ≤ and ≥

under H1, signs are ≠, < and >

In general, a hypothesis tests about the values of the population mean μ take one of the following

three forms:

H0: μ = μ0 H0: μ ≥ μ0 H0: μ ≤ μ0

H1: μ ≠ μ0 H1: μ < μ0 H1: μ > μ0



56

For example, consider the following problems in choosing the proper form for a hypothesis test:

Problem 1

The manager of an automobile dealership is considering a new bonus plan designed to increase

sales volume. Currently, the mean sales volume is 14 automobiles per month. The managerwants to conduct a research study to see whether the new bonus plan increases sales volume. To

collect data on the plan, a sample of sales personnel will be allowed to sell under the new bonus

plan for a 1-month period. Define the null and the alternative hypotheses.

Solution: Here H0: μ ≤ 14 and H1: μ > 14.

Problem 2


sales volume. Currently, the mean sales volume is 14 automobiles per month. The manager

wants to conduct a research study to see whether the new bonus plan decreases sales volume. Tocollect data on the plan, a sample of sales personnel will be allowed to sell under the new bonus


Solution: Here H0: μ ≥ 14 and H1: μ < 14.

Problem 3


sales volume. Currently, the mean sales volume is 14 automobiles per month. The manager

wants to conduct a research study to see whether the new bonus plan changes sales volume. To

collect data on the plan, a sample of sales personnel will be allowed to sell under the new bonus


Solution: Here H0: μ = 14 and H1: μ ≠ 14.

Steps for conducting a of hypothesis test

1. Develop H0 and H1.

2. Specify the level of significance, α, which defines unlikely values of sample statistic if the

null hypothesis is true. It is selected by the researcher at start. The common values of α are 0.01,

0.05 and 0.10 and is most common 0.05.

3. Select the test statistic (a quantity calculated using the sample values that is used to perform

the hypothesis test) that will be used to test the hypothesis.

Guidelines to select test statistic:



57

Tests on population mean (μ)

a)

Use Z-statistic when n>30 and SD known

b) Use t-statistic when n≤30 and SD unknown

c) Tests on population variance and SD (σ2 and σ)

Use χ2-statistic.

4. Use α to determine the critical value (A boundary values that define the critical region from

the non-critical region or acceptance region. Based upon given risk level α) for the test statistic

and state the rejection rule for H0.

Critical region (CR) or rejection region (RR) are the area of the test statistic for which H0 is false.

Non-critical region or acceptance region (AR) are the area of the test statistic for which H0 is

true.

5. Collect the sample data and compute the value of the test statistic.

6. Use the value of the test statistic and the rejection rule to determine whether to reject H 0.

Using the p-value to make decision:

The probability when H0 is true, of obtaining a sample result that is at least as unlikely as what is

observed. More clearly, the p-value is a measure of the likelihood of the sample results when H0 is

assumed to be true. The smaller the p-value, the less likely it is that the sample results came from a

situation whether the H0 is true. It is often called the observed level of significance. The user can then

compare the p-value to α and draw a hypothesis test conclusion without referring to a statistical table):

Use the value of the test statistic to compute the p-value.

Reject H0 if p-value < α.

Problem-4

Individuals filing federal income tax returns prior to March 31 had an average refund of $1056.

Consider the population of last minute filers who mail their returns during the last 5 days of the

income tax period typically April 10 to April 15. A researcher suggests that one of the reasons

individuals wait until the last 5 days to file their returns is that on average those individuals have

a lower refund than early fillers.

a) Develop appropriate hypotheses such that rejection of null hypothesis will support the

researchers argument.

b. Using 5% level of significance, what is the critical value for the test statistic and what is the

rejection rule?



c. For a

mean re

test stati

d. What

e. What i

Solution

Denote

σ = $16

(a) Set u

(b) We f

of signifi

Rejectio

(c) Test

(d) Conc

Decision

Thus, it i

and alter

may be

tax retur

ample of 4

und was $9

tic.

s your conc

s the p-valu

- Individu

0.

the follow

nd that n >

cance is fo

rule: Reje

Statistic zcal

lusion

: Reject the

s possible t

natively ac

oncluded t

s between

0 individua

10 and the

lusion?

e for the tes

ls federal i

ng hypothe

30, choose

nd from the

ct H0 if zcal

= (sqrt(400

null hypoth

conclude t

ept the alte

at the resea

pril 10 to

ls who file

ample stan

?

come tax r

es:

0: μ ≥ $10

he z-statisti

z table is -1

-1.645

(910 - 1056

sis.

hat we are

rnative hyp

chers clai

pril 15 had

58

a return be

ard deviati

turns prior

6 vs. H1: μ

c. The criti

.645.

))/1600 = -1

5% confide

othesis. Mo

is true that

an average

tween April

n was $16

to March 3

< $1056

al value of

.8250

nt that we

re clearly,

means indi

refund of lo

10 and Ap

0. Comput

. Here n =

the z-statisti

ay reject th

ased on sa

viduals filin

er than $1

il 15, the sa

the value

00, = $91

c at the 5%

e null hypo

ple eviden

g federal in

56.

mple

f the

and

level

hesis

ce, it

come



59

Problem- 5




individuals wait until the last 5 days to file their returns is that on average those individuals havegrater refund than early fillers.




rejection rule?

c. For a sample of 400 individuals who filed a return between April 10 and April 15, the sample

mean refund was $910 and the sample standard deviation was $1600. Compute the value of the

test statistic.

d. What is your conclusion?

e. What is the p-value for the test?

Solution

Denote X - Individuals federal income tax returns prior to March 31. Here n = 400, = $910 and

σ = $1600.

(a) Set up the following hypotheses:

H0: μ ≤ $1056 vs. H1: μ > $1056

(b) We find that n > 30, choose the z-statistic. The critical value of the z-statistic at the 5% level

of significance is found from the z table is 1.645.

Rejection rule: Reject H0 if zcal ≥1.645

(c) Test Statistic zcal = (sqrt(400)(910 - 1056))/1600 = -1.8250



60

(d) Conclusion

Decision: Accept the null hypothesis.

Thus, it is possible to conclude that we are 95% confident that we may accept the null hypothesis

and alternatively reject the alternative hypothesis. More clearly, based on sample evidence, it

may be concluded that the researchers claim is false that means individuals filing federal income

tax returns between April 10 to April 15 had an average refund no greater than $1056.

Problem- 6





changed refund than early fillers.



b. Using 5% level of significance, what is the critical value for the test statistic and what is therejection rule?



test statistic.



Solution

Denote X - Individuals federal income tax returns prior to March 31. Here n=400, =$910 and

σ=$1600.


H0: μ=$1056 vs. H1: μ≠$1056



(b) We fi

significa

Rejectio

(c) Test

(d) Conc

Decision

Thus, it i

and alte

may be ctax retur

Practice

The Edi

expande

46 kilo

indicates

11.9 kil

expend,kilowatt-

Guidelin

X- Num

SD = 11.

nd that n>3

ce is found

rule: Reje

tatistic zcal

lusion

: Accept the

s possible t

natively rej

oncluded ts between

problem

on Electric

by various

att-hours p

that vacuu

watt-hours,

on the aver hours to be

e

er of kilo

9.

0, choose th

from the z

ct H0 if zcal

(sqrt(400)

null hypot

conclude t

ect the alte

at the resea pril 10 to

Institute h

home appli

er year. If

cleaners

does this

age, less thnormal.

att-hours ex

e z-statistic.

able is 1.96

1.960 or zc

(910 - 1056

esis.

at we are 9

native hyp

chers claim pril 15 had

as publishe

ances. It is

a random

xpend an

uggest at t

an 46 kilo

panded for

61

The critical

0.

l≤-1.960

)/1600 = -1

5% confide

thesis. Mo

is false thaan average

d figures o

laimed that

ample of

verage of

e 0.10 lev

att-hours a

homes for

value of th

.8250

t that we

e clearly, b

means indirefund not c

the annu

a vacuum c

2 homes i

2 kilowatt-

el of signif

nually? As

acuum clea

z-statistic

ay accept t

ased on sa

viduals filihanges fro

l number

leaner expe

cluded in

ours per y

cance that

sume that t

ers. Here

t the 5% le

e null hypo

ple eviden

g federal in $1056.

f kilowatt-

ds an avera

a planned

ar with a

vacuum cle

he populati

=42, 4

el of

hesis

ce, it

come

ours

ge of

study

D of

aners

n of

and



62

H0: μ ≥ 46 vs. H1: μ < 46

We find that n>30, choose the z-statistic. The critical value of the z-statistic at the 5% level of

significance is found from the z table is 1.645.

(Solve it, follow problem 4)

Test for population mean for small samples and SD unknown

Problem-7





a lower refund than early fillers.




rejection rule?



test statistic.



Solution

Denote X - Individuals federal income tax returns prior to March 31. Here n = 10, = $910 and σ

= $1600.


H0: μ ≥ $1056 vs. H1: μ < $1056

(b) We find that n≤30, choose the t-statistic. The critical value of the t-statistic at the 5% level

of significance with 9 df is found from the t table is -1.833.

Rejection rule: Reject H0 if tcal ≤-1.833



(c) Test

(d) Conc

Decision

Thus, it i

and alte

may be c

tax retur

Practice

Joan’s

labor coof trees,

2 hours

planting

and 2.5.

exceeds

Guidelin

X- Tree

H0: μ ≤2

We find

significa

(Solve it

Statistic tcal

lusion

: Accept the

s possible t

natively rej

oncluded t

s between

problem

ursery spec

t associateshrubs and

f labor tim

during the

Using the

hours.

e:

lanting tim

vs. H1: μ >

that n≤30,

ce with 14

follow pro

= (sqrt(10)(

null hypot

conclude t

ect the alte

at the resea

pril 10 to

ializes in c

with a parto on to be

e for the pl

past month

.05 level o

e. Here n=1

2

choose the

df is found

lem 7)

10 - 1056))

esis.

at we are 9

native hyp

chers claim

pril 15 had

stom-desig

icular landssed for the

nting of a

follow (ti

f significan

, mean = 2.

-statistic. T

rom the t ta

63

/1600 = -1.

5% confide

thesis. Mo

is false tha

an average

ed landsca

caping prop project. Fo

edium-siz

e in hours):

ce, test to s

4 and SD =

he critical

ble is 1.761.

6

t that we

e clearly, b

means indi

refund is no

ing for res

osal is base cost-estim

tree. Actu

1.9, 1.7, 2.

ee whether

0.52 (Used

alue of the

ay accept t

ased on sa

viduals fili

t lower than

dential are

on the nuting purpos

l times fro

8, 2.4, 2.6,

the mean t

calculator t

t-statistic a

e null hypo

ple eviden

g federal in

$1056.

s. The esti

ber of plaes, manage

a sample

2.5, 2.8, 3.

ee planting

find it)

the 5% le

hesis

ce, it

come

ated

tingss use

of 15

, 1.6

time

el of



64

Tests for standard deviation

(1) H0: σ2 ≤ σ2

0 vs. H1: σ2 >σ2

0

(2) H0: σ2 ≥ σ2

0 vs. H1: σ2 < σ2

0

(3) H0: σ2 = σ2

0 vs. H1: σ2 ≠ σ2

0

Test Statistic:

χ2 = (n-1)s2/σ20, where σ2

0 is the hypothesized value for the population variance.

Problem 8

A Fortune study found that the variance in the number of vehicles owned or leased by

subscribers to Fortune magazine is 0.94. Assume a sample of 12 subscribers to another magazine

provided the following data on the number of vehicles owned or leased: 2, 1, 2, 0, 3, 2, 2, 1, 2, 1,

0 and 1. a. Compute the sample variance in the number of vehicles owned or leased by the 12

subscribers. B. Test the hypothesis H0: σ2 = 0.94 to determine if the variance in the number of

vehicles owned or leased by subscribers of the other magazine differ from σ2 = 0.94 for Fortune.

Using a 0.05 level of significance, what is your conclusion?

Solution

Denote X –The number of vehicles owned or leased by subscribers of Fortune magazine. Here n

= 12, sample variance s2= 0.81.

Set up the following hypotheses

H0: σ2 = 0.94 vs. H1: σ2 ≠ 0.94.

Note that the alternative is two-sided so we should get two rejections regions in both the lower

and the upper tails of the sampling distribution.

Test statistic: χ2-statistic. With H0: σ2 = 0.94, the value of the χ2 statistic is computed as (n-

1)s2/σ2

0 = (11x0.81)/0.94 = 9.478.

The critical values of the χ2 statistic at the 5% level of significance will be χ20.975 and χ2

0.025

respectively. Using 11 degrees of freedom, the critical values are found from the χ2 table are

χ2

0.975 = 3.815 and χ2

0.025 =21.920 respectively.

The rejection rule: Reject H0 if χ2≤ 3.815 or χ2≥ 21.920



Decision

Accept t

Thus, it

hypothes

number

claim fo

Practice

Home m

the sumthe inter

Conside

loans at

H0: σ2

decrease

Guidelin

X- Hom

σ= 0.09

Set up th

H0: σ2 ≥

Test stat

1)s2/σ20

:

e null hypo

is possible

is. More cle

f vehicles

Fortune.

problem

ortgage int

er of 2000,est rates w

a follow

sample of

0.0092 to

d. Using the

e

mortgage i

, populatio

e following

0.0092 vs.

stic: χ2-sta

(19×0. 11

thesis.

to conclu

arly, based

wned or le

rest rates f

data are avas 0.096. T

p study in

20 lending i

see whethe

0.01 level

nterest rates

variance =

hypotheses

1: σ2 < 0.0

istic. With

42)/0.0092

e that we

on sample e

sed by sub

r 30-year f

ilable fromhe corresp

he summer

nstitutions

r the sampl

f significan

for 30-year

σ2 = 0.0092

92.

0: σ2≥ 0.0

26.83

65

are 95% c

vidence, it

cribers of t

xed rate lo

various par nding varia

of 2003.

ad a sampl

e data indi

ce, what is

. Here n = 2

and α = 0.0

092, the val

onfident th

ay be conc

e other ma

ns vary thr

s of the counce in inte

he interest

SD of 0.1

ate that th

our conclus

0, sample S

1

ue of the χ

at we may

luded that t

azine do n

oughout the

ntry suggesrest rates

rates for 30

4. Conduct

variability

ion?

=0.114, po

statistic is

accept the

e variance i

t differ fro

country. D

ed that theould be 0.

-years fixe

a hypothesi

in interest

pulation S

computed

null

n the

the

uring

D of092.

rate

s test

rates

=

s (n-



The criti

degrees

The reje

Decision

{Insert d

Accept t

Thus, ithypothesinterest r

HW: Tex

Summar

Populati

i) H0: μ =

ii) H0: μ

iii) H0: μ

cal values

f freedom,

tion rule: R

:

ecision cur

e null hypo

is possibleis. More clates increas

t, Chapter 11

on Tests of

n Mean ( )

5 vs. H1: μ ≠

5 vs. H1: μ

5 vs. H1: μ

f the χ2 st

he critical

eject H0 if

e}

thesis.

to concluearly, based.

Hypothesis

Test Pop

5

5

> 5

i) H0

ii) H

iii)

tistic at the

alue is foun

2≤ 7.633

e that weon sampl

(One Sampl

One

lation Prop

: P = 0.6 vs.

0:P ≥ 0.6 vs.

0:P ≤ 0.6 vs.

66

1% level

d from the

are 99% c evidence,

)

ample Test

ortion (P) te

1:P ≠0.6

1:P<0.6

H1:P > 0.6

f significa2 table are

onfident thit may be

st Po

i) H

ii)

iii)

ce will be20.990 =7.63

at we mayconcluded

ulation SD (

0: σ =1.5 vs.

0: σ ≥ 1.5 vs

0: σ ≤ 1.5 v

χ2

0.990. Usi

.

accept thehe variabili

) Test

H1: σ ≠1.5

. H1: σ<1.5

. H1: σ >1.5

g 19

nullty in



67

Note: i) Two-sided or two-tailed tests and the other two’s are one-sided or one-tail lower or upper

tests.

Statistic:

Zcal =x

H )x(no

σ

μ−(large

sample test n>30)

or

tcal =x

H

s

)x(no

μ−

(small sample test n≤30.

Distribution:

Standard Normal Z (or t)

and use Z-table (or t-table)

to have Ztab or ttab.

e.g. format of Ztab= Zα for

one-sided test and for two-

sided test Ztab= Zα/2

ttab= t(n-1),α for one-sided test

and ttab= t(n-1),α/2 for two-

sided test.

Statistic:

Zcal:

n

)P1(Pwhere,

P p̂000 HH

p

p

H −=σ

σ

−

Distribution:

Standard Normal Z and use Z-table to

have Ztab.

Format of Ztab

= Zα for one-sided test

and for two-sided test Ztab= Zα/2

Statistic:

2

H

2

x2

0

S)1n(

σ

−=χ , where

2

xS is

the sample variance.

Distribution:

Chi-square and use chi-square

table for χ2tab.

Note that Chi-square table is

very similar to t-table.

For example χ2tab = χ2

(n-1),α for

one-sided test and χ2tab = χ2

(n-

1),α/2 for two sided test



68

Lecture 21

Tests of two populations means, two standard deviations, Applications from real data

See Text, Chapter 11

Summary on Tests of Hypothesis (Two Samples)

Two Samples Tests

Population Means Test Population Proportions test Population SDs Test

i) H0: μ1 = μ2 vs. H1: μ1 ≠ μ2

ii) H0: μ1 ≥ μ2 vs. H1: μ1 < μ2

iii) H0: μ1 ≤ μ2 vs. H1: μ1 > μ2

i) H0: P1 = P2 vs. H1:P1 ≠P2

ii) H0: P1 ≥ P2 vs. H1:P1<P2

iii) H0:P1 ≤ P2 vs. H1:P1 > P2

i) H0: σ1 = σ2 vs. H1: σ1 ≠σ2

ii) H0: σ1 ≥ σ2 vs. H1: σ1<σ2

iii) H0: σ1 ≤ σ2 vs. H1: σ1>σ2

Note: i) Two-sided or two-tailed tests and the other two’s are one-sided or one-tail lower or upper

tests.

Statistic:

Zcal =

21 xx

21 )xx(

−σ

−(large

sample test at least one

sample >30)

or

tcal =

21 xx

21

S

)xx(

−

−

(small samples ≤30).

Statistic:

Zcal:

2

22

1

11 p̂ p̂

p̂ p̂

21

n

)P1(P

n

)P1(P

where, p̂ p̂

21

21

−+

−=σ

σ

−

−

−

Distribution:

Standard Normal Z and use Z-table to

have Ztab.

Format of Ztab= Zα for one –sided test

and for two-sided Ztab= Zα/2

Statistic:

2

2

2

1

S

SF = , where

2

2

2

1 SandS are

the sample variances.

Distribution:

F and use F-table for Ftab.

For example Ftab = Fn,α for one-

sided test and Ftab = Fn,α/2 for

two sided test, where n = n1+ n2-

2.



69

Distribution:

Standard Normal Z (or t)

and use Z-table (or t-table)

to have Ztab or ttab.

e.g. format of Ztab= Zα for

one-sided test and for two-

sided Ztab= Zα/2 and ttab= tn,α

for one-sided and ttab= tn,α/2

for two sided, where n = n1+

n2-2



70

Lectures 22-23Chapter 14_Correlation and Regression Analysis



71



72



73



74



75

Application from real data

Correlation Analysis

1) Scatter Diagram – To guess relationship between two variables2) Correlation coefficient (r xy) will indicate us percent of relation exists between two

variables.

Let’s consider the following problem to understand it very clearly!

Problem

Consider two variables

x (No. of TV commercials): 2,5,1,3,4,1,5,3,4,2

y(Total sales): 50,57,41,54,54,38,63,48,59,46

Find the relationship between two variables and make a summary based on your findings.

Solution:

Denote x - No. of TV commercials and y- Total sales because it is believable that sales dependson No. of commercials

Make a shape of Scatter diagram to see what sorts of relation exist between and x and y.

Summary: We see that there is a positive relation exists between no. of TV commercials and totalsales.

To understand very clearly what percent relation exist between x and y, we will apply the followingformula (known as correlation coefficient) is defined as

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6

Total Sales

No. of TV Commercials



76

, , 0 0

where

, ∑

1 1

∑ )

/

/

Make the following calculation table (details Must see Textbook, pp.115-116) to find r xy

No. of TVCommercials(x)

TotalSales(y) 3 51 3 51

2 50 1 1 1

5 57 4 36 12

1 41 4 100 20

3 54 0 9 04 54 1 9 3

1 38 4 169 26

5 63 4 144 24

3 48 0 9 0

4 59 1 64 8

2 46 1 25 5

Total 30 510 20 566 99

Thus, from the table we get,

1.49, 7.93, , 11

, = 11/(1.49x7.93) =0.9310

Summary

We see that =0.93 means that when no. of TV commercials increases there is a 93% chance that total

sales may be increased.



77

Application from real data

Regression Analysis

Here aims based on random samples data

(1)

Fit a model(2) Predict y and x values

Fitting a model:

Consider the following two variables regression model

Yi = α + βXi + ei, i = 1,2,….,n

Where Y= dependent variable(e.g. total sales)

α =constant

β = regression coefficient or slope

X = dependent variable(e.g. No. of commercials)

e = random error

Here there are two parameters α and β. These two will be estimated based on random samples data.

Using the Ordinary Least Square method, we find that estimated values of α and β

∑ ∑

Estimated model y on x:

yi = +xi , i = 1,2,….,n

Prediction or Forcasting

The predicted model is defined by

y p =

+

x p

Let’s consider the following problem to understand it very clearly!

Problem:

Recall the following two variables

x (No. of TV commercials): 2,5,1,3,4,1,5,3,4,2



78

y(Total sales): 50,57,41,54,54,38,63,48,59,46

(i) Fit a model y on x.(ii) Predict (or forecast) total sales when x=5.

Solution:

Consider the following two variables regression model

Yi = α + βXi + ei, i = 1,2,….,n

where Y= Total sales

α =constant

β = regression coefficient y on x

X = No. of commercialse = random error

Two parameters α and β will be estimated based on random samples data y and x.

Calculation table

No. of TVCommercials(x)

TotalSales(y) 3 51 3 51

2 50 1 1 1

5 57 4 36 12

1 41 4 100 20

3 54 0 9 04 54 1 9 3

1 38 4 169 26

5 63 4 144 24

3 48 0 9 0

4 59 1 64 8

2 46 1 25 5

Total 30 510 20 566 99

(ii) We know that estimated model y on x: yi =

+

xi, where

∑ ∑

We find from the calculation table = 99/20= 4.95



51-(4.95x3) = 36.15

Thus, estimated model y on x becomes: yi= 36.15+4.95xi

Summary

=36.15 means that if there are no commercials (i.e. x=0), then expected sales may be 36.15$.=4.95 means that when no. of TV commercials increases there is a chance that total sales may beincreased.

(ii) We know that the predicted model is: y p = +x p, i = 1,2,….

According to question, we have to predict total sales, when x=5.

Thus y p =36.15+(4.95x5)=60.9$.

So, we can expect when there are 5 commercials in a week, company can expect total sales 60.9$.

HW: Text

Ex: 47-51, pp.122-124Ex: 4-14, 18-21, pp.570-582

/End of Lecture notes/

MAT 211 CourseGuide_Lecture Notes_Summer 2015 (2)

Documents