Top Banner
1. Overview of Statistics & Collection of Data 1. 1 Introduction to statistics – D efination,types,basic terms, levelofdata m easurement. 1.2 M ethods ofC ollection ofD ata – C ensus & Sam pling Methods
59
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: DATA COLLECTION IN RESEARCH

1. Overview of Statistics & Collection of Data

1. 1 Introduction to statistics – Defination, types, basic terms,level of data measurement.

1.2 Methods of Collection of Data – Census & SamplingMethods

Page 2: DATA COLLECTION IN RESEARCH
Page 3: DATA COLLECTION IN RESEARCH

Shaya’a Othman Definition of Statistics

“Statistics is a scientific method of collecting, organizing, presenting, analyzing and interpreting of numerical information, developed from mathematical theory of probability, to assist in making effective and efficient decision.”

Definition by Shaya'a Othman,

Page 4: DATA COLLECTION IN RESEARCH

OVERVIEW OF STATISTICS

Collecting & Publishing Numerical data

Scienctific Method of Collecting, Organizing, Presenting, Analyzing , Interpreting,

numerical information,developed from

mathematical theory of probability, to assist in making effective and

efficient decision.

DEFINATION

DESCRIPTIVE STATISTICS: Methods of Organizing, and Presenting Data in informative way.

INFERENTIAL STATISTICS:Methods of determine something about population base on sample.

Q u alita tive o r a ttrib u te(typ e o f ca r ow n ed )

d isc re te(n u m b er o f ch ild ren )

con tin u ou s(tim e taken fo r an exam )

Q u an tita tive o r n u m erica l

D A TA

Levels of Measurement

Nominal Nominal

OrdinalOrdinal

IntervalInterval

RatioRatio

DATA

TYPESVaribles

Levels

Inferential

Descriptive

Science

co

mm

on

ETHICS

Misleading Data

Use of AverageUse of GraphicUse of Association

Computer Application:Microsoft Excel

SPSS, NVivo (CAQDAS}

COMPUTER

STATISTICS

Page 5: DATA COLLECTION IN RESEARCH

Collection of Data

Primary DataSecondary Data Census [Total Count]

Sample [selected Count]

SAMPLING TECHNIQUES;Systematic samplingStratified Sampling

Multi-dtage SamplingCluster samplingQuota sampling

METHODS OF COLLECTINGInterviews Forms - Direct/Phone

Mailing QuestionnairesComputer -eMail, eFax, etc

Mobile Phone -SMS

MALAYSIAN GOVERNMENT PUBLICATION:Statistics Dept. PM Dept.Econ. Planning Unit, PM DeptResearch Institution -RRI,

PORIM, MARDI,

Private Survey/Research Co.

INTERNATIONALORGANAZIATION:United NationsOIC ,ASEAN, World Bank, Islamic Dev. Bank

Government Publications

International Organization

Private Publication/Data

Total Count of Population

Selected Count of Population

Internet, Website,-CIA Data

SO

UR

SE

TECH

NIQ

UES

METHODS

Internets

COLLECTINGDATA

Page 6: DATA COLLECTION IN RESEARCH

RESEARCH METHODOLOGY

Page 7: DATA COLLECTION IN RESEARCH

WHAT IS HYPOTHESIS ?

Page 8: DATA COLLECTION IN RESEARCH

STEPS ACTIONS DESCRIPTIONS

STEP 1State Null and Alternative hypothesis

Null Hypothesis : Ho = 0Alternative Hypothesis : H1 = 0Note : 1.Two-tailed test if alternative hypothesis does not state direction [ greater or less].2. One-tailed test if alternative state direction.

STEP 2Select Level of Significance

1. .01 level [1% level] - for consumer research 2. .05 level [5% level] – for quality assurance3. .10 level [10% level] – for political pooling

STEP 3Identify the test Statistics

z and t as test statistic , and othersNon-Parametric Test : F and X Chi-square statistic

STEP 4Formulate Decision Rule

Find the critical value of z from Normal Distribution table , or value t from t distribution table where appropriate.

STEP 5Take a sample arrive at decision

Only ONE DECISION is possible in Hypothesis TestingDo not reject Null Hypothesis, or reject Null Hypothesis and Accept Alternative Hypothesis

5-STEPS PROCEDURE FOR TESTING HYPOTHESIS

Page 9: DATA COLLECTION IN RESEARCH

1.Two-tailed test if alternative hypothesis does not state direction [ greater or less].

2. One-tailed test if alternative state direction.

Page 10: DATA COLLECTION IN RESEARCH
Page 11: DATA COLLECTION IN RESEARCH

Possibility Two Type of Errors[Type I and Type II]

Page 12: DATA COLLECTION IN RESEARCH
Page 13: DATA COLLECTION IN RESEARCH

One-Sample Tests of Hypothesis

Two-Samples Tests of Hypothesis

Large sample[ n more than 30]

Small Sample[ n less than 30]

Large Sample [n more than 30 ]

Small Sample[n less than 30]

Two-Tail Test[No direction]

z = x – u σ/√n

Using normal distribution table

t = x - u s/ √n

df = n-1Using t distribution table

z = x₁ - x₂ ______ √[ (σ₁² / n₁ ) +(σ₂²/ n₂)]

t = x₁ - x₂ ______ √[ (s₁² /n₁ ) +(s₂²/ n₂ )]

df = n + n - 2Using t- distribution table

One-Tail Test[With direction : Greater or less than]

- --

- - - -

0 1.65

D o not

re ject

[P robability = .95]

Region of

re jection

[Probability=.05]

C ritica l va lue

STATISTICAL TEST OF HYPOTHESIS

Page 14: DATA COLLECTION IN RESEARCH

Hypothesis – “A supposition or proposed explanation made on the basis of limited evidence as a starting point for further investigation. Oxford Dictionary

Hypothesis – “ A statement or conjecture which is neither true nor false, subjected to be verified “ Shayaa Othman, KUIS

Hypothesis – “A statement about a population parameter developed for the purpose of testing “ Douglas A Lind Statistical Techniques on Business Economics

Hypothesis Testing – “A procedure based on sample evidence and probability theory to determine whether the hypothesis is a reasonable statement. “

Douglas A Lind statistical Techniques on Business Economics

Null Hypothesis – “A statement about a the value of a population parameter.”Douglas A Lind statistical Techniques on Business Economics

Alternative Hypothesis – “A statement that is accepted if the sample data provide sufficient evidence that the null hypothesis is false.”

Douglas A Lind statistical Techniques on Business Economics

Page 15: DATA COLLECTION IN RESEARCH

Describing Data – Measures of Location

Population Mean = Sum of all the values in the Population Number of Values in the Population

Sample Mean = Sum of values in the Sample = Σx Number of Values in the Sample n

Weighted Mean = Σ[wx] Σw

Parameter = A characteristic of Population

Median = The midpoint of values after they have been ordered from the smallest to the highest

Mode = The value of observations that appears most frequently

Page 16: DATA COLLECTION IN RESEARCH

Describing data = Measures of Dispersion

Range = Largest Value – Smaller Value

Mean Deviation = The Arithmetic mean of the absolute values of the deviation from the arithmetic mean

= l X- X l n

where is sigma [sum of]; X = value of each observation; X = arithmetic mean of the values; n is number of

observation ; l l indicates absolute values

Variance = The arithmetic mean of the of the squared deviation from the mean

Standard Deviation = The Square Root of the variance

Location of Percentiles = Lp = (n+1) P 100

M M

Page 17: DATA COLLECTION IN RESEARCH

Characteristics of the Mean

It is calculated by summing the values and dividing by the number of values.

It requires the interval scale. All values are used. It is unique. The sum of the deviations from the mean is 0.

The Arithmetic MeanArithmetic Mean is the most widely used measure of location and shows the central value of the data.

The major characteristics of the mean are: A verag e J oe

3- 17

Page 18: DATA COLLECTION IN RESEARCH

Population Mean

N

X

where µ is the population mean N is the total number of observations. X is a particular value. indicates the operation of adding.

For ungrouped data, the

Population MeanPopulation Mean is the sum of all the population values divided by the total number of population values:

3- 18

Page 19: DATA COLLECTION IN RESEARCH

Example 1

500,484

000,73...000,56

N

X

Find the mean mileage for the cars.

A ParameterParameter is a measurable characteristic of a population.

AHMAD’s family owns four cars. The following is the current mileage on each of the four cars.

56,000

23,000

42,000

73,000

3- 19

Page 20: DATA COLLECTION IN RESEARCH

Sample Mean

n

XX

where n is the total number of values in the sample.

For ungrouped data, the sample mean is the sum of all the sample values divided by the number of sample values:

3- 20

Page 21: DATA COLLECTION IN RESEARCH

Example 2

4.155

77

5

0.15...0.14

n

XX

A statisticstatistic is a measurable characteristic of a sample.

A sample of five executives received the following bonus last year ($000):

14.0, 14.0, 15.0, 15.0, 17.0, 17.0, 16.0, 16.0, 15.015.0

3- 21

Page 22: DATA COLLECTION IN RESEARCH

Example 4

89.0$50

50.44$1515155

)15.1($15)90.0($15)75.0($15)50.0($5

wX

During a one hour period on a hot Saturday afternoon in

Langkawi, Ahmad sold fifty drinks. He sold five drinks for $0.50,; fifteen for $0.75, fifteen for $0.90, and fifteen for $1.10. Compute the weighted mean of

the price of the drinks.

3- 22

Page 23: DATA COLLECTION IN RESEARCH

The Median

There are as many values above the median as below it in the data array.

For an even set of values, the median will be the arithmetic average of the two middle numbers and is

found at the (n+1)/2 ranked observation.

The MedianMedian is the midpoint of the values after they have been ordered from the smallest to the largest.

3- 23

Page 24: DATA COLLECTION IN RESEARCH

The ages for a sample of five INSANIAH students visiting Islamic Artifact Exhibition:

21, 25, 19, 20, 22,18, 27.

Arranging the data in ascending order gives:

18,19, 20, 21, 22, 25, 27

Thus median = 21.

The median (continued)

3- 24

Page 25: DATA COLLECTION IN RESEARCH

Example 5

Arranging the data in ascending

order gives:

73, 76, 80

Thus the median is 76.

The heights of 3 INSANIAH Lecturers, in inches, are: 76, 73, 80.

The median is found at the (n+1)/2 = (3+1)/2 =2th data

point.

3- 25

Page 26: DATA COLLECTION IN RESEARCH

The Mode: Example 6

Example 6Example 6:: The exam scores for ten students are: 81, 93, 84, 75, 68, 87, 81, 75, 81, 87. Because the score of 81 occurs the most often, it is the mode.

Data can have more than one mode. If it has two modes, it is referred to as bimodal, three modes, trimodal, and the like.

The ModeMode is another measure of location and represents the value of the observation that appears most frequently.

3- 26

Page 27: DATA COLLECTION IN RESEARCH

Symmetric distributionSymmetric distribution: A distribution having the same shape on either side of the center

Skewed distributionSkewed distribution: One whose shapes on either side of the center differ; a nonsymmetrical distribution.

Can be positively or negatively skewed, or bimodal

The Relative Positions of the Mean, Median, and Mode

3- 27

Page 28: DATA COLLECTION IN RESEARCH

The Relative Positions of the Mean, Median, and Mode: Symmetric Distribution

Zero skewness Mean

=Median

=Mode

M o d e

M ed ia n

M ea n

3- 28

Page 29: DATA COLLECTION IN RESEARCH

The Relative Positions of the Mean, Median, and Mode: Right Skewed Distribution

Positively skewed: Mean and median are to the right of the mode.

Mean>Median>Mode

M o d e

M ed ia n

M ea n

3- 29

Page 30: DATA COLLECTION IN RESEARCH

Negatively Skewed: Mean and Median are to the left of the Mode.

Mean<Median<Mode

The Relative Positions of the Mean, Median, and Mode: Left Skewed Distribution

M o d eM ea n

M ed ia n

3- 30

Page 31: DATA COLLECTION IN RESEARCH

Geometric Mean

GM X X X Xnn ( )( )( )...( )1 2 3

The geometric mean is used to average percents, indexes, and relatives.

The Geometric MeanGeometric Mean (GM) of a set of n numbers is defined as the nth root of the product of the n numbers. The formula is:

3- 31

Page 32: DATA COLLECTION IN RESEARCH

Example 7

The interest rate on three bonds were 5, 21, and 4 percent.The arithmetic mean is (5+21+4)/3 =10.0.The geometric mean is

49.7)4)(21)(5(3 GM

The GM gives a more conservative profit figure because it is not heavily weighted by the rate of 21percent.

3- 32

Page 33: DATA COLLECTION IN RESEARCH

Geometric Mean continued

1period) of beginningat (Value

period) of endat Value(nGM

Another use of the geometric mean is to determine the percent increase in sales, production or other business or economic series from one time period to another.

Grow th in Sales 1999-2004

0

10

20

30

40

50

1999 2000 2001 2002 2003 2004

Year

Sal

es in

Milli

ons(

$)

3- 33

Page 34: DATA COLLECTION IN RESEARCH

Example 8

0127.1000,755

000,8358 GM

The total number of females enrolled in American colleges increased from 755,000 in 1992 to 835,000 in 2000. That is, the geometric mean rate of increase is 1.27%.

3- 34

Page 35: DATA COLLECTION IN RESEARCH

Describing data = Measures of Dispersion

Range = Largest Value – Smaller Value

Mean Deviation = The Arithmetic mean of the absolute values of t he deviation from the arithmetic mean

= E l X- X’ l n

where E is sigma [sum of]; X = value of each observation; X’ = arithmetic mean of the values; n is number of observation ; l l indicates absolute values

Variance = The arithmetic mean of the of the squared deviation from the mean

Standard Deviation = The Square Root of the variance

Page 36: DATA COLLECTION IN RESEARCH

DispersionDispersion refers to the spread or variability in the data.

Measures of dispersion include the following: rangerange, , mean deviationmean deviation, , variancevariance, and , and standard standard deviationdeviation.

Range Range = Largest value – Smallest value

Measures of Dispersion

0

5

10

15

20

25

30

0 2 4 6 8 10 12

3- 36

Page 37: DATA COLLECTION IN RESEARCH

The following represents the current year’s Return on Equity of the 25 companies in an investor’s portfolio.

-8.1 3.2 5.9 8.1 12.3-5.1 4.1 6.3 9.2 13.3-3.1 4.6 7.9 9.5 14.0-1.4 4.8 7.9 9.7 15.01.2 5.7 8.0 10.3 22.1

Example 9

Highest value: 22.1 Lowest value: -8.1

Range = Highest value – lowest value= 22.1-(-8.1)= 30.2

3- 37

Page 38: DATA COLLECTION IN RESEARCH

Mean Mean DeviationDeviation

The arithmetic mean of the

absolute values of the

deviations from the arithmetic

mean.

The main features of the mean deviation are:

All values are used in the calculation.

It is not unduly influenced by large or small values.

The absolute values are difficult to manipulate.

Mean Deviation

M D = X - X

n

3- 38

Page 39: DATA COLLECTION IN RESEARCH

The weights of a sample of crates containing books for the INSANIAH Library (in pounds ) are:

103, 97, 101, 106, 103Find the mean deviation.

X = 102

The mean deviation is:

4.25

541515

102103...102103

n

XXMD

Example 10

3- 39

Page 40: DATA COLLECTION IN RESEARCH

VarianceVariance:: the arithmetic mean of the squared

deviations from the mean.

Standard deviationStandard deviation: The square root of the variance.

Variance and standard Deviation

3- 40

Page 41: DATA COLLECTION IN RESEARCH

Not influenced by extreme values. The units are awkward, the square of the

original units. All values are used in the calculation.

The major characteristics of the

Population VariancePopulation Variance are:

Population Variance

3- 41

Page 42: DATA COLLECTION IN RESEARCH

Population VariancePopulation Variance formula:

(X - )2

N =

X is the value of an observation in the population

m is the arithmetic mean of the population

N is the number of observations in the population

Population Standard DeviationPopulation Standard Deviation formula:

2Variance and standard deviation

3- 42

Page 43: DATA COLLECTION IN RESEARCH

(-8 .1 -6 .6 2 ) 2 + (-5 .1 -6 .6 2 ) 2 + ... + (2 2 .1 -6 .6 2 ) 2

2 5

= 4 2 .2 2 7

= 6 .4 9 8

In Example 9, the variance and standard deviation are:

(X - )2

N =

Example 9 continued

3- 43

Page 44: DATA COLLECTION IN RESEARCH

Sample variance (sSample variance (s22))

s 2 =(X - X ) 2

n -1

Sample standard deviation (s)Sample standard deviation (s)

2ss Sample variance and standard deviation

3- 44

Page 45: DATA COLLECTION IN RESEARCH

40.75

37

n

XX

30.515

2.2115

4.76...4.77

1

2222

n

XXs

Example 11

The hourly wages earned by a sample of five students are:$7, $5, $11, $8, $6.

Find the sample variance and standard deviation.

30.230.52 ss

3- 45

Page 46: DATA COLLECTION IN RESEARCH
Page 47: DATA COLLECTION IN RESEARCH
Page 48: DATA COLLECTION IN RESEARCH

Cumulative Frequency Polygon

Histogram &Frequency Polygon

Page 49: DATA COLLECTION IN RESEARCH

Example 12

A sample of ten movie in TV tallied the total number of movies showing in all TV channel last week. Compute the mean number of movies showing.

Movies showing

frequency f

class midpoint

X

(f)(X)

1 up to 3 1 2 2

3 up to 5 2 4 8

5 up to 7 3 6 18

7 up to 9 1 8 8

9 up to 11

3 10 30

Total 10 66

6.610

66

n

fXX

3- 49

Page 50: DATA COLLECTION IN RESEARCH

The Median of Grouped Data

)(2 if

CFn

LMedian

where L is the lower limit of the median class, CF is the cumulative frequency preceding the median class, f is the frequency of the median class, and i is the median class interval.

The MedianMedian of a sample of data organized in a frequency distribution is computed by:

3- 50

Page 51: DATA COLLECTION IN RESEARCH

Describing Data – Measures of Location

[For Grouped Data]MEAN

MEDIAN

MODE

Page 52: DATA COLLECTION IN RESEARCH

The Mean of Grouped Data

n

XfX

The MeanMean of a sample of data organized in a frequency

distribution is computed by the following formula:

3- 52

Page 53: DATA COLLECTION IN RESEARCH

Example 12

A sample of ten movie theaters in a large metropolitan area tallied the total number of movies showing last week. Compute the mean number of movies showing.

Movies showing

frequency f

class midpoint

X

(f)(X)

1 up to 3 1 2 2

3 up to 5 2 4 8

5 up to 7 3 6 18

7 up to 9 1 8 8

9 up to 11

3 10 30

Total 10 66

6.610

66

n

XX

3- 53

Page 54: DATA COLLECTION IN RESEARCH

The Median of Grouped Data

)(2 if

CFn

LMedian

where L is the lower limit of the median class, CF is the cumulative frequency preceding the median class, f is the frequency of the median class, and i is the median class interval.

The MedianMedian of a sample of data organized in a frequency distribution is computed by:

3- 54

Page 55: DATA COLLECTION IN RESEARCH

Finding the Median Class

To determine the median class for grouped data

Construct a cumulative frequency distribution.

Divide the total number of data values by 2.

Determine which class will contain this value. For example, if n=50, 50/2 = 25, then determine which class will contain the 25th value.

3- 55

Page 56: DATA COLLECTION IN RESEARCH

Example 12 continued

Movies showing

Frequency Cumulative Frequency

1 up to 3 1 1

3 up to 5 2 3

5 up to 7 3 6

7 up to 9 1 7

9 up to 11 3 10

3- 56

Page 57: DATA COLLECTION IN RESEARCH

Example 12 continued

33.6)2(3

32

10

5)(2

if

CFn

LMedian

From the table, L=5, n=10, f=3, i=2, CF=3

3- 57

Page 58: DATA COLLECTION IN RESEARCH

BUSINESS STATISTICS ; LECTURE NOTE [ShayaaOthman]

Page 59: DATA COLLECTION IN RESEARCH