DATA COLLECTION IN RESEARCH

1. Overview of Statistics & Collection of Data

1. 1 Introduction to statistics – Defination, types, basic terms,level of data measurement.

1.2 Methods of Collection of Data – Census & SamplingMethods

Shaya’a Othman Definition of Statistics

“Statistics is a scientific method of collecting, organizing, presenting, analyzing and interpreting of numerical information, developed from mathematical theory of probability, to assist in making effective and efficient decision.”

Definition by Shaya'a Othman,

OVERVIEW OF STATISTICS

Collecting & Publishing Numerical data

Scienctific Method of Collecting, Organizing, Presenting, Analyzing , Interpreting,

numerical information,developed from

mathematical theory of probability, to assist in making effective and

efficient decision.

DEFINATION

DESCRIPTIVE STATISTICS: Methods of Organizing, and Presenting Data in informative way.

INFERENTIAL STATISTICS:Methods of determine something about population base on sample.

Q u alita tive o r a ttrib u te(typ e o f ca r ow n ed )

d isc re te(n u m b er o f ch ild ren )

con tin u ou s(tim e taken fo r an exam )

Q u an tita tive o r n u m erica l

D A TA

Levels of Measurement

Nominal Nominal

OrdinalOrdinal

IntervalInterval

RatioRatio

DATA

TYPESVaribles

Levels

Inferential

Descriptive

Science

co

mm

on

ETHICS

Misleading Data

Use of AverageUse of GraphicUse of Association

Computer Application:Microsoft Excel

SPSS, NVivo (CAQDAS}

COMPUTER

STATISTICS

Collection of Data

Primary DataSecondary Data Census [Total Count]

Sample [selected Count]

SAMPLING TECHNIQUES;Systematic samplingStratified Sampling

Multi-dtage SamplingCluster samplingQuota sampling

METHODS OF COLLECTINGInterviews Forms - Direct/Phone

Mailing QuestionnairesComputer -eMail, eFax, etc

Mobile Phone -SMS

MALAYSIAN GOVERNMENT PUBLICATION:Statistics Dept. PM Dept.Econ. Planning Unit, PM DeptResearch Institution -RRI,

PORIM, MARDI,

Private Survey/Research Co.

INTERNATIONALORGANAZIATION:United NationsOIC ,ASEAN, World Bank, Islamic Dev. Bank

Government Publications

International Organization

Private Publication/Data

Total Count of Population

Selected Count of Population

Internet, Website,-CIA Data

SO

UR

SE

TECH

NIQ

UES

METHODS

Internets

COLLECTINGDATA

RESEARCH METHODOLOGY

WHAT IS HYPOTHESIS ?

STEPS ACTIONS DESCRIPTIONS

STEP 1State Null and Alternative hypothesis

Null Hypothesis : Ho = 0Alternative Hypothesis : H1 = 0Note : 1.Two-tailed test if alternative hypothesis does not state direction [ greater or less].2. One-tailed test if alternative state direction.

STEP 2Select Level of Significance

1. .01 level [1% level] - for consumer research 2. .05 level [5% level] – for quality assurance3. .10 level [10% level] – for political pooling

STEP 3Identify the test Statistics

z and t as test statistic , and othersNon-Parametric Test : F and X Chi-square statistic

STEP 4Formulate Decision Rule

Find the critical value of z from Normal Distribution table , or value t from t distribution table where appropriate.

STEP 5Take a sample arrive at decision

Only ONE DECISION is possible in Hypothesis TestingDo not reject Null Hypothesis, or reject Null Hypothesis and Accept Alternative Hypothesis

5-STEPS PROCEDURE FOR TESTING HYPOTHESIS

1.Two-tailed test if alternative hypothesis does not state direction [ greater or less].

2. One-tailed test if alternative state direction.

Possibility Two Type of Errors[Type I and Type II]

One-Sample Tests of Hypothesis

Two-Samples Tests of Hypothesis

Large sample[ n more than 30]

Small Sample[ n less than 30]

Large Sample [n more than 30 ]

Small Sample[n less than 30]

Two-Tail Test[No direction]

z = x – u σ/√n

Using normal distribution table

t = x - u s/ √n

df = n-1Using t distribution table

z = x₁ - x₂ ______ √[ (σ₁² / n₁ ) +(σ₂²/ n₂)]

t = x₁ - x₂ ______ √[ (s₁² /n₁ ) +(s₂²/ n₂ )]

df = n + n - 2Using t- distribution table

One-Tail Test[With direction : Greater or less than]

- --

- - - -

0 1.65

D o not

re ject

[P robability = .95]

Region of

re jection

[Probability=.05]

C ritica l va lue

STATISTICAL TEST OF HYPOTHESIS

Hypothesis – “A supposition or proposed explanation made on the basis of limited evidence as a starting point for further investigation. Oxford Dictionary

Hypothesis – “ A statement or conjecture which is neither true nor false, subjected to be verified “ Shayaa Othman, KUIS

Hypothesis – “A statement about a population parameter developed for the purpose of testing “ Douglas A Lind Statistical Techniques on Business Economics

Hypothesis Testing – “A procedure based on sample evidence and probability theory to determine whether the hypothesis is a reasonable statement. “

Douglas A Lind statistical Techniques on Business Economics

Null Hypothesis – “A statement about a the value of a population parameter.”Douglas A Lind statistical Techniques on Business Economics

Alternative Hypothesis – “A statement that is accepted if the sample data provide sufficient evidence that the null hypothesis is false.”

Douglas A Lind statistical Techniques on Business Economics

Describing Data – Measures of Location

Population Mean = Sum of all the values in the Population Number of Values in the Population

Sample Mean = Sum of values in the Sample = Σx Number of Values in the Sample n

Weighted Mean = Σ[wx] Σw

Parameter = A characteristic of Population

Median = The midpoint of values after they have been ordered from the smallest to the highest

Mode = The value of observations that appears most frequently

Describing data = Measures of Dispersion

Range = Largest Value – Smaller Value

Mean Deviation = The Arithmetic mean of the absolute values of the deviation from the arithmetic mean

= l X- X l n

where is sigma [sum of]; X = value of each observation; X = arithmetic mean of the values; n is number of

observation ; l l indicates absolute values

Variance = The arithmetic mean of the of the squared deviation from the mean

Standard Deviation = The Square Root of the variance

Location of Percentiles = Lp = (n+1) P 100

M M

Characteristics of the Mean

It is calculated by summing the values and dividing by the number of values.

It requires the interval scale. All values are used. It is unique. The sum of the deviations from the mean is 0.

The Arithmetic MeanArithmetic Mean is the most widely used measure of location and shows the central value of the data.

The major characteristics of the mean are: A verag e J oe

3- 17

Population Mean

N

X

where µ is the population mean N is the total number of observations. X is a particular value. indicates the operation of adding.

For ungrouped data, the

Population MeanPopulation Mean is the sum of all the population values divided by the total number of population values:

3- 18

Example 1

500,484

000,73...000,56

N

X

Find the mean mileage for the cars.

A ParameterParameter is a measurable characteristic of a population.

AHMAD’s family owns four cars. The following is the current mileage on each of the four cars.

56,000

23,000

42,000

73,000

3- 19

Sample Mean

n

XX

where n is the total number of values in the sample.

For ungrouped data, the sample mean is the sum of all the sample values divided by the number of sample values:

3- 20

Example 2

4.155

77

5

0.15...0.14

n

XX

A statisticstatistic is a measurable characteristic of a sample.

A sample of five executives received the following bonus last year ($000):

14.0, 14.0, 15.0, 15.0, 17.0, 17.0, 16.0, 16.0, 15.015.0

3- 21

Example 4

89.0$50

50.44$1515155

)15.1($15)90.0($15)75.0($15)50.0($5

wX

During a one hour period on a hot Saturday afternoon in

Langkawi, Ahmad sold fifty drinks. He sold five drinks for $0.50,; fifteen for $0.75, fifteen for $0.90, and fifteen for $1.10. Compute the weighted mean of

the price of the drinks.

3- 22

The Median

There are as many values above the median as below it in the data array.

For an even set of values, the median will be the arithmetic average of the two middle numbers and is

found at the (n+1)/2 ranked observation.

The MedianMedian is the midpoint of the values after they have been ordered from the smallest to the largest.

3- 23

The ages for a sample of five INSANIAH students visiting Islamic Artifact Exhibition:

21, 25, 19, 20, 22,18, 27.

Arranging the data in ascending order gives:

18,19, 20, 21, 22, 25, 27

Thus median = 21.

The median (continued)

3- 24

Example 5

Arranging the data in ascending

order gives:

73, 76, 80

Thus the median is 76.

The heights of 3 INSANIAH Lecturers, in inches, are: 76, 73, 80.

The median is found at the (n+1)/2 = (3+1)/2 =2th data

point.

3- 25

The Mode: Example 6

Example 6Example 6:: The exam scores for ten students are: 81, 93, 84, 75, 68, 87, 81, 75, 81, 87. Because the score of 81 occurs the most often, it is the mode.

Data can have more than one mode. If it has two modes, it is referred to as bimodal, three modes, trimodal, and the like.

The ModeMode is another measure of location and represents the value of the observation that appears most frequently.

3- 26

Symmetric distributionSymmetric distribution: A distribution having the same shape on either side of the center

Skewed distributionSkewed distribution: One whose shapes on either side of the center differ; a nonsymmetrical distribution.

Can be positively or negatively skewed, or bimodal

The Relative Positions of the Mean, Median, and Mode

3- 27

The Relative Positions of the Mean, Median, and Mode: Symmetric Distribution

Zero skewness Mean

=Median

=Mode

M o d e

M ed ia n

M ea n

3- 28

The Relative Positions of the Mean, Median, and Mode: Right Skewed Distribution

Positively skewed: Mean and median are to the right of the mode.

Mean>Median>Mode

M o d e

M ed ia n

M ea n

3- 29

Negatively Skewed: Mean and Median are to the left of the Mode.

Mean<Median<Mode

The Relative Positions of the Mean, Median, and Mode: Left Skewed Distribution

M o d eM ea n

M ed ia n

3- 30

Geometric Mean

GM X X X Xnn ( )( )( )...( )1 2 3

The geometric mean is used to average percents, indexes, and relatives.

The Geometric MeanGeometric Mean (GM) of a set of n numbers is defined as the nth root of the product of the n numbers. The formula is:

3- 31

Example 7

The interest rate on three bonds were 5, 21, and 4 percent.The arithmetic mean is (5+21+4)/3 =10.0.The geometric mean is

49.7)4)(21)(5(3 GM

The GM gives a more conservative profit figure because it is not heavily weighted by the rate of 21percent.

3- 32

Geometric Mean continued

1period) of beginningat (Value

period) of endat Value(nGM

Another use of the geometric mean is to determine the percent increase in sales, production or other business or economic series from one time period to another.

Grow th in Sales 1999-2004

0

10

20

30

40

50

1999 2000 2001 2002 2003 2004

Year

Sal

es in

Milli

ons(

$)

3- 33

Example 8

0127.1000,755

000,8358 GM

The total number of females enrolled in American colleges increased from 755,000 in 1992 to 835,000 in 2000. That is, the geometric mean rate of increase is 1.27%.

3- 34

Describing data = Measures of Dispersion

Range = Largest Value – Smaller Value

Mean Deviation = The Arithmetic mean of the absolute values of t he deviation from the arithmetic mean

= E l X- X’ l n

where E is sigma [sum of]; X = value of each observation; X’ = arithmetic mean of the values; n is number of observation ; l l indicates absolute values

Variance = The arithmetic mean of the of the squared deviation from the mean

Standard Deviation = The Square Root of the variance

DispersionDispersion refers to the spread or variability in the data.

Measures of dispersion include the following: rangerange, , mean deviationmean deviation, , variancevariance, and , and standard standard deviationdeviation.

Range Range = Largest value – Smallest value

Measures of Dispersion

0

5

10

15

20

25

30

0 2 4 6 8 10 12

3- 36

The following represents the current year’s Return on Equity of the 25 companies in an investor’s portfolio.

-8.1 3.2 5.9 8.1 12.3-5.1 4.1 6.3 9.2 13.3-3.1 4.6 7.9 9.5 14.0-1.4 4.8 7.9 9.7 15.01.2 5.7 8.0 10.3 22.1

Example 9

Highest value: 22.1 Lowest value: -8.1

Range = Highest value – lowest value= 22.1-(-8.1)= 30.2

3- 37

Mean Mean DeviationDeviation

The arithmetic mean of the

absolute values of the

deviations from the arithmetic

mean.

The main features of the mean deviation are:

All values are used in the calculation.

It is not unduly influenced by large or small values.

The absolute values are difficult to manipulate.

Mean Deviation

M D = X - X

n

3- 38

The weights of a sample of crates containing books for the INSANIAH Library (in pounds ) are:

103, 97, 101, 106, 103Find the mean deviation.

X = 102

The mean deviation is:

4.25

541515

102103...102103

n

XXMD

Example 10

3- 39

VarianceVariance:: the arithmetic mean of the squared

deviations from the mean.

Standard deviationStandard deviation: The square root of the variance.

Variance and standard Deviation

3- 40

Not influenced by extreme values. The units are awkward, the square of the

original units. All values are used in the calculation.

The major characteristics of the

Population VariancePopulation Variance are:

Population Variance

3- 41

Population VariancePopulation Variance formula:

(X - )2

N =

X is the value of an observation in the population

m is the arithmetic mean of the population

N is the number of observations in the population

Population Standard DeviationPopulation Standard Deviation formula:

2Variance and standard deviation

3- 42

(-8 .1 -6 .6 2 ) 2 + (-5 .1 -6 .6 2 ) 2 + ... + (2 2 .1 -6 .6 2 ) 2

2 5

= 4 2 .2 2 7

= 6 .4 9 8

In Example 9, the variance and standard deviation are:

(X - )2

N =

Example 9 continued

3- 43

Sample variance (sSample variance (s22))

s 2 =(X - X ) 2

n -1

Sample standard deviation (s)Sample standard deviation (s)

2ss Sample variance and standard deviation

3- 44

40.75

37

n

XX

30.515

2.2115

4.76...4.77

1

2222

n

XXs

Example 11

The hourly wages earned by a sample of five students are:$7, $5, $11, $8, $6.

Find the sample variance and standard deviation.

30.230.52 ss

3- 45

Cumulative Frequency Polygon

Histogram &Frequency Polygon

Example 12

A sample of ten movie in TV tallied the total number of movies showing in all TV channel last week. Compute the mean number of movies showing.

Movies showing

frequency f

class midpoint

X

(f)(X)

1 up to 3 1 2 2

3 up to 5 2 4 8

5 up to 7 3 6 18

7 up to 9 1 8 8

9 up to 11

3 10 30

Total 10 66

6.610

66

n

fXX

3- 49

The Median of Grouped Data

)(2 if

CFn

LMedian

where L is the lower limit of the median class, CF is the cumulative frequency preceding the median class, f is the frequency of the median class, and i is the median class interval.

The MedianMedian of a sample of data organized in a frequency distribution is computed by:

3- 50

Describing Data – Measures of Location

[For Grouped Data]MEAN

MEDIAN

MODE

The Mean of Grouped Data

n

XfX

The MeanMean of a sample of data organized in a frequency

distribution is computed by the following formula:

3- 52

Example 12

A sample of ten movie theaters in a large metropolitan area tallied the total number of movies showing last week. Compute the mean number of movies showing.

Movies showing

frequency f

class midpoint

X

(f)(X)

1 up to 3 1 2 2

3 up to 5 2 4 8

5 up to 7 3 6 18

7 up to 9 1 8 8

9 up to 11

3 10 30

Total 10 66

6.610

66

n

XX

3- 53

The Median of Grouped Data

)(2 if

CFn

LMedian

where L is the lower limit of the median class, CF is the cumulative frequency preceding the median class, f is the frequency of the median class, and i is the median class interval.

The MedianMedian of a sample of data organized in a frequency distribution is computed by:

3- 54

Finding the Median Class

To determine the median class for grouped data

Construct a cumulative frequency distribution.

Divide the total number of data values by 2.

Determine which class will contain this value. For example, if n=50, 50/2 = 25, then determine which class will contain the 25th value.

3- 55

Example 12 continued

Movies showing

Frequency Cumulative Frequency

1 up to 3 1 1

3 up to 5 2 3

5 up to 7 3 6

7 up to 9 1 7

9 up to 11 3 10

3- 56

Example 12 continued

33.6)2(3

32

10

5)(2

if

CFn

LMedian

From the table, L=5, n=10, f=3, i=2, CF=3

3- 57

BUSINESS STATISTICS ; LECTURE NOTE [ShayaaOthman]

DATA COLLECTION IN RESEARCH

Education

statistical test of

kuis hypothesis

sampletests of hypothesis

sample data

hypothesis testing donot

small sample n

population sample mean

value t