BAHADAR SHAH CHAIRMAN,DEPARTMENT OF MANAGEMENT SCIENCES
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 1/134
BAHADAR SHAHCHAIRMAN,DEPARTMENT OF
MANAGEMENT SCIENCES
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 2/134
SAMPLING: A Scientific
Method of Data Collection
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 3/134
SAMPLE
•It is a Unit that selected from population
•Representers of the population
•Purpose to draw the inference
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 4/134
Very difficult to study each and every unit of thepopulation when population unit are heterogeneous
Time Constraints
Finance
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 6/134
The population having significant variations Heterogeneous), observation
of multiple individual needed to find all possible characteristics that may
exist
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 7/134
Population
The entire group of people of interest from whom the
researcher needs to obtain information
Element (sampling unit)
One unit from a population
Sampling
The selection of a subset of the population through varioussampling techniques
Sampling Frame
Listing of population from which a sample is chosen. Thesampling frame for any probability sample is a complete list of
all the cases in the population from which your sample will be
drown
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 9/134
Population Vs. Sample
Population of Interest
Sample
Population Sample
Parameter Statistic
We measure the sample using statistics in order to draw inferences about the
population and its parameters.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 11/134
Representative
Accessible
Low cost
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 12/134
SAMPLING
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 13/134
Population
SampleSampling
Frame
Sampling Process
What you
want to talk
about
What you
actually
observe inthe data
Inference
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 15/134
Define the population
Identify the sampling frameSelect a sampling design or procedure
Determine the sample size
Draw the sample
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 17/134
Classification of Sampling Methods
Sampling
Methods
Probability
Samples
Simple
Randomluster
Systematic Stratified
Non-
probability
Quotaudgment
Convenience Snowball
Multistage
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 18/134
Each and every unit of the population has the
equal chance for selection as a sampling unit
Also called formal sampling or random sampling
Probability samples are more accurate
Probability samples allow us to estimate the
accuracy of the sample
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 19/134
Simple Random Sampling
Stratified Sampling
Cluster Sampling
Systematic Sampling
Multistage Sampling
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 20/134
Simple Random Sampling
The purest form of probability sampling
Assures each element in the population has an
equal chance of being included in the sample
Random number generators
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 21/134
Simple random sampling
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 22/134
Types of Simple Random Sample
With replacement
Without replacement
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 23/134
With replacement
The unit once selected has the chance for again
selection
Without replacement
The unit once selected can not be selected
again
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 24/134
Tippet method
Lottery Method
Random Table
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 25/134
6 8 4 2 5 7 9 5 4 1 2 5 6 3 2 1 4 05 8 2 0 3 2 1 5 4 7 8 5 9 6 2 0 2 4
3 6 2 3 3 3 2 5 4 7 8 9 1 2 0 3 2 5
9 8 5 2 6 3 0 1 7 4 2 4 5 0 3 6 8 6
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 27/134
Disadvantage
High cost; low frequency of use
Requires sampling frame
oes not use researchers’ expertise
Larger risk of random error than stratified
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 28/134
Population is divided into two or more groups
called strata, according to some criterion, such asgeographic location, grade level, age, or income,
and subsamples are randomly selected from eachstrata.
Elements within each strata are homogeneous, but
are heterogeneous ross strata
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 29/134
Stratified Random Sampling
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 30/134
Types of Stratified Random Sampling
Proportionate Stratified Random Sampling
Equal proportion of sample unit are selected from eachstrata
Disproportionate Stratified Random Sampling
Also called as equal allocation technique and sample unit
decided according to analytical consideration
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 31/134
Advantage
Assures representation of all groups in
sample population needed
Characteristics of each stratum can be
estimated and comparisons made
Reduces variability from systematic
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 32/134
Disadvantage
Requires accurate information on proportionsof each stratum
Stratified lists costly to prepare
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 33/134
Cluster Sampling
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 35/134
Section 4
Section 5
Section 3
Section 2ection 1
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 36/134
Advantage
Low cost/high frequency of use
Requires list of all clusters, but only of individuals within chosen
clusters
Can estimate characteristics of both cluster and population
For multistage, has strengths of used methods
Researchers lack a good sampling frame for a dispersed population
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 37/134
Disadvantage
The cost to reach an element to sample is very high
Usually less expensive than SRS but not as accurate
Each stage in cluster sampling introduces sampling error—the more stages there are, the more error there tends to be
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 38/134
Systematic Random Sampling
Order all units in the sampling frame based on some
variable and then every nth number on the list isselected
Gaps between elements are equal and Constant There is periodicity.
N= Sampling Interval
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 41/134
Advantage
Moderate cost; moderate usage
External validity high; internal validity high;
statistical estimation of error
Simple to draw sample; easy to verify
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 42/134
Periodic ordering
Requires sampling frame
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 43/134
Multistage Random Sampling
Pr im a ry S e c o n d a r y
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 44/134
1
2
3
4
5
6
7
8
9
1 0
C lu s t e rs
1
2
3
4
5
6
7
8
9
1 0
1 1
1 2
1 3
1 4
1 5
Cl u s te r s S i m p l e R a n d o m S a m p l i n g w i th i n S e c o n d a r y
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 45/134
Select all schools; thens mple
within schools
Sample schools; then measure ll students
Sample schools; then s mple students
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 46/134
Non Probability Sampling
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 47/134
Involves non random methods in selection of sample
All have not equal chance of being selected
Selection depend upon situation
Considerably less expensive
Convenient
Sample chosen in many ways
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 48/134
Purposive Sampling
Quota sampling larger populations)
Snowball sampling
Self-selection sampling
Convenience sampling
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 49/134
Purposive Sampling
Also called judgment Sampling
The sampling procedure in which an experienced research
selects the sample based on some appropriatecharacteristic of sample members… to serve a purpose
When taking sample reject, people who do not fit for a
particular profile
Start with a purpose in mind
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 51/134
Demerit
Bias selection of sample may occur
Time consuming process
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 52/134
Quota Sampling
The population is divided into cells on the basis of
relevant control characteristics.
A quota of sample units is established for each cell
A convenience sample is drawn for each cell until the
quota is met
It is entirely non random and it is normally used for
interview surveys
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 53/134
Used when research budget limited
Very extensively used/understood
No need for list of population elements
Introduces some elements of stratification
Demerit
Variability and bias cannot be measured orcontrolled
Time Consuming
Projecting data beyond sample not justified
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 54/134
The research starts with a key person and introduce the
next one to become a chain
Make contact with one or two cases in the population
Ask these cases to identify further cases.
Stop when either no new cases are given or the sample is
as large as manageable
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 55/134
Demerit
low cost
Useful in specific circumstances
Useful for locating rare populations
Bias because sampling units not independent
Projecting data beyond sample not justified
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 56/134
It occurs when you allow each case usually individuals, to
identify their desire to take part in the research you
therefore
Publicize your need for cases, either by advertising through
appropriate media or by asking them to take part
Collect data from those who respond
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 57/134
Demerit
More accurate
Useful in specific circumstances to serve the purpose
More costly due to Advertizing
Mass are left
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 58/134
Called as Accidental / Incidental Sampling
Selecting haphazardly those cases that are easiest
to obtain
Sample most available are chosen
It is done at the “convenience” of the researcher
Convenience Sampling
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 59/134
Merit
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 60/134
Merit
Very low cost
Extensively used/understood
No need for list of population elements
Demerit Variability and bias cannot be measured or
controlled
Projecting data beyond sample not justified
Restriction of Generalization
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 61/134
The ever increasing demand for research has created a need foran efficient method of determining the sample size needed to berepresentative of a given population. In the article “Small SampleTechniques,” the research division of the National EducationAssociation has published a formula for determining samplesize. Regrettably a table has not bee available for ready, easyreference which could have been constructed using the following
formula. s = X²NP(1− P) ÷ d² (N −1) + X²P(1− P). s = required sample size. X² = the table value of chi-square for 1 degree of freedom at the
desired confidence level (3.841). N = the population size.
P = the population proportion (assumed to be .50 since thiswould provide the maximum sample size). d = the degree of accuracy expressed as a proportion (.05).
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 63/134
N S N S N S
10 10 220 140 1200 291
15 14 230 144 1300 297
20 19 240 148 1400 302
25 24 250 152 1500 306
30 28 260 155 1600 310
35 32 270 159 1700 313
40 36 280 162 1800 317
45 40 290 165 1900 320
50 44 300 169 2000 322
55 48 320 175 2200 327
60 52 340 181 2400 331
65 56 360 186 2600 335
70 59 380 191 2800 338
75 63 400 196 3000 341
80 66 420 201 3500 346
85 70 440 205 4000 351
90 73 60 210 4500 354
95 76 480 214 5000 357
100 80 500 217 6000 361
110 86 550 226 7000 364
120 92 600 234 8000 367
130 97 650 242 9000 368
140 103 700 248 10000 370
150 108 750 254 15000 375
160 113 800 260 20000 377
170 118 850 265 30000 379
180 123 900 269 40000 380
190 127 950 274 50000 381
200 132 1000 278 75000 382 210 136 1100 285 1000000 384
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 65/134
In addition to the purpose of the study andpopulation size, three criteria usually willneed to be specified to determine theappropriate sample size: the level of
precision, the level of confidence or risk, andthe degree of variability in the attributesbeing measured (Miaoulis and Michener,1976). Each of these is reviewed below.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 66/134
The level of precision, sometimes calledsampling error, is the range in which the truevalue of the population is estimated to be. Thisrange is often expressed in percentage points(e.g., ±5 percent) in the same way that results for
political campaign polls are reported by themedia. Thus, if a researcher finds that 60% offarmers in the sample have adopted arecommended practice with a precision rate of±5%, then he or she can conclude that between55% and 65% of farmers in the population haveadopted the practice.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 67/134
The confidence or risk level is based on ideas encompassed under theCentral Limit Theorem. The key idea encompassed in the Central LimitTheorem is that when a population is repeatedly sampled, the averagevalue of the attribute obtained by those samples is equal to the truepopulation value. Furthermore, the values obtained by these samples aredistributed normally about the true value, with some samples having ahigher value and some obtaining a lower score than the true populationvalue. In a normal distribution, approximately 95% of the sample values
are within two standard deviations of the true population value (e.g.,mean). In other words, this means that if a 95% confidence level is selected, 95
out of 100 samples will have the true population value within the rangeof precision specified earlier (Figure 1). There is always a chance that thesample you obtain does not represent the true population value. Suchsamples with extreme values are represented by the shaded areas inFigure 1. This risk is reduced for 99% confidence levels and increased for
90% (or lower) confidence levels.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 68/134
The third criterion, the degree of variability in the attributesbeing measured, refers to the distribution of attributes inthe population. The more heterogeneous a population, thelarger the sample size required to obtain a given level ofprecision. The less variable (more homogeneous) apopulation, the smaller the sample size. Note that aproportion of 50% indicates a greater level of variabilitythan either 20% or 80%. This is because 20% and 80%indicate that a large majority do not or do, respectively,have the attribute of interest. Because a proportion of .5indicates the maximum variability in a population, it isoften used in determining a more conservative samplesize, that is, the sample size may be larger than if the true
variability of the population attribute were used.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 69/134
There are several approaches to determiningthe sample size. These include using a censusfor small populations, imitating a sample sizeof similar studies, using published tables, and
applying formulas to calculate a sample size.
Each strategy is discussed below.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 70/134
1. Using a Census for Small Populations One approach is to use the entire population as the sample. Although
cost considerations make this impossible for large populations, a censusis attractive for small populations (e.g., 200 or less). A census eliminatessampling error and provides data on all the individuals in the population.In addition, some costs such as questionnaire design and developing thesampling frame are “fixed,” that is, they will be the same for samples of50 or 200. Finally, virtually the entire population would have to be
sampled in small populations to achieve a desirable level of precision. 2. Using a Sample Size of a Similar Study
Another approach is to use the same sample size as those of studiessimilar to the one you plan. Without reviewing the procedures employedin these studies you may run the risk of repeating errors that were madein determining the sample size for another study. However, a review ofthe literature in your discipline can provide guidance about “typical”
sample sizes that are used.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 71/134
3. Using Published Tables A third way to determine sample size is to rely on
published tables, which provide the sample size for agiven set of criteria. Table 1 and Table 2 presentsample sizes that would be necessary for givencombinations of precision, confidence levels, and
variability. Please note two things. First, these samplesizes reflect the number of obtained responses andnot necessarily the number of surveys mailed orinterviews planned (this number is often increased tocompensate for nonresponse). Second, the samplesizes in Table 2 presume that the attributes beingmeasured are distributed normally or nearly so. If thisassumption cannot be met, then the entirepopulation may need to be surveyed.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 73/134
Size of Population Sample Size (n) for Precision (e) of:
±5% ±7% ±10%100 81 67 51125 96 78 56150 110 86 61175 122 94 64200 134 101 67225 144 107 70
250 154 112 72275 163 117 74300 172 121 76325 180 125 77350 187 129 78375 194 132 80400 201 135 81425 207 138 82450 212 140 82
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 74/134
4. Using Formulas to Calculate a Sample Size Although tables can provide a useful guide
for determining the sample size, you mayneed to calculate the necessary sample size
for a different combination of levels ofprecision, confidence, and variability. Thefourth approach to determining sample sizeis the application of one of several formulas
(Equation 5 was used to calculate the samplesizes in Table 1 and Table 2 ).
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 75/134
A. Formula For Calcul ating A Sample For Proportions For populations that are large, Cochran (1963:75) developed the Equation 1 to
yield a epresentative sample for proportions. Z²pq
n₀ = ---------- e² Which is valid where n0 is the sample size, Z2 is the abscissa of the normal curve
that cuts off an area α at the tails (1 – α equals the desired confidence level, e.g.,
95%)1, e is the desired level of precision, p is the estimated proportion of anattribute that is present in the population, and q is 1-p. The value for Z is found instatistical tables which contain the area under the normal curve.
To illustrate, suppose we wish to evaluate a state-wide Extension program inwhich farmers were encouraged to adopt a new practice. Assume there is a largepopulation but that we do not know the variability in the proportion that will adoptthe practice; therefore, assume p=.5 (maximum variability). Furthermore, supposewe desire a 95% confidence level and ±5% precision. The resulting sample size isdemonstrated in Equation 2.
Z²pq (1.96)² (.5) (.5) n₀ = ---------- = ------------------------- = 385
farmers e² (.5)²
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 76/134
B. Finite Population Correct ion For Proportions If the population is small then the sample size can be reduced slightly. This is because a given
sample size provides proportionately more information for a small population than for a largepopulation. The sample size (n0)
can be adjusted using Equation 3.
n₀
n = --------------------
n₀-1
1+ -------------N
Where n is the sample size and N is the population size.
Suppose our evaluation of farmers‟ adoption of the new practice only affected 2,000 farmers.The sample size that would now be necessary is shown in Equation 4.
n₀ 385
n = ---------------- = ------------------ = 323 farmers
n₀-1 385-1
1+ ------------- 1 + -------------N 2000
As you can see, this adjustment (called the finite population correction) can substantiallyreduce the necessary sample size for small populations.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 77/134
A Simplified Formula For Proportions Yamane (1967:886) provides a simplified formula tocalculate sample sizes. This formula was used to calculatethe sample sizes in Tables 2 and 3 and is shown below. A95% confidence level and P = .5 are assumed for Equation5.
Nn= --------------1+ (e)²
Where n is the sample size, N is the population size, and eis the level of precision. When this formula is applied tothe above sample, we get Equation 6.
N 2000n= ----------- = -------------- = 333
1+ (e)² 1+2000 (.5)²
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 78/134
Formula For Sample Size For The Mean
The use of tables and formulas to determine sample size in the above discussion employedproportions that assume a dichotomous response for the attributes being measured. There aretwo methods to determine sample size for variables that are polytomous or continuous. Onemethod is to combine responses into two categories and then use a sample size based onproportion (Smith, 1983). The second method is to use the formula for the sample size for themean. The formula of the sample size for the mean is similar to that of the proportion, exceptfor the measure of variability. The formula for the mean employs σ2 instead of (p x q), asshown in Equation 7.
Z²ɒ²
n₀= ---------e²
Where n0 is the sample size, z is the abscissa of the normal curve that cuts off an area σ at thetails, e is the desired level of precision (in the same unit of measure as the variance), and σ2 isthe variance of an attribute in the population. The disadvantage of the sample size based onthe mean is that a “good” estimate of the population variance is necessary.
Often, an estimate is not available. Furthermore, the sample size can vary widely from oneattribute to another because each is likely to have a different variance. Because of theseproblems, the sample size for the proportion is frequently preferred
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 80/134
In addition, an adjustment in the sample size may be needed toaccommodate a comparative analysis of subgroups (e.g., such as anevaluation of program participants with nonparticipants). Sudman (1976)suggests that a minimum of 100 elements is needed for each majorgroup or subgroup in the sample and for each minor subgroup, asample of 20 to 50 elements is necessary. Similarly, Kish (1965) saysthat 30 to 200 elements are sufficient when the attribute is present 20to 80 percent of the time (i.e., the distribution approaches normality).
On the other hand, skewed distributions can result in serious departuresfrom normality even for moderate size samples (Kish, 1965:17). Then alarger sample or a census is required.
Finally, the sample size formulas provide the number of responses thatneed to be obtained. Many researchers commonly add 10% to thesample size to compensate for persons that the researcher is unable tocontact. The sample size also is often increased by 30% to compensatefor nonresponse. Thus, the number of mailed surveys or planned
interviews can be substantially larger than the number required for adesired level of confidence and precision.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 81/134
A hypothesis is a kind of truth claim about some aspect of theworld: for instance, the attitudes of patients or the prevalence of
a disease in a population. Research sets out to try to prove thistruth claim (or, more properly, to reject the null hypothesis - atruth claim phrased as a negative). For example, let us thinkabout the following hypothesis:
Levels of Efficiency are affected by Satisfaction and the relatednull hypothesis:
Levels of Efficiency are not affected by Satisfaction Let us imagine that we have this as our research hypothesis, and
we are planning research to test it. We will undertake a trial,comparing groups of employees who are working in differentorganization, to assess the extent of efficiency in thesedifferent groupings. Obviously the findings of a study -- whileinteresting in themselves -- only have value if they can be
generalised, to discover something about the topic which can beapplied in other organizations. If we find an association, then wewill want to do something to increase efficiency (by increasingsatisfaction). So our study has to have external validity, that is,the capacity to be generalised beyond the subjects actually in thestudy
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 83/134
Sample Errors
Non Sample Errors
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 84/134
Error caused by the act of taking a sample
They cause sample results to be different from the results of
census
Differences between the sample and the population that exist
only because of the observations that happened to be selected for
the sample
Statistical Errors are sample error
We have no control over
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 85/134
Non Response Error
Response Error
Not Control by Sample Size
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 88/134
respondent gives an incorrect answer, e.g. due to prestige or competence
implications, or due to sensitivity or social undesirability of question
respondent misunderstands the requirements
lack of motivation to give an accurate answer
“lazy” respondent gives an“average” answer
question requires memory/recall
proxy respondents are used, i.e. taking answers from someone other than
the respondent
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 90/134
The question is unclear, ambiguous or difficult to answer
The list of possible answers suggested in the recording instrumentis incomplete
Requested information assumes a framework unfamiliar to therespondent
The definitions used by the survey are different from those used by
the respondent e.g. how many part-time employees do you have?See next slide for an example)
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 91/134
Non-sampling errors are inevitable in production of national
statistics. Important that:-
At planning stage, all potential non-sampling errors are listed and stepstaken to minimise them are considered.
If data are collected from other sources, question procedures adoptedfor data collection, and data verification at each step of the data chain.
Critically view the data collected and attempt to resolve queries
immediately they arise. Document sources of non-sampling errors so that results presented can
be interpreted meaningfully.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 92/134
What any researcher wants is to be right! They want to discoverthat there is an association between two variables: say, asthmaand traffic pollution, but only if such an association really exists.If there is no such association, then they want their study tosupport the null hypothesis that the two are not related. (Whilethe former may be more exciting, both are important findings).
What no researcher wants is to be wrong! No-one wants to find
an association which does not really exist, or - just asimportantly - not find an association which does exist. Both such situations can arise in any piece of research. The first
(finding an association which is not really there) is called a Type Ierror. It is the error of falsely rejecting a true null hypothesis.(Think through this carefully. What we are talking about herecould also be called a false positive. An example would be a
study which rejects the null hypothesis that there is noassociation between ill-health and deprivation. The findingssuggest such an association, but in reality, no such relationshipexists.)
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 93/134
The measurement of such generalisability of a study is done by
statistical tests of inference. You may be familiar with some such tests:tests such as the chi-squared test, the t-test, and tests of correlation.We will not look at these tests in any detail, but we need to understandthat the purpose of these and other tests of statistical inference is toassess the extent to which the findings of a study can be accepted asvalid for the population from which the study sample has been drawn. Ifthe statistics we use suggest that the findings are 'true', then we can behappy to conclude (within certain limits of probability), that the study'sfindings can be generalised, and we can act on them (to improvenutrition among children under five years, for instance).
From common sense, we see that the larger the sample is, the easier it isto be satisfied that it is representative of the population from which it isdrawn: but how large does it need to be? This is the question that weneed to answer, and to do so, we need to think a little more about thepossibilities that our findings may not reflect reality: that we have
committed an error in our conclusions.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 95/134
For any piece of research that tries to make inferences from asample to a population
there are four possible outcomes: two are desirable, tworender the research worthless.
Figure 1 shows these four possible outcomes
diagrammatically.
Null Hypothesisis
POPULATION
False True
False Cell 1
CurrentResult
Cell 2
Type 1 Error(Alpha)
True Cell 3Type IIerror(beta)
Cell 4Current Result
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 97/134
Cell 3. This cell similarly reflects an undesirable outcome of a study.
Here, as in Cell 4, a study supports the null hypothesis, implying thatthere is no association between ill health and deprivation in thepopulation under investigation. But in reality, the null hypothesis is falseand there is an association in the real world which the study does notfind. This mistake is the Type II error of accepting a false nullhypothesis. and is the result of having a sample size which is too smallto allow detection of the association by statistical tests at an acceptablelevel of significance (say p = 0.05). The likelihood of committing a TypeII error is the beta (β) value of a statistical test, and the value (1 - β ) isthe statistical power of the test. Thus the statistical power of a test isthe likelihood of avoiding a Type II error i.e. the probability that the testwill reject the null hypothesis when the null hypothesis is false.Conventionally, a value of 0.80 or 80% is the target value for statisticalpower, representing a likelihood that four times out of five a study willreject a false null hypothesis, although values greater than 80% e.g. 90%
are also sometimes used. Outcomes of studies which fall into cell 3 areincorrect; β or its complement (1-β) are the measures of the likelihoodof such an outcome of a study.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 99/134
Not all quantitative studies involve hypothesis-testing,some studies merely seek to describe the phenomenaunder examination. Whereas hypothesis testing willinvolve comparing the characteristics of two or moregroups, a descriptive survey may be concerned solely withdescribing the characteristics of a single group. The aim ofthis type of survey is often to obtain an accurate estimate
of a particular figure, such as a mean or a proportion. Forexample, we may want to know how many times, in anaverage week, that a general practitioner sees patientsnewly presenting with asthma. In addition we may alsowant to know what proportion of these patients admit tosmoking five or more cigarettes a day. In these
circumstances, the aim is not to compare this figure withanother group, but rather, to accurately reflect the realfigure in the wider population.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 101/134
2. The degree of precision which we can accept. This is often
presented in the form of a confidence interval. For example, asurvey of a sample of patients indicates that 35 per cent smoke.Are we willing to accept that the figure for the wider populationlies between 25 and 45 per cent, (allowing a margin for randomerror (MRE) of 10% either way), or do we want to be more precise,such that the confidence interval is three per cent each way, andthe true figure falls between 32 and 38 per cent? As we can seefrom the following table, the smaller the allowed margin forrandom error, the larger the sample must be.
Margin for random error Sample size+ or – 10% 88+ or – 5% 350
+ or – 3% 971+ or – 2% 2188+ or – 1% 8750
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 102/134
How large must a sample be to estimate the mean value of the population?
Suppose we wish to measure the number of times that the average patientwith asthma consults her/his general practitioner for treatment?
a) First, the SE (standard error) is calculated by deciding upon the accuracylevel which you require. If, for instance, you wish your survey to produce avery accurate answer with only a small confidence interval, then you mightdecide that you want to be 95% confident that the mean average figureproduced by your survey is no more than plus or minus two visits to the GP.
For example, if you thought that your survey might produce a mean estimate
of 12.5 visits per year, then your confidence interval in this case would be12.5 ± two visits. Your confidence interval would then tell you that you couldreasonably (more detail on what „reasonably‟ means below!) expect the trueaverage rate of visits in the population to be somewhere between 10.5 and14.5 visits per year.
Now decide on your required significance level. If you decide on 95%,
(meaning that 19 times out of 20 the true population mean falls within theconfidence limit of 10.5 and 14.5 visits), the standard error is calculated bydividing the MRE by 1.96. So, in this case, the standard error is 2 divided by1.96 = 1.02.
If you want a 95 confidence interval, then divide the maximum acceptableMRE (margin for random error) by 1.96 to calculate the SE.
If instead you want a 99 confidence interval, then divide the maximum
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 103/134
b) The formula to calculate the sample size for a mean (or point) estimate is:
N = SE 2
SD where N = the required sample size, SD = the stand ard deviation, and SE = the standard error of the mean The standard deviation could be estimated either by looking at some previous
study or by carrying out a pilot study. Suppose that previous data showed that thestandard deviation of the number of visits made to a GP in a year was 10, then wewould input this into the formula as follows:N = SE ² = 10 ² = ( 9.8) ² = 96.12 = 97 (rounded to nearest patient)
SD 1.02 If we are to be 95% confident that the answer achieved is correct ± two visits, then
the required sample is 97 - before making allowance for a proportion of thepeople leaving the study early and failing to provide outcome data.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 105/134
N = P(100% -P
(SE)² With P = 70% and SE=2.55, we have: N= 70% (100% 70%) = 2100 = 323.28 = 324 (rounded upwards) 2.55% 6.50 So, in order to be 95% confident that the true proportion of people saying they are
satisfied lies within ± 5% of the answer, we will require a sample size of 324. Thisassumes that the likely answer is around 70% with a range between 65% and 75%.
Of course, in real life, we often have absolutely no idea what the likely proportionis going to be. There may be no previous data and no time to carry out a pilot. Inthese circumstances, it is safer to assume the worst case scenario and assume thatthe proportion is likely to be 50%. Other things being equal, this will allow for thelargest possible sample size - and in most circumstances it is preferable to have aslight overestimate of the number of people needed, rather than anunderestimate.
(If we wished to use a 99% level of significance, so we might be 99% confident thatour confidence parameters include the true figure, then we need to divide theconfidence interval by 2.56. In this case, the standard error would be 5/2.56 =1.94. Using the formula above, we find that this would require a sample size of558.)
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 106/134
As we saw earlier in this pack, studies which testhypotheses (seeking to generalise from a studyto a population), need sufficient power tominimise the likelihood of Type I and Type IIerrors. Both statistical significance and statistical
power are affected by sample size. The chancesof gaining a statistically significant result will beincreased by enlarging a study's sample. Putanother way, the statistical power of a study isenhanced as sample size increases. Let us look at
each of these aspects of inferential research inturn.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 107/134
When a researcher uses a statistical test, what they are doing is testing theirresults against a gold standard. If the test gives a positive result (this is usuallyknown as 'achieving statistical significance'), then they can be relatively satisfied
that their results are 'true', and that the real world situation is that discovered inthe study (Cell 1 in Fig 1). If the test does not give significant results (non-significant or NS), then they can be reasonably satisfied that the results reflect Cell4, where they have found no association and no such association exists.
However, we can never be absolutely certain that we have a result which falls inCells 1 or 4. Statistical significance represents the likelihood of committing a TypeI error (Cell 2).Let us imagine that we have results suggesting an associationbetween ill-health and deprivation, and a t-test (a test to compare the results oftwo different groups) gives a value which indicates that at the 5% or 0.05 level of
statistical significance, there is more ill-health among a group of high scorers onthe Jarman Index of deprivation than among a group of low scorers.
What this means is that 95 per cent of the time, we can be certain that this resultreflects a true effect (Cell 1). Five per cent of the time, it is a chance result,resulting from random associations in the sample we chose. If the t-test value ishigher, we might reach 1% or 0.01 significance. Now, the result will only be achance association one per cent of the time .
Tests of statistical significance are designed to account for sample size, thus the
larger a sample; the 'easier' it is for results to reach significance. A study whichcompares two groups of 10 patients will have to demonstrate a much greaterdifference between the groups than a study with 1000 patients in each group. Thisis fair: the larger study is much more likely to be 'representative' of a populationthan the smaller one. To summarize: statistical significance is a measure of thelikelihood that positive results reflect a real effect, and that the findings can
be used to make conclusions about differences which really exist.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 108/134
Because of the way statistical tests are designed, as we have just seen, they build
in a safety margin to avoid generalising false positive results which could havedisastrous or expensive consequences. But researchers who use small samplesalso run the risk of not being able to demonstrate differences or associationswhich really do exist. Thus they are in danger of committing a Type II error (Cell 3in Fig 1), of accepting a false null hypothesis. Such studies are „under-powered‟,not possessing sufficient statistical power to detect the effects they set out todetect. Conventionally, the target is a power of 80% or 0.8, meaning that a studyhas an 80 per cent likelihood of detecting a difference or association which reallyexists.
Examination of research undertaken in various fields of study suggests that manystudies do not meet this 0.8 conventional target for power (Fox and Mathers1997).
What this means is that many studies have a much reduced likelihood of beingable to discern the effects which they set out to seek: a study, with a power of0.66 for some specified treatment effect, will only detect that effect (if true) twotimes out of three. A non-significant finding of a study may thus simply reflect theinadequate power of the study to detect differences or associations at levels which
are conventionally accepted as statistically significant.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 109/134
When a study has only small (say less than 50%) power to detect
a useful result, one must ask the simple question of suchresearch: „Why did you bother when your study had little chanceof finding what you set out to find?‟
Sample size calculations need to be undertaken prior to a studyto avoid both the
wasteful consequences of under-powering, (or of overpowering
in which sample sizes are excessively large, with higher thannecessary study costs and, perhaps, the needless involvement oftoo many patients, which has ethical implications.).
Statistical power calculations are also sometimes undertakenafter a study has been completed, to assess the likelihood of astudy having discovered effects.
Statistical power is a function of three variables: sample size, the
chosen level of statistical significance (a) and effect size. While calculation of
power entails recourse to tables of values for these variables, thecalculation is relatively straightforward in most cases.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 110/134
As was mentioned earlier, there is a trade-off between significance and power, becauseas one tries to reduce the chances of generating false negative results, the likelihood ofa false positive result increases. Researchers need to decide which is more crucial, and
set the significance level accordingly. In Exercise 3 you were asked to decide, in varioussituations, whether a Type I or Type II error was more serious - based on clinical andother criteria.
Fortunately both statistical significance and power are increased by increasing samplesize, so increasing sample size will reduce likelihoods of both Type I and Type II errors.However, that does not mean that researchers necessarily need to vastly increase thesize of their samples, at great expense of time and resources.
The other factor affecting the power of a study is the effect size (ES) which is underinvestigation in the study. This is a measure of „how wrong the null hypothesis is‟. Forexample, we might compare the efficacy of two bronchodilators for treating an asthmaattack. The ES is the difference in efficacy between the two drugs. An effect size may bea difference between groups or the strength of an association between variables such asill-health and deprivation.
If an ES is small, then many studies with small sample sizes are likely to be
underpowered. But if an ES is large, then a relatively small scale study could havesufficient power to identify the effect under investigation. It is sometimes possible toincrease the effect size (for example, by making more extreme comparisons, orundertaking a longer or more powerful intervention), but usually this is the intractableelement in the equation, and accurate estimation of the effect size is essential forcalculating power before a study begins, and hence the necessary sample size.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 111/134
However, when critiquing business educationresearch, Wunsch (1986) stated that “two ofthe most consistent flaws included
(1) is regard for sampling error when
determining sample size, and (2) disregard for response and nonresponse
bias” (p. 31).
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 112/134
The question then is, how large of a sample is required to
infer research findings back to a population? Standard textbook authors and researchers offer tested
methods that allow studies to take full advantage ofstatistical measurements, which in
turn give researchers the upper hand in determining thecorrect sample size. Sample size is one of the four inter-related features of a study design that can influence thedetection of significant differences, relationships orinteractions (Peers, 1996). Generally, these survey designstry to minimize both alpha error (finding a difference thatdoes not actually exist in the population) and beta
error (failing to find a difference that actually exists in thepopulation) (Peers, 1996).
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 113/134
However, improvement is needed. Researchersare learning experimental statistics from highlycompetent statisticians and then doing their bestto apply the formulas and approaches they learnto their research design. A simple survey of
published manuscripts reveals numerous errorsand questionable approaches to sample sizeselection, and serves as proof that improvementis needed. Many researchers could benefit from areal-life primer on the tools needed to properly
conduct research, including, but not limited to,sample size selection.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 114/134
Primary Variables of Measurement
The researcher must make decisions as to which variables will beincorporated into formula calculations. For example, if the researcherplans to use a seven-point scale to measure a continuous variable, e.g.,
job satisfaction, and also plans to determine if the respondents differ bycertain categorical variables, e.g., gender, tenured, educational level,etc., which ariable(s) should be used as the basis for sample size? Thisis important because the use of gender as the primary variable will resultin a substantially larger sample size than if one used the seven-pointscale as the primary variable of measure. Cochran (1977) addressed thisissue by stating that “One method of determining sample size is tospecify margins of error for the items that are regarded as most vital tothe survey. An estimation of the sample size needed is first madeseparately for each of these important items” (p. 81). When thesecalculations are completed, researchers will have a range of n‟s, usuallyranging from smaller n‟s for scaled, continuous variables, to larger n‟s
for dichotomous or categorical variables. The researcher should makesampling decisions based on these data. If the n‟s for the variables ofinterest are relatively close, the researcher can simply use the largest nas the sample size and be confident that the sample size will provide thedesired results.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 115/134
More commonly, there is a sufficient variation amongthe n‟s so that we are reluctant to choose the largest,either from budgetary considerations or because thiswill give an over-all standard of precisionsubstantially higher than originally contemplated. Inthis event, the desired standard of precision may berelaxed for certain of the items, in order to permit theuse of a smaller value of n (Cochran, 1977, p. 81).
The researcher may also decide to use thisinformation in deciding whether to keep all of thevariables identified in the study. “In some cases, then‟s are so discordant that certain of them must bedropped from the inquiry; . . .” (Cochran, 1977, p. 81)
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 116/134
Cochran‟s (1977) formula uses two key factors: (1)the risk the researcher is willing to accept in the
study, commonly called the margin of error, or theerror the researcher is willing to accept, and (2) thealpha level, the level of acceptable risk the researcheris willing to accept that the true margin of error
exceeds the acceptable margin of error; i.e., theprobability that differences revealed by statisticalanalyses really do not exist; also known as Type Ierror. Another type of error will not be addressedfurther here, namely, Type II error, also known asbeta error. Type II error occurs when statistical
procedures result in a judgment of no significantdifferences when these differences do indeed exist
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 118/134
Acceptable Margin of Error. The general rule relative to
acceptable margins of error in educational and socialresearch is as follows: For categorical data, 5% margin oferror is acceptable, and, for continuous data, 3% margin oferror is acceptable (Krejcie & Morgan, 1970). For example,a 3% margin of error would result in the researcher beingconfident that the true mean of a seven point scale is
within ±.21 (.03 times seven points on the scale) of themean calculated from the research sample. For adichotomous variable, a 5% margin of error would result inthe researcher being confident that the proportion ofrespondents who were male was within ±5% of theproportion calculated from the research sample.
Researchers may increase these values when a highermargin of error is acceptable or may decrease these valueswhen a higher degree of precision is needed.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 119/134
A critical component of sample size formulas is the estimation of
variance in the primary variables of interest in the study. Theresearcher does not have direct control over variance and mustincorporate variance estimates into research design. Cochran(1977) listed four ways of estimating population variances forsample size determinations: (1) take the sample in two steps,and use the results of the first step to determine how manyadditional responses are needed to attain an appropriate sample
size based on the variance observed in the first step data; (2) usepilot study results; (3) use data from previous studies of thesame or a similar population; or (4) estimate or guess thestructure of the population assisted by some logicalmathematical results. The first three ways are logical andproduce valid estimates of variance; therefore, they do not needto be discussed further.
However, in many educational and social research studies, it isnot feasible to use any of the first three ways and the researchermust estimate variance using the fourth method.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 120/134
A researcher typically needs to estimate the variance of scaled and
categorical variables. To estimate the variance of a scaled variable, onemust determine the inclusive range of the scale, and then divide by thenumber of standard deviations that would include all possible values inthe range, and then square this number. For example, if a researcherused a seven-point scale and given that six standard deviations (three toeach side of the mean) would capture 98% of all responses, thecalculations would be as follows:
7 (number of points on the scale)S = ---------------------------------------------
6 (number of standard deviations)When estimating the variance of a dichotomous (proportional) variablesuch as gender, Krejcie and Morgan (1970) recommended thatresearchers should use .50 as an estimate of the population proportion.This proportion will result in the maximization of variance, which willalso produce the maximum sample size. This proportion can be used toestimate variance in the population. For example, squaring .50 willresult in a population variance estimate of .25 for a dichotomousvariable.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 121/134
Before proceeding with sample size calculations, assuming continuous
data, the researcher should determine if a categorical variable will play aprimary role in data analysis. If so, the categorical sample size formulasshould be used. If this is not the case, the sample size formulas forcontinuous data described in this section are appropriate.
Assume that a researcher has set the alpha level a priori at .05, plans touse a seven point scale, has set the level of acceptable error at 3%, andhas estimated the standard deviation of the scale as 1.167. Cochran‟ssample size formula for continuous data and an example of its use ispresented here along with the explanations as to how these decisionswere made.
(t)2 * (s)2 (1.96)2(1.167)2 no = ------------- = -------------- = 118 (d)2 (7*.03)2 Where t = value for selected alpha level of .025 in each tail = 1.96 (the
alpha level of .05 indicates the level of risk the researcher is willing totake that true margin of error may exceed the acceptable margin oferror.)
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 122/134
Where s = estimate of standard deviation in the population = 1.167. (estimate of
variance deviation for 7 point scale calculated by using 7 [inclusive range of scale]divided by 6 [number of standard deviations that include almost all (approximately98%) of the possible values in the range]).
Where d = acceptable margin of error for mean being estimated = .21. (numberof points on primary scale * acceptable margin of error; points on primary scale =7; acceptable margin of error = .03 [error researcher is willing to except]).
Therefore, for a population of 1,679, the required sample size is 118. However,since this sample size exceeds 5% of the population (1,679*.05=84), Cochran‟s(1977) correction formula should be used to calculate the final sample size. Thesecalculations are as follows:
no (118) n= -------------------- = ---------------- = 111 (1 + no / Population) (1 + 118/1679) Where population size = 1, 679. Where n0 = required return sample size
according to Cochran‟s formula= 118. Where n1 = required return sample size because sample > 5% of population.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 123/134
These procedures result in the minimum returned sample size. If a
researcher has a captive audience, this sample size may be attainedeasily. However, since many educational and social research studiesoften use data collection methods such as surveys and other voluntaryparticipation methods, the response rates are typically well below 100%.Salkind (1997) recommended oversampling when he stated that “If youare mailing out surveys or questionnaires, . . . . Count on increasing yoursample size by 40%-50% toaccount for lost mail and uncooperativesubjects” (p. 107). Fink (1995) stated that “Oversampling can add costs
to the survey but is often necessary” (p. 36). Cochran (1977) stated that“A second consequence is, of course, that the variances of estimates areincreased because the sample actually obtained is smaller than thetarget sample. This factor can be allowed for, at least approximately, inselecting the size of the sample” (p. 396).
However, many researchers criticize the use of over-sampling to ensurethat this minimum sample size is achieved and suggestions on how to
secure the minimal sample size are scarce.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 124/134
If the researcher decides to use oversampling, four methods may be used to
determine the anticipated response rate: (1) take the sample in two steps, and usethe results of the first step to estimate how many additional responses may beexpected from the second step; (2) use pilot study results; (3) use responses ratesfrom previous studies of the same or a similar population; or (4) estimate theresponse rate. The first three ways are logical and will produce valid estimates ofresponse rates; therefore, they do not need to be discussed further. Estimatingresponse rates is not an exact
science. A researcher may be able to consult other researchers or review theresearch literature in similar fields to determine the response rates that have been
achieved with similar and, if necessary, dissimilar populations Therefore, in this example, it was anticipated that a response rate of 65% would be
achieved based on prior research experience. Given a required minimum samplesize (corrected) of 111, the following calculations were used to determine thedrawn sample size required to produce the minimum sample size:
Where anticipated return rate = 65%. Where n2 = sample size adjusted for response rate. Where minimum sample size (corrected) = 111. Therefore, n2 = 111/.65 = 171.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 125/134
The sample size formulas and procedures used for
categorical data are very similar, but some variations doexist. Assume a researcher has set the alpha level a prioriat .05, plans to use a proportional variable, has set thelevel of acceptable error at 5%, and has estimated thestandard deviation of the scale as .5. Cochran‟s samplesize formula for categorical data and an example of its use
is presented here along with explanations as to how thesedecisions were made. (t)2 * (p)(q) no= --------------------- (d)2 (1.96)2(.5)(.5) no= ---------------------- = 384 (.05)2
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 126/134
Th l l i f ll
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 127/134
These calculations are as follows:
no
n1= ----------------------
(1 + no / Population)
(384)
n1= --------------------- = 313
(1 + 384/1679)
Where population size = 1,679
Where n0 = required return sample size according to Cochran‟s formula=384
Where n1 = required return sample size because sample > 5% of populationThese procedures result in a minimum returned sample size of 313. Usingthe same oversampling procedures as cited in the continuous data example,and again assuming a response rate of 65%, a minimum drawn sample size
of 482 should be used. These calculations were based on the following: Where anticipated return rate = 65%.
Where n2 = sample size adjusted for response rate.
Where minimum sample size (corrected) = 313.
Therefore, n2 = 313/.65 = 482
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 128/134
Table 1 presents sample size values that will beappropriate for many common samplingproblems. The table includes sample sizes forboth continuous and categorical data assumingalpha levels of .10, .05, or .01. The margins of
error used in the table were .03 for continuousdata and .05 for 48 Bartlett, Kotrlik, & Higginscategorical data. Researchers may use this tableif the margin of error shown is appropriate fortheir study; however, the appropriate sample size
must be calculated if these error rates are notappropriate.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 129/134
Situations exist where the procedures described in the previous paragraphs
will not satisfy the needs of a study and two examples will be addressedhere. One situation is when the researcher wishes to use multiple regressionanalysis in a study. To use multiple regression analysis, the ratio ofobservations to independent variables should not fall below five. If thisminimum is not followed, there is a risk for overfitting, “. . . making theresults too specific to the sample, thus lacking generalizability” (Hair,Anderson, Tatham, & Black, 1995, p. 105). A more conservative ratio, of tenobservations for each independent variable was reported optimal by Millerand Kunce (1973) and Halinski and Feldt (1970).
These ratios are especially critical in using regression analyses withcontinuous data because sample sizes for continuous data are typically muchsmaller than sample sizes for categorical data. Therefore, there is apossibility that the random sample will not be sufficient if multiple variables
are used in the regression analysis. For example, in the continuous dataillustration, a population of 1,679 was utilized and it was determined that aminimum returned sample size of 111 was required. The sample size for apopulation of 1,679 in the categorical data example was 313. Table 2,developed based on the recommendations cited in the previous paragraph,uses both the five to one and ten to one ratios.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 131/134
Sample size for:Maximum numberof regressors ifratio is:
5 to 1 10 to 1
Continuous data: n = 111 22 11Categorical data: n = 313 62 31
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 132/134
As shown in Table 2, if the researcher uses the optimal ratio of ten to
one with continuous data, the number of regressors (independentvariables) in the multiple regression model would be limited to 11.Larger numbers of regressors could be used with the other situationsshown. It should be noted that if a variable such as ethnicity isincorporated into the categorical example, this variable must be dummycoded, which will result in multiple variables utilized in the model ratherthan a single variable. One variable for each ethnic group, e.g., White,Black, Hispanic, Asian, American Indian would each be coded as 1=yes
and 2=no in the regression model, which would result in five variablesrather than one in the regression model. In the continuous data example, if a researcher planned to use 14
variables in a multiple regression analysis and wished to use the optimalratio of ten to one, the returned sample size must be increased from111 to 140. This sample size of 140 would be calculated from taking thenumber of independent variables to be entered in the regression
(fourteen) and multiplying them by the number of the ratio (ten). Cautionshould be used when making this decision because raising the samplesize above the level indicated by the sample size formula will increasethe probability of Type I error.
8/14/2019 Lecture SamplingJJJJJKK
http://slidepdf.com/reader/full/lecture-samplingjjjjjkk 133/134
If the researcher plans to use factor analysis in a study, the same ratio considerationsdiscussed under multiple regression should be used, with one additional criteria, namely, thatfactor analysis should not be done with less than 100 observations. It should be noted
that an increase in sample size will decrease the level at which an item loading on a factor issignificant. For example, assuming an alpha level of .05, a factor would have to load at a levelof .75 or higher to be significant in a sample size of 50, while a factor would only have to loadat a level of .30 to be significant in a sample size of 350 (Hair et al., 1995). Sampling non-respondents. Donald (1967), Hagbert (1968), Johnson (1959), and Miller and Smith (1983)recommend that the researcher take a random sample of 10-20% of non-respondents to use innon-respondent follow-up analyses. If nonrespondents are treated as a potentially differentpopulation, it does not appear that this recommendation is valid or adequate. Rather, theresearcher could consider using Cochran‟s formula to determine an adequate sample ofnonrespondents for the non-respondent follow-up response analyses.
Budget, time and other constraints. Often, the researcher is faced with various constraints thatmay force them to use inadequate sample sizes because of practical versus statistical reasons.These constraints may include budget, time, personnel, and other resource limitations. Inthese cases, researchers should report both the appropriate sample sizes along with thesample sizes actually used in the study, the reasons for using inadequate sample sizes, and adiscussion of the effect the inadequate sample sizes may have on the results of the study. Theresearcher should exercise caution when making programmatic recommendations based onresearch conducted with inadequate sample sizes