Chapter 4 final class - class.classmatandread.netclass.classmatandread.net/Sampling.class.pdf* explain the difference between ... John discovered that how you sample does make a difference;
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
The quality and validity of generalizations drawn from a research study are greatly influenced by its target
definition. After all, because different target definitions exclude and include different individuals, the data collected
from different groups of individuals is also likely to vary. Two studies designed to study the same thing, but with
different target definitions, are likely to have quite different results.
A recent study by the Pew Research Center1 provides a striking example of the relationship between target
definition and research findings. The Pew Center compared their own and Gallup’s target definitions of “Muslim”
and the resulting effect of these definitions on estimates of Muslim American demographics. The two target
definitions, which differed with respect to languages spoken and method of contact were:
Pew Research: Nationally representative probability sample; speaks English, Arabic,
Urdu, Farsi; reachable via landlines
Gallup: Nationally representative probability sample, speaks English, Spanish;
reachable by landlines and cell phones
The demographic characteristics estimated by each approach are shown in Figure 4.2. Note how variations in the
language-spoken component of the target definition result in significant differences in demographic estimates.
Select sampling method
Once the target population is defined, the next step determines which of two types of sampling methods will be used
to identify items or individuals for study inclusion. As discussed earlier, a probability sample is a sample in which
each individual, household, or item (generally called a sample element) comprising the universe from which the
sample is drawn has a known chance and equal probability of being selected for inclusion in the research. The
selection of sample elements is done purely by chance, for example, with a table of random numbers, coin flips or
through random digit dialing. When a probability sample is used, the selection of elements from the sample universe
continues until the required number of elements has been selected and observed or interviewed. A non-probability
sample is a sample of elements that is not selected strictly by chance from the universe of all individuals, but are
rather selected in some less random, more purposeful way. Here, the selection of elements for study inclusion may
be made on the basis of convenience or judgment.
1 Pew Research Center (2009). “Why Surveys of Muslim Americans Differ” at http://pewresearch.org/pubs/1144/muslim-americans-pew-research-survey-gallup.
population definition. You can take one of two approaches to specifying the sample frame. You can either construct
or obtain a list to represent the target population or, when lists are incomplete or unavailable, you can specify a
procedure such as random digit dialing for identifying and contacting target individuals.
The adequacy of a sample frame is evaluated in terms of how well the frame represents the target
population.
A perfect sampling frame is identical to the target population, that is, the sample frame contains every
population element once and only once and only population elements are contained in the sampling frame. As might
be expected, perfect sample frames are quite rare in actual practice. Typically, sample frames will either over-
register or under-register the target population. A sample frame that consists of all the elements in the target
population plus additional elements suffers from over-registration. An over-registered sample frame is too broad. A
sample frame that contains fewer elements than the target population suffers from under-registration. An under-
registered sample frame is too narrow and excludes elements from the target population. Examples of sample frames
having under- and over-registration as well as perfect registration are provided in Figure 4.3.
Perfect Registration A manufacturer of paper goods wishes to conduct a survey of attitudes and purchasing behaviors among his current clients. The target population is defined as companies that have purchased at least $100 worth of goods within the past three months. The names of all clients meeting these criteria are selected from the manufacturer's data base and are placed on a separate list (the sample frame) from which study participants will be selected. Over-registration: Sample Frame Larger Than Target Population You have just completed a six-month advertising test in metropolitan Atlanta and wish to determine levels of advertising and product awareness as well as brand perceptions. You decide to use random digit dialing among prefixes that are identified as "Atlanta." There are two over-registration problems. First, because of the way telephone companies assign telephone prefixes, not all telephones with an Atlanta prefix actually are in metropolitan Atlanta. Second, the research should be conducted among individuals who have lived in metropolitan Atlanta for at least six months, the time of the advertising test. Random digit dialing will not discriminate between those who have and have not resided for the required amount of time in Atlanta. A screener can be used to adjust the sample frame to better correspond with the target population. Under-registration: Sample Frame Smaller Than Target Population You want to assess teachers' reactions to corporate-sponsored educational materials. You select a list of members of the American Federation of Teachers as the sample frame. This frame suffers from under-registration because not all teachers are members of the Federation. You want to conduct a telephone survey of individuals residing in New York. One potential sample frame might be the telephone book. However, this sample frame is incomplete and suffers from under-registration because a telephone book does not contain individuals with unlisted telephone numbers.
Neither over- nor under-registration is necessarily fatal to the integrity of an advertising research study, but if left
unaccounted for, can cause significant bias. As a result, if over-registration is believed to be the case, and the
elements that fall outside the target population can be identified, then it might be possible to eliminate the effects of
over-registration by modifying your sampling plan or by using a supplemental questionnaire (called a screener) to
eliminate individuals not in the target population. If under-registration is believed to be the case, then it might be
possible to modify the sample frame by updating or some other procedure that adds omitted units.
Types of Probability Sampling
Once you know the characteristics of the population of interest (the target population) and how the population will
be identified (the sample frame), you next need to determine the specific probability sampling procedure by which
individuals are selected for study inclusion (see Figure 4.1). The three most common forms of probability sampling
used in advertising research are: simple random, systematic random, and stratified random samples.2
Simple Random Samples
Simple random samples are frequently used in advertising research. Here, each member of the population (as
represented in the sample frame) has a known and equal chance of being selected for inclusion in the research. You
can think of random sampling as a drawing where the name of each member of the population is placed on a ticket
and then placed into a drum. Individual names are selected from the drum. Every name in the drum has an equal
chance of being selected. In practice, random number tables or random digit dialing are often used to select a
random sample. A visual representation of random sampling is shown in Figure 4.4.
2 A fourth sampling method is cluster sampling, which is primarily used for research with data collection needs that require personal, at-home interviews. Cluster sampling is appropriate to this form of data collection because it shifts data collection to groups of sampling units rather than individual sampling units. Cluster sampling works as follows: First, the universe described in the sample universe definition is divided into groups, or clusters where every element of the universe is contained in one and only one cluster. Second, clusters are examined for internal representativness. Each cluster should be a "miniuniverse," that is, the characteristics of the cluster should mirror the characteristics of the total universe. Third, clusters are examined for external comparability. Clusters should be equivalent to each other with regard to important characteristics. Fourth, one or more of the clusters is selected to represent the total universe. Fifth, simple, systematic, or stratified sampling is used to select elements within the cluster. For further discussion of cluster sampling see Stat Trek, “Statistics Tutorial: Cluster Sampling” at http://stattrek.com/Lesson6/CLS.aspx?Tutorial=Stat.
Individual Number Gender Education Average Number of Hours on Social Media 1 Male High School 3.7 2 Male High School 3.4 3 Male High School 6.0 4 Male High School 1.1 5 Male College 3.7 6 Male College 4.1 7 Male College 2.5 8 Female High School 2.5 9 Female High School 4.5 10 Female High School 1.6 11 Female High School 4.9 12 Female College 7.3 13 Female College 1.8 14 Female College 4.8 15 Female College 1.1 16 Female College 1.6 17 Female College 2.1 18 Female College 1.6 19 Female College 3.7 20 Female College 6.2
(If we were to interview every individual in this target universe we would find that the average number of hours
spent in the prior 24 hours was 3.4.) Now, instead of conducting of census of this population, a researcher might use
random digit dialing for both land lines and cell phones to contact individuals in this sample universe and ask them
to provide the required information.3 Different samples of five individuals each, the identification number of
individuals in each sample, and the average number of hours spent on social networking sites is shown in the table
below.
Sample Sample Members Average Number of Hours
1 3, 4, 7, 13, 14 3.2
2 11, 12, 15, 17, 19 3.8
3 9, 10, 12, 15, 19 3.6
4 1, 8, 9, 15, 20 3.6
5 2, 10, 16, 19, 20 3.3
As can be seen, a simple random sample can provide an accurate estimate of the entire population without having to
survey the entire population. While there is some expected variation across samples, different random samples from
the universe shown in Figure 4.5 provide estimates comparable to the population as a whole. By randomly selecting
individuals from the sample universe we can accurately estimate the behaviors of the entire target population. In this
situation, a simple random sample satisfies the two characteristics of good sampling described earlier in this chapter:
it is efficient and it provides reliable generalizations about the population from which the sample is taken.
Systematic Random Samples
A variation of a simple random sample is a systematic random sample. Systematic random samples typically provide
data identical to simple random samples with the added advantage of simplicity - no table of random numbers or
coin toss is needed and sample size can be firmly specified.
Similar to a simple random sample, a systematic sample begins with a sample frame, after which the
following steps are taken:
* Count the number of elements on the list,
* Determine the desired sample size,
3 The sample frame for this example, individuals aged 18 to 24 with either a land line or cell phone, is likely to provide good representation of the total universe as the vast majority of 18 to 24 year olds will have one of these telephone connections.
There are two approaches to using confidence interval and confidence level to determine required sample size. The
first approach is a manual calculation which uses formulae grounded in statistical theory. These formulae as well as
the underlying statistical theory are described in the addendum to this chapter. The second approach uses one of the
numerous sample size calculators available online. While all perform the similar functions, we recommend the
calculator provided by Raosoft due to its ease of use and range of information automatically provided.4
Estimate Number of Contacts
Sample size requirements identified in the prior step reflect the number of completed interviews required for a
desired confidence interval and confidence level. A research fact of life, however, is that not all individuals
contacted will agree to participate in the research and not all who agree will actually complete the survey or other
data gathering instrument. As a result, the number of people contacted is always greater than the final desired
sample size and is determined by the following formula where: where DSS is the desired final sample size, ATP
4 The Raosoft sample size calculator is located at http://www.raosoft.com/samplesize.html. Other calculators can be found typing “sample size calculator” in any search engine.
interviewing those dressed in "hippie clothes" because you feel they might not take the research
seriously and you avoid interviewing those in the fraternity and sorority seats because you feel
their opinions are not indicative of the "average" student. The systematic exclusion of these
individuals violates the principle of random selection and introduces a great deal of bias into the
research.5
In sum, sample selection bias prevents the conduct of sound research and can lead to inappropriate conclusions
about a sampled population. The sampling planning process should therefore include an explicit discussion of how
sample bias might be introduced into the study and how the sampling techniques used in the research served to
eliminate identified potential sources of bias.
Bias and Telephone Sampling
Amercians, especially younger individuals, are increasingly adapting cell phones as their only form of telephone
communication. It is now estimated that about 15% of the population is now a cell phone only household with this
number even higher among certain segments: about 31% of those aged 18-24 and 20% of Hispanics are now cell
only.
Recent research demonstrates that the implications of this situation for population sampling differs across
survey topics. The Pew Center for the People & the Press notes that:
“Surveys that rely only on landline interviews are more likely to produce biased estimates if the
segment of the public unreachable on a landline differs substantially from the landline public. If
the cell-only respondents are not very different from the landline respondents, the survey estimates
will not be biased by the absence of the cell-only group. For example, the landline survey finds
that 54% of Americans favor bringing troops home from Iraq; among the cell-only respondents,
55% favor a U.S. troop withdrawal. Thus the overall survey estimate is unaffected when the cell-
only respondents are blended in. One way to consider the impact of adding cell-only interviews to
a survey is to ask the question: How different would the cell-only have to be for the total survey
estimates to be affected by their inclusion?”6
Thus, when cell phone users are believed to be similar to landline samples, adding a separate cell phone sample may
not be necessary. Pew has found, for example, that “on key political measures such as presidential approval, Iraq
5 These examples are adapted from Earl Babbie (1986). The Practice of Social Research (4th Edition) (Belmont, CA: Wadsworth Publishing Company). 6 Pew Center for the People & the Press (2008). “The Impact Of "Cell-Onlys" On Public Opinion Polling” at http://people-press.org/report/391/.
policy, presidential primary voter preference, and party affiliation, respondents reached on cell phones hold attitudes
that are very similar to those reached on landline telephones. Analysis of two separate nationwide studies shows that
including interviews conducted by cell phone does not substantially change any key survey findings.”7
Cell phone only samples do need to be added, however, when separate analyses of high incidence cell
phone only individuals are of interest or when cell phone users are believed to be dissimilar in attitude or behavior
from the overall population. This occurs, for example, when examining use of technology. The Pew Internet &
American Life Project, for example, found that cell phone users are more likely than those in a landline sample to:
* live in households earning less than $50,000
* have no education beyond high school
* be students
* be white or African-American
* be childless
* have a broadband connection at home.
Further, in terms of online activities, cell users are more likely to be content creators and bloggers. They are also
more likely to have downloaded songs and videos, watched video-sharing sites such as YouTube, and consumed
news online. In cases such as this, adding a cell phone sample is very important.8 Pew describes the process of
merging landline and cell phone samples as follows:
“The design of the landline sample ensures representation of both listed and unlisted numbers
(including those not yet listed) by using random digit dialing. This method uses random generation
of the last two digits of telephone numbers selected on the basis of the area code, telephone
exchange, and bank number. A bank is defined as 100 contiguous telephone numbers, for example
800-555-1200 to 800-555-1299. The telephone exchanges are selected to be proportionally
stratified by county and by telephone exchange within the county. That is, the number of
telephone numbers randomly sampled from within a given county is proportional to that county's
share of telephone numbers in the U.S. Only banks of telephone numbers containing three or more
listed residential numbers are selected.
The cell phone sample is drawn through systematic sampling from dedicated wireless banks of
100 contiguous numbers and shared service banks with no directory-listed landline numbers (to
7 Pew Center, op. cit. 8 Lee Rainie (2008). “Polling in the Age of the Cell Phone” at http://www.pewinternet.org/Commentary/2008/June/Polling-in-the-age-of-the-cell-phone.aspx.
ensure that the cell phone sample does not include banks that are also included in the landline
sample). The sample is designed to be representative both geographically and by large and small
wireless carriers.9
Bias and Online Panels
Telephone contact remains an important mode of respondent contact. Nevertheless, as the mode of data collection
shifts from mail and telephone to online, researchers are turning to online panels for their source of respondents.
When using panels, a researcher identifies target population characteristics and the desired sample size, and then the
appropriate number of individuals with the specified characteristics are randomly selected from the panel for
participation in the research. The underlying assumption of panel use is that panel characteristics mirror that of the
broader U. S. population.
Many research companies offer online panels for research.10 Not all panels provide equal data quality,
however, as panels differ with regard to how the panel is formed, the demographics of panel members and the extent
to which the research company has verified the representativeness of panel composition. E-rewards provides a set of
excellent insights for evaluating panel quality, which are adapted below:11
* Invitation only panels are preferred over “opt-in” panels because this recruitment technique
helps to reduce “self-selection” bias.
* Panels should aggressively and continuously identify “professional respondents” and
immediately expel these individuals from the panel.
* All panel members’ demographic and other defining information should be verified.
* Panel demographic composition should be verified and the panel itself should mirror the general
U.S. adult population. Researchers should not have to resort to weighting results to compensate for
the lack of panel representativeness.
9 Pew Center for People & the Press (undated). “About Our Survey Methodology in Detail” at http://people-press.org/methodology/about/. 10 See Green Book (2009) for a listing at http://www.greenbook.org/market-research-firms.cfm/online-panels. 11 E-rewards (2009). “What Defines Online Panel Quality” at http://www.e-rewardsresearch.com/downloads/WhatDefinesOnlinePanelQual.pdf. All of the points noted apply to the e-Rewards panel. In addition, Knowledge Networks provides additional insight into panel evaluation and verification at http://www.knowledgenetworks.com/ganp/reviewer-info.html.
Percent of the target population falling into each quota cell ______________________Education_______________________ Less Than High Some College and Gender High School School College Above Total % Men 10 17 11 10 48 Women 11 18 12 11 52 Total % 21 35 23 21 100 Sample size for each cell, given a total sample of 400 ______________________Education_______________________ Less Than High Some College and Gender High School School College Above Total Men 40 68 44 40 192 Women 44 72 48 44 208 Total % 84 140 92 84 400
Figure 4.12
Snowball Sampling
Snowball sampling uses current study participants to help recruit future participants from among their friends and
acquaintances. Thus the sample group grows like a rolling snowball. Snowball sampling is typically used for very
small, hard to reach, or highly specialized populations of individuals; populations where access is facilitated through
personal introductions. Because sample members are not selected from a sampling frame, snowball samples are
subject to numerous biases. For example, people who have many friends are more likely to be recruited into the
sample. In addition, since snowball sampling relies on referrals from initial participants to generate additional
participants, any biasing characteristics of initial respondents are likely to affect all future participants. This reflects
the fact that People tend to associate with others like themselves. This increases the chance of correlations and other
relationships being found in the study that do not apply to the wider population of which participants are members.
Sample Size in Nonprobability Samples
The nature of nonprobability samples precludes the use of statistical techniques to determine confidence intervals
and associated sample size. As a result, sample sizes in nonprobability research typically reflect some form of
judgment. Some forms of judgment, however, are better than others.
Unaided judgment is the most arbitrary approach to nonprobability sample size determination. Here, the
client or researcher simply says: "A sample of fifty (or 100 or 1,000) will do. This is a good number. One that I feel
comfortable with." While the researcher or client may "feel comfortable" with sample sizes selected in this way,
there is no assurance that the sample is sufficient to satisfy informational needs. Consequently, this approach should
be avoided.
What Is the Budget? A second approach reflects budget considerations. The amount of funds available for
sampling is divided by the cost per sample unit (for example, the cost to interview one individual) and the result is
used to set the sample size. At ten dollars per interview, for example, a budget of $1,000 dictates a sample size of
100. This approach should also be avoided. Buying the largest sample size that the you can afford has a high
potential to produce samples that are either too large or too small given the research's informational needs.
Frame of reference is a more reasonable approach to nonprobability sample size determination, where
sample decisions follow the practices of others. Here, you would first determine the sample sizes others have used
for similar types of research and would then select samples of comparable size. The strength of this approach lies in
the fact that there is often merit in historical precedence. A weakness, however, is that you may not know the
validity of the rationale underlying initial decisions of sample size.
Analytical requirements is probably the best method for determining nonprobability sample size. It is
recommended that the total number of individuals or observations in major study subgroups total at least 100 while
there be a minimum of twenty to fifty individuals in minor analytical groups. These requirements are met, for
example, in the quota sample shown in Table 4.13.
Sample Selection and Qualitative Research
This chapter began by discussing the differences between quantitative and qualitative research. Given quantitative
research’s goal of generalization, it follows that sampling focuses on people. Sampling in quantitative research is
designed to maximize the chances that observations made among a sample of people are true and generalizable to
the population from which they were sampled. Qualitative research has a different goal - to provide deep insights
into a group of people. Given this goal, sampling in qualitative research focuses on information. Events, incidents,
experiences, attitudes and behaviors not people per se are the focus of qualitative sampling.12 Sampling in qualitative
research is therefore considered successful not if the results are statistically generalizable but if the sample provides
information-rich cases.13
Purposive sampling is the most common form of qualitative sampling. In this approach, a researcher starts
with a specific purpose or information-need in mind, and the sample is then selected to include only those people 12 See, for example, M. B. Miles and A. M. Huberman (1994). Qualitative Data Analysis: An Expanded Sourcebook (2nd ed.) (Thousand Oaks, CA: Sage Publications) and A. Strauss and J. Corbin (1990). Basics of Qualitative Research: Grounded Theory Procedures and Techniques (Newbury Park, CA: Sage Publications). 13 See M. Q. Patton (1990). Qualitative Evaluation and Research Methods (2nd ed.) (Newbury Park, CA: Sage Publications).
who in the judgment of the researcher will be able to provide information relevant to satisfying the information
need. Hopefully, individuals selected will represent “information-rich cases for study in depth. Information-rich
cases are those from which one can learn a great deal about issues of central importance to the purpose of the
research.”14
The types of individuals selected for the purposive sample are determined by the researcher’s judgments,
approach, goals and information needs. Patton15 notes that a qualitative researcher has a great deal of latitude in
identifying and selecting appropriate individuals. Imagine that a researcher wants to explore why individuals use
Twitter. The most common options and approaches would include sampling:
* Extreme or deviant cases where individuals who are “outliers” are selected. Here, a researcher
looks at what rarely happens in order to better understand what usually happens.16 Twitter users
sending more than 200 messages a day would likely fall into this group.
* Typical cases where “average” or “typical” individuals are selected. Interviewing of individuals
in this group is typically most productive after insights from interviews with the prior group are
completed. This group might be comprised of individuals who send around four Twitter messages
per day (as this is the reported average).
* Highly intense or passionate individuals who may not be extreme in their behaviors but are
highly involved in the area being explored. This group could include Twitter users who, regardless
of the number of messages sent, strongly believe that “I couldn’t live without my Twitter.”
* Confirming or disconfirming cases where the attitudes or behaviors of individuals selected either
support or negate the researcher’s pre-existing perspective. This group could include those whose
primary motivation for Twitter use is “Because it helps me feel more important” (the researcher’s
belief) and those who use Twitter for any number of other reasons.
Importantly, these approaches are not mutually exclusive. As opposed to quantitative research where sample
characteristics are unchanging from the start of the research, a qualitative researcher has a great deal of flexibility
with regard to the sample. A researcher can start the research process with one type of case and then, depending
upon what is learned, move to interviews with individuals who possess a different set of characteristics. This
approach, interviewing a broad heterogeneous sample of individuals, allows a researcher to acquire insights that cut
14 Patton, op. cit. 15 Patton, op. cit. 16 R. Gomm, G. Needham and A. Bullman (2000). Evaluating Research in Health and Social Care (London, England: Sage Publications).
across a wide variety of individuals and that allow for maximum contrast across cases with differing characteristics.
In addition, sample characteristics may change as the research progresses as a researcher’s knowledge increases and
new insights and potential areas for exploration are uncovered.17
Sample Size
Sample size in qualitative research is estimated at the start of the research, but the actual number of participants is
guided by information gain: interviewing continues as long as the budget permits a researcher to learn new things
and gain new insights. The “adequacy of sample size in qualitative research is relative, a matter of judging a sample
neither small nor large per se, but rather too small or too large for the intended purposes of”18 what needs to be
learned. A sample of 10, for example, may be adequate if nothing new is being learned from additional interviews
while a sample of 30 may be too small if each new interview provides additional insights. Thus, “determining an
adequate sample size in qualitative research is ultimately a matter of judgment and experience in evaluating the
quality of the information collected … and the intended research outcomes.”19
Summary
The sampling process involves the selection and examination of the elements of a population for drawing
conclusions about the larger population of which these elements are members. A good sample is efficient and
provides reliable generalizations about the larger population.
All sampling begins with a definition of the target population, the group of elements about which you wish
to make inferences and draw generalizations. A well defined target population unambiguously describes the group
of interest and clearly differentiates those things or individuals who are of interest from those who are not.
A determination of the sampling method occurs next. Given the informational needs motivating the
research, and the time and financial considerations, either a probability or nonprobability sampling technique will be
selected. A probability sample is when each individual, household, or item comprising the universe from which the
sample is drawn has a known chance, or probability, of being selected for inclusion in the research. The selection of
sample elements is done purely by chance. A nonprobability sample is when the elements are not selected strictly by
chance from the universe of all individuals, but are rather selected in some less random, often more purposeful way.
17 See Imelda T. Coyne (1997). “Sampling in Qualitative Research. Purposeful and Theoretical Sampling; Merging or Clear Boundaries?” Journal of Advanced Nursing 26: 623-630. 18 Margarete Sandelowski (1995). “Sample Size in Qualitative Research,” Research in Nursing & Health 18: 179 - 183. 19 Sandelowski, op. cit.