Sampling Methods and Survey Types: One of the world's best-known polling organisation s, Gallup, say that one of the most frequently asked questions they get from Americans is why they've never been interviewed for a survey. In an adult population of almost two hundred million, Americans express scepticism about the scientific reliability of sampling. In particular, they do not believe that a survey of 1500 - 2000 people can represent the views of all citizens. Gallup's sampling princip le is that selecting a sample of a small proportion of the whole population can represent the opinions of all the people, provided that the sample is properly selected. • So how do Gallup select a sample? Firstly, they have to locate a place where all or most Americans can be found. This isn't in the shopping mall, but at home. From the 1930s to mid 1980s, poll respondents were interviewed face-to-face in their homes. But by the 1990s, with approximately 95% of all U.S. homes having a telephone, the vast majority ofsurveys use this medium. Of course, this has the benefit of being a substantially less expensive method. • Identifying and describing the population. Gallup is often asked to carry out polls on behalf of an organisation with the aim oflearning more about the population's attitudes and beliefs. Let's imagine that an American national newspap er wants a poll done about U.S. golf fans; the target population may be all Americans aged at least 18 who say that they're fans of golf. But if the poll was conducted on behalf of the U.S. PGA (Professional GolfAssociation), the target audience might be more specific; for instance, all people over the age of 16, who watch at least 5 hours of golf (during the major tournaments ) each week. Two surveys about the same sport, including many of the same target respondents, but with very different sample populations. • Choosing a method to sample the target population randomly. The polling organisations have lists of all household teleph one numbers in continental USA. A computerised system uses random digit dialling (RDD) to create a new list of all possible American telephone numbers, then selects a subset ofnumbers from that new list for the polling organisation to call. This is important because approximately 30% of American residential numbers are unlisted, according to recent estimates. The exclusion of these "hidden" numbers would introdu ce bias into the sample. • Sample Accuracy. With a sample size of 1000 adults, using the random selection process outlined above, Gallup can be statistically certain that 95 times out of one hundred, continued poll ing would produce the same result within a margin of error of +/- 3%. If the sample size was doubled to 2000 adults, Gallup would incur roughly twice the cost in conducting the survey , but the margin of error would decrease only to +/- 2%. • Interviewin g t he selected sample. What if the people randomly selected to survey are not in? What if some of the target population are busy on other phone calls when the pollsters call? In these cases the target respondent's phone number is stored and recalled later at regular times throughout the survey period. Excluding peopl e who don't answer the phone the first time Gallup calls them, would
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
One of the world's best-known polling organisations, Gallup, say that one of the most
frequently asked questions they get from Americans is why they've never been
interviewed for a survey.
In an adult population of almost two hundred million, Americans express scepticism
about the scientific reliability of sampling. In particular, they do not believe that a
survey of 1500 - 2000 people can represent the views of all citizens.
Gallup's sampling principle is that selecting a sample of a small proportion of the whole
population can represent the opinions of all the people, provided that the sample is
properly selected.
• So how do Gallup select a sample?
Firstly, they have to locate a place where all or most Americans can be found. This
isn't in the shopping mall, but at home. From the 1930s to mid 1980s, poll
respondents were interviewed face-to-face in their homes. But by the 1990s, withapproximately 95% of all U.S. homes having a telephone, the vast majority of
surveys use this medium. Of course, this has the benefit of being a substantially less
expensive method.
• Identifying and describing the population.
Gallup is often asked to carry out polls on behalf of an organisation with the aim of
learning more about the population's attitudes and beliefs. Let's imagine that an
American national newspaper wants a poll done about U.S. golf fans; the target
population may be all Americans aged at least 18 who say that they're fans of golf.
But if the poll was conducted on behalf of the U.S. PGA (Professional Golf
Association), the target audience might be more specific; for instance, all people
over the age of 16, who watch at least 5 hours of golf (during the majortournaments) each week. Two surveys about the same sport, including many of the
same target respondents, but with very different sample populations.
• Choosing a method to sample the target population randomly.
The polling organisations have lists of all household telephone numbers in
continental USA. A computerised system uses random digit dialling (RDD) to create
a new list of all possible American telephone numbers, then selects a subset of
numbers from that new list for the polling organisation to call. This is important
because approximately 30% of American residential numbers are unlisted, according
to recent estimates. The exclusion of these "hidden" numbers would introduce bias
into the sample.
• Sample Accuracy.
With a sample size of 1000 adults, using the random selection process outlined
above, Gallup can be statistically certain that 95 times out of one hundred,
continued polling would produce the same result within a margin of error of +/- 3%.
If the sample size was doubled to 2000 adults, Gallup would incur roughly twice the
cost in conducting the survey, but the margin of error would decrease only to +/-
2%.
• Interviewing the selected sample.
What if the people randomly selected to survey are not in?
What if some of the target population are busy on other phone calls when the
pollsters call? In these cases the target respondent's phone number is stored and
recalled later at regular times throughout the survey period.
Excluding people who don't answer the phone the first time Gallup calls them, would
introduce bias amongst the survey sample: for instance, young single adults, who
are frequently out or using the phone, are less likely to be included in the sample
population than more sedentary people who are less frequent phone users.
In a household with more than one adult in residence, Gallup randomly select an
adult, either by asking for the person with the latest birthday or by asking theperson who answers the phone to list all the adults who live there. The pollster then
selects one of these adults at random.
• Asking the "right" questions.
Gallup assess that the greatest source of bias or error in survey data is probably the
wording of the questions themselves.
For example, you may have thought that conducting a pre-election poll of voting
intentions would be a simple process. But the question "Who will you vote for in the
next election?" can be equally as open to bias as any other survey. Does the polling
organisation list the vice-presidential candidates along with the names of the
presidential candidates? Should the party represented by the candidate be listed or
should there be no indication of party affiliation?
In these cases, Gallup tries to mimic the format and content of the ballot paper and
reads the names of the presidential and vice-presidential candidates and gives the
name of the party represented by them.
Questions to do with policy issues can also be very tricky: are things like food
stamps or housing grants to be called "welfare" or "programs for the poor"? If
members of the armed services are going abroad should this be termed "sending"
troops or "contributing" to a UN force? These are emotive topics and the wording of
the question can "slant" the answer received from poll respondents.
• The oldest one in the book.
One of the oldest question wordings concerns presidential job approval. Since the
1950s and Roosevelt"s presidency, Gallup has used the following question: "Do you
approve or disapprove of the job .... is doing as president?"
This means that there is a reliable trend line provided by the continuity of the
question asked. If, for example, George W. Bush has a job approval rating of 48%
after one year of his presidency, what can be learned from such a rating? What the
trend line allows is for analysts to look into history and compare this figure with
ratings recorded earlier in the presidential term. Additionally, an analysis can be
made of this figure compared to ratings recorded during previous presidents' terms.
In this case the question may be asked: did previous presidents with this approvalrating at this stage in their term tend to get re-elected or not?
Top
Sampling: Further examples
1. Surveys usually involve considerable expenditure of time, effort and cost.
It is vital to clarify at the outset what you want to find out in the survey, before
starting to use precious resources.
The Trendy Tea and Coffee Company (TTCC) are set to launch a new premium brandof tea and want to get the packaging right. Four different designs are created from a
traditional dark green colour, to a flashy black, silver and yellow look. TTCC employ
a market research organisation who survey 1000 people to find out which design
they prefer.
On the basis of the reported survey findings, TTCC launch the new tea in the flashydesign, and sales of the new product nosedive after the initial period. It becomes
clear upon review that no research was carried out on the drinking habits of those
people surveyed. If this work had been done, it would have shown that the regular
tea drinkers in the sample population all preferred the dark green packaging.
2. A Goods-In Inspector at a large drinks manufacturer in South-West
England has to deal with a consignment of 1000 cases of grape juice. In the past,
the drinks company has been affected by minor contamination in its fermenting
process that has led to the loss of some batches of its best-selling line: "UK - the
British Sherry for British Tastes".
The inspector has neither the time not the staff to open all the cases to check for
possible sources of the contamination, but she wants to have an idea of what the
whole consignment is like. She decides to open twenty cases of the grape juice - one
case in every fifty delivered. She could just open every fiftieth case in turn, but this
seems to be too standard an approach. She wants to introduce a more random
method.
So instead, the inspector imagines that the cases are numbered one to one
thousand and then uses her computer to generate at random, twenty 4-figure
numbers, ignoring all those that exceed one thousand. This gives the inspector her
sample population. As a result, there is no bias in her choice of cases to inspect.
3. The sampling method outlined above will be very labour intensive to carry
out. The inspector may have to open case 972, followed by case 23, then case 427.She realises this will be very tedious work and tries to think of a different solution -
one that combines random and multi-stage sampling methods:
She decides to split the consignment into batches of twenty-five, giving forty
batches in total. From each of these she chooses one case by selecting a random
number from one to twenty-five.
This multi-stage sampling approach saves the inspector time, cost and effort.
Top
Correlation between variables
Let's start by looking at how a scatter diagram can illustrate these relationships:
• Scattergrams
The scattergram or XY chart can be a useful way of representing the relationship
between two variables. The usual conventions of dependent and independent
variable position on the axes are followed. Points on the diagram are not connected
as they are on a line graph. The relationship between the two variables displayed on
the chart may be positive, negative or non-existent.
In Chart 1 there is a very strong negative correlation shown between disposable
Chart 3 illustrates a lack of a statistical relationship. There is little or a non-existent
correlation between disposable income and amount of rainfall, unless of course we're
looking at the long term effect on the global climate of taking all these extra foreign
holidays and driving all these new cars that our higher incomes can afford!
•
Chart 3: Scattergram showing little or no association
(Data for display purposes only)
But we can go further than just representing the correlation between two separate
variables; we can formally measure the strength of the association between them.
• The Correlation Coefficient
As indicated, the idea behind the correlation coefficient is that we can give a numbervalue to the strength of relationship between one variable and another. There are
two main measures commonly used: Spearman's Rank Correlation Coefficient
and Pearson's Product-moment Correlation Coefficient. The former of these
two is the least complicated to calculate and allows us to assess the aesthetic or
qualitative characteristics of data. The latter allows us to measure the strength of
the association between two variables by working out the dispersion of the
scattergram points.
There is an illustration of correlation coefficient measures in the 'Crunching' section
on TimeWeb.
Top
Normal Distribution Curve illustration
The chart below illustrates a normally distributed population. You will notice that the
curve conforms to the characteristics outlined in the explanation section: the most
frequent value is at the centre; there is symmetry about the central value; there is
diminishing frequency as you move away from the centre.
A line is drawn from each of the two points of inflexion (one on either side of the mean)
to the X-axis. The distance from that point to the mean point on the X-axis is equal to
You begin the selection by pointing (with your eyes closed) to an area in the table.Imagine you point to line 10 (the lines are numbered down the left-hand side of the
table). The first possible four digit number between 0001 and 1000 is 0177. Notice that
as the table contains five digit numbers, it's acceptable to start by taking the fifth digit
of the first number in line 10.
The second four digit number is 0568.
The third number is 0722.
The fourth is 0940.
The fifth is 0970.
The sixth is 0500.
You would continue down the table, gathering four digit numbers until you had collectedthirty numbers between 0001 and 1000. Each of these would represent one car in the