Top Banner
Sampling Methods and Survey Types: One of the world's best-known polling organisation s, Gallup, say that one of the most frequently asked questions they get from Americans is why they've never been interviewed for a survey. In an adult population of almost two hundred million, Americans express scepticism about the scientific reliability of sampling. In particular, they do not believe that a survey of 1500 - 2000 people can represent the views of all citizens. Gallup's sampling princip le is that selecting a sample of a small proportion of the whole population can represent the opinions of all the people, provided that the sample is properly selected. So how do Gallup select a sample? Firstly, they have to locate a place where all or most Americans can be found. This isn't in the shopping mall, but at home. From the 1930s to mid 1980s, poll respondents were interviewed face-to-face in their homes. But by the 1990s, with approximately 95% of all U.S. homes having a telephone, the vast majority of surveys use this medium. Of course, this has the benefit of being a substantially less expensive method. Identifying and describing the population. Gallup is often asked to carry out polls on behalf of an organisation with the aim of learning more about the population's attitudes and beliefs. Let's imagine that an American national newspap er wants a poll done about U.S. golf fans; the target population may be all Americans aged at least 18 who say that they're fans of golf. But if the poll was conducted on behalf of the U.S. PGA (Professional Golf Association), the target audience might be more specific; for instance, all people over the age of 16, who watch at least 5 hours of golf (during the major tournaments ) each week. Two surveys about the same sport, including many of the same target respondents, but with very different sample populations. Choosing a method to sample the target population randomly. The polling organisations have lists of all household teleph one numbers in continental USA. A computerised system uses random digit dialling (RDD) to create a new list of all possible American telephone numbers, then selects a subset of numbers from that new list for the polling organisation to call. This is important because approximately 30% of American residential numbers are unlisted, according to recent estimates. The exclusion of these "hidden" numbers would introdu ce bias into the sample. Sample Accuracy. With a sample size of 1000 adults, using the random selection process outlined above, Gallup can be statistically certain that 95 times out of one hundred, continued poll ing would produce the same result within a margin of error of +/- 3%. If the sample size was doubled to 2000 adults, Gallup would incur roughly twice the cost in conducting the survey , but the margin of error would decrease only to +/- 2%. Interviewin g t he selected sample. What if the people randomly selected to survey are not in? What if some of the target population are busy on other phone calls when the pollsters call? In these cases the target respondent's phone number is stored and recalled later at regular times throughout the survey period. Excluding peopl e who don't answer the phone the first time Gallup calls them, would
14

Sampling Methods and Survey Types

Apr 10, 2018

Download

Documents

ahi5
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sampling Methods and Survey Types

8/8/2019 Sampling Methods and Survey Types

http://slidepdf.com/reader/full/sampling-methods-and-survey-types 1/14

Sampling Methods and Survey Types:

One of the world's best-known polling organisations, Gallup, say that one of the most

frequently asked questions they get from Americans is why they've never been

interviewed for a survey.

In an adult population of almost two hundred million, Americans express scepticism

about the scientific reliability of sampling. In particular, they do not believe that a

survey of 1500 - 2000 people can represent the views of all citizens.

Gallup's sampling principle is that selecting a sample of a small proportion of the whole

population can represent the opinions of all the people, provided that the sample is

properly selected.

• So how do Gallup select a sample?

Firstly, they have to locate a place where all or most Americans can be found. This

isn't in the shopping mall, but at home. From the 1930s to mid 1980s, poll

respondents were interviewed face-to-face in their homes. But by the 1990s, withapproximately 95% of all U.S. homes having a telephone, the vast majority of 

surveys use this medium. Of course, this has the benefit of being a substantially less

expensive method.

• Identifying and describing the population.

Gallup is often asked to carry out polls on behalf of an organisation with the aim of 

learning more about the population's attitudes and beliefs. Let's imagine that an

American national newspaper wants a poll done about U.S. golf fans; the target

population may be all Americans aged at least 18 who say that they're fans of golf.

But if the poll was conducted on behalf of the U.S. PGA (Professional Golf 

Association), the target audience might be more specific; for instance, all people

over the age of 16, who watch at least 5 hours of golf (during the majortournaments) each week. Two surveys about the same sport, including many of the

same target respondents, but with very different sample populations.

• Choosing a method to sample the target population randomly.

The polling organisations have lists of all household telephone numbers in

continental USA. A computerised system uses random digit dialling (RDD) to create

a new list of all possible American telephone numbers, then selects a subset of 

numbers from that new list for the polling organisation to call. This is important

because approximately 30% of American residential numbers are unlisted, according

to recent estimates. The exclusion of these "hidden" numbers would introduce bias

into the sample.

• Sample Accuracy.

With a sample size of 1000 adults, using the random selection process outlined

above, Gallup can be statistically certain that 95 times out of one hundred,

continued polling would produce the same result within a margin of error of +/- 3%.

If the sample size was doubled to 2000 adults, Gallup would incur roughly twice the

cost in conducting the survey, but the margin of error would decrease only to +/-

2%.

• Interviewing the selected sample.

What if the people randomly selected to survey are not in?

What if some of the target population are busy on other phone calls when the

pollsters call? In these cases the target respondent's phone number is stored and

recalled later at regular times throughout the survey period.

Excluding people who don't answer the phone the first time Gallup calls them, would

Page 2: Sampling Methods and Survey Types

8/8/2019 Sampling Methods and Survey Types

http://slidepdf.com/reader/full/sampling-methods-and-survey-types 2/14

introduce bias amongst the survey sample: for instance, young single adults, who

are frequently out or using the phone, are less likely to be included in the sample

population than more sedentary people who are less frequent phone users.

In a household with more than one adult in residence, Gallup randomly select an

adult, either by asking for the person with the latest birthday or by asking theperson who answers the phone to list all the adults who live there. The pollster then

selects one of these adults at random.

• Asking the "right" questions.

Gallup assess that the greatest source of bias or error in survey data is probably the

wording of the questions themselves.

For example, you may have thought that conducting a pre-election poll of voting

intentions would be a simple process. But the question "Who will you vote for in the

next election?" can be equally as open to bias as any other survey. Does the polling

organisation list the vice-presidential candidates along with the names of the

presidential candidates? Should the party represented by the candidate be listed or

should there be no indication of party affiliation?

In these cases, Gallup tries to mimic the format and content of the ballot paper and

reads the names of the presidential and vice-presidential candidates and gives the

name of the party represented by them.

Questions to do with policy issues can also be very tricky: are things like food

stamps or housing grants to be called "welfare" or "programs for the poor"? If 

members of the armed services are going abroad should this be termed "sending"

troops or "contributing" to a UN force? These are emotive topics and the wording of 

the question can "slant" the answer received from poll respondents.

• The oldest one in the book.

One of the oldest question wordings concerns presidential job approval. Since the

1950s and Roosevelt"s presidency, Gallup has used the following question: "Do you

approve or disapprove of the job .... is doing as president?"

This means that there is a reliable trend line provided by the continuity of the

question asked. If, for example, George W. Bush has a job approval rating of 48%

after one year of his presidency, what can be learned from such a rating? What the

trend line allows is for analysts to look into history and compare this figure with

ratings recorded earlier in the presidential term. Additionally, an analysis can be

made of this figure compared to ratings recorded during previous presidents' terms.

In this case the question may be asked: did previous presidents with this approvalrating at this stage in their term tend to get re-elected or not?

Top 

Sampling: Further examples

1. Surveys usually involve considerable expenditure of time, effort and cost.

It is vital to clarify at the outset what you want to find out in the survey, before

starting to use precious resources.

The Trendy Tea and Coffee Company (TTCC) are set to launch a new premium brandof tea and want to get the packaging right. Four different designs are created from a

Page 3: Sampling Methods and Survey Types

8/8/2019 Sampling Methods and Survey Types

http://slidepdf.com/reader/full/sampling-methods-and-survey-types 3/14

traditional dark green colour, to a flashy black, silver and yellow look. TTCC employ

a market research organisation who survey 1000 people to find out which design

they prefer.

On the basis of the reported survey findings, TTCC launch the new tea in the flashydesign, and sales of the new product nosedive after the initial period. It becomes

clear upon review that no research was carried out on the drinking habits of those

people surveyed. If this work had been done, it would have shown that the regular

tea drinkers in the sample population all preferred the dark green packaging.

2. A Goods-In Inspector at a large drinks manufacturer in South-West

England has to deal with a consignment of 1000 cases of grape juice. In the past,

the drinks company has been affected by minor contamination in its fermenting

process that has led to the loss of some batches of its best-selling line: "UK - the

British Sherry for British Tastes".

The inspector has neither the time not the staff to open all the cases to check for

possible sources of the contamination, but she wants to have an idea of what the

whole consignment is like. She decides to open twenty cases of the grape juice - one

case in every fifty delivered. She could just open every fiftieth case in turn, but this

seems to be too standard an approach. She wants to introduce a more random

method.

So instead, the inspector imagines that the cases are numbered one to one

thousand and then uses her computer to generate at random, twenty 4-figure

numbers, ignoring all those that exceed one thousand. This gives the inspector her

sample population. As a result, there is no bias in her choice of cases to inspect.

3. The sampling method outlined above will be very labour intensive to carry

out. The inspector may have to open case 972, followed by case 23, then case 427.She realises this will be very tedious work and tries to think of a different solution -

one that combines random and multi-stage sampling methods:

She decides to split the consignment into batches of twenty-five, giving forty

batches in total. From each of these she chooses one case by selecting a random

number from one to twenty-five.

This multi-stage sampling approach saves the inspector time, cost and effort.

Top 

Correlation between variables

Let's start by looking at how a scatter diagram can illustrate these relationships:

• Scattergrams

The scattergram or XY chart can be a useful way of representing the relationship

between two variables. The usual conventions of dependent and independent

variable position on the axes are followed. Points on the diagram are not connected

as they are on a line graph. The relationship between the two variables displayed on

the chart may be positive, negative or non-existent.

In Chart 1 there is a very strong negative correlation shown between disposable

Page 4: Sampling Methods and Survey Types

8/8/2019 Sampling Methods and Survey Types

http://slidepdf.com/reader/full/sampling-methods-and-survey-types 4/14

income levels and the number of discount stores in existence. You may feel that this

makes sense as a hypothesis, in that as income levels fall, more discount retail

enterprises emerge.

Chart 1: Scattergram (XY Chart) showing a negative association

(Data for display purposes only)

In Chart 2, disposable income is plotted against number of overseas holidays taken.

It shows that in this case there is a strong positive correlation between income and

"luxury" items, such as foreign holidays. You may not agree that a foreign holiday is

a luxury, but may feel that, in general, the higher the income level the greater the

number of overseas holidays taken will be.

Chart 2: Scattergram showing a positive association

(Data for display purposes only)

The charts shown here are meant to illustrate the concept of a scattergram. In

practice, of course, the points on a scattergram are likely to lie around the chart,

although a strong association between the two variables is likely to allow us to draw

a straight line through the points shown. Such a straight line is known as the "line of 

best fit". This is a straight line that seems to fit the points on the diagram best.

The line of best fit is usually drawn by eye. But there are more sophisticated ways of 

making the line more accurate. This is because it is known that for a set of points on

a scattergram, the line of best fit will always pass through the point (x-bar, y-bar)

where x-bar is the mean of the horizontal values and y-bar is the mean of the y

Page 5: Sampling Methods and Survey Types

8/8/2019 Sampling Methods and Survey Types

http://slidepdf.com/reader/full/sampling-methods-and-survey-types 5/14

values.

Chart 3 illustrates a lack of a statistical relationship. There is little or a non-existent

correlation between disposable income and amount of rainfall, unless of course we're

looking at the long term effect on the global climate of taking all these extra foreign

holidays and driving all these new cars that our higher incomes can afford!

Chart 3: Scattergram showing little or no association

(Data for display purposes only)

But we can go further than just representing the correlation between two separate

variables; we can formally measure the strength of the association between them.

• The Correlation Coefficient

As indicated, the idea behind the correlation coefficient is that we can give a numbervalue to the strength of relationship between one variable and another. There are

two main measures commonly used: Spearman's Rank Correlation Coefficient

and Pearson's Product-moment Correlation Coefficient. The former of these

two is the least complicated to calculate and allows us to assess the aesthetic or

qualitative characteristics of data. The latter allows us to measure the strength of 

the association between two variables by working out the dispersion of the

scattergram points.

There is an illustration of  correlation coefficient measures in the 'Crunching' section

on TimeWeb.

Top 

Normal Distribution Curve illustration

The chart below illustrates a normally distributed population. You will notice that the

curve conforms to the characteristics outlined in the explanation section: the most

frequent value is at the centre; there is symmetry about the central value; there is

diminishing frequency as you move away from the centre.

A line is drawn from each of the two points of inflexion (one on either side of the mean)

to the X-axis. The distance from that point to the mean point on the X-axis is equal to

the standard deviation.

Page 6: Sampling Methods and Survey Types

8/8/2019 Sampling Methods and Survey Types

http://slidepdf.com/reader/full/sampling-methods-and-survey-types 6/14

Four separate areas are now identifiable from the chart:

Area A shows the area between the mean and one standard deviation above the mean.

Area B shows the area between the mean and one standard deviation below the mean.

Area C indicates the area to the right of one standard deviation above the mean.

Area D indicates the area to the left of one standard deviation below the mean.

Because the normal curve is symmetrical, Area A equals Area B. Areas C and D are also

equal. The total of A, B, C and D equals the total area under the curve, or the entire

population.

Mathematical calculations show that in any normal distribution, approximately 68% of 

all observations fall within one standard deviation (SD) of the mean (Areas A plus B).

So, about 34% of observations lie between the mean and one standard deviation above

the mean (Area A) and 34% lie between the mean and one standard deviation below

the mean (Area B). By subtraction, we can tell that in a normal distribution 32% of the

observations fall outside one standard deviation, 16% on either side (16% in Area C

and 16% in Area D).

Let's now put this into the language of probability: In any normal distribution, there is a

.68 probability that a particular value will fall within one standard deviation of the

mean; there is approximately a .34 probability that a value will lie between the mean

and one SD above the mean (Area A) and a .34 probability that a value will lie between

the mean and one SD below the mean (Area B).

Also, there is a .16 probability that a particular value will lie above one SD from the

mean (Area C) and a .16 probability that the value will lie below one SD from the mean

(Area D).

Using this knowledge, we can re-draw our normal curve chart, now putting in six

separate areas:

Page 7: Sampling Methods and Survey Types

8/8/2019 Sampling Methods and Survey Types

http://slidepdf.com/reader/full/sampling-methods-and-survey-types 7/14

The vertical lines from the curve to the X-axis represent the mean (at the centre) and

distances of one and two SDs on either side of the mean.

Areas A and B have the same characteristics as in the first chart; each being equal and

each containing approximately 34% of all the values in the normal distribution.

Areas C and D are also equal and are defined by the vertical lines indicating one and

two SDs from the mean (on either side). Each of these areas contain approximately

13.5% of all the values in the normal distribution.

Areas E and F at the extreme ends of the curve are defined by the vertical line

indicating three SDs from the mean and the tail ends of the distribution. Each of these

areas contain 2.5% of all the values . In other words, in a normal distribution, 5% of a

population will be beyond two SDs: 2.5% above the mean and 2.5% below.

Let's restate this information in the language of probability:

1. In any normal distribution, there is a .34 probability that any particular

value will fall between the mean and one SD above the mean (Area A) and the same

probability of the value falling between the mean and one SD below the mean (Area

B).

2. There is a .135 probability of any value falling between one and two SDs

above the mean (Area C) and the same probability of the value falling between one

and two SDs below the mean (Area D).

3. There is a .475 probability that any value will fall between two SDs above

the mean (within Areas A to C) and the same probability of the value falling between

two SDs below the mean (within Areas B to D).

4. The mathematics of normal curves shows that the area contained by the

vertical lines representing three SDs from the mean contains 99.7% of the area

under the curve and 99.7% of all the values in the data set. There is, therefore, a

probability of .997 that in any normal distribution any particular value will fall within

three SDs from the mean.

Why not try the what samples tell us worksheet to see that you understand this?

Top 

Page 8: Sampling Methods and Survey Types

8/8/2019 Sampling Methods and Survey Types

http://slidepdf.com/reader/full/sampling-methods-and-survey-types 8/14

Random Sampling:

Random sampling is

usually the preferred

method of sampling,

because of the lack of 

built-in bias that is

involved.

This method requires

that a list of every

member of the

population is available. There are times when this will be impossible, for instance when

an entire national or regional population is involved, or for example if you are studying

the whole population of small businesses in the UK. In these cases, the simple random

sampling method outlined below will not be appropriate.

In a simple random sample, with a list of the entire population being studied, the

sampler gives a number to every item on the list and selects the sample by using a

random number generator or a table of random numbers.

Here's how it works.

Imagine you want to study all the cars being stored in a warehousing complex, but you

don't have the time or other resources to deal with them all. You might decide to work

with a sample of 30 cars out of a total warehouse population of 1000.

So, you begin by assigning a number to every member of the total population. As the

largest number you need (1000) has four digits, every car in the warehouse is given a

four digit number, beginning with 0001, 0002, 0003 and so on, up to 1000.

You look at your list of random numbers, which looks like the following:

A TABLE OF RANDOM NUMBERS

00 10097 32533 76520 13586 34673 54876 80959 09117 39292 74945

01 37542 04805 64894 74296 24805 24037 20636 10402 00822 91665

02 08422 68953 19645 09303 23209 02560 15953 34764 35080 33606

03 99019 02529 09376 70715 38311 31165 88676 74397 04436 27659

04 12807 99970 80157 36147 64032 36653 98951 16877 12171 76833

05 66065 74717 34072 76850 36697 36170 65813 39885 11199 29170

06 31060 10805 45571 82406 35303 42614 86799 07439 23403 09732

07 85269 77602 02051 65692 68665 74818 73053 85247 18623 88579

08 63573 32135 05325 47048 90553 57548 28468 28709 83491 25624

09 73796 45753 03529 64778 35808 34282 60935 20344 35273 88435

10 98520 17767 14905 68607 22109 40558 60970 93433 50500 73998

11 11805 05431 39808 27732 50725 68248 29405 24201 52775 67851

12 83452 99634 06288 98083 13746 70078 18475 40610 68711 77817

13 88685 40200 86507 58401 36766 67951 90364 76493 29609 11062

14 99594 67348 87517 64969 91826 08928 93785 61368 23478 34113

15 65481 17674 17468 50950 58047 76974 73039 57186 40218 16544

16 80124 35635 17727 08015 45318 22374 21115 78253 14385 53763

Page 9: Sampling Methods and Survey Types

8/8/2019 Sampling Methods and Survey Types

http://slidepdf.com/reader/full/sampling-methods-and-survey-types 9/14

17 74350 99817 77402 77214 43236 00210 45521 64237 96286 02655

18 69916 26803 66252 29148 36936 87203 76621 13990 94400 56418

19 09893 20505 14225 68514 46427 56788 96297 78822 54382 14598

20 91499 14523 68479 27686 46162 83554 94750 89923 37089 20048

21 80336 94598 26940 36858 70297 34135 53140 33340 42050 82341

22 44104 81949 85157 47954 32979 26575 57600 40881 22222 06413

23 12550 73742 11100 02040 12860 74697 96644 89439 28707 25815

24 63606 49329 16505 34484 40219 52563 43651 77082 07207 31790

25 61196 90446 26457 47774 51924 33729 65394 59593 42582 60527

26 15474 45266 95270 79953 59367 83848 82396 10118 33211 59466

27 94557 28573 67897 54387 54622 44431 91190 42592 92927 45973

28 42481 16213 97344 08721 16868 48767 03071 12059 25701 46670

29 23523 78317 73208 89837 68935 91416 26252 29663 05522 82562

30 04493 52494 75246 33824 45862 51025 61962 79335 65337 12472

31 00549 97654 64051 88159 96119 63896 54692 82391 23287 2952932 35963 15307 26898 09354 33351 35462 77974 50024 90103 39333

33 59808 08391 45427 26842 83609 49700 13021 24892 78565 20106

34 46058 85236 01390 92286 77281 44077 93910 83647 70617 42941

35 32179 00597 87379 25241 05567 07007 86743 17157 85394 11838

36 69234 61406 20117 45204 15956 60000 18743 92423 97118 96338

37 19565 41430 01758 75379 40419 21585 66674 36806 84962 85207

38 45155 14938 19476 07246 43667 94543 59047 90033 20826 69541

39 94864 31994 36168 10851 34888 81553 01540 35456 05014 51176

40 98086 24826 45240 28404 44999 08896 39094 73407 35441 31880

41 33185 16232 41941 50949 89435 48581 88695 41994 37548 7304342 80951 00406 96382 70774 20151 23387 25016 25298 94624 61171

43 79752 49140 71961 28296 69861 02591 74852 20539 00387 59579

44 18633 32537 98145 06571 31010 24674 05455 61427 77938 91936

45 74029 43902 77557 32270 97790 17119 52527 58021 80814 51748

46 54178 45611 80993 37143 05335 12969 56127 19255 36040 90324

47 11664 49883 52079 84827 59381 71539 09973 33440 88461 23356

48 48324 77928 31249 64710 02295 36870 32307 57546 15020 09994

49 69074 94138 87637 91976 35584 04401 10518 21615 01848 76938

You begin the selection by pointing (with your eyes closed) to an area in the table.Imagine you point to line 10 (the lines are numbered down the left-hand side of the

table). The first possible four digit number between 0001 and 1000 is 0177. Notice that

as the table contains five digit numbers, it's acceptable to start by taking the fifth digit

of the first number in line 10.

The second four digit number is 0568.

The third number is 0722.

The fourth is 0940.

The fifth is 0970.

The sixth is 0500.

You would continue down the table, gathering four digit numbers until you had collectedthirty numbers between 0001 and 1000. Each of these would represent one car in the

Page 10: Sampling Methods and Survey Types

8/8/2019 Sampling Methods and Survey Types

http://slidepdf.com/reader/full/sampling-methods-and-survey-types 10/14

warehouse, chosen at random to form a sample of thirty cars.

There is less bias in this selection method because every member of the population has

an equal chance of being selected, and represented in the sample. You have made no

attempt to organise the population into sections, so the selection process is free from

your direction.

Top 

Probability

Jaques Bernoulli was the first to suggest what is known as the 'central limit theorem'

which is based on his work on probability. Imagine that you have a container that holds

thousands of pebbles; you don't know how many there are, neither do you know that of 

the 5000 pebbles, 3000 of them are white and 2000 black. The ratio of white to black

pebbles is therefore 3:2.

Bernoulli asked how many pebbles you would draw from the container before you could

make an estimate of the actual ratio of white to black pebbles. Of course you would

begin to get a fairly clear idea pretty soon, as you picked out a pebble, noted its colour

and then replaced it in the container. But the key to the limit theorem is whether or not

you can repeat the experiment over and over until it's ten, or one hundred times more

probable that the 3:2 ratio exists.

Bernoulli states that this is the case; the more experiments are carried out, the more

likely it is that the estimated ratio will get close to the true ratio.

Top 

Time series

To identify trends in time series data, other than drawing a trend curve onto a graph

freehand, there are two common measures used:

• using moving averages.

• using regression analysis to find the line of 'best fit'.

Top 

ILLUSTRATION 

Sampling and Statistics

Contents:

• Degrees of Freedom example

• Example of a collection of sample means (s-means)

Degrees of Freedom example

There is an explanation available of degrees of freedom if you are not sure.

Page 11: Sampling Methods and Survey Types

8/8/2019 Sampling Methods and Survey Types

http://slidepdf.com/reader/full/sampling-methods-and-survey-types 11/14

If a random sample of 16 light bulbs produced in a larger batch is selected and the

mean of the sample is 1450 hours and the estimated SD is 80 hours, estimate the

population mean at the 95% confidence level.

SE of the sample means = 80 / ( 16)

= 20 hours

Number of degrees of freedom = 16 - 1

= 15

The t statistic (read from the t distribution tables) at a 95% level and with 15 degrees

of freedom = 2.13

So the population mean = 2.13 x 20 43 hours

So, we can be 95% confident that m (population mean) lies in within the range

1450 +/- 43 = 1407 to 1493 hours.

[Top]

Example of a collection of sample means (s-means)

Assume that we can properly identify a sample from a large population that we are

interested in studying, by using random, quota or stratified sampling techniques

outlined earlier.

We are interested in collecting a representative sample of a large population: for

instance, numbers of people in the workforce who are aged under eighteen. Let's say

we want to find out how many hours per week this group works on average.

Imagine that we sample a group of thirty people under the age of eighteen who are in

some form of paid work. We have a group of numbers that represent the number of 

hours worked in a week by each of the thirty people in our sample. We can then

calculate the mean of this sample, either by adding up all the values and dividing by

the total number in the sample, or by entering the values into an Excel worksheet and

getting the calculation done that way.

Now, suppose that in our desire to produce as representative a sample as possible

within the time and cost contraints of our project, we continue to draw samples of thirty

people under the age of eighteen. We use the same random process as with our first

sample and make sure that we do not include in the samples anyone who was part of 

the earlier samples.

What we have produced is a collection of sample means, one for each of the samples

we have drawn from the population. These are quite likely to be fairly close to each

other in value, but there will be some differences. In other words, the collection of 

means from the various samples taken will have a frequency distribution (with a mean

value, a median, a variation and a standard deviation).

Let's suppose that the following table represents the ten samples and their means:

Sample 1: S-Mean = 6.25 hours

Sample 2: S-Mean = 6.50 hours

Sample 3: S-Mean = 6.00 hours

Sample 4: S-Mean = 7.75 hours

Sample 5: S-Mean = 4.50 hours

Page 12: Sampling Methods and Survey Types

8/8/2019 Sampling Methods and Survey Types

http://slidepdf.com/reader/full/sampling-methods-and-survey-types 12/14

Sample 6: S-Mean = 8.00 hours

Sample 7: S-Mean = 3.50 hours

Sample 8: S-Mean = 9.25 hours

Sample 9: S-Mean = 4.75 hours

Sample 10:S-Mean = 6.50 hours

Remember that each of these S-Means is the average for a sample of thirty under-

eighteens who carry out some form of paid work. The list of numbers will have mean

value and a standard deviation. Can you place these into an Excel worksheet to

calculate these values?

The mean value in this case is 6.3 hours and the standard deviation is 1.74 hours.

Check you could get the same result by using a spreadsheet package.

Of course, if we kept collecting samples like the ten in this example, eventually we

would have sampled the entire population (as long as we made sure that no two under-

eighteens were in more than one sample). The average of all of our samples would then

be the average for the whole population, because all of our samples were the same as

the whole population.

In practice, we don't have the time or the money to conduct such a huge sampling taskand in most cases, we don't have to.

There is a worksheet available on 'what samples tell us'

• Standard Errors and Increasing Sample Size

As we have seen, we can take a single sample of more than 30 items and make

conclusions about the large population from which it is drawn.

As we find out more about the standard error, we can notice other interesting details

that should aid our understanding of statistics in practice.

Firstly, the size of the confidence interval depends on the size of the standard error.

Page 13: Sampling Methods and Survey Types

8/8/2019 Sampling Methods and Survey Types

http://slidepdf.com/reader/full/sampling-methods-and-survey-types 13/14

So, if we can minimise the standard error, we can reduce the range of values in each

confidence level - thus producing more precise conclusions.

This is because we calculate the standard error from the sample by taking the

standard deviation of the sample and dividing it by the square root of the number of 

observations in the sample. Increasing the number in the sample may only have a

small effect on reducing the size of the standard error.

You may ask yourself how much you would have to increase the sample size by in

order to have any significant impact. The answer is that increasing the sample size

will indeed narrow the range of results, but that the sample size has to be increased

so dramatically that the cost and time taken would make it unworkable.

This can be illustrated by the following example: If we were studying a sample of 

100 students and their exam performance and if the standard deviation of the list of 

results was, say, 14, then we could calculate the standard error by dividing the

standard deviation by the square root of the number in the sample. So, 14 divided

by the square root of 100, or 14 divided by 10 = 1.4.

This means that in estimating the confidence intervals for the entire population of 

students, we use the figure of 1.4 marks as the basis of the intervals to calculate the

ranges for .68, .95 and .99 probability.

We might think that we ought to try to reduce the range in order to get a more

precise result. We could increase the sample size, in order to increase the size of its

square root and therefore reduce the size of the standard error.

But, because we are dealing with the square root of the number in the sample, we

find that to have any significant impact on the standard error, we would have toincrease the sample size considerably.

So in the example given above, we were studying a sample of 100 students, and

found the result for a standard error of 1.4 by dividing the standard deviation of the

sample (14) by the square root of 100 (10). If we wanted to reduce the standard

error by one half, we would have to divide 14 by 20. In order to do this we would

have to sample 400 students, as the square root of 400 is 20.

Do you see the relationship here between sample size and size of the standard

error? In order to halve the standard error, we have to increase the sample size by

four times its original scope.

What we should learn from this is that in many cases it is not worth the effort of 

increasing the sample size in order to achieve more precise results. If you bear in

mind that the really time consuming part of the analysis is the selection of the

sample information, then you can see that it is usually more efficient to keep the

sample relatively small (as long as it is over 30 items) and to focus our efforts on

gathering the best sample we can. This means, of course, ensuring that our sample

is as free as possible from bias.

• Review of confidence interval analysis of a population from a single

sample.

Page 14: Sampling Methods and Survey Types

8/8/2019 Sampling Methods and Survey Types

http://slidepdf.com/reader/full/sampling-methods-and-survey-types 14/14

It may be wise here, to review the steps we should take in making generalisations

within confidence levels abour an entire population from a single sample:

1. Firstly, we select a sampling strategy, which usually means a

random sample, and select our sample, making sure that we have at least

30 observations within it.

2. Then we collect the information from the sample and process it,

(using Excel or similar spreadsheet package), in order to find out the mean

and the standard error of the sample.

3. Finally, we make conclusions at the different confidence intervals:

68% for a range within plus or minus 1 standard error of the mean of the

sample; 95% for a range within plus or minus 2 standard errors of the mean

of the sample; and 99% for a range within plus or minus 3 standard errors

of the mean of the sample.