Prof. Hemant Kombrabail

TYBMS Prof. Hemant Kombrabail

SAMPLING

SOME BASIC TERMS1. Population – In statistical usage the term population is applied to any finite or infinite

collection of individuals. It has displaced the older term universe, which is derived from the universe of discourse of logic. It is practically synonymous with aggregate and does not necessarily refer to a collection of living organisms.

2. Census - The complete enumeration of a population or groups at a point in time with respect to well-defined characteristics such as population, production, traffic on particular roads. In some connection the term is associated with the data collected rather than the extent of the collection so that the term Sample Census has a distinct meaning. The partial enumeration resulting from a failure to cover the whole population, as distinct from a designed sample enquiry, may be referred to as an 'incomplete census’.

3. Sample - A part of a population, or a subset from a set of units, which is provided by some process or other, usually by deliberate selection with the object of investigating the properties of the parent population or set.

4. Sample survey – A survey, which is carried out using a sampling method i.e. in which a portion only, and not the whole population, is surveyed.

5. Sampling unit - One of the units into which an aggregate is divided or regarded as divided for the purposes of sampling, each unit being regarded as individual and indivisible when the selection is made. The definition of unit may be made on some natural basis, for example, households, persons, units of product, tickets, etc. 01 on some arbitrary basis, e.g. areas defined by grid coordinates on a map. In the case of multi-stage sampling the units are different at different stages of sampling, being 'large' at the first stage and growing progressively smaller with each stage in the process of selection. The term sample unit is sometimes used in a synonymous sense.

6. Sampling Frame - A list, map or other specification of the units, which constitute the available information relating to the population designated for a particular sampling scheme. There is a frame corresponding to each state of sampling in a multi-stage sampling scheme. The frame may or may not contain information about the size or other supplementary information of the units, but it should have enough details so that a unit, if included in the sample, may be located and taken up for inquiry. The nature of the frame exerts a considerable influence over the structure of a sample survey. It is rarely perfect, and may be inaccurate, incomplete, inadequately described, out of date or subject to some degree of duplication. Reasonable reliability in the frame is a desirable condition for the reliability of a sample survey based on it. In multi-stage sampling it is sometimes possible to construct the frame at higher stages during the progress of the sample survey itself For example, certain first stage units may be selected in the first instance, and then more detailed lists or maps be constructed by compilation of available information or by direct observation only of the first-stage units actually selected

7. Sampling design - A. sample design is a definite plan for obtaining a sample from the sampling frame. It refers to the technique or the procedure the researcher would adopt in selecting some sampling units from which inferences about the population is drawn. Sampling design is determined before any data are collected.

1


8. Statistic(s) and parameter(s) - A statistic is a characteristic of a sample, whereas a parameter is a characteristic of a population. Thus, when we work out certain measures such as mean, median, mode etc from samples, then they are called statistic(s) for they describe the characteristics of a sample. But when such measures describe the characteristics of a population, they are known as parameter(s). For instance, the population mean () is a parameter, whereas the sample mean (X) is a statistic. To obtain the estimate of a parameter from a statistic constitutes the prime objective of sampling analysis.

9. Sampling error - That part of the difference between a population value and an estimate thereof, derived from a random sample, which is due to the fact that only a sample of values is observed, as distinct from errors due to imperfect selection, bias in response or estimation, errors of observation and recording, etc The totality of sampling errors in all possible samples of the same size generates the sampling distribution of the statistic which is being used to estimate the parent value

10. Precision - Precision is the range within which the population average (or other parameter) will lie in accordance with the reliability specified in the confidence level as a percentage of the estimate ± or as a numerical quantity. For instance, if the estimate is Rs. 4000 and the precision desired is ± 4%, then the true value will be no less than Rs. 3840 and no more than Rs. 4160. This is the range (Rs 3840 to Rs. 4160) within which the true answer should lie. But if we desire that the estimate should not deviate from the actual value by more than Rs. 200 in either direction, in that case the range would be Rs. 3800 to Rs. 4200.

11. Confidence level and Significance level - The confidence level or reliability is the expected percentage of times that the actual value will fall within the stated precision limits. Thus, if we take a confidence level of 95%, then we mean that there arc 95 chances in 100 (or .95 in 1) that the sample results represent the true condition of the population within a specified precision range against 5 chances in 100 (or .05 in 1) that it does not. Precision is the range within which the answer may vary and still be acceptable; confidence level indicates the likelihood that the answer will fall within that range, and the significance level indicates the likelihood that the answer will fall outside that range. We can always remember that if the confidence level is 95%, then the significance level will be (100 — 95) i.e., 5%:, if the confidence level is 99%, the significance level is (100 — 99) i.e., 1%, and so on. We should also remember that the area of normal curve within precision limits for the specified confidence level constitutes the acceptance region and the area of the curve outside these limits in either direction constitutes the rejection regions.

12. Sampling distribution - We are often concerned with sampling distribution in sampling analysis. If we take certain number of samples and for each sample compute various statistical measures such as mean, standard deviation, etc., then we can find that each sample may give its own value for the statistic under consideration. All such values of a particular statistic, say mean, together with their relative frequencies will constitute the sampling distribution of the particular statistic, say mean. Accordingly, we can have sampling distribution of mean, or the sampling distribution of standard deviation or the sampling distribution of any other statistical measure. It may be noted that each item in a sampling distribution is a particular statistic of a sample. The sampling distribution tends quite closer to the normal distribution if the number of

2


samples is large. The significance of sampling distribution follows from the fact that the mean of a sampling distribution is the same as the mean of the universe. Thus, the mean of the sampling distribution can be taken as the mean of the universe.

13. Bias - Generally, an effect which deprives a statistical result of representativeness by systematically distorting it, as distinct from a random error which may distort on any one occasion but balances out on the average

14. Biased sample - A sample obtained by a biased sampling process, that is to say, a process which incorporates a systematic component of error, as distinct from random error which balances out on the average Non-random sampling is often, though not inevitably, subject to bias, particularly when entrusted to subjective judgment on the part of human beings

CENSUS SURVEY AND SAMPLE SURVEY:Census survey means survey or complete enumeration of population with certain objectives. The government in India after every ten years conducts such census survey. The entire geographical area and entire population is covered in census survey. The data collected are tabulated and published as census report. Such census data are used for different purposes including economic planning and policy decisions. Census survey is a costly and time-consuming activity and also needs huge organization and manpower for its orderly conduct. In commercial research, such census survey is not conducted due to various constraints particularly relating to funds, time and manpower.

Census implies collection of information from each element of the group or population of interest, (e.g. Survey of industrial consumers). In many cases, complete enumeration is not possible and the only alternative available is sampling.

Sample survey is the survey of a small representative part of the population taken up for detailed scrutiny and study purpose. A sample is a small representative of the whole and conclusions drawn from such sample are equally applicable to the entire population. Sample survey gives the benefits of census survey but with less time, expenditure and manpower. It is a better substitute to census survey. Sample surveys are commonly conducted in marketing research projects and gives promising results.

A survey which is carried out using a sampling method i.e. using a representative portion of the whole population is called sample survey which is a short cut alternative to census survey but gives similar benefits.

REASONS FOR IMPRACTICALITY OF CENSUS

There are certain reasons that make census impractical or even impossible. The reasons

are as follows:

1. Cost: Cost is an obvious constraint on the determination of whether a census should be

taken. If information is desired on grocery purchase and use behavior (frequencies and

3


amounts of purchase of each product category, average amount kept at home and the

like) and the population of interest is all households in a country, the cost will

preclude a census being taken. Thus a sample is the only logical way of obtaining new

data from a population of this size.

2. Time: The kind of cost we have just considered is an outlay cost. The time involved in

obtaining information from either a census or a sample involves the possibility of also

incurring an opportunity cost. That is, the decision until information is obtained may

result in smaller gain or a larger loss than would have been the case from making the

same decision earlier. The opportunity to make more (or save more. as the case may

be) is, therefore, foregone.

3- Accuracy: A study using a census, by definition, contains no sampling error. A study

using a sample may involve sampling error in addition to other types of error. Other

things being equal, a census will provide more accurate data than a sample. However

it has been argued that a more accurate estimate of the population of a country could

be made from a sample than from a census. Taking a census of a population on a

"mail out - mail back" basis requires that the names and addresses of almost all

households be obtained, census questionnaires mailed, and interviews conducted of

those not responding. The questionnaires are sent to a population of which only about

half have completed high school. The potential for errors in a returned questionnaire

is therefore high.

4. Destructive nature of the measurement: Measurements are sometimes destructive in

nature. When they are, it is apparent that taking a census would usually defeat the

purpose of a measurement. If one were producing firecrackers, electrical fuses, or gas

seed, performing a functional use test on a all products for quality control purposes

would not be considered from an economic standpoint. A sample is then the only

practical choice. On the other hand, if the light bulbs, bicycles, or electrical appliances

are to be tested, a 100% sample (census) may be entirely reasonable.

According to Crisp R. D., the fundamental idea of sampling is that a small number of items or parts (called a sample) are chosen at random from a large number of items or a whole (called a universe or population) the sample will tend to have the same characteristics and in approximately the same proportion as the universe.

4


FEATURES OF SAMPLING(1) Sampling is a small representative of the whole. It is an effective alternative to the census survey.(2) Sampling reduces the time, efforts and money of the researcher on data collection without any adverse effect on its quality.(3) The sampling technique is based on the assumption that random selection of sample from the universe do possesses the same features and characteristics as that of the universe.(4) The findings of sample survey are accurate and reliable. The larger sample is better as the results available are more accurate.(5) Sampling is used in data collection as well as for different purposes in our daily life. (6) The concept of sampling is quite common and popular in marketing research as it helps researchers to finalize their findings and recommendations within a short period.

FEATURES / ATTRIBUTES OF A GOOD / RELIABLE SAMPLE(1) Goal-oriented: A sample design should be goal oriented. It is means and should be oriented to the research objectives and fitted to the survey conditions.(2) Accurate representative of the universe: A sample should be an accurate representative of the universe from which it is taken. There are different methods for selecting a sample. It will be truly representative only when it represents all types of units or groups in the total population in fair proportions. In brief sample should be selected carefully as improper sampling is a source of error in the survey.(3) Proportional: A sample should be proportional. It should be large enough to represent the universe properly. The sample size should be sufficiently large to provide statistical stability or reliability. The sample size should give accuracy required for the purpose of particular study.(4) Random selection: A sample should be selected at random. This means that any item in the group has a full and equal chance of being selected and included in the sample. This makes the selected sample truly representative in character.(5) Economical: A sample should be economical. The objectives of the survey should be achieved with minimum cost and effort.(6) Practical: A sample design should be practical. The sample design should be simple i.e. it should be capable of being understood and followed in the fieldwork.(7) Actual information provider: A sample should be designed so as to provide actual information required for the study and also provide an adequate basis for the measurement of its own reliability.

In brief, a good sample should be truly representative in character. It should be selected at random and should be adequately proportional. These, in fact, are the attributes of a good sample.

ADVANTAGES OF SAMPLING METHOD:(1) Saves time and money: Sampling facilitates primary data collection easily / quickly and with less cost. It is time saving and economical method of survey for data collection.

5


(2) Provides reliable data: The conclusions drawn from the sample survey are reliable, accurate and also applicable to the whole population/universe. Sampling has no adverse effect on the quality of data collected. It gives quality results with lesser volume of work.(3) Scientific base: The concept of sampling has scientific backing as it is based on the law of statistical regularity and the law of inertia of large numbers.(4) Facilitates better supervision on data collection: Sampling method is restricted to limited number of respondents. Naturally effective monitoring and supervision on the data collection work is possible. This improves the quality of data collected.

LIMITATIONS OF SAMPLING METHOD:(1) Findings are not completely accurate: The findings of sampling method are

reasonably accurate but not completely accurate .The findings and conclusions drawn from sample survey may be comparatively less accurate compared with that available from the census technique in which the entire population is covered.

(2) Findings may not be reliable: The findings may not be reliable if the sample selected is too small or is not adequately representative in character. In such cases the conclusions drawn may be misleading and this may affect the quality of research work.

(3) Difficulties in the selection of representative sample: There are many practical difficulties in the selection of representative sample. This may defeat the very purpose of sampling..

(4) Data collection difficult in the case of large sample: Data collection becomes difficult when large size sample is decided. This also leads to more time and money for data collection.

A sample survey is a better alternative to the census or complete investigation, which is lengthy and also costly. For example, census reports are published by the Government two or three years after the collection of data. However, survey reports (based on samples) can be prepared and published within a few months. Thus, sampling is widely used methodology in MR. It is one vital element of research design.

STEPS IN SAMPLING PROCESS:Having looked into the major advantages and limitations of sampling, we now turn to the sampling process. It is the procedure required right from defining a population to the actual selection of sample elements. There are seven steps involved in this process.

Step 1: Define the population

It is the aggregate of all the elements defined prior to selection of the sample. It is necessary to define population in terms of

(i) elements

(ii) sampling units

(iii) extent

(iv) time.

6


A few examples are given here.

If we were to conduct a survey on the consumption of tea in Gujarat, then these specifications might be as follows(i) Element: Housewives (ii) Sampling units: Households, then housewives(iii) Extent Gujarat State (iv) Time January 1-10, 1999

If we were to monitor the sales of a product recently introduced by us, the population might be

(i) Element Our product

(ii) Sampling units Retail outlets, super markets, then our product

(iii) Extent Delhi and New Delhi

(iv) Time January 7-14, 1999

It may be emphasized that all these four specifications must be contained in the designated population Omission of any of them would render the definition of population incomplete

Step 2 : Identify the sampling frame

Identifying the sampling frame, which could be a telephone directory, a list of blocks and localities of a city, a map or any other list consisting of all the sampling units. It may be pointed out that if the frame is incomplete or otherwise defective, sampling will not be able to overcome these shortcomings

The question is—How to ensure that the frame is perfect and free from any defect Leslie Kish has observed that a perfect frame is one where "every element appears on the list separately, once only once, and nothing else appears on the list" This type of perfect frame would indicate one-to-one correspondence between frame units and sampling units But such perfect frames are rather rare Accordingly, one has to use frames with one deficiency or another, but one should ensure that the frame is not too deficient so as to be given up altogether

This raises a pertinent question -What are the criteria for a suitable frame? In order to examine the suitability or otherwise of a sampling frame, a number of questions need be asked. These are

1 Does it adequately cover the population to be surveyed?2 How complete is the frame? Is every unit that should be included represented?3 Is it accurate? Is the information about each individual unit correct? Does the frame as a

whole contain units, which no longer exist?4 Is there any duplication? If so, then the probability of selection is disturbed as a unit can

enter the sample more than once

7


5 Is the frame up-to-date? It could have met all the criteria when compiled but could well be deficient when it came to be used This could well be true of all frames involving the human population as change is taking place continuously

6 How convenient is it to use? Is it readily accessible? Is it arranged in a way suitable for sampling? Can it easily be re-arranged so as to enable us to introduce stratification and to undertake multi-stage sampling?

These are demanding criteria and it is most unlikely that any frame would meet them all Nevertheless, they are the factors to be borne in mind whenever we undertake random samplingIn marketing research most of the frames are from census reports, electoral registers, lists of member units of trade and industry associations, lists of members of professional bodies, lists of dwelling units maintained by local bodies, returns from an earlier survey and large scale maps.

Step 3: Specify the sampling unit The sampling unit is the basic unit containing the elements of the target population. The sampling unit may be different from the element. For example, if one wanted a sample of housewives, it might be possible to have access to such a sample directly. However, it is easier to select households as the sampling unit and then interview housewives in each of the households.

As mentioned in the preceding step, the sampling frame should be complete and accurate otherwise the selection of the sampling unit might be defective. It is necessary to get a further specification of the sampling unit both in personal interviews and in telephone interviews. Thus, in personal interviews, a pertinent question is—of the several persons in a household, who should be interviewed? If interviews were held during office timings when the heads of families and other employed persons are away, interviewing would under-represent employed persons and over-represent elderly persons, housewives and the unemployed. In view of these considerations, it is necessary to have a random process of selection of the adult residents of each household. One method that could be used for this purpose is to list all the eligible persons living at a particular address and then select one of them.

Step 4: Specify the sampling method

It indicates how the sample units are selected. One of the most important decisions in this regard is to determine which of the two—probability and non-probability sample—is to be chosen.

In case of a probability sample, the probability or chance of every unit in the population being in the sample is known. Further, the selection of specific units in the sample depends entirely on chance. No substitution of one unit for another is permissible. This means that no human judgment is involved in the selection of a sample. In contrast, in a non-probability sample, the probability of inclusion of any unit in the population in the sample is not known. In addition, the selection of units within a sample involves human judgment rather than pure chance.

8


In case of a probability sample, it is possible to measure the sampling error and thereby determine the degree of precision in the estimates with the help of the theory of probability. This theory also enables us to consider, from amongst the various possible sample designs, the one that will give the maximum information per rupee. This is not possible when a non-probability sample is used.Probability sampling enables us to choose representative sample designs. It also enables us to estimate the extent to which the results based on such a sample are likely to be different from what we would have obtained had we covered the population in our study. Conversely, the use of probability sampling enables us to determine the sample size for a given degree of precision, indicating that our sample results do not differ by more than a specified amount from those yielded by a study covering entire population.

Although non-probability sampling does not yield these benefits, on account of its convenience and economy, it is often preferred to probability sampling. If the researcher is convinced that the risks involved in the use of a non-probability sample are more than offset by its being relatively cheap and convenient, his choice should be in favor of non-probability sampling.

There are various types of sample designs that can be covered under the two broad groups, random or probability samples and non-random or non-probability samples.

Step 5: Determine the sample size

In other words, one has to decide how many elements of the target population are to be chosen.

Step 6: Specify the sampling plan

This means that one should indicate how decisions made so far are to be implemented. For example, if a survey of households is to be conducted, a sampling plan should define a household, contain instructions to the interviewer as to how he should take a systematic sample of households, advise him on what he should do when no one is available on his visit to the household, and so on. These are some pertinent issues in a sampling survey to which a sampling plan should provide answers.

Step 7: Select the sample

This is the final step in the sampling process. A good deal of office and fieldwork is involved in the actual selection of the sampling elements. Most of the problems in this stage are faced by the interviewer while contacting the sample-respondents.

SAMPLING METHODS/SAMPLING DESIGNSSample designs are different methods used for the conduct of sample survey. Quota sampling, judgment sampling etc. are the non-probability sample designs while random sampling, area sampling, etc. are the probability sample designs. In brief, the sample designs are divided into the following two categories: (a) Probability Sampling Methods(b) Non-Probability Sampling Method

9


(a) Probability/Random Sampling MethodsIn the probability sampling methods, the sample units are selected at random. This means the selection is haphazard/arbitrary. Every member in the universe has equal chance of being selected as the representative. The fact that any item can be selected is known. The selection of sampling item is impartial and independent of the person making the study. There is no scope for any biased selection of sample units.

Probability sampling methods include random sampling, stratified, cluster, sampling, etc. Such methods are used extensively in marketing research. These methods provide unbiased information. The probability sampling methods are objectively designed. However, these methods are time consuming and also costly for use. Greater statistical competence and time are required to plan and use probability sampling methods.

(b) Non-probability Sampling Methods

10

Types of Sampling Methods

Probability Sampling Non - Probability Sampling

Quota Sampling

Judgment Sampling

Convenience Sampling

Simple Random Sampling

Stratified Sampling

Cluster Sampling Multi-Phase Sampling

Systematic Sampling

Area Sampling

Multi-Stage Sampling

Replicated Sampling

Sequential Sampling

Master Samples

Panel Samples


Here, sample units are selected in a non-random manner. The selection may be purposive. It may be based on the convenience or the judgment of the researcher. The selection is deliberate not random. Every item is not given a definite chance of being included in the sample. The non-probability sampling ' methods include convenience sampling, judgment sampling, and quota sampling. In these methods, the sample is selected in a subjective manner and the decision regarding sample is taken by the researcher * himself. The sample selected may not be representative of the universe to be studied. The selection of sample may be influenced by the subjective consideration of the person connected with research work (researcher).

Non-probability sampling methods are also used in marketing research along with probability methods. Such methods are sometimes preferred because they cost less per observation, require less time and need relatively little statistical sophistication in planning the sample design and in the selection the respondents. Probability sampling methods are more scientific and capable of yielding more representative samples than non-probability sampling methods. However, there is no sampling method (probability or non-profitability) that can be considered to be best in all situations. Any suitable method may be selected and used properly for promising results.

PROBABILITY SAMPLING V/S NON-PROBABILITY SAMPLING

Probability Sampling Non-Probability SamplingMeaning (i) Probability sampling provides

an equal chance of being selected in the sample to each element of the population.(ii) A probability sample is one, where the selected units have some specific chance of being included in the sample.

(i) Non-Probability sampling does not provide an equal chance of being selected in the sample to each element of the population.(ii) A non-probability sample is arbitrarily selected.

Type of method It is a systematic and modern method of sampling

It is a traditional and rather outdated method of sampling.

Selection of sample

The sample is selected by chance or at random

The sample is selected by choice

Selection process The selection process is controlled objectively so that the items will be chosen strictly at random

The selection process is, at least partially, subjective

Benefit It helps to select a truly representative sample Here, the selection of sample items is independent of the person

The sample selected may or may not be a true representative of the whole population as it is selected as per the convenience of the

11


making the study (researcher) researcherNature of process It is a mechanical and

mathematical processIt is a mental process/exercise of the researcher

(A) PROBABILITY SAMPLING METHODS

(1) SIMPLE RANDOM SAMPLINGRandom sampling is one popular and extensively used sampling method In this method, each and every unit of the population has an equal chance of being selected or included in the sample Random selection does not mean haphazard selection It is one type of selection in which every item in the universe has an equal chance of being selected alone with all other items In random sampling, the complete list of the universe is taken but the selection is made 'at random' from this list However, some uniform system is used for the selection of sample Random sampling is useful for the conduct of telephone or mail survey It is an ideal method in the surveys of specialized nature

The process of randomness does not mean that it is 'haphazard', as a layman may be inclined to think. What it means is that the process of selecting a sample is independent of human judgment. To ensure this, there are two methods that are followed when drawing a random sample. These are: (i) the lottery method and (ii) the use of random numbers.

In the lottery method, each unit of the population is numbered and shown on a chit of paper or disc. The chits are folded and put in a box from which a sample of the requisite size is to be drawn. In case discs are used, these are well mixed up before a draw is made so that no particular unit can be identified before it gets selected. The sample is drawn in the same manner as winning numbers in a lottery are drawn

In the second method, the tables of random numbers are used. The members of the population are numbered from 1 to N from which n members are selected. This process is explained below with the help of an illustration.

Suppose a sample of size 50 is to be selected from a population of 500. First, number the 500 units from 1 to 500, the order being quite immaterial. While numbering the units, ensure that each unit in the population has uniform digits, in this case, three. Thus, 1st unit would have a three-digit number 001, 2nd unit 002, 10th unit 010, 11th unit O11, and so on. After the units have been given three-digit numbers, the table of random numbers is to be used. One may start from the left-hand top corner of the table of random numbers and proceed systematically down sets of three-digit columns, rejecting numbers over 500 and those that have occurred earlier.Using the first thousand numbers from the table of random numbers (an excerpt from the table is given below), a sample of 50 out of 500 will thus be chosen.231 055 148 389 117 433 495 367 070 313092 259 113 455 126 426 062 401 100 488434 325 211 207 398 225 485 035 171 047

12


318 263 239 108 379 420 122 441 493 310032 194 144 337 224 006 068 043 500 222

Advantages Of Simple Random Sampling Method(1) Simplicity: Simple random sampling is simplest method of probability sampling and can be used for different types of surveys(2) Scientific: This method is scientific as there is equal opportunity to every unit for selection as sample(3) Truly representative character: The samples selected by this method are truly representative in character.(4) Quality results: Random sampling can be used effectively (for quality results) when the universe to be studied is small and can be listed accurately (e. g. motor car owners in a city)

Limitations Of Simple Random Sampling Method(1) Difficult when the universe is very large: In simple random sampling, the whole list of universe is taken up for selection Obtaining the complete and up-to-date list of the universe is difficult It is difficult particularly when the universe is very large in number.(2) Costly: The cost for conducting survey by this sampling method is high as the samples are selected at random and it is obligatory to contact them and collect the information(3) May prove inefficient: This method may prove to be statistically inefficient and provide a larger standard of error than the other types of sampling designs(4) Administrative difficulties: Random sampling involves administrative difficulties as regards the selection of sample and follow-up measures for the collection of information(5) May not be fully represented: The sample selected may not be fully representative as the selection is from the whole population and not from the groups that constitute the population

(2) STRATIFIED SAMPLING:In stratified sampling, the units included in the sample constitute roughly the same population in which they are present in the total population

Stratified sampling is also called proportional random sampling. In this sampling, the population is first subdivided into certain mutually exclusive groups or strata Such groups may be formed on the basis of geographical area / size of the household or income After stratification, a random sample of a given size is selected from each stratum of the total population This is how an attempt is being made to make the sample more representative in character Here, each of the strata is represented in the sample in relation to its importance

13


The following example will make this clear.

Strata income per month (Rs)

(1)

Population number of households

(2)

Sample (Proportionate)

(3)

Sample (Disproportionate)

(4)

0-500 5,000 50 75501-1000 4,000 40 201001-2000 3,000 30 202001-3000 2,000 20 253001 + 1,000 10 10

15,000 150 150

In the above example, the population consists of 15,000 households, divided into five strata on the basis of monthly income. Column (3) of the table shows the sample, i.e., number of households selected from each stratum. The sample constitutes one per cent of the population. A sample of this type, where each stratum has a uniform sampling fraction, is called a proportionate stratified sampling. If, on the contrary, the strata have variable sampling fractions, the sample is called a disproportionate stratified sample. The figures given in column (4) of the above table show a disproportionate stratified sample. It will be seen that the sampling fraction varies from one stratum to another. Thus, for example, it is 0.015 for the monthly income Rs 0-500 and 0.01 for the stratum, Rs 3001+.

It may he noted that a stratified random sample with a uniform sample fraction results in greater precision than a simple random sample. But, this is possible only when the selection within strata is made on a random basis. Further, a stratified proportionate sample is generally convenient on account of practical considerations,

There are some other considerations in favor of the stratified random sample. The researcher may be interested in the results for separate strata rather than for the entire population. A simple random sample will not show results by strata as it presents only an aggregative picture. Another consideration is that it may be administratively expedient to split the population into strata. Yet another consideration is that one can use different procedures for selecting samples from various strata. If the data are more variable in any particular strata, a larger sampling fraction should be taken in that stratum. This would result in greater overall precision

This method reduces the sampling error and it is a more accurate and representative sampling method Naturally, it is treated as an improvement over simple random sampling. It provides information about different components of the total population Use of stratified sampling also leads to administrative conveniences In order to use a stratified sample, some information regarding the population and its strata should be available to the researcher

The process of stratified random sampling differs from simple random sampling In simple random sampling, sample items are chosen at random from the entire universe while in stratified random sampling, a separate random sample is chosen from each

14


stratum Stratified random sampling is used in order to increase the precision of sampling estimates.

(3) SYSTEMATIC RANDOM SAMPLING:In systematic random sampling method, the units of a population are first listed and the sample is selected as per a well-defined system. The sample is drawn by selecting every nth item is the sampling frame, "n" is determined on the basis of the desired size of the sample A number is drawn at random, usually a number between 1 and 10 is selected For example, we have 50,000 items in the universe and a sample size is decided as 5,000 items In our case 'n' is equal to 10 Naturally, we have to select every 10th item from the universe However, the first item is selected at random e.g. let us take 3. Such numbers are like 3, 13, 23, 33, 43, etc Advantages of Systematic Random Sampling (a) It is a simple and unbiased sampling method. (b) It ensures speedy selection of sample. (c) It is more efficient statistically than simple random sampling. (d) It ensures more representative sample.

Disadvantages of Systematic Random Sampling (a) It is time consuming and costly. (b) It can go wrong if every sample is assumed to be similar(c) It can create more confusion if the selection of sample is reckless.

(4) CLUSTER SAMPLING:In cluster sampling, individual units are not selected as sample but are grouped together and are selected group-wise for inclusion in the sample Thus, groups are selected on random basis as sample For example, the total universe will be divided into number of groups. Each group contains equal number of items. The sample will be selected in groups only. Similarly, if one family is selected as sample, the information will be collected from each member of the family. Such selection of sample in group form is called cluster sampling.

For example, if a survey is to be undertaken in a city to collect data from individual households, then, selection of households from all over the city would involve a considerable amount of fieldwork and consequently, would cost more. Instead, a few localities are first chosen. Then, all the households in these localities are covered in the sample. Apart from reduction in cost, such a cluster sample would be desirable in the absence of a suitable sampling frame for the whole population. If, on the other hand, a sample of individual households from the entire city is to be chosen, it will be necessary to first undertake the listing of all households. In view of the non-availability of a satisfactory sampling frame, in the case of cluster sampling, such a listing could be confined to only a few localities that are to be entirely covered in the sample.

A few points regarding cluster sampling may be noted here. First, "whether or not a particular aggregate of units should be called a cluster" will depend on the circumstances

15


of each case. In foregoing example, localities were taken as clusters and households as individual units. In another case, the households may be taken as a cluster and the members of the households as individual units. Second, it is not necessary that clusters should always be natural aggregates such as locality constituencies, schools or classes. Artificial clusters may be formed, as is generally done in area sampling where grids may be determined on the maps. Third, several levels of clusters may be used in any one sample design. Thus, in a city survey, localities or wards, streets and households may be selected in which case localities or wards are the clusters at the first level and streets at the second level and households would be the units.

Cluster sampling method is less costly as the expenditure on traveling of interviewers is minimized. It is useful when the researcher desires to study the characteristics of certain individuals or items of identical nature.

(5) AREA SAMPLING:

Area sampling is a form of multi-stage sampling in which maps, rather than lists or registers, are used as the sampling frame. This method is more frequently used in those countries that do not have a satisfactory sampling frame such as population listsIn area sampling, the overall area to be covered in a survey is divided into several smaller areas within which a random sample is selected Thus, for example, a city map can be used for area sampling Various blocks can be identified on the map and this can provide a suitable frame The entire city area can be divided into these blocks which are then numbered and from which a random sample is finally drawn

In sampling the blocks, stratification and sampling with probability proportional to a measure of size are commonly employed. However, stratification in area sampling is based on geographical considerations Thus, when blocks are identified and numbered on the map, they can be grouped into some meaningful strata representing the different neighborhoods of the town. The point to emphasize is that these blocks must be identifiable without any difficulty

On the basis of the blocks thus identified, numbered and assigned to strata, a stratified sample of dwellings can be selected This can be done in either of two ways First, a sample of dwellings may be drawn from all the dwellings included in a selected block Second, blocks may be divided into segments of a more or less equal size, and a sample of these segments can be chosen and finally all the dwellings from the selected segments may be taken in the sample It will thus be seen that the second method introduces another stage of sampling, namely, segments

Although the above discussion relates to area sampling with respect to a city or town, the same approach is applicable to a large area, say, a state or a country, the only difference being that one or more additional stages of sampling may have to be introduced

Finally, it may be pointed out that area sampling is perhaps the only possibility if a suitable sampling frame is not available

16


(6) MULTI-STAGE SAMPLING

Multi-stage sampling, as the name implies, involves the selection of units in more than one stage. In such a sampling, the population consists of a number of first stage units, called primary sampling units (PSUs). Each of these PSUs consists of a number of second-stage units. First, a sample is taken of the PSUs, and then a sample is taken of the second-stage units. This process continues until the selection of the final sampling units. It may be noted that at each stage of sampling, a sample can be selected with or without stratification.

An illustration would make the concept of multi-stage sampling clear. Suppose a sample of 5000 urban households from all over the country is to be selected. In such a case, the first stage sample may involve the selection of districts. Suppose 25 districts out of say 500 districts are selected. The second stage may involve the selection of cities, say four from each district. Finally, 50 households from each selected city may be chosen. Thus, one would have a sample of 5000 urban households, arrived at in three stages. It is obvious that the final sampling unit is the household.

In the absence of multi-stage sampling of this type, the process of the selection of 5000 urban households from all over the country would be extremely difficult. Besides, such a sample would be very thinly spread over the entire country and if personal interviews are to be conducted for collecting information, it would be an extremely costly affair. In view of these considerations a sampling from a widely spread population is generally based on multi-stage.

The number of stages in a multi-stage sampling varies depending on convenience and the availability of suitable sampling frames at different stages. Often, one or more stages can be further included in order to reduce cost. Thus, in our earlier example, the final stage of sampling comprised 50 households from each of the four selected cities. Since this would involve the selection of households all over the city, it would turn out to be quite expensive and time consuming if personal interviews are to be conducted. In such a case, it may be advisable to select two wards or localities in each of the four selected cities and then to select 25 households from each of the 2 selected wards or localities. Thus, the cost of interviewing as also the time in carrying out the survey could be reduced considerably.

It will be seen that an additional stage comprising wards or localities has been introduced here. Thus the sample has become a four-stage sample –1st stage – districts2nd stage – cities3rd stage – localities4th stage – households

From the preceding discussion it should be clear that a multi-stage sample results in the concentration of fieldwork. This in turn, leads to saving time, labor and money. There is another advantage in its use. Where a suitable sampling frame covering the entire population is not available, a multi-stage sample can be used.

17


(7) MULTI-PHASE SAMPLINGA multi phase sample should not be confused with a multi-stage sample The former involves a design where some information is collected from the entire sample and additional information is collected from only a part of the original sample Suppose a survey is undertaken to determine the nature and extent of health facilities available in a city and the general opinion of the people. In the first phase a general questionnaire can be sent out to ascertain who amongst the respondents had at one time or other used the hospital services. Then, in the second stage, a comprehensive questionnaire may be sent to only these respondents to ascertain what they feel about the medical facilities in the hospitals. This is a two-phase or double sampling.

The main point of distinction between a multi-stage and a multi-phase sampling is that in the former each successive stage has a different unit of sample whereas in the latter the unit of sample remains unchanged though additional information is obtained from a sub-sample.

The main advantage of a multi-phase sampling is that it effects economy in time, money and effort. In our earlier example, if a detailed questionnaire is sent out to a large sample comprising individuals, they would not be able to provide the necessary information. Second, more time will be required. Finally, it will be far more expensive to carry out the survey, especially when personal interviews are involved.

(8) REPLICATED SAMPLINGReplicated sampling implies a sample design in which "two or more sub-samples are drawn and processed completely independent of each other" It was first introduced by “Mahalnobis" in 1936, who used the term inter-penetrating sub-samples.

In replicated sampling, several random sub-samples are selected from the population instead of one full sample. All the sub-samples have the same design and each one of them is a self-contained sample of the population. For example, take the case of a random sample of 10 households. This sample may be divided into, say, 10 equal sub-samples to be assigned to 10 interviewers. Thus, each interviewer may be required to collect information from 10 households.

A replicated sample is particularly chosen on account of the convenience it affords in the calculation of standard error. In many complex sample designs, the calculation of standard error becomes too laborious. Selecting a replicated sample design can considerably reduce this difficulty. However, in modem times when computers are being increasingly used, the ease in calculating standard error has made it somewhat less important. Apart from this advantage, there are certain other advantages of replicated sampling. First, if the size of a sample is too large, it may be advisable to split it up into two or more sub-samples. One sub-sample may be used to get the advanced results of the survey. Second, replicated sampling can indicate the non-sampling errors.

However, replicated sampling would not be helpful in undertaking a detailed investigation of bias as the numbers in the separate sub-samples tend to be small Further,

18


such samples do not reveal any systematic errors that may be more or less common to all interviewers and the compensating errors which cancel each other out over an interviewer's assignment.

Apart from the above limitations, replicated samples have other disadvantages If personal interviews are to be conducted, replicated samples turn out to be costlier Likewise, tabulation costs would be higher than in the case of a single large sample Finally, replicated samples are more complex to administer.

(9) SEQUENTIAL SAMPLINGIn sequential sampling, a number of samples n1, n2, n3…nx are randomly drawn from the population It is not at all necessary that each sample should be of the same size Generally, the first sample is the largest, the second is smaller than the first, the third is smaller than the second, and so on

A sequential sampling is resorted mainly to bring down the cost and hence the smallest possible sample is used The desired statistics from first sample, n i, are computed and evaluated If these statistics do not satisfy the criteria laid down, a second sample is drawn The results of the first and second samples are added and the statistics are recomputed This process is continued until the specified criteria are satisfied The criteria are usually a minimum significance level, a minimum cluster size, or a minimum confidence interval

The main advantage of sequential sampling is that it obviates the need for determining a fixed sample size before the commencement of the survey

Suppose a firm is to decide whether a new product is to be introduced in the market or not It feels that if it is able to acquire 15 per cent market share in a country within a year, it should introduce the new product Further, it feels that if a market share of 10 per cent in a few test markets is achieved, it would be possible to acquire a 15 per cent market share in the country, say, within a period of six months Now, when the firm has undertaken test marketing, it actually achieved far more than 10 per cent, say, 20 per cent, of the market share and that too within three months of test marketing The firm may be sure to achieve the 15 per cent national market share within one year even though it may not be possible for it to accurately forecast the test market share at the end of four months

(B) NON-PROBABILITY SAMPLING METHODS

(1) CONVENIENCE SAMPLINGIn convenience sampling, the convenience of the researcher is given importance while selecting the sample. The researcher as per his convenience decides inclusion of units in the sample. The items that are easily accessible or easily measurable are included in the sample. Specific plan/system/method is not used for the selection of items in sample. As a result bias is likely to enter into the sample selected.

19


Interviewing respondents on the street or at the bus stop or at the railway station are the examples of convenience sampling. In this sense, convenience sampling is also called accidental sampling, as the respondents in the sample are included merely on account of their being available on the spot where the survey work is in progress. Convenience sampling is more suitable in exploratory research, where the focus is mainly on getting new ideas and insights into a given problem.

Advantages of Convenience Sampling (a) It is profitably used in pre-testing of questionnaires(b) It keeps the researcher free of tension. (c) It allows the respondents to answer questions in leisure.

Disadvantages of Convenience Sampling(a) Sampling could be non-representative of the population e.g., students living in college

town may not represent sample of student community. (b) Problem of element of chance(c) It cannot rule out bias of respondents.

(2) QUOTA SAMPLING

Quota sampling is quite frequently used in marketing research. It involves the fixation of certain quotas, which are to be fulfilled by the interviewers

Suppose in a certain territory we want to conduct a survey of households Their total number is 2,00,000 It is required that a sample of 1 per cent, i.e. 2000 households are to be covered We may fix certain controls which can be either independent or inter-related These controls are shown in the following tables

A sample of 2000 households has been chosen, subject to the condition that 1200 of these should be from rural areas and 800 from the urban areas of the territory Likewise, of the 2000 households, the rich households should number 150, the middle class ones 650 and the remaining 1200 should be

Independent Controls

Rural 1200 Rich 150Urban 800 Middle class 650

Poor 1200Total 2000 Total 2000

Inter-related ControlsRural Urban Total

Rich 100 50 150Middle class 400 250 650Poor 700 500 1200Total 1200 800 2000

20


from the poor class These are independent quota controls The second table shows the inter-related quota controls As can be seen, inter-related quota controls allow less freedom of selection of the units than that available in the case of independent controls

There are certain advantages in both the schemes Independent controls are much simpler, especially from the viewpoint of interviewers They are also likely to be cheaper as interviewers may cover their quotas within a small geographical area In view of this, independent controls may affect the representativeness of the quota sampling Interrelated quota controls are more representative though such controls may involve more time and effort on the part of interviewers Also, they may be costlier than independent quota controls

In view of the non-random element of quota sampling, it has been severely criticized especially by statisticians, who consider it theoretically weak and unsound There are points both in favor of and against quota sampling These are given below

Advantages of quota sampling

(a) It is economical as traveling costs can be reduced An interviewer need not travel all over a town to track down pre-selected respondents However, if numerous controls are employed in a quota sample, it will become more expensive though it will have less selection bias

(b) It is administratively convenient The labor of selecting a random sample can be avoided by using quota sampling Also, the problem of non-contacts and call-backs can be dispensed with altogether

(c) When the field work is to be done quickly, perhaps in order to minimize memory errors, quota sampling is most appropriate and feasible

(d) It is independent of the existence of sampling frames Wherever a suitable sampling frame is not available, quota sampling is perhaps the only choice available

Limitations of Quota sampling

1 Since quota sampling is not based on random selection, it is not possible to calculate estimates of standard errors for the sample results

2 It may not be possible to get a 'representative' sample within the quota as the selection depends entirely on the mood and convenience of the interviewers

3 Since too much latitude is given to the interviewers, the quality of work suffers if they are not competent

4 It may be extremely difficult to supervise the control and field investigation under quota sampling

(3) JUDGEMENT SAMPLING

The main characteristic of judgment sampling is that units or elements in the population are purposively selected It is because of this that judgment samples are also called purposive samples Since the process of selection is not based on the random method, a judgment sample is considered to be non-probability sampling

21


Occasionally it may be desirable to use judgment sampling Thus, an expert may be asked to select a sample of 'representative' business firms The reliability of such a sample would depend upon the judgment of the expert The quota sample, discussed earlier, is in a way a judgment sample where the actual selection of units within the earlier fixed quota depends on the interviewer

It may be noted that when a small sample of a few units is to be selected, a judgment sample may be more suitable as the errors of judgment are likely to be less than the random errors of a probability sample 16 However, when a large sample is to be selected, the element of bias in the selection could be quite large m the case of a judgment sample Further, it may be costlier than the random sampling

(4) MASTER SAMPLES

A master sample is one from which repeated sub-samples can be taken as and when required from the same area or population This was first used in the United States when the US Master sample of agriculture was taken In this sampling, the rural area of over 3000 US counties was divided into segments of about four farms each "After selecting a systematic sample of 1/8 of the segments, the materials were duplicated and made available, with instruction, at low cost" The crucial point to note in respect of master samples is that "the actual sample for each new survey is not selected directly from the entire population but from a frame of segments and dwellings that was selected earlier from the entire population "

The utility of the samples is limited to a relatively short period for there may be changes in the population which would distort the representative character of the master samples In view of this, master samples should be relatively permanent, say, dwellings rather than individuals or household which frequently undergo changes on account of births, deaths and migration The main advantage of master samples is that they can be expeditiously selected on account of their simplicity Another advantage is that they are economical, because the same master frame is used for drawing samples for several surveys, as a result of which the cost incurred on the preparation of the master frame is spread over these surveys. Further, on account of this economy in each survey, one can initially spend more to create a good master frame. Thus, economy may lead to improved quality in the listing.

(5) PANEL SAMPLES

Panel samples are frequently used in marketing research. In panel samples, the same units or elements are measured on subsequent occasions. To give an example: Suppose that one is interested in knowing the change in the consumption pattern of households. A sample of households is drawn. These households are contacted to gather information on the pattern of consumption, subsequently, say after a period of six months, the same households are approached once again and the necessary information on their consumption is obtained. A comparison of the results of the two sets of data would indicate whether there has been any change, and, if so, to what extent. In fact, the information is collected on a more or less continuous basis with the help of panel samples.

22


Panel samples are extremely convenient and economical and the cost of drawing a second sample is not incurred. But the main limitation of such samples is that it may be difficult to sustain the interest of individuals included in the panel for a long period. Many respondents on the panel may refuse to be interviewed twice or may give poor answers. In either case the quality of the survey will suffer. Another limiting factor in panel samples is that there may be bias on account of the continued participation in the panel. It is felt that the individual is conditioned to some extent by the fact that data on purchases are reported. In such a case the purchase behavior of panel members may become different from others not covered by the panel. Furthermore, panel samples may turn out to be more expensive while locating the same sample of respondents after a lapse of, say, a year, when some of them might have migrated to other areas. This would involve travel costs in addition to being difficult.

CHARACTERISTICS OF A GOOD SAMPLE DESIGNKish mentions that a good sample design requires the judicious balancing of four broad criteria— goal orientation, measurability, practicality and economy.

Goal orientation

This suggests that a sample design "should be oriented to the research objectives, tailored to the survey design, and fitted to the survey conditions" If this is done, it should influence the choice of the population, the measurement as also the procedure of choosing a sample

Measurability

A sample design should enable the computation of valid estimates of its sampling variability Normally, this variability is expressed in the form of standard errors in surveys However, this is possible only in the case of probability sampling In non-probability samples, such as a quota sample, it is not possible to know the degree of precision of the survey results

Practicality

This implies that the sample design can be followed properly in the survey, as envisaged earlier It is necessary that complete, correct, practical and clear instructions should be given to the interviewer so that no mistakes are made in the selection of sampling units and the final selection in the field is not different from the original sample design Practicality also refers to simplicity of the design, i.e. it should be capable of being understood and followed in actual operation of the field work

Economy

Finally, economy implies that the objectives of the survey should be achieved with minimum cost and effort Survey objectives are generally spelt out in terms of precision, i.e. the inverse of the variance of survey estimates For a given degree of precision, the sample design should give the minimum cost Alternatively, for a given per unit cost, the sample design should achieve maximum precision (minimum variance)

23


It may be pointed out that these four criteria come into conflict with each other in most of the cases, and the researcher should carefully balance the conflicting criteria so that he is able to select a really good sample design As there is no unique method or procedure by which one can select a good sample, one has to compare several sample designs that can be used in a survey This means that one has to weigh the pros and cons, the strong and weak points of various sample designs in respect of these four criteria, before selecting the best possible one

METHODS OF DETERMINING SAMPLE SIZE

There are six methods of determining sample size in market research1. Unaided Judgment – When no specific method is used to determine sample size it is

called unaided judgment. Such approach when used to arrive at sample size gives no explicit considerations to either the likely precision of the sample results or the cost of obtaining them (characteristics in which client should have interest). It is an approach to be avoided

2. All-You-Can-Afford - In this method, a budget for the project is set by some (generally unspecified) process and after the estimated fixed costs of designing the project, preparing a questionnaire (if required), analyzing the data & preparing the report are deducted, the remainder of the budget is allocated to sampling Dividing this remaining amount by the estimated cost per sampling gives the sample sizeThis method concentrates on the cost of the information and is not concerned about its value Although cost always has to be considered in any systematic approach to sample size determination, one also needs to give consideration to how much the information provided by the sample will be worth. This approach produces sample sizes that are larger than required as well as sizes that are smaller than optimal

3. Required Size Per Cell - This method of determining sample size can be housed on simple random, stratified random, purposive and quota samples For example, In a study of attitudes with respect to fast food establishments in a local marketing area it was decided that information was desired for two occupational groups and for each of the four age groups This resulted in 2x4 =-8 sample cells. A sample size of 30 was needed per cell for the types of statistical analyses that were to be conducted. The overall sample size was therefore 8 x 30 = 240.

4 Use of Bayesian Statistical Model - The Bayesian model involves finding the difference between the expected value of the information to be provided by the sample size and cost of sample. This difference is known as expected net gain from sampling (ENG) The sample size with the largest positive ENG is chosen.

The procedure for finding the optimal value of ‘n’ or the size of sample under this approach is as under:01. Find the expected value of the sample information (EVSI) for every possible n02. Also workout reasonably approximated cost of taking a sample of every possible

n,03. Compare the EVSI and the cost of the sample for every possible n. In other

words, workout the expected net gain (ENG) for every possible n as stated below:

24


For a given sample size (n): (EVSI) - (Cost of sample) = (ENG)04. From above step the optimal sample size, that value of n, which maximizes the

difference between the EVSI and the cost of the sample, can be determined

The computation of EVSI for every possible n and then comparing the same with the respective cost is often a very cumbersome task and is generally feasible with mechanized or computer help. Hence, this approach although being theoretically optimal is rarely used in practice.

5. Use of Traditional Statistical Model - The formula for traditional statistical model depends upon the type of sample to be taken and it always incorporates three common variables an estimate of the variance in the population from which the sample is to be drawn the error from sampling that the researcher will allow the desired level of confidence that the actual sampling error will be within the

allowable limitsThe statistical models for simple random sampling include estimation of means and estimation of proportion

SAMPLING ERRORSWhatever kind of sample is taken and whatever the sample size there will always be error arising from the sampling process. The extent of such error may be defined as the difference between a sample result, and the result that would have been achieved by undertaking a complete census. Such errors arise because particular types of cases are under-represented or over-represented in the sample compared with the population as a whole. If, for example, the cases are individual consumers, then the under- or over- representation of the sexes, ages or social classes will affect the measurement (and, more importantly, the estimates made from them) of a large number of variables. Lack of representation in the appropriate quantities may be a product of two factors: systematic error (or bias) and random error (or variance).

Systematic errorBias arises when the sampling procedures used bring about over- or under- representation of types of cases in the sample, which is mostly in the same direction. This may happen because:• the selection procedures are not random,• the selection is made from a list that does not cover the population, or uses a procedure

that excludes certain groups,• non-respondents are not a cross-section of the population.

If the selection procedures are not random then it means that human judgement has entered into the selection process. For example, interviewers may be asked to choose

25


respondents at some geographical location or to select households in specified streets. The result is likely to be that certain kinds of people or households or organizations are excluded from the sample. Thus choosing respondents in a shopping centre will miss out people who seldom or never go shopping; the selection of households by an interviewer may result in the omission of flats at the tops of stairs.

If the Electoral Register is used to select adults aged 16 or over, then, as indicated earlier, 16 and 17 year-olds and many of the 18 year-olds will be missing from the list and will be under-represented in the final sample. The use of telephone directories will under represent certain social groups less likely to be in the telephone book (or those who are ex-directory). Duplication in lists, for example in the Yellow Pages, may result in some over-representation. If we try to estimate sales of soap from a sample of private households, then all users in institutions of various kinds will be excluded.

Non-response is a problem for both censuses and samples. For censuses it means that the enumeration will be incomplete. If large numbers are missing, it would be inappropriate to treat those successfully contacted as a representative sample'. For samples, it means that estimates made from the sample will he biased if non-respondents are not themselves representative of the population. If they are representative, then non-response is not so much of a problem; but it may still mean that analyses are made on the basis of too small a sample.

Whatever the reason for the systematic error, the effect will be that all samples that could be drawn from a population will tend to result in the same direction of over- or under-representation. The average of all these samples will then not be the same as the real population average or proportion. Thus if we took lots of samples using a procedure that tended to omit working mothers with young children, then all the samples will manifest such under-representation rather than some over-representing them and some under-representing them so that the average of all samples was very close to the real population proportion.

Systematic errors cannot be reduced simply by increasing the sample size. If certain kinds of people are not being selected, cannot be contacted or are not responding, it will not be 'solved' by taking a bigger sample. Indeed, some kinds of errors -will increase with more interviewers, more questionnaires and greater data-processing requirements. All the researcher can do is minimize the likelihood of bias by using appropriate sample designs. Biases for some variables can be checked, for example against Census data or data from other sources. Sometimes attempts are made to discover the characteristics of non-responders, for example by sending out interviewers to non-respondents to a postal survey, taking 'late' responders as typical of non-responders, or gaining demographic data from the results of another survey that the non-responders have taken part in.

Random errorIf we took a number of random, unbiased samples from the same population there will almost certainly be a degree of fluctuation from one sample to another. Over a large number of samples such errors will tend to cancel out, so that the average of such

26


samples will be close to the real population value However, we usually take only one sample, and even a sample that has used unbiased selection procedures will seldom be exactly representative of the population from which it was drawn. Each sample will, in short, exhibit a degree of error. Such error is often called 'sampling error', 'hut it would he clearer to think of it as 'random sampling error' to distinguish it from bias (which some statisticians and some textbooks, confusingly, categorize as 'non-sampling' error).

Unlike bias, which affects the general sample composition and relates to each variable being measured in unknown ways, random sampling error will differ from variable to variable. The reason for this is that the extent of such error will depend on two factors:• the size of the sample - the bigger the sample, the less the random sampling error (but by a declining amount),• the variability in the population for that particular variable - a sample used to estimate a variable that varies widely in the population will show more random sampling error than for a variable that does not.

These two factors are used as a basis for calculating the likely degree of variability in a sample of a given size for a particular variable. This, in turn, is used as an input for establishing with a specified probability the range of accuracy of sample estimates, or that sample findings are only random sampling fluctuations from a population of cases in which the findings are untrue.

NON-SAMPLING ERRORS

Not all errors in a piece of research are a result of the sampling process Certain kinds of error may arise even if a complete census is taken. There are four main categories of such error:• response errors,• interviewer errors,• non-response errors,• processing errors.

Where research is based on asking people questions then response errors may arise where, for one reason or another, respondents give wrong answers. This may be through dishonesty, forgetfulness, faulty memories. unwillingness or misunderstanding of the questions being asked. Many of these errors arise as a result of poor or inadequate questionnaire design putting it the other way round, the potential for such errors to arise can be minimized by careful design of question-wording, question formulation and questionnaire layout. In interview surveys, whether face-to-face or by telephone, interviewers may themselves misunderstand questions or the instructions for filling them in. they may be dishonest, inaccurate, make mistakes or ask questions in a non-standard fashion. Interviewer training, along with field supervision and control can, to a large extent, remove the likelihood of such errors, but they will never be entirely eliminated, and there is always the potential for systematic differences between the results obtained by different interviewers.

27


In nearly all research there will be missing cases, but in survey research there will always be a degree of non-response because some people will refuse to he interviewed or to complete a questionnaire, some will be ineligible because they turn out not to be part of the survey population, some will terminate the interview or refuse to answer some of the questions, and some will be non-contactable, for example, because they have moved away, died, or are on holiday at the time of the survey. Even where a census is attempted, it will often remain incomplete. The extent of non-response will vary considerably according to the type of research, the topic of the research, and, where based on face-to-face interviews, on the experience and training of the interviewers. Calculating the amount of non-response can be confusing since some researchers will, for example, take the proportion of refusals in the sample drawn, others will take refusals and non-contacts as a proportion of those found eligible, and so on.

Processing errors can arise back at the office, particularly at the stage of entering answers to questions onto a computerized database via a keyboard and screen. Agencies sometimes validate these entries by, in effect, entering them twice, and the computer checks to see if the two entries are identical. Alternatively, some agencies check samples of the entries. It is possible, in addition, to apply range checks and logical checks.

There are, then, a number of sources of non-sampling error, and it is important to bear these in mind when interpreting survey results, whether based on a sample or not. The crucial point is that such errors can arise even if a census is taken.

Total survey error

Any research that is based on addressing questions to people and recording their answers risks error resulting from the respondents themselves and from interviewers where these are used in addition to those kinds of error that arise in any research from data handling, and from inadequacies of sampling. Total survey error is the addition of all these sources of error, both sampling and non-sampling It is difficult to estimate what the total survey error is in any one survey, and it will tend to vary from question to question. What is certainly true is that the error that results from random sampling fluctuations - which is the only kind of error that is taken into account when confidence intervals are calculated or tests are made against the null hypothesis - accounts for only a very small proportion of the total survey error.

Errors of various kinds can always be reduced by spending more money, for example, on more interviewer training and supervision, on random sampling techniques, on pilot testing or on getting a higher response rate. However, the reduction in error has to be traded off against the extra cost involved. Furthermore errors are often interrelated so that attempts to reduce one kind of error may actually increase another, for example, minimizing the non-response errors by persuading more reluctant respondents may well increase response error Non-sampling errors tend to be pervasive, not well-behaved and do not decrease - indeed may increase - with the size of the sample. It is sometimes even difficult to see whether they cause under- or over-estimation of population characteristics. There is, in addition, the paradox that the more efficient the sample design is in

28


controlling random sampling fluctuations, the more important in proportion become bias and non-sampling error.

CONTROLLING NON-SAMPLING ERRORSIn practice, market research agencies make all reasonable attempts, within the limits imposed by cost and time constraints, to minimize or at least measure the impact or make some estimate of non-sampling errors and of bias in the sampling procedure. Thus, as far as response errors are concerned, agencies may:

• pilot-test questionnaires in order to check for misunderstandings of questions,• analyse tendencies to overclaim or underclaim for certain kinds of consumer behaviour,

for example, the tendency to underclaim the consumption of alcohol, or to overclaim television watching,

• use aided-recall techniques (prompted lists) to help respondents remember products that they may have purchased and forgotten about, or radio programs that they forgot they had listened to,

• use questioning techniques that minimize the effort respondents need to make.

To minimize interviewer error, agencies will often:• set rigorous training standards for interviewers,• monitor the process of interviewing by doing 'back checks' - calling or telephoning

respondents who have already been interviewed to check that the interview was carried out properly, or sending supervisors to accompany interviewers on a regular sample basis,

• computer analyses may be made of questionnaire errors to identify' interviewers who may need retraining or reminding of particular points.

To minimize errors resulting from non-response, agencies do one or more- of several things:• for interview surveys interviewers may be asked to make a specified number of

callbacks if the respondent was not at home on the first call Three or four such callbacks may be made, ideally at different times and days of the week

• interviewers may make an appointment by telephone with the respondent.• self-completing questionnaires may be left where no contact has been made• monetary incentives or gifts may sometimes help to improve the response rate,• interviewers may get a 'foot-in-the-door' by having respondents comply with some

small request before presenting them with the larger survey,• non-respondents to a postal survey may be sent interviewers to persuade respondents to

complete the questionnaire, or they may be sent further reminders.

Processing errors will be minimized by careful editing and checking of the questionnaires in addition to the use of data entry validation procedures.

Market research agencies will try to minimize bias by using carefully constructed sample designs that use random procedures wherever possible, or by imposing restrictions on interviewer choices where it is not. These sample designs were described earlier. Biases will still remain, however, and sometimes these are known. Thus it may be known that

29


there are too many women in the sample, or too few men aged 20-24, compared with known population proportions. Many agencies will make corrections to the data to adjust for these biases by 'weighting' them.

In the real world of market research agencies and their clients it is unfortunately true that many clients do not understand or lack interest in the basics of sampling. In consequence many clients do not ask for estimates of bias or calculations of random sampling error. At the same time the agencies feel that to produce calculations, for example of confidence intervals for a large number of variables will only add confusion and perhaps distrust of the data. In consequence, sampling errors are often quietly ignored, and the estimates given are taken to be the 'truth'. Agencies will instead try to assure their clients that the occurrence and impact of non-sampling errors have been minimized by:• demonstrating that the procedures for the collection, analysis and reporting of the

results are 'respectable', meticulous and thorough,• showing that the research design features are such as to minimize sources of error

within the parameters set by time and cost,• emphasizing the extent of quality control checks that will uncover, correct and

minimize the occurrence of 'mistakes',• making corrections to the resulting data so that known biases are adjusted for.

Beyond these assurances, clients are sometimes given some indication of the extent of random sampling error that remains. Clients may be given 'read-off tables for groups of products or types of variable, based on the 'average' variability for that group or type, given a particular sample size.

___

Important Sampling DistributionsSome important sampling distributions, which are commonly used, are:

1. Sampling distribution of mean: Sampling distribution of mean refers to the probability distribution of all the possible means of random samples of a given size that we take from a population. If samples are taken from a normal population, N (, ), the sampling distribution of mean would also be normal with mean x = and

standard deviation = / , where is the mean of the population, is the standard deviation of the population and n means the number of items in a sample. But when samplings from a population which is not normal (may be positively or negatively skewed), even then, as per the central limit theorem, the sampling distribution of mean tends quite closer to the normal distribution, provided the number of sample items is large i.e., more than 30. In case we want 'o reduce the sampling distribution of mean to unit normal distribution i.e., N (0, 1), we can write

the normal variate z = for the sampling distribution of mean. This characteristic

of the sampling distribution of mean is very useful in several decision situations for accepting or rejection of hypotheses.

(2) Sampling distribution of proportion: Like sampling distribution of mean, we can as well have a sampling distribution of proportion. This happens in case of statistics of

30


attributes. Assume that we have worked out the proportion of defective parts in large number of samples, each with say 100 items, that have been taken from an infinite population and plot a probability distribution of the said proportions, we obtain what is known as the sampling distribution of proportion. Usually the statistics of attributes correspond to the conditions of a binomial distribution that tends to become normal distribution as n becomes larger and larger. If p represents the proportion of defectives i.e., of successes and q the proportion of non-defectives i.e., of failures (or q == 1 — p) and if p is treated as a random variable, then the sampling distribution of

proportion of successes has a mean = p with standard deviation = where n is

the sample size. Presuming the binomial distribution approximating the normal distribution for large n, the normal variate of the sampling distribution of proportion

z = where px is the sample proportion of successes, can be used for testing of

hypotheses.3. Student's t-distribution: When population standard deviation () is not known and the

sample is of a small size (i.e., n 30), we use t distribution for the sampling distribution of mean and workout t variable as:

where i.e., the sample standard deviation, t-

distribution is also symmetrical and is very close to the distribution of standard normal variate, z, except for small values of n. The variable t differs from z in the sense that we use sample standard deviation (s) in the calculation of t, whereas we use standard deviation of population (,) in the calculation of z. There is a different t distribution for every possible sample size i.e., for different degrees of freedom. The degrees of freedom for a sample of size n is n - 1. As the sample size gets larger, the shape of the t distribution becomes approximately equal to the normal distribution. In fact for sample sizes of more than 30, the t distribution is so close to the normal distribution that we can use the normal to approximate the t-distribution. But when n is small, the t-distribution is far from normal but when , t-distribution is identical with normal distribution. The t-distribution tables are available which give the values of t for different degrees of freedom at various levels of significance. The table value of t for given degrees of freedom at a certain level of significance is compared with the calculated value of t from the sample data, and if the latter exceeds, we infer that the null hypothesis cannot be accepted.

4. F distribution: If (s1)2 and (s2)

2 are the variances of two independent samples of size n1 and n2 respectively taken from two independent normal populations, having the

same variance, , the ratio F= (s1)2 /(s2)

2, where

and has an F distribution

with n1 — 1 and n2 — 1 degrees of freedom. F ratio is computed in a way that the larger variance is always in the numerator. Tables have been prepared for F

31


distribution that give value of F for various values of degrees of freedom for larger as well as smaller variances. The calculated value of F from the sample data is compared with the corresponding table value of F and if the former exceeds the latter, then we infer that the null hypothesis of the variances being equal cannot be accepted.

5. Chi-square (2) distribution: Chi-square distribution is encountered when we deal with collections of values that involve adding up squares. Variances of samples require us to add a collection of squared quantities and thus have distributions that are related to chi-square distribution. If we take each one of a collection of sample variances, divide them by the known population variance and multiply these quotients by (n — 1), where n means the number of items in the sample, we shall obtain a chi-square

distribution. Thus, would have the same distribution as chi-square

distribution with (n - 1) degrees of freedom. Chi-square distribution tat not symmetrical and all the values arc positive. One must know the degrees of freedom for using chi-square distribution. This distribution may also be used for judging the significance of difference between observed and expected frequencies and also as a test of goodness of fit. The generalized shape of 2 distribution depends upon the degree of freedom and the 2 value is worked out as under:

Tables are there that give the value of 2 for given degree of freedom which may be used with calculated value of 2 for relevant degree of freedom at a desired level of significance for testing hypotheses.

32