T5 sampling

SamplingBy Rama Krishna Kompella

Learning Objectives Understand the identifying the target respondents Sampling and different types of sampling Understanding sample process What are the potential errors in sampling Determining Sampling size

Census vs. Sampling Two methods of selecting the respondents Census Sampling Census When the number of respondents / units of interest are limited, or When it is required to gather data from all the individuals in the population

Census vs. Sampling Sampling When the size of the population is too large The population is homogeneous Considerations of time and cost play a major role in going for sampling

Sampling Process Define the population Identify the sampling frame Specify the sampling unit Selection of sampling method Determination of Sampling size Specify sampling plan Selection of sample

Sampling Process The population needs to be defined in terms of: Term Example Element Companys Product Sampling Unit Retail outlet, super market Extent Hyderabad & Secunderabad Time April 10 May 25

Sampling Process Identify the sampling frame: Need to clearly define from which universe will the sample be picked from Ex: When you are studying the purchase behaviour of consumers buying premium cars, your sampling frame will be all the premium car outlets in the city

Sampling Process Specify the sampling unit We need to decide on whom to contact in order to obtain the data required Need to be careful while selecting the sampling unit, as we need to be sure of whether we will get the required data from the respondent or not Ex: When studying intention to purchase a car, the unit of sampling would be people who are employed and having a steady income. Whereas if we are studying the trends from a dealer perspective, then the sampling unit will be the dealers

Sampling process Need to select the kind of sampling method used in order to identify the respondents There are two ways of selecting the sample: Probability methods Non-probability methods

Sampling Process Need to decide how many respondents need to be chosen from the population Generally, the sample size depends on the type of research conducted For exploratory research the sample size tends to be small in number, whereas for conclusive research the sample size will be large

Sampling Process A sampling plan needs to clearly specify who is the target population Ex: when we are planning to study the purchase pattern of groceries by households, we need to clearly specify what household means. Is it a family who have kids, DINKS, Empty nesters etc.

Sampling Designwithin the Research Process

Step 4: Specifying the sampling method Probability Sampling Every element in the target population or universe [sampling frame] has equal probability of being chosen in the sample for the survey being conducted. Scientific, operationally convenient and simple in theory. Results may be generalized. Non-Probability Sampling Every element in the universe [sampling frame] does not have equal probability of being chosen in the sample. Operationally convenient and simple in theory. Results may not be generalized.

Types of Sampling Designs Probability Nonprobability Simple random Convenience Complex random Purposive Systematic Judgment Cluster Quota Stratified Snowball Double

Simple Random Sampling In simple random sampling, every item of the population has equal probability of being chosen Two methods are used in random sampling: Lottery method Random number table

Simple RandomAdvantages Disadvantages Easy to implement with Requires list of random dialing population elements Time consuming Uses larger sample sizes Produces larger errors High cost14-22

SystematicAdvantages Disadvantages Simple to design Periodicity within Easier than simple random population may skew Easy to determine sampling sample and results distribution of mean or Trends in list may bias proportion results Moderate cost14-23

StratifiedAdvantages Disadvantages Control of sample size in Increased error will result if strata subgroups are selected at Increased statistical different rates efficiency Especially expensive if Provides data to represent strata on population must and analyze subgroups be created Enables use of different High cost methods in strata14-24

ClusterAdvantages Disadvantages Provides an unbiased Often lower statistical estimate of population efficiency due to subgroups parameters if properly being homogeneous rather done than heterogeneous Economically more efficient Moderate cost than simple random Lowest cost per sample Easy to do without list14-25

Stratified and Cluster SamplingStratified Cluster Population divided into Population divided into few subgroups many subgroups Homogeneity within Heterogeneity within subgroups subgroups Heterogeneity between Homogeneity between subgroups subgroups Choice of elements Random choice of from within each subgroups subgroup14-26

Area Sampling14-27

Double SamplingAdvantages Disadvantages May reduce costs if first Increased costs if stage results in enough discriminately used data to stratify or cluster the population14-28

Nonprobability Samples No need to generalize Limited Feasibility objectives Time Cost14-29

Nonprobability Sampling Methods Convenience Judgment Quota Snowball14-30

Non-probability samples Convenience sampling Drawn at the convenience of the researcher. Common in exploratory research. Does not lead to any conclusion. Judgmental sampling Sampling based on some judgment, gut-feelings or experience of the researcher. Common in commercial marketing research projects. If inference drawing is not necessary, these samples are quite useful. Quota sampling An extension of judgmental sampling. It is something like a two-stage judgmental sampling. Quite difficult to draw. Snowball sampling Used in studies involving respondents who are rare to find. To start with, the researcher compiles a short list of sample units from various sources. Each of these respondents are contacted to provide names of other probable respondents.

Quota Sampling To select a quota sample comprising 3000 persons in country X using three control characteristics: sex, age and level of education. Here, the three control characteristics are considered independently of one another. In order to calculate the desired number of sample elements possessing the various attributes of the specified control characteristics, the distribution pattern of the general population in country X in terms of each control characteristics is examined. Control Characteristics Population Distribution Sample Elements . Gender: .... Male ...................... 50.7% Male 3000 x 50.7% = 1521 ................. Female .................. 49.3% Female 3000 x 49.3% = 1479 Age: .......... 20-29 years ........... 13.4% 20-29 years 3000 x 13.4% = 402 ................. 30-39 years ........... 53.3% 30-39 years 3000 x 52.3% = 1569 ................. 40 years & over ..... 33.3% 40 years & over 3000 x 34.3% = 1029 Religion: ... Christianity............ 76.4% Christianity 3000 x 76.4% = 2292 ................. Islam ..................... 14.8% Islam 3000 x 14.8% = 444 ................. Hinduism ............... 6.6% Hinduism 3000 x 6.6% = 198 ................. Others ................... 2.2% Others 3000 x 2.2% = 66 __________________________________________________________________________________

Types of error Non-sampling error Error associated with collecting and analyzing the data Sampling error Error associated with failing to interview the entire population

Non-Sampling Error Coverage error Wrong population definition Flawed sampling frame Interviewer or management error in following sampling frame Response error Badly worded question results in invalid or incorrect response Interviewer bias changes response Non-response error Respondent refuses to take survey or is away Respondent refuses to answer certain questions Processing errors Error in data entry or recording of responses Analysis errors Inappropriate analytical techniques, weighting or imputation are applied

Sampling Error Sampling error is known after the data are collected by calculating the Margin of Error and confidence intervals Surveys dont have a Margin of Error, questions do Power analyses use estimates of the parameters involved in calculating the margin of error It is common to see sample sizes of 400 and 1000 for surveys (these are associated with 5% and 3% margins of error) In most cases the size of the population being sampled from is irrelevant The margin of error should be calculated using the size of the subgroups sampled

Whats Next? Computation of sample size Sampling error

Key Terms Area sampling Multiphase sampling Census Nonprobability sampling Cluster sampling Population Convenience sampling Population element Disproportionate Population parameters stratified sampling Population proportion of Double sampling incidence Judgment sampling Probability sampling14-37

Key Terms Proportionate stratified Simple random sample sampling Skip interval Quota sampling Snowball sampling Sample statistics Stratified random sampling Sampling Systematic sampling Sampling error Systematic variance Sampling frame Sequential sampling14-38

Simple Random Sampling In simple random sampling, every item of the population has equal probability of being chosen Two methods are used in random sampling: Lottery method Random number table

Random Number Table

Systematic Random Sampling Three steps are followed: Select the sampling interval, K K=Total Population / Desired Sample Size Select a unit randomly between the first unit and kth unit Add K to the selected number to the randomly chosen number EX: If total population = 1000, desired sample size is 50, then K = 1000/50 = 20. Randomly select a number between 1 and 20 Let us say, the number is 17, then the sample series will be 17, 37, 57

Stratified Random Sampling Calculate the percentage of population present in each stratum Determine the sample to be drawn from each stratum Randomly select sample from each stratum Eg: You need to select 40 people from an office, which has the following staff Male, full time 90 Male, part time 18 Female, full time 9 Female, part time 63

Some Notations to rememberPopulation Parameters Symbol Sample Notations SymbolSize N Size nMean value Mean value x-Percentage value Percentage value(population proportion) P (sample proportion) p Q or [1 P] q or [1 p]Standard deviation Estimated standard deviation sVariance 2 Estimated sample s 2Standard error Estimated standard error(population parameter) S or SP (sample statistics) Sx or Sp Other Sampling ConceptsConfidence intervals CIx or CIp Tolerance level of error eCritical z-value ZBConfidence levels CLFinite correction factor (the overallsquare root of [N n/N 1] (alsoreferred to as finite multiplier orfinite population correction) fcf

Central Limit Theorem The theorem states that for almost all defined target populations (virtually with disregard to the actual shape of the original population), the sampling distribution of the mean (x) or the percentage ( p) value derived from a simple random sample will be approximately normally distributed, provided that the sample size is sufficiently large (i.e., when n is greater than or equal to 30). In turn, the sample mean value (x) of that random sample with an estimated sampling error (Sx) fluctuates around the true population mean value () with a standard error of /n and has an approximately normal sampling distribution, regardless of the shape of the probability frequency distribution curve of the overall target population

Normal Curve

Sampling Error Sampling error is any type of bias that is attributable to mistakes made in either the selection process of prospective sampling units or determining the sample size

Statistical Precision Using several statistical methods, the researcher will be able to specify the critical tolerance level of error (i.e., allowable margin of error) prior to undertaking a research study This critical tolerance level of error (e) represents general precision (S) with no specific confidence level or precise precision [(S)(ZB,CL)] when a specific level of confidence is required

Statistical Precision General precision can be viewed as the amount of general sampling error associated with the given sample of raw data that was generated through some type of data collection activity. Precise precision represents the amount of measured sampling error associated with the raw data at a specified level of confidence

Statistical Precision When attempting to measure the precision of raw data, researchers must incorporate the theoretical understanding of the concepts of sampling distributions, the central limit theorem, and estimated standard error in order to calculate the necessary confidence intervals.

Estimated Standard Error Estimated standard error, also referred to as general precision, gives the researcher a measurement of the sampling error and an indication of how far the sample result lies from the actual target population parameter value estimate. The formula to compute the estimated standard error of a sample mean value (Sx) is Sx = s /n where s = Estimated standard deviation of the sample mean n = Sample size

Confidence Interval A confidence interval represents a statistical range of values within which the true value of the target population parameter is expected to lie

Z-Score

Determining Sample SizeThree factors play an important role in determining appropriate sample sizes:1. The variability of the population characteristic under investigation ( or P). The greater the variability of the characteristic, the larger the size of the sample necessary.2. The level of confidence desired in the estimate (CL). The higher the level of confidence desired, the larger the sample size needed.3. The degree of precision desired in estimating the population characteristic (e). The more precise the required sample results (i.e., the smaller the e), the larger the necessary sample size.

Determining the Sample Size

Q & As