The Advent of Internet Surveys for Political Research: A Comparison of Telephone and Internet Samples by Robert P. Berrens and Alok K. Bohara Department of Economics University of New Mexico Hank Jenkins-Smith and Carol Silva Institute for Public Policy and Department of Political Science University of New Mexico David L. Weimer Department of Political Science and La Follette School of Public Affairs University of Wisconsin-Madison May 2001 Correspondence: Dave Weimer La Follette School of Public Affairs University of Wisconsin-Madison 1225 Observatory Drive Madison, WI 53706 [email protected]* The authors thank the National Science Foundation (NSF Grant Number 9818108) for financial support for the project reported on in this paper. The authors also thank Harris Interactive and Knowledge Networks for their contributions of survey samples. John Bremer, Hui Li, and Zachary Talarek provided valuable assistance at various stages of the project. We also thank Charles Franklin, Ken Goldstein, Dana Mukamel, William Howell, Aidan vining, and John Witte, as well as participants in the Public Affairs Seminar and the Methodology Workshop at the University of Wisconsin-Madison for helpful comments. Of course, the opinions expressed are solely those of the authors.
49
Embed
The Advent of Internet Surveys for Political Research: A Comparison of Telephone and Internet Samples
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The Advent of Internet Surveys for Political Research:
A Comparison of Telephone and Internet Samples
by
Robert P. Berrens and Alok K. BoharaDepartment of EconomicsUniversity of New Mexico
Hank Jenkins-Smith and Carol SilvaInstitute for Public Policy and Department of Political Science
University of New Mexico
David L. WeimerDepartment of Political Science and La Follette School of Public Affairs
University of Wisconsin-Madison
May 2001
Correspondence: Dave WeimerLa Follette School of Public AffairsUniversity of Wisconsin-Madison1225 Observatory DriveMadison, WI [email protected]
* The authors thank the National Science Foundation (NSF Grant Number 9818108) for financial supportfor the project reported on in this paper. The authors also thank Harris Interactive and KnowledgeNetworks for their contributions of survey samples. John Bremer, Hui Li, and Zachary Talarek providedvaluable assistance at various stages of the project. We also thank Charles Franklin, Ken Goldstein, DanaMukamel, William Howell, Aidan vining, and John Witte, as well as participants in the Public AffairsSeminar and the Methodology Workshop at the University of Wisconsin-Madison for helpful comments. Of course, the opinions expressed are solely those of the authors.
The Advent of Internet Surveys for Political Research:A Comparison of Telephone and Internet Samples
Abstract
The authors present the results of parallel telephone and Internet surveys to investigate theircomparability. The telephone survey was administered to a national probability sample based on randomdigit dialing. The contemporaneous Internet survey was administered to a random sample of the database of willing respondents assembled by Harris Interactive. The survey was replicated by HarrisInteractive six months later, and by Knowledge Networks, which employs a randomly recruited panel,nine months later. The data facilitate comparisons in terms of demographic characteristics, environmentalknowledge, and political opinions across survey modes. Knowledge and opinion questions generallyshow statistically significant but substantively modest difference across modes. With inclusion ofstandard demographic controls, typical relational models of interest to political scientists produce similarestimates of parameters across modes. The use of commercial Internet samples may thus already bereasonable for many types of social science research.
1
INTRODUCTION
The data available to social scientists describing the attitudes, beliefs, and even behaviors of
individuals come mainly from surveys.1 These data often provide the most direct, and sometimes the
only, basis for the description of population characteristics or the testing of hypotheses derived from
theories. Consequently, the availability and quality of survey data have fundamental relevance to social
science research. The research presented here assesses the characteristics of samples form two prominent
commercial Internet panels by comparing them to a national probability sample of respondents to a
telephone survey on knowledge of and attitudes toward global climate change and a related international
treaty (Kyoto Protocol).
Survey design involves tradeoffs among validity, representativeness, and cost. During the 1970s
a number of factors changed the nature of design tradeoffs so that telephone surveys replaced in-person
interviews as the dominant mode of survey administration. Rising fuel and labor prices made in-person
interviews more expensive, and increased labor market participation by women made it more difficult to
complete interviews with sampled households. At the same time, telephone technology improved, the
percentage of households with telephones surpassed 90 percent, and the introduction of sampling through
random digit dialing (RDD) provided a way of reaching unlisted telephone numbers and more easily
drawing national probability samples. The relative advantages of telephone administration (lower cost,
less risk of interviewer bias, avoidance of cluster sampling, and greater ease of supervising interviewers)
had to be balanced against the relative advantages of in-person interviews (potentially greater coverage of
households, greater feasibility of long or complex survey instruments, and provision of non-verbal
informational aids to respondents). The acceptance of telephone administration by academic researchers
lagged somewhat behind its use by survey researchers – as late as the mid-1970s, many survey research
texts ignored telephone administration (Klecka and Tuchfarber, 1978). Though a number of very
prominent on-going social science surveys, such as the National Election Studies (NES) and the Survey
2
of Income and Program Participation (SIPP) continue to be administered through in-person interviews,
the majority of national surveys conducted for research purposes are now administered by telephone,
taking advantage of list-assisted RDD to sample and computer assisted telephone interview (CATI)
systems to collect data and monitor the quality of interviews.
Social and technological trends appear to be making it more costly to conduct valid and
representative telephone surveys. At the same time, the Internet has emerged as an important
communications technology. In terms of survey administration, it offers several advantages relative to the
telephone: dramatically lower marginal costs of producing completed surveys, superior capability for
providing information, including visual displays, to respondents and for asking complex questions, and
the minimization of interviewer bias. Its primary weakness involves the nature of the samples that it can
currently provide. One problem, which current trends are making much less important, is the incomplete
penetration of Internet use among U.S. adults. The other, more serious, problem is the difficulty of
drawing representative samples from among Internet users. The current absence of a feasible analog to
RDD, and norms and legal prohibitions against message broadcasting (spamming), prevent random
sampling of the universe of Internet users.
The potential uses of the Internet fall into three broad categories demanding different levels of
sample representativeness. First, Internet surveys might be used to estimate population characteristics
such as means and proportions. As classically formulated, the reliable inference of population
characteristics requires true probability samples, suggesting that as currently organized, Internet surveys
are ill suited to serve this function unless supplemented in some way with data from non-Internet sources.
Second, and potentially of most interest to social scientists, Internet surveys might be used to
investigate relationships among variables. In this context, true probability samples may not be necessary
to make valid inferences about relationships, especially when the variables are based on “treatments” that
are randomly applied to respondents. Indeed, witness the extensive use of convenience samples, such as
3
students in psychology courses, to test hypothesis implied by social science theories. Much econometric
analysis deals with estimating models based on data not generated through probability samples.
Additionally, studies in the rapidly growing area of experimental economics rarely employ samples
randomly drawn from the general population.
Third, Internet surveys might be used to investigate methodological issues in survey design that
can be reasonably treated as independent of mode. The low marginal cost of completed surveys
facilitates the comparison of such design issues as question order and format. The inferences about
design issues are unlikely to be highly sensitive to the characteristics of the sample. Consequently,
Internet surveys may prove useful both in investigating general methodological issues and as components
of pre-tests for surveys to be administered by other modes.
Clearly, survey researchers have much to gain if the hurdles facing Internet surveying can be
overcome.
In an attempt to solve the problem of randomly sampling Internet users, several commercial firms
have developed proprietary data bases of willing respondents, typically recruited at the time people select
Internet providers. The largest such data base has been developed by Harris Interactive. In January 2000,
the authors administered a survey on knowledge and attitudes related to global climate change and U.S.
ratification of the Kyoto Protocol to a national RDD sample of U.S. adults through telephone interviews
and to a sample of the Harris Interactive panel of willing respondents through web-based questionnaires.
A second Internet sample using the same instrument was collected in July 2000 from the Harris
Interactive panel. In November 2000, the instrument was administered by Knowledge Networks, which
uses Web TV technology to survey panels of respondents originally recruited through RDD. The
knowledge and attitude data collected in parallel by telephone and Internet provide a unique opportunity
for a more general assessment of the uses of the Internet for administration of social science surveys.
The comparisons of the samples address several different questions. Because the Knowledge
4
Networks sample is based on standard sampling theory, any differences between it and the telephone
sample can be interpreted as likely resulting from either the technology of survey administration or
conditioning of those in the panel. In contrast, the samples from the Harris Interactive panel are not
consistent with standard sampling theory (that is, they are not probability samples). Similarities between
the first Harris sample and the telephone sample, therefore, must be interpreted with caution as there is no
theoretical basis to believe that these similarities would be found in the administration of surveys asking
different sorts of questions. The second Harris Interactive sample, however, employs weights based on
information from an RDD telephone survey to correct for sample selection bias. Although this approach
cannot provide the robust protection against sampling bias provided by true probability samples, it does
provide a theoretical basis for believing that similarities between the telephone and second Harris
Interactive samples are likely to generalize to similar sorts of surveys.
Our objective is to provide insight into potential uses of surveys of Internet panels in social
science research. We begin by documenting two trends that are likely to make Internet surveys relatively
more attractive in the future: the increasing difficulty of doing valid telephone surveys and the increasing
representativeness of the population of Internet users. After describing the structure and purpose of the
survey on global climate change, we make several comparisons between survey modes. First, we
compare the socioeconomic characteristics of respondents. Second, as concern about sampling bias is
based on possible differences in knowledge, attitudes, and behaviors not directly observable in the
population being sampled, we compare the samples in terms of knowledge about global climate change,
degree of engagement in the survey as measured by the use and assessment of information offered to
splits of the Internet samples, and political attitudes. Third, as the focus of much social science research
is the testing of hypotheses about relationships among variables, we investigate the relationship between
political ideology and environmental attitudes, and support for ratification of the Kyoto Protocol as a
function of household costs. We conclude with some observations about the likely current and future
5
uses of Internet surveys.
INCREASING DIFFICULTY OF ADMINISTERING TELEPHONE SURVEYS
Three factors suggest that telephone surveys will become more difficult to administer in the
future: the gradual but long-term trend of increasing nonresponse rates in both in-person and telephone
surveys; technological changes in telecommunications; and public responses to surveys and pseudo-
surveys.
General Trends in Nonresponse
Unit nonresponse, or nonparticipation, refers to the failure to obtain a survey from a sampled
respondent. It consists of refusals to participate by sampled persons as well as sampled persons who are
not interviewed for reasons other than explicit refusal to be interviewed. Unit nonresponse appears to
have been increasing in the United States and Europe over the last two decades (de Leeuw, 1999: 127),
though documenting the trend is difficult for reasons of definition, comparability, and endogeniety of
effort.
Until fairly recently, most survey research organizations developed their own response
classifications, making it difficult to compare nonresponse rates across different surveys or over time.
Widespread adoption of the standardized definitions developed by the American Association for Public
Opinion Research (AAPOR, 1998) should increase comparability in the future. Nevertheless, some
discretion will remain in the classification of cases into the various response categories.
Surveys generally differ in terms of subject matter, format, respondent incentives, and sponsors,
all factors that are likely to affect response rates. Political and economic conditions prevalent at the time
may also affect response rates (Harris-Kojetin and Tucker, 1999). Consequently, inferences about trends
in nonresponse rates must generally be based on a limited number of on-going surveys, mainly
participate in a replication of the survey, yielding a sample size of 11,160 collected between July 10 and
17. The response rate, based on invitations sent and completed surveys, was 5.5 percent. This sample
was propensity-weighted based on attitudinal and behavioral questions concurrently being asked in HI
RDD telephone surveys.8
Knowledge Networks Sample (November 2000)
From November 25, 2000 to December 11, 2000, KN administered the survey to a random
sample of its panel based on previously estimated probability weights to correct for nonresponses in the
selection stages in the panel. Only one respondent was selected per household. Of those sampled, 76
percent completed surveys yielding a sample size of 2,162 and a multi-stage response rate of 24.1 percent
(20.2 taking account of Web TV non-coverage). For this analysis, raking weights based on the Current
Population Survey were estimated for the sample with respect to age, gender, race, ethnicity, region,
education, and metropolitan versus non-metropolitan household location to correct further for
nonresponse bias. These weights also convert the data from a household to a population sample.
SURVEY MODE COMPARISONS
In the following sections we present a number of comparisons across the survey modes. In
general, the telephone sample is taken as the basis of comparison. Two important caveats are worth
noting. First, the telephone sample should not be viewed as a perfect sample. It certainly has all the
flaws common to RDD telephone samples.9 Consequently, the comparisons should be viewed as
answering the question: How do the Internet samples compare to a high-quality telephone sample of the
sort commonly used in social science research? Second, although all four surveys were collected within a
span of 11 months, only the first HI sample is contemporaneous with the telephone sample.
Consequently, underlying changes in the population cannot be ruled out as explanations for differences
21
between these two samples and the two collected subsequently.
Socioeconomic
Demographic comparisons across the modes are presented in Table 1. The first two rows show
mean age and percent male. The weighted data (shown in bold) produce fairly close figures for all four
samples. The next row, percent of respondents with at least a college degree, shows considerable
difference between the telephone and Internet samples. As is often the case in telephone surveys, the
telephone sample overestimates the percentage of adults with a college degree – 41.4 percent as opposed
to 23.2 percent estimated in the March 2000 Current Population Survey. The percentages for the Internet
samples are very close to the Census Bureau estimate. Interestingly, while the HI unweighted sample
percentages for college degree are also gross overestimates, the KN sample percentage is very close,
reflecting to some degree the use of probability weights in sampling from its panel.
The three Internet samples slightly underestimate the population percentages of Hispanics and
African-Americans (both percentages appear close to 12.9 percent in the 2000 Census), while the
telephone sample substantially underestimates these percentages. One striking, but not unexpected
difference is in the percentage of households with a computer – the HI sample percentages are much
larger than those in the telephone and KN sample. Looking just at those in the telephone sample who use
the Internet at least weekly, the percentage with home computers is much closer to the HI samples.
Some caution is needed in interpreting the income figures as this variable, unlike the others
shown, had substantial item nonresponse.10 The mean household income was largest for the telephone
sample, and smallest for the first HI sample. As would be expected, the HI households, with universal
Internet use and high rates of home computer ownership, have substantially larger mean numbers of
telephone lines than do either the telephone or KN samples.
Overall, the weighted Internet samples do quite well in terms of matching the population in terms
22
of basic demographic information, though they show the expected differences in terms of computer and
telephone ownership. As is commonly the case, the telephone sample appears to substantially
overestimate the percentage of the population with college degrees and to underestimate the African-
American and Hispanic percentages in the population.
Environmental Knowledge
Socioeconomic differences among the samples do not necessarily impose a fundamental problem
in that statistical adjustments can be made in analyses to take account of the observable differences. At
the same time, even if the samples were identical in terms of socioeconomic characteristics, they could
still produce different inferences about relationships among variables in the population because they
differ in terms of unobservable characteristics. Although it is never possible to know which unobservable
characteristics are relevant to any particular analysis, it is interesting to explore differences in knowledge,
motivations, and attitudes across the samples where possible. To the extent that the samples appear
similar in terms of the knowledge and attitudes that we can measure, it gives us at least some confidence
in their external validity.
Survey questions intended to elicit respondents’ knowledge about scientific views on the likely
causes and consequences of global climate change provide a basis for comparison. Table 2 compares the
percentage of sample respondents with correct answers to ten environmental knowledge questions,
recognition of the Kyoto Protocol, and an overall knowledge score constructed as the sum of correct
answers and recognition of the Kyoto Protocol. When “don’t know” responses are treated as incorrect
answers (leftmost column under each mode), the KN sample percentages appear substantially and
systematically smaller than those for the telephone or HI samples. When “don’t know” is treated as a
missing value (the rightmost columns under each mode), the KN sample percentages are no longer
systematically smaller than those for the other modes.11 Figure 1 displays the correspondence between
23
the percentages of the Internet samples correctly answering each knowledge question and the percentage
of telephone respondents answering the question correctly. The large number of pluses that lie below the
line, representing equality, are the KN percentages when “don’t know” is taken as an incorrect response.
In order to investigate statistical significance, individual-level probit models for each of the
eleven knowledge questions in Table 2 were estimated: the dependent variable was whether or not the
respondent correctly answered the question (1 if yes, 0 if no), and the independent variables were
indicator variables for the three Internet samples.12 (The 11-point knowledge score, listed in the last row
of Table 2, was modeled as an ordered probit.) For any given knowledge question, asterisks indicate
those that are statistically significant at the 5 percent level. The large sample sizes for these estimations
means that they have large power for finding statistically significant differences. Inclusion of
demographic variables in the estimations generally did not wash out the mode effects.
Table 3 investigates the pattern of correct responses over the eleven knowledge questions. The
six possible Wilcoxon matched-pairs signed-rank tests are shown for the percentages of correct responses
with the different handling of the “don’t know” response. When “don’t know” is treated as an incorrect
answer, the patterns of responses do not statistically differ between the telephone and HI samples at the 5
percent level. Substantively, they show relatively small average percentage differences. The KN sample
differs statistically from all three of the others and shows large average percentage differences. The
picture changes substantially when “don’t know” is treated as a missing value. The KN distribution is no
longer statistically different from the telephone sample, but is statistically different from the second HI
sample. Additionally, although the percentage difference remains small, the distribution of the telephone
sample is statistically different from the distribution of the first HI sample.
Overall, there appear to be statistically significant differences in environmental knowledge among
the survey modes, but these differences generally appear to be substantively small. The higher rates of
“don’t know” in the Knowledge Network sample could possibly be an indication of panel conditioning –
24
either fatigue or changing norms of response (i.e. greater willingness to admit a lack of knowledge
associated with greater exposure to surveys).13
Information Use in Internet Modes
The Internet respondents access to enhanced information provides an opportunity for comparing
survey motivation among the HI and KN samples. The first row of Table 4 shows the percentage of those
who viewed one or more pages of information. The use rates were relatively close (ranging from 72.7
percent for the first HI sample to 66.2 percent for the KN sample), indicating similar initial motivations
across the samples. The HI samples showed more intensity of use in terms of pages visited than did the
KN sample, but all three samples showed similar use times for those who visited at least one page.
Perceptions of the usefulness of the information and its perceived bias varied much less across the
samples.
Do the distributions of responses to the usefulness and bias questions show similar patterns across
the Internet samples? Figures 2 and 3 display response frequencies for these two evaluative questions.
The three samples show roughly similar patterns. Overall, information users in the Internet samples
appear to have perceived the information they accessed in roughly the same way.
Political Variables and Environmental Attitudes
Of particular interest to political scientists is the comparability of the samples with respect to
political attitudes and behavior. Table 5 compares the samples in terms of a number of politically
relevant variables. A number of difference appear. The KN sample has a lower rate of voter registration
than the other samples. It also seems to have a substantially lower rate of membership in environmental
groups than the other samples. All three of the Internet samples seem to be more liberal and have higher
fractions of identification with the Democrat party than the telephone sample. The first HI sample has a
25
noticeably lower percentage of Republican party identifiers and a higher percentage of third party
identifiers.
Relationships between Environmental Views and Ideology
While making estimates of population parameters is often important in social science research,
much empirical work is directed at testing hypotheses about the relationships among variables of interest.
Only when analyses are based on probability samples can we be highly confident about their
generalization to the larger population. As the representativeness of at least the large panel Internet
samples is questionable, it is interesting to ask how inferences might differ across modes. In this spirit
we investigate the following general hypothesis: political ideology affects environmental attitudes.
Specifically, we investigate the relationship between ideology and the three general environmental
attitudes: (1) perceptions of environmental threat, (2) tradeoffs between property rights, and (3) reliance
on qinternational treaties to deal with environmental problems.
Table 6 shows the effect of ideology on perception of environmental threat (11 point scale) as
estimated in three ordered probit specifications.14 In the first specification, ideology and its interaction
with each of the three Internet modes are the explanatory variables (with the telephone survey mode as
the base category). There are a large negative and statistically significant coefficients for ideology under
all four of the survey modes. The small and statistically insignificant coefficient for the ideology-KN
interaction indicates that we would reach the same conclusion using either sample. The interaction terms
for the HI samples show statistically significant impacts of ideology that are about 50 percent larger than
in the other two samples.
As shown in the second column, however, the introduction of a set of standard covariates reduces
the size of the coefficients on the interaction terms for the HI samples, and washes out their statistical
significance. As there was substantial item nonresponse for income, the third column shows the model
26
estimated with all the demographic covariates except income. The ideology interactions for HI do not
lose statistical significance, but they are statistically indistinguishable from the ideology interaction for
the KN sample. Nevertheless, across all modes we find a large negative statistically significant
relationship between ideology and perception of environmental threat.
Table 7 repeats the analysis with perceptions of the validity of tradeoffs between property rights
and the environment as the dependent variable. In the absence of controls, the effect of ideology on the
perception of tradeoffs is statistically indistinguishable between the telephone sample and the first HI
sample, as well as between the telephone sample and the KN sample. With the introduction of the
demographic controls, the relationship also becomes statistically indistinguishable between the telephone
sample and the second HI sample. Removing income from among the demographic controls leaves a
statistically significant difference between the ideology effects for the telephone and second HI sample.
Table 8 tells virtually the same story as Table 7 for the perception of international environmental
treaties. There is no mode interaction for the first HI sample or the KN sample, and the mode interaction
for the second HI sample washes out statistically with a full set of demographic controls including
income.
To summarize, at least in these applications, researchers would not make different statistical
inferences using either the telephone or the KN samples. Further, if one included income and other
demographic controls in the estimation models, one would not make different statistical inferences using
the telephone, or either of the HI samples.
Referendum Voting Models
As a final comparison we investigate mode effects in the basic referendum voting model that
underlies CV analysis.15 We exclude respondents in the Internet studies who were either given access to
enhanced information or were asked to value the modified Kyoto protocol because these treatments did
27
not occur in the telephone sample. The mental accounts treatment, which asked respondents to estimate
the percentage of their monthly income that was available for discretionary spending and how much of
that discretionary income goes toward environmental causes and organizations, was included in all four
samples.16
The “elicitation method” for obtaining information about valuation from respondents employed in
this study was the advisory referendum format.17 After going through a series of questions that were used
as vehicles to explain the provisions and likely consequences of ratification of the Kyoto Protocol,
respondents were asked the following question:
The US Senate has not yet voted on whether to ratify the Kyoto Protocol. If the US does notratify the treaty, it is very unlikely that the Protocol can be successfully implemented. Suppose that a national vote or referendum were held today in which US residents couldvote to advise their Senators whether to support or oppose ratifying the Kyoto Protocol. IfUS compliance with the treaty would cost your household X dollars per year in increasedenergy and gasoline prices, would you vote for or against having your Senators supportratification of the Kyoto Protocol? Keep in mind that the X dollars spent on increased energyand gasoline prices could not be spent on other things, such as other household expenses,charities, groceries, or car payments.
In this case, we consider the simplest possible model: a logistic regression with the response to the vote
question as the dependent variable (yes=1, no=0) and the bid price, income, an indicator for the mental
accounts treatment, an interaction between the mental accounts indicator and bid price (X), and, in some
models, basic demographic controls, as the explanatory variables. If the focus of the analysis were
actually on the estimation of willingness-to-pay, then many additional variables would be included and
estimation would involve more complicated models that would blur our focus here on comparison across
modes. Nevertheless, this simple model, which is representative of the type typically estimated in CV
studies as an initial check to see if the data meets minimal construct validity requirements (most
importantly declining probability of voting yes as the bid price increases), allows us to focus clearly on
mode effects.
28
The first column of Table 9 shows the basic model without demographic controls. It shows
similar patterns of coefficients as the models for the individual modes shown in the last four columns.
Bid price and income have the expected signs and statistical significance; the statistically significant
coefficients for the mental accounts indicator and its interaction with bid price shows that the mental
accounts treatment reduces the probability of voting yes for bid amounts up to about $1375, which is near
the upper extreme of the bid range. Thus, it appears that asking respondents to answer questions about
their discretionary income (and perhaps focusing their attention on their budget contraints) generally
lowers their probability of voting yes for the referendum. The negative and statistically significant
coefficients for the Internet samples indicates that, other things equal, Internet respondents are less likely
to vote yes. The first HI sample shows a relatively small effect whose statistical significance washes out
with the addition of demographic controls (column 2). The second HI and the KN samples have roughly
the same size and remain statistically significant with the addition of the demographic controls.
When the model in first column of Table 9 is saturated with mode interactions for bid price,
income, mental accounts indicator, and the mental accounts-bid price interaction (not shown), the only
statistically significant mode effect is the constant shift for the KN sample, which cannot be statistically
distinguished from the shift effect for the second HI sample. None of the adjusted Wald tests for the
interaction triplets being simultaneously zero can be rejected. Consequently, with the exception of a
generally lower acceptance rate for the KN sample, it appears that there are no consistent mode effects in
the referendum model. Further, across all four samples, the analyst would make the same policy
inference for the validity test – the probability of voting yes on the referendum is significantly and
inversely related to the respondent’s price for the policy.
CONCLUSION
All survey methods involve errors. The appropriate question, therefore, is not, Can the Internet
29
replace the telephone as the primary mode of administration in social science survey research? Rather, it
is, Under what circumstances is use of Internet surveys appropriate? We have explored this question by
making a variety of inferential comparisons among a standard RDD sample, and samples from the leading
firm in the development of a large panel of willing Internet users (Harris Interactive) and the leading firm
in the development of a random panel of Web TV-enabled respondents (Knowledge Networks).
Although many differences arose, across a variety of tests on attitudes and voting intentions, the Internet
samples produced relational inferences quite similar to the telephone sample. Readers will have to judge
for themselves if the similarity we found gives them sufficient confidence to use Internet samples for their
particular research questions.
At the same time, Internet surveys based on either large panels or random panels offer
possibilities for some types of research that were previously prohibitively expensive. One of these
possibilities is the generation of large sample sizes to permit the investigation of methodological
questions within the context of the same survey – the large HI sample sizes that allowed us to use a three-
treatment design make this point clear. A second possibility is the opportunity to provide much more
information to respondents than is feasible in any other survey mode. Both Internet firms were able to
support our enhanced information treatment, and KN was also able to track visits to and time spent on
particular information pages. A third possibility, not explored in this study, is the capability to generate
samples of the population with rare characteristics. Finally, the extension of the HI panel to include
willing respondents from other countries opens up intriguing possibilities for comparative analysis.
We expect that the dialogue and debate over the use of Internet samples will continue, and, with
time, the weight of evidence will allow judgment to be made. Political and other social scientists who
rely on survey data for their research should be concerned about these developments. We hope that the
analysis presented here provides a catalyst for future inquiry.
30
1. Between January 1990 and April 2001, for example, 21 percent, 35 percent, and 33 percent of thearticles in the American Political Science Review, the American Journal of Political Science, and theJournal of Politics, respectively, were based on survey data.
2. Although we do not have information on the marginal costs of sampling from the Harris Interactive orKnowledge Networks panels, we can provide the following comparison of commercial rates for an 18minute survey: Knowledge Networks ($60 thousand for 2000 completions); Harris Interactive ($35thousand for 2000 completions; $72 thousand for 6000 completions). By way of comparison, ourtelephone survey with about 1,700 completions cost approximately $50 thousand. The first HarrisInteractive sample actually cost the project $40 thousand; as noted in the text, the second HarrisInteractive and the Knowledge Networks samples were provided free of charge, suggesting relatively lowmarginal costs.
3. CentERdata, an affiliate of Tilburg University, The Netherlands, has maintained a panel of Internetrespondents since 1991. Its panel consists of 2000 Dutch households, each of which completes a weeklysurvey (centerdata.kub.nl).
4. For overviews of CV, see Mitchell and Carson (1989), Bishop and Heberlein (1990), Bateman andWillis (2000), Boardman et al. (2001). Critical views are thoroughly reviewed in Hausman (1993).
5. Development of the survey instrument began in the summer of 1998 as part of the preparation of agrant application to the National Science Foundation. On short notice, HI generated an Internet sample(N=869) to provide comparisons with questions on global climate change that had appeared in a nationaltelephone survey focusing on global climate change conducted by the Institute of Public Policy at theUniversity of New Mexico in November and December 1997. After receipt of the grant, a focus groupwas held at the Institute for Public Policy to help determine question format and content. A “beta”version web survey instrument was constructed by the authors to help in the process of designing a surveyinstrument that could be administered by both telephone and Internet. The beta version included the 27pages of information on global climate change and the Kyoto Protocol developed collaboratively by theauthors and reviewed by students and others with varying degrees of knowledge about global climatechange. A CATI version of the survey was prepared and provided to HI (and subsequently to KN). HIprepared and pre-tested its survey instrument in December 1999. Implementation of the telephone surveybegan prior to administration of the Internet version to allow for adjustment of the random bid prices.
6. Visitors to the web site are randomly assigned to treatments. Those wishing to see specific treatments,such as the enhanced information pages, may thus have to visit the site several times.
7. The formula used for the response rate is completes plus partials divided by completes plus partialsplus “break offs” plus unfinished appointments plus refusals plus those not interviewed due to a languagebarrier plus those too ill to be surveyed.
8. HI uses several different question sets for propensity weighting. In this study, in additional to threeattitudinal questions about whether Washington was in touch with the rest of the country, personalefficacy, and information overload, respondents were asked if they owned a retirement account andwhether they had read a book, traveled, or participated in a team or individual sport over the last month.
NOTES
31
9. The Institute for Public Policy has been conducting RDD polls for over a decade. Its surveys haveprovided data for studies published in a variety of social science journals.
10. The item response rates for income were as follows: telephone, 84.9 percent; first HI 79.8 percent,Second HI, 82.1 percent, and KN, 70.9 percent.
11. Mondak (1999) argues against the common practice of treating “don’t knows” as incorrect answers inthe construction of knowledge scales. His analysis suggests that treating “don’t know” as missingprovides a more meaningful comparison.
12. All statistical estimations presented in this paper treat the modes as survey strata, each with their ownset of probability weights. The estimations were done using the STATA Version 6 statistical softwarepackage.
13. Only the number of previous surveys competed (as opposed to number requested) are available in theKN data set. The number of previous completions does not appear to have any statistically significanteffect on the total number of “don’t knows” in the eleven-question set for males. There appears to be aweak quadratic relationship between “don’t knows” and previous completions for females – suggestingthat don’t knows fall during the first 14 completions and rise thereafter. In the sample, the mean numberof previous completions was 18.
14. The results in this section would be qualitative the same if linear regression rather than ordered probitmodels were estimated. The results would not hold if the analyses were done using unweighted data –demographic controls generally do not wash out the mode interactions when the data are not weighted.
15. Although not done here, estimates of mean willingness-to-pay can be derived from models withrandomly assigned bid prices (see Cameron and James, 1987).
16. The mental accounts treatment had two compartments. First level compartment: “Now think aboutyour average monthly income and expenses. After you have paid all the necessary bills for such things ashousing, transportation, groceries, insurance, debt, and taxes, what percent of your income is left over foroptional uses on things like recreation, savings, and giving for charity and other causes?” Second levelcompartment: “Now think about the portion of your total income available for optional uses. On average,what percent of that amount do you use for contributions to environmental causes, such as donations forspecific programs or contributions and memberships to environmental advocacy groups?”
17. In the case of a public good, such as reduction in the emissions of greenhouse gases, only a questionof this sort which elicits a binary response to a specific price can be incentive compatible with honestrevelation, and then only if the respondent anticipates having to pay the stated price upon provision of thepublic good. See Carson et al. (1999).
Table 1Comparison of Respondent Socioeconomic Characteristics Across Surveys
Survey
Mean(Standard error)
Public Policy Institute January Telephone
(N=1,699)
Harris InteractiveJanuary Internet
(N=13,034)
Harris InteractiveJuly Internet(N=11,160)
Knowledge NetworksNovember Internet
(N=2,162)
Household Weighted1
RakingWeighted2 Raw
RakingWeighted2 Raw
PropensityWeighted 3 Raw
RakingWeighted4Full
Sample
Use Internet at LeastWeekly?
No(N=726)
Yes(N=973)
Mean Age inYears
42.0(.46)
46.8(.68)
39.3(.49)
44.7(.48)
41.6(.10)
44.4(.71)
42.6(.13)
44.1(.50)
45.8(.36)
44.6(.42)
Percent Male 47.6(1.4)
42.5(2.0)
51.7(1.8)
47.9(1.3)
44.3(.44)
48.0(1.4)
56.7(.47)
48.0(1.3)
49.4(1.1)
48.0(1.2)
Percent CollegeGraduate
41.4(1.3)
26.5(1.8)
53.4(1.8)
42.7(1.3)
43.7(1.2)
22.0(.71)
45.9(.47)
22.9(.79)
23.9(.92)
21.2(.94)
Percent Hispanic 6.8(.74)
6.8(1.1)
6.9(.99)
10.0(.97)
3.1(.15)
9.4(.96)
2.9(.16)
9.7(1.1)
9.8(.63)
10.4(.76)
Percent African-American5
7.6(.71)
8.7(1.1)
6.7(.89)
12.9(1.1)
3.0(.15)
12.4(1.3)
2.7(.15)
11.5(1.1)
9.3(.63)
10.8(.82)
Household MeanIncome (1000$)
56.2(1.2)
44.8(1.6)
65.8(1.6)
57.4(1.4)
51.3(.34)
45.1(1.6)
55.7(.40)
52.2(1.2)
49.4(.84)
46.3(.85)
Percent with Computersat Home
64.1(1.3)
37.0(2.0)
86.4(1.3)
62.7(1.3)
93.5(.22)
93.0(.67)
95.3(.20)
95.9(.43)
60.9(1.1)
58.2(1.3)
Percent with Computerat Work
67.2(1.3)
51.6(2.0)
80.3(1.4)
66.7(1.2)
66.0(.41)
54.6(1.4)
66.1(.45)
50.3(1.3)
48.4(1.1)
47.4(1.2)
Mean Number ofTelephone Lines
1.19(.016)
1.10(.011)
1.26(.027)
1.30(.021)
1.40(.0058)
1.40(.023)
1.41(.0064)
1.38(.018)
1.20(.0074)
1.06(.0049)
Notes: 1. Weights proportional to adults in households divided by number of telephone lines to convert from household-level to individual-level.2. Weights set to match 32 national marginals: regions (4 categories), sex (2 categories), and age cohorts (4 categories). 3. Weights based on propensity scores estimated by Harris Interactive using data from parallel telephone surveys.4. Weights based matches to know demographic marginals and corrections for sample selection bias.5. Percent black or African-American, or most closely identify with black or African-American if mixed race.
38
Table 2Comparison of Respondent Knowledge Across Surveys
Knowledge Score (0 to 11) 6.74 7.14 6.55* 7.37 6.58 7.54 5.34* 7.14*Effects (E): Scientists who specialize in the study of the Earth’s climate have debated the possible effects of climate change. Do most scientists expect any of the followingchanges in global climate to take place? Do most scientists expect ...Causes (C): Many scientists have argued that global average temperatures have risen slightly and will continue to increase for many years as a result of human activities. To the best of your knowledge: Do scientists believe ...Treaty (K): Have you heard about the proposed international treaty called the Kyoto Protocol?
Telephone data weighted to individuals; Internet surveys with proprietary weights.
Cells marked with * indicate a statistically significant mode effects (relative to telephone mode) in probit regressions on individual level data. Eleven items based ondichotomous probits; knowledge score based on ordered probit.
39
Table 3Distributions of Eleven Knowledge Questions Across Modes
Note: Numbers in bold indicate statistically significantly different distributions (at the 5 percent level) of theproportion of correct responses across the eleven knowledge questions.
40
Table 4Comparison of Information Use and Assessment Across Internet Samples
Harris InteractiveJanuary (N=5,946)
Harris InteractiveJuly (N=5,187)
KnowledgeNetworks
November (N=957)
Percent of Respondents who Viewed One orMore Pages
72.7N=4,320
68.8N=3,571
66.2N=634
Mean Number of Pages Viewed by ThoseOffered Information
7.1(8.6)
5.5(7.3)
3.8(5.9)
Mean Number of Pages Viewed by ThoseViewing One or More Pages 9.8
(8.6)8.0
(7.5)5.8
(6.4)
Mean Number of Minutes Spent onInformation Pages by Those Viewing One or
More Pages
9.4(8.5)
9.4(8.5)
9.0(10.0)
Mean Perception of Usefulness ofInformation by Those Viewing One or More
Pages (0 not at all useful; 10 extremelyuseful)
6.8(2.7)
6.8(2.7)
6.2(2.8)
Mean Perception of Bias in Information byThose Viewing One or More Pages
(0 strongly against GCC; 10 strongly infavor of GCC)
5.9(1.8)
5.9(1.7)
5.6(1.7)
Note: Based on unweighted data; standard deviations in parentheses.
41
Table 5Comparison of Political Variables and Environmental Attitudes Across Surveys
Survey
Question
JanuaryPublic Policy Institute Telephone
JanuaryHarris Internet
July Harris Internet
November Knowledge Networks
Household Weighted
RakingWeighted Raw Raking
Weighted Raw PropensityWeighted Raw Raking
WeightedFullSample
Use Internet at LeastWeekly?
No Yes
Percent registered to vote 86.7(.93)
84.7(1.5)
88.3(1.2)
87.3(.87)
89.5(.27)
84.5(1.0)
91.2(.27)
87.4(.92)
76.6(.91)
72.9(1.2)
Percent Democrat 34.4(1.3)
37.6(2.0)
31.9(1.6)
37.4(1.3)
31.6(.41)
36.8(1.5)
28.5(.43)
37.5(1.4)
40.6(1.1)
41.5(1.2)
Percent Republican 33.9(1.3)
32.1(1.9)
35.3(1.7)
31.1(1.2)
28.4(.40)
24.1(.93)
33.1(.45)
32.3(1.1)
29.6(.98)
27.7(1.1)
Percent third party 2.8(.44)
2.4(.60)
3.1(.62)
2.3(.34)
5.1(.19)
4.1(.34)
5.5(.22)
2.8(.22)
3.9(.42)
3.6(.43)
Percent members of environmental groups 10.9(.82)
8.1(1.1)
13.1(1.2)
11.3(.81)
11.6(.78)
11.8(.74)
16.3(.32)
9.5(.52)
6.5(.53)
6.4(.59)
Ideology (7 point scale; 1 strongly liberal)1 4.29(.043)
4.41(.062)
4.19(.058)
4.23(.043)
4.06(.015)
4.03(.041)
4.21(.016)
4.11(.037)
4.09(.036)
4.04(.040)
Environmental threat (11 point scale; 0 noreal threat; 10 brink of collapse)
5.71(.054)
5.79(.087)
5.65(.069)
5.76(.055)
5.85(.019)
5.83(.078)
5.72(.022)
5.74(.069)
5.42(.048)
5.48(.053)
Emphasis on property rights overenvironmental protection (4 point scale,
1strongly disagree)
2.66(.021)
2.73(.034)
2.61(.029)
2.67(.021)
2.44(.0072)
2.53(.025)
2.52(.0079)
2.59(.022)
2.53(.017)
2.53(.020)
International environmental treaties (11point scale, 0 very bad idea, 10 very good
idea)
7.20(.073)
7.22(.12)
7.19(.092)
7.22(.073)
6.90(.026)
6.93(.086)
6.69(.029)
6.78(.085)
6.87(.059)
6.82(.068)
1Ideology (percent, rounded): strongly liberal (4.9), liberal (14.6), slightly liberal (13.8), middle-of-road (27.0), slightly conservative (15.3), conservative (17.6), stronglyconservative (6.9).
42
Table 6Effects of Ideology on Environmental Attitudes: Crisis?
Dependent variable: Environmental threat (11 point scale with 0 “no real threat,” 10 “brink of collapse”)
Wald Test: equality of H1 and KNinteractions with ideology not reject not reject not reject
Wald Test: equality of H2 and KNinteractions with ideology reject not reject not reject
* statistically significant at 5 percent level
Question: Some people believe that pollution, population growth, resource depletion, and other man-madeproblems have put us on the brink of environmental crisis that will make it impossible for humans to continue tosurvive as we have in the past. Others believe that these fears are overstated and that we are not in a seriousenvironmental crisis. On a scale from zero to ten where zero means that there is no real environmental crisis andten means that human civilization is on the brink of collapse due to environmental threats, what do you think aboutthe current environmental situation?
Responses (percent, rounded): no real threat (2.7), 1 (2.3), 2 (4.1), 3 (6.7), 4 (7.8), 5 (19.4), 6 (15.6), 7 (18.8), 8(14.9), 9 (4.31), brink of collapse (3.5)
43
Table 7Effects of Ideology on Environmental Attitudes: Tradeoffs with Property Rights
Dependent variable: In tradeoffs between property rights and the environment, emphasis should be on propertyrights (4 point scale with 1 “strongly agree,” 4 “strongly disagree”)
Wald Test: equality of H1 and KNinteractions with ideology reject reject reject
Wald Test: equality of H2 and KNinteractions with ideology reject reject reject
* statistically significant at 5 percent level
Question: Please indicate whether you strongly agree, agree, disagree, or strongly disagree with the followingstatement. Where tradeoffs must be made between environmental protection and property rights, the emphasisshould be on protecting property rights.
Wald Test: equality of H1 and KNinteractions with ideology do not reject do not reject do not reject
Wald Test: equality of H2 and KNinteractions with ideology reject do not reject reject
* statistically significant at 5 percent level
Question: Government official in the U.S. are currently considering a proposed international treaty that concernsglobal climate change, called the Kyoto Protocol. In 1997 representatives from the U.S. and approximately 150other nations developed and signed the Kyoto Protocol, which calls for reducing the production of greenhousegases. The U.S. has negotiated similar treaties with other nations to try to deal with other environmental problems,such as acid rain and ozone depletion. On a scale from zero to ten where zero means it is a very bad idea and tenmeans it is a very good idea, how do you view international treaties as a way to deal with environmental problems?
Responses (percent, rounded): very bad idea (6.4), 1 (1.4), 2 (2.9), 3 (3.4), 4 (3.6), 5 (15.0), 6 (7.0), 7 (10.2), 8(15.6), 9 (7.6), very good idea (27.0)
45
Table 9Logistic Models of Advisory Vote for Ratification
* – statistically significant at the 5 percent level (income and bid coefficients based on one-sided tests)
Notes
1. The addition of full sets of mode interactions with bid price, income, mental accounts, and the bid price- mentalaccounts interaction results in no statistically significant mode interaction terms in either model. Further, there wereno statistically significant Adjusted Wald tests for particular sets of interactions (i.e. bid price interacted with HarrisInteractive 1, Harris Interactive 2, and Knowledge Networks).