Jelke Bethlehem Web panels for Official Statistics
Jelke Bethlehem
Web panels for Official Statistics
Official Statistics
The mission of national statistical institutes
‐ Publishing reliable and accurate statistical
information that meets the needs of society.
‐ Commitment to quality: the quality of the statistical
information must be guaranteed.
Challenges
‐ ICT developments.
‐ Decreasing response
rates.
‐ Decreasing budgets.
2
Data collection for population surveys
Traditional data collection
‐ Face-to-face and telephone, paper.
‐ Interviewer-assisted.
‐ Good quality, slow, expensive.
Computer-assisted interviewing
‐ CAPI, CATI.
‐ Interviewer-assisted.
‐ Better quality, fast, easier, expensive.
Web surveys
‐ CAWI, cheaper, self-administered, quality issues.
3
Online data collection
Single mode web surveys
‐ Must be based on probability sampling.
‐ Self-administered: quality issues.
‐ Low response rates (30%).
Mixed-mode web surveys
‐ Sequential mixed-mode, start with web.
‐ Less expensive than CAPI or CATI.
‐ Normal response rates.
‐ Mode effects.
4
Web panels
Why a web panel?
‐ Instrument for longitudinal research.
‐ Sampling frame for cross-sectional research.
‐ Quick surveys.
Challenges
‐ Under-coverage (lack of internet-access).
‐ How to recruit a representative web panel?
‐ Nonresponse (in recruitment and surveys).
‐ Measurement errors (self-administered).
‐ Maintenance (attrition, panel conditioning).
5
Under-coverage
The under-coverage problem
‐ People without internet cannot be a panel member.
‐ Those with internet differ from those without it.
‐ Therefore, estimates may be biased.
The bias:
‐ The bias depends on internet coverage.
‐ The bias depends on the difference between those
with and without internet.
6
)YY(N
NYYY)y(E)y(B NII
NIIII
Under-coverage
Internet coverage in Europe (2011)
7
Internet coverage varies between 45% (Bulgaria) and 94% (The Netherlands)
Source: Eurostat
Under-coverage
Under-represented groups
‐ Low-educated, ethnic minorities, elderly.
‐ Only 34% of people of age 75+ use internet (NL).
Reducing under-coverage
‐ Provide free internet access to those without it.
‐ Make a mixed-mode panel with CAPI, CATI or mail
for those without internet.
‐ Maybe the problem will solve itself in time.
8
Recruitment
Recruitment by means of self-selection (opt-in)
‐ People decide themselves whether or not to become
a member of the panel. No sample selection.
‐ Participation probabilities πk are unknown.
‐ Bias:
‐ Bias depends on average participation probability.
‐ Bias depends on variation of the probabilities.
‐ Bias depends on relationship between target
variable and participation behaviour.
9
Y,YSSSS
SSRY)y(E)y(B
Recruitment
Other self-selection problems
‐ Also people from outside the target population can
become a member of the panel.
‐ Sometimes multiple membership is possible.
‐ Groups of people may attempt to manipulate the
outcomes of the polls.
Conclusion
‐ A self-selection panel is out of the
question for general population
surveys.
10
Recruitment
Recruitment by means of probability sampling
‐ Allows for unbiased estimation.
‐ Allows for computation of margins of error.
‐ Required: a sampling frame with email addresses.
‐ Such a sampling frame is not available.
‐ Solution: Different mode(s) for recruitment:
mail, CATI or CAPI (or a combination).
‐ Traditional sampling frames can be used.
‐ Disadvantage: makes a web panel
expensive.
11
Recruitment
Recruitment from other surveys
‐ Build panel from respondents of previous CAPI or
CATI surveys.
‐ Respondents may have agreed to participate in
future surveys.
‐ Recruitment may be less expensive.
‐ But these respondents may be a selective group,
and therefore the resulting panel may lack
representativity.
12
Nonresponse
The nonresponse problem
‐ Nonresponse leads to biased estimates.
‐ Bias:
‐ Bias depends on response rate.
‐ Bias depends on variation of response probabilities.
Indicators
‐ Response rate
‐ Representativity indicator: R = 1 – 2 Sρ
13
Y,YRR
SSRY)y(E)y(B
Nonresponse
Recruitment nonresponse
‐ High, as participation requires substantial
commitment.
‐ Bias reduction (adjustment weighting) difficult due
to lack of relevant auxiliary variables.
Survey/wave nonresponse (attrition)
‐ May be low, as people agreed to participate.
‐ Plenty of auxiliary variables for bias reduction, e.g.
from profile survey.
14
Nonresponse
Treatment of nonresponse
‐ Different treatment of recruitment and survey
nonresponse, as they are different phenomena.
‐ Treatment is only effective if response behaviour
can be explained by auxiliary variables.
‐ Treatment is only effective if target variable can be
explained by auxiliary variables.
‐ Consider reference survey for obtaining more
auxiliary variables.
15
Measurement errors
What about the quality of the answers?
‐ CAPI and CATI are interviewer-assisted surveys, but
web surveys are self-administered.
‐ How strong are the effects of satisficing (not the best
answer, but a reasonable answer)?
‐ How to handle “don’t know”?
‐ Are there device-effects (desktop, laptop, tablet,
smartphone, etc)?
‐ Include consistency checks?
‐ Do results in the literature apply to official statistics?
16
Maintenance
Panel must be kept stable over time
‐ Detected changes must be caused by real changes.
‐ How to handle attrition?
‐ How to handle panel conditioning?
Refreshment
‐ Refreshment is costly.
‐ Add a random sample from the population, or focus
on under-represented groups?
‐ Estimation more complex due to varying selection
probabilities.
17
A web panel pilot
Objectives
‐ Getting experience with setting up a web panel.
‐ Getting more information about the costs.
‐ Using a simple tool (NetQ), not yet Blaise.
Recruitment
‐ Invite respondents from Mobility Survey (OViN).
‐ This was a mixed-mode survey (web-CATI-CAPI).
‐ Recruitment by mail.
‐ Inference with respect to OViN respondents.
18
A web panel pilot
Recruitment process
‐ Response rates:
‐ Ultimate response rate is very low.
19
Step n % of sample % of previous
Sample 12046
Response OViN 6928 57.5 57.5
Willingness 4251 35.3 61.4
Selected 4227 35.1 99.4
Registered 1231 10.2 29.1
Participates 1134 9.4 92.1
A web panel pilot
Recruitment process
‐ Representativity:
‐ Representativity improves.
‐ There is a risk of a large bias.
20
Step n R-indicator
Sample 12046
Response OViN 6928 0.784
Willingness 4251 0.843
Selected 4227 0.842
Registered 1231 0.883
A web panel pilot
Recruitment process
‐ Relation with OViN recruitment
‐ Higher participation rates for web respondents.
‐ Socially desirable answers in recruitment?
21
Mode Response OViN
Willing(% of response)
In panel (% of willing)
Web 2370 55.4 55.4
CATI 2946 59.9 16.9
CAPI 1612 72.8 17.5
A web panel pilot
Estimation
‐ Two target variables of which the values are known
for all OViN respondents: Level of education and
employment status.
‐ They are related to many other target variables.
Questions
‐ How close are panel estimates to OViN response?
‐ Does weighting adjustment help?
‐ Weight model: age × income × soc-eco-class.
22
A web panel pilot
Estimation for level of education
‐ The bias is somewhat smaller, but remains
substantial.
23
Level of education Panel Weighted OViN
Primary 2.6 4.3 5.5
Lower secondary 15.2 16.5 21.0
Higher secondary 34.4 35.8 37.6
Bachelor/master 45.5 40.6 33.6
A web panel pilot
Estimation for employment status
‐ Correction too strong, too weak, or in wrong
direction.
24
Employment Panel Weighted OViN
Housewife/man 11.9 12.2 12.5
Pension 16.8 17.8 14.7
School/student 6.1 10.6 9.8
Disabled 2.4 2.8 2.8
Unemployed 1.9 2.1 2.3
Employed 59.2 52.4 56.1
Web panels for Official Statistics
Conclusions
‐ Under-coverage is a problem that can be solved.
‐ Recruitment by means of probability sampling.
‐ Recruitment of a representative panel is expensive.
‐ Recruitment nonresponse is high.
‐ Relevant auxiliary variables are required to reduce
nonresponse bias.
‐ More research is required with respect to
measurement errors.
‐ A panel maintenance strategy must be implemented.
25