Web Panels in Official Statistics

Jelke Bethlehem

Web panels for Official Statistics

Official Statistics

The mission of national statistical institutes

‐ Publishing reliable and accurate statistical

information that meets the needs of society.

‐ Commitment to quality: the quality of the statistical

information must be guaranteed.

Challenges

‐ ICT developments.

‐ Decreasing response

rates.

‐ Decreasing budgets.

2

Data collection for population surveys

Traditional data collection

‐ Face-to-face and telephone, paper.

‐ Interviewer-assisted.

‐ Good quality, slow, expensive.

Computer-assisted interviewing

‐ CAPI, CATI.

‐ Interviewer-assisted.

‐ Better quality, fast, easier, expensive.

Web surveys

‐ CAWI, cheaper, self-administered, quality issues.

3

Online data collection

Single mode web surveys

‐ Must be based on probability sampling.

‐ Self-administered: quality issues.

‐ Low response rates (30%).

Mixed-mode web surveys

‐ Sequential mixed-mode, start with web.

‐ Less expensive than CAPI or CATI.

‐ Normal response rates.

‐ Mode effects.

4

Web panels

Why a web panel?

‐ Instrument for longitudinal research.

‐ Sampling frame for cross-sectional research.

‐ Quick surveys.

Challenges

‐ Under-coverage (lack of internet-access).

‐ How to recruit a representative web panel?

‐ Nonresponse (in recruitment and surveys).

‐ Measurement errors (self-administered).

‐ Maintenance (attrition, panel conditioning).

5

Under-coverage

The under-coverage problem

‐ People without internet cannot be a panel member.

‐ Those with internet differ from those without it.

‐ Therefore, estimates may be biased.

The bias:

‐ The bias depends on internet coverage.

‐ The bias depends on the difference between those

with and without internet.

6

)YY(N

NYYY)y(E)y(B NII

NIIII

Under-coverage

Internet coverage in Europe (2011)

7

Internet coverage varies between 45% (Bulgaria) and 94% (The Netherlands)

Source: Eurostat

Under-coverage

Under-represented groups

‐ Low-educated, ethnic minorities, elderly.

‐ Only 34% of people of age 75+ use internet (NL).

Reducing under-coverage

‐ Provide free internet access to those without it.

‐ Make a mixed-mode panel with CAPI, CATI or mail

for those without internet.

‐ Maybe the problem will solve itself in time.

8

Recruitment

Recruitment by means of self-selection (opt-in)

‐ People decide themselves whether or not to become

a member of the panel. No sample selection.

‐ Participation probabilities πk are unknown.

‐ Bias:

‐ Bias depends on average participation probability.

‐ Bias depends on variation of the probabilities.

‐ Bias depends on relationship between target

variable and participation behaviour.

9

Y,YSSSS

SSRY)y(E)y(B

Recruitment

Other self-selection problems

‐ Also people from outside the target population can

become a member of the panel.

‐ Sometimes multiple membership is possible.

‐ Groups of people may attempt to manipulate the

outcomes of the polls.

Conclusion

‐ A self-selection panel is out of the

question for general population

surveys.

10

Recruitment

Recruitment by means of probability sampling

‐ Allows for unbiased estimation.

‐ Allows for computation of margins of error.

‐ Required: a sampling frame with email addresses.

‐ Such a sampling frame is not available.

‐ Solution: Different mode(s) for recruitment:

mail, CATI or CAPI (or a combination).

‐ Traditional sampling frames can be used.

‐ Disadvantage: makes a web panel

expensive.

11

Recruitment

Recruitment from other surveys

‐ Build panel from respondents of previous CAPI or

CATI surveys.

‐ Respondents may have agreed to participate in

future surveys.

‐ Recruitment may be less expensive.

‐ But these respondents may be a selective group,

and therefore the resulting panel may lack

representativity.

12

Nonresponse

The nonresponse problem

‐ Nonresponse leads to biased estimates.

‐ Bias:

‐ Bias depends on response rate.

‐ Bias depends on variation of response probabilities.

Indicators

‐ Response rate

‐ Representativity indicator: R = 1 – 2 Sρ

13

Y,YRR

SSRY)y(E)y(B

Nonresponse

Recruitment nonresponse

‐ High, as participation requires substantial

commitment.

‐ Bias reduction (adjustment weighting) difficult due

to lack of relevant auxiliary variables.

Survey/wave nonresponse (attrition)

‐ May be low, as people agreed to participate.

‐ Plenty of auxiliary variables for bias reduction, e.g.

from profile survey.

14

Nonresponse

Treatment of nonresponse

‐ Different treatment of recruitment and survey

nonresponse, as they are different phenomena.

‐ Treatment is only effective if response behaviour

can be explained by auxiliary variables.

‐ Treatment is only effective if target variable can be

explained by auxiliary variables.

‐ Consider reference survey for obtaining more

auxiliary variables.

15

Measurement errors

What about the quality of the answers?

‐ CAPI and CATI are interviewer-assisted surveys, but

web surveys are self-administered.

‐ How strong are the effects of satisficing (not the best

answer, but a reasonable answer)?

‐ How to handle “don’t know”?

‐ Are there device-effects (desktop, laptop, tablet,

smartphone, etc)?

‐ Include consistency checks?

‐ Do results in the literature apply to official statistics?

16

Maintenance

Panel must be kept stable over time

‐ Detected changes must be caused by real changes.

‐ How to handle attrition?

‐ How to handle panel conditioning?

Refreshment

‐ Refreshment is costly.

‐ Add a random sample from the population, or focus

on under-represented groups?

‐ Estimation more complex due to varying selection

probabilities.

17

A web panel pilot

Objectives

‐ Getting experience with setting up a web panel.

‐ Getting more information about the costs.

‐ Using a simple tool (NetQ), not yet Blaise.

Recruitment

‐ Invite respondents from Mobility Survey (OViN).

‐ This was a mixed-mode survey (web-CATI-CAPI).

‐ Recruitment by mail.

‐ Inference with respect to OViN respondents.

18

A web panel pilot

Recruitment process

‐ Response rates:

‐ Ultimate response rate is very low.

19

Step n % of sample % of previous

Sample 12046

Response OViN 6928 57.5 57.5

Willingness 4251 35.3 61.4

Selected 4227 35.1 99.4

Registered 1231 10.2 29.1

Participates 1134 9.4 92.1

A web panel pilot

Recruitment process

‐ Representativity:

‐ Representativity improves.

‐ There is a risk of a large bias.

20

Step n R-indicator

Sample 12046

Response OViN 6928 0.784

Willingness 4251 0.843

Selected 4227 0.842

Registered 1231 0.883

A web panel pilot

Recruitment process

‐ Relation with OViN recruitment

‐ Higher participation rates for web respondents.

‐ Socially desirable answers in recruitment?

21

Mode Response OViN

Willing(% of response)

In panel (% of willing)

Web 2370 55.4 55.4

CATI 2946 59.9 16.9

CAPI 1612 72.8 17.5

A web panel pilot

Estimation

‐ Two target variables of which the values are known

for all OViN respondents: Level of education and

employment status.

‐ They are related to many other target variables.

Questions

‐ How close are panel estimates to OViN response?

‐ Does weighting adjustment help?

‐ Weight model: age × income × soc-eco-class.

22

A web panel pilot

Estimation for level of education

‐ The bias is somewhat smaller, but remains

substantial.

23

Level of education Panel Weighted OViN

Primary 2.6 4.3 5.5

Lower secondary 15.2 16.5 21.0

Higher secondary 34.4 35.8 37.6

Bachelor/master 45.5 40.6 33.6

A web panel pilot

Estimation for employment status

‐ Correction too strong, too weak, or in wrong

direction.

24

Employment Panel Weighted OViN

Housewife/man 11.9 12.2 12.5

Pension 16.8 17.8 14.7

School/student 6.1 10.6 9.8

Disabled 2.4 2.8 2.8

Unemployed 1.9 2.1 2.3

Employed 59.2 52.4 56.1

Web panels for Official Statistics

Conclusions

‐ Under-coverage is a problem that can be solved.

‐ Recruitment by means of probability sampling.

‐ Recruitment of a representative panel is expensive.

‐ Recruitment nonresponse is high.

‐ Relevant auxiliary variables are required to reduce

nonresponse bias.

‐ More research is required with respect to

measurement errors.

‐ A panel maintenance strategy must be implemented.

25

Web Panels in Official Statistics

Technology