Top Banner
Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics
17

Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics.

Mar 28, 2015

Download

Documents

Rocco Cott
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics.

Olav ten BoschMSIS, Dublin, 14-16 April 2014

On the use of internet robots for official statistics

Page 2: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics.

Overview

– Why internet as a data source (IAD)?– Internet robots, how do they work?– Applications:

‐ Airline tickets‐ Housing market‐ Clothing‐ “Robot assisted data collection”

– Conclusion

Page 3: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics.

Why IAD? (1)

Administrative sources– Tax, social security services– Municipalities/ Provinces– Supermarkets

Surveys

Internet sources

Less!!!

Faster, better,

more efficient

New indicators

Page 4: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics.

4

Page 5: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics.

Which content is original, reliable, stable,representative and accessible?

Internet sources

Why IAD? (2)

– Internet prices for CPI ?– Real estate sites for housing statistics ?– Internet vacancies for job statistics ?– Social media sentiment for consumer

confidence ?– Trade in second-hand goods as

economic indicators ? – Travel activity for tourism statistics ?

Page 6: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics.

Robots / crawlers / bots / spiders / scrapers: how do they work? (1)

Browser

Website

Internet Requests

code,images,

style,data,etc.

Graphicalmarkup

You

Commands

Page 7: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics.

Robots / crawlers / bots / spiders / scrapers: how do they work? (2)

Robot/ spider/ crawler

Website

Internet Requests

Navigation

code,images,

style,data,etc.

Data

You

Page 8: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics.

Robots / crawlers / bots / spiders / scrapers: how do they work? (3)

Robot/ spider/ crawler

Website

Internet Requests

Navigation

code,images,

style,data,etc.

Data

Monitoractively

Generic software for:- site navigation- product details- monitoring

DataData

DataData

Agil

e

Page 9: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics.

Airline tickets (1)Robot collection versus manual collection

0

50

100

150

200

250

11 Feb 03 Mar 23 Mar 12 Apr 02 May 22 May 11 Jun 01 Jul 21 Jul 10 Aug

Ticket price Amsterdam -Milano

Robot

Manual

Page 10: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics.

Airline tickets (2)Price of a ticket over time

-80%

-60%

-40%

-20%

0%

20%

40%

60%

-120 -90 -60 -30 0

Days before departure

Pric

e w

rt a

vera

ge

Barcelona

London

Milaan

Rome

Page 11: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics.

Housing Market (1)

Page 12: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics.

Housing market (2)Dynamics of the ‘database behind’ becomes visible

Page 13: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics.

Clothing (1):

Page 14: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics.

2 sites: very volatile data

Clothing (2):

Challenges:- from volatile data to stable statistics- how to classify multiple less structured

data sources

Seasonal pattern

Page 15: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics.

Robot-assisted data collection (1)

– Use case: few price observations on many sites– Example: price of a cinema ticket– “Robot tool” to automatically check if prices are changed

Page 16: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics.

Robot-assisted data collection (2)

16

Page 17: Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics.

Conclusion

– Using internet as a datasource we can measure statistical phenomena in a completely different way

– It is powerful to combine fast internet data with reliable (but slower) administrative data

– We should redesign statistics with the possibilities of internet data in mind

Challenges:– Legal framework– The internet changes continuously: how to turn volatile data sources into reliable statistics?– We need advanced statistical methods, processes and IT