Top Banner
Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics
17

Olav ten Bosch MSIS, Dublin, 14-16 April 2014

Feb 26, 2016

Download

Documents

tria maulana

On the use of internet robots for official statistics. Olav ten Bosch MSIS, Dublin, 14-16 April 2014. Overview. Why internet as a data source (IAD)? Internet robots, how do they work ? Applications: Airline tickets Housing market Clothing “Robot assisted data collection” - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Olav ten Bosch MSIS, Dublin,  14-16 April  2014

Olav ten BoschMSIS, Dublin, 14-16 April 2014

On the use of internet robots for official statistics

Page 2: Olav ten Bosch MSIS, Dublin,  14-16 April  2014

Overview

– Why internet as a data source (IAD)?– Internet robots, how do they work?– Applications:

‐ Airline tickets‐ Housing market‐ Clothing‐ “Robot assisted data collection”

– Conclusion

Page 3: Olav ten Bosch MSIS, Dublin,  14-16 April  2014

Why IAD? (1)

Administrative sources– Tax, social security services– Municipalities/ Provinces– Supermarkets

Surveys

Internet sources

Less!!!

Faster, better, more efficient

New indicators

Page 4: Olav ten Bosch MSIS, Dublin,  14-16 April  2014

4

Page 5: Olav ten Bosch MSIS, Dublin,  14-16 April  2014

Which content is original, reliable, stable,representative and accessible?

Internet sources

Why IAD? (2)

– Internet prices for CPI ?– Real estate sites for housing statistics ?– Internet vacancies for job statistics ?– Social media sentiment for consumer

confidence ?– Trade in second-hand goods as

economic indicators ? – Travel activity for tourism statistics ?

Page 6: Olav ten Bosch MSIS, Dublin,  14-16 April  2014

Robots / crawlers / bots / spiders / scrapers: how do they work? (1)

Browser

Website

Internet Requests

code,images,

style,data,etc.

Graphicalmarkup

YouCommands

Page 7: Olav ten Bosch MSIS, Dublin,  14-16 April  2014

Robots / crawlers / bots / spiders / scrapers: how do they work? (2)

Robot/ spider/ crawler

Website

Internet Requests

Navigation

code,images,

style,data,etc.

Data

You

Page 8: Olav ten Bosch MSIS, Dublin,  14-16 April  2014

Robots / crawlers / bots / spiders / scrapers: how do they work? (3)

Robot/ spider/ crawler

Website

Internet Requests

Navigation

code,images,

style,data,etc.

Data

Monitoractively

Generic software for:- site navigation- product details- monitoring

DataData

DataData

Agile

Page 9: Olav ten Bosch MSIS, Dublin,  14-16 April  2014

Airline tickets (1)Robot collection versus manual collection

0

50

100

150

200

250

11 Feb 03 Mar 23 Mar 12 Apr 02 May 22 May 11 Jun 01 Jul 21 Jul 10 Aug

Ticket price Amsterdam -Milano

Robot

Manual

Page 10: Olav ten Bosch MSIS, Dublin,  14-16 April  2014

Airline tickets (2)Price of a ticket over time

-80%

-60%

-40%

-20%

0%

20%

40%

60%

-120 -90 -60 -30 0

Days before departure

Pric

e w

rt av

erag

e

Barcelona

London

Milaan

Rome

Page 11: Olav ten Bosch MSIS, Dublin,  14-16 April  2014

Housing Market (1)

Page 12: Olav ten Bosch MSIS, Dublin,  14-16 April  2014

Housing market (2)Dynamics of the ‘database behind’ becomes visible

Page 13: Olav ten Bosch MSIS, Dublin,  14-16 April  2014

Clothing (1):

Page 14: Olav ten Bosch MSIS, Dublin,  14-16 April  2014

2 sites: very volatile data

Clothing (2):

Challenges:- from volatile data to stable statistics- how to classify multiple less structured

data sources

Seasonal pattern

Page 15: Olav ten Bosch MSIS, Dublin,  14-16 April  2014

Robot-assisted data collection (1)

– Use case: few price observations on many sites– Example: price of a cinema ticket– “Robot tool” to automatically check if prices are changed

Page 16: Olav ten Bosch MSIS, Dublin,  14-16 April  2014

Robot-assisted data collection (2)

16

Page 17: Olav ten Bosch MSIS, Dublin,  14-16 April  2014

Conclusion

– Using internet as a datasource we can measure statistical phenomena in a completely different way

– It is powerful to combine fast internet data with reliable (but slower) administrative data

– We should redesign statistics with the possibilities of internet data in mind

Challenges:– Legal framework– The internet changes continuously: how to turn volatile data sources into reliable statistics?– We need advanced statistical methods, processes and IT