www.statistik.at Wir bewegen Informationen
Automatic price collection on the internet (web scraping)
Ingolf Boettcher
Tokio20. May 2015 Ottawa Group 2015 –
Topic 1 Alternate data sources and Index number formulasSession 2 Online Prices and Web Scraping
www.statistik.at Folie 2 | 26.05.2015
Web scraping
There is a hugeamount of data on the internet <HTML>
<HEAD><TITLE> DATA </Title></HEAD> </HTML>
How can we bestcollect/scrape/harvestdata from there forstatistical purposes?
www.statistik.at Folie 3 | 26.05.2015
Web scraping
Internet data collection –Minimum goal for (Price) Statistics:
Turn website content into a spreadsheet
www.statistik.at Folie 4 | 26.05.2015
Web scraping
Internet data collection
Options:
1. Manual price collection2. Develop an API /Web scraper2.1 by writing custom computer code2.2 by using point and click web tools
www.statistik.at Folie 5 | 26.05.2015
Web scraping
Reasons for not writing an ownweb scraper
IT‐developer needed, therefore:• Expensive • Inflexible • Even maintenance cannot be
handled by CPI staff
www.statistik.at Folie 6 | 26.05.2015
Web scraping
Reasons to use click and point webtools forweb scraping:
No IT‐developer needed, therefore:
• Cheap• Flexible• No programming skill required
www.statistik.at Folie 7 | 26.05.2015
Web scraping
How web scraping with click and point usingimport.io looks like:
• web-platform that allows to structure andextract data from websites
www.statistik.at Folie 8 | 26.05.2015
Webscraping
Web scraping with click and point on web‐based platform offers solutions to:
• extract data by point-and-click• record actions on a website • crawl all the data of a webpage
More issues to be considered:• Legality to crawl on websites• Internal IT Security • Training of staff
www.statistik.at Folie 9 | 26.05.2015
Contact:Ingolf Boettcher
Guglgasse 13, 1110 WienTel: +43 (1) 71128-7917
Fax: +43 (1) [email protected]
Automatic price collection on the internet (web scraping)
www.statistik.at Folie 10 | 26.05.2015
Webscraping with import.io