Automatic data collection on the Internet (web …... Wir bewegen Informationen Automatic price collection on the internet (web scraping) Ingolf Boettcher Tokio 20. May 2015 Ottawa

Post on 29-May-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

www.statistik.at Wir bewegen Informationen

Automatic price collection on the internet (web scraping) 

Ingolf Boettcher

Tokio20. May 2015 Ottawa Group 2015 –

Topic 1 Alternate data sources and Index number formulasSession 2 Online Prices and Web Scraping

www.statistik.at Folie 2 | 26.05.2015

Web scraping

There is a hugeamount of data on the internet <HTML>

<HEAD><TITLE> DATA </Title></HEAD> </HTML>

How can we bestcollect/scrape/harvestdata from there forstatistical purposes?

www.statistik.at Folie 3 | 26.05.2015

Web scraping

Internet data collection –Minimum goal for (Price) Statistics:

Turn website content into a spreadsheet

www.statistik.at Folie 4 | 26.05.2015

Web scraping

Internet data collection

Options:

1. Manual price collection2. Develop an API /Web scraper2.1 by writing custom computer code2.2 by using point and click web tools

www.statistik.at Folie 5 | 26.05.2015

Web scraping

Reasons for not writing an ownweb scraper

IT‐developer needed, therefore:• Expensive • Inflexible  • Even maintenance cannot be

handled by CPI staff

www.statistik.at Folie 6 | 26.05.2015

Web scraping

Reasons to use click and point webtools forweb scraping:

No IT‐developer needed, therefore:

• Cheap• Flexible• No programming skill required

www.statistik.at Folie 7 | 26.05.2015

Web scraping

How web scraping with click and point usingimport.io looks like:

• web-platform that allows to structure andextract data from websites

www.statistik.at Folie 8 | 26.05.2015

Webscraping

Web scraping with click and point on web‐based platform offers solutions to:

• extract data by point-and-click• record actions on a website • crawl all the data of a webpage

More issues to be considered:• Legality to crawl on websites• Internal IT Security • Training of staff

www.statistik.at Folie 9 | 26.05.2015

Contact:Ingolf Boettcher

Guglgasse 13, 1110 WienTel: +43 (1) 71128-7917

Fax: +43 (1) 7180718Ingolf.boettcher@statistik.gv.at

Automatic price collection on the internet (web scraping)

www.statistik.at Folie 10 | 26.05.2015

Webscraping with import.io

top related