Top Banner
Job Vacancies Experiment Boro Nikić Satellite workshop on Big Data, NTTS 2015
17

Job Vacancies Experiment Boro Nikić Satellite workshop on Big Data, NTTS 2015.

Dec 15, 2015

Download

Documents

Miya Clymer
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Job Vacancies Experiment Boro Nikić Satellite workshop on Big Data, NTTS 2015.

Job Vacancies Experiment

Boro Nikić

Satellite workshop on Big Data, NTTS 2015

Page 2: Job Vacancies Experiment Boro Nikić Satellite workshop on Big Data, NTTS 2015.

2

Job Vacancies experiment (1)

- Idea about the experiment: Rome Workshop (May,2014)

- Started with identifying web sites which advertise jobs

- and searching for available APIs for websites - UNECE Task Team consisted of representatives

from Austria, Hungary, Italy, Netherlands, Sweden and Slovenia

Page 3: Job Vacancies Experiment Boro Nikić Satellite workshop on Big Data, NTTS 2015.

3

Job Vacancies experiment (2)

Goals:- Overview of the methodologies of calculation of

JV statistics at NSIs- Identification of possible web scraping tools - Determination of BD methodology of calculation

of JV statistics - Testing the BD quality indicators proposed by

UNECE Quality Task Team

Page 4: Job Vacancies Experiment Boro Nikić Satellite workshop on Big Data, NTTS 2015.

Overview of the methodologies of calculation of JV statistics at NSIs

By EU regulation it is prescribed to publish quarterly statistic on JV data:- Totals of advertised JV on national level- Totals on domains defined by size of units- Totals on domains defined by NACE activity groups

Documents on Wiki: • http://

www1.unece.org/stat/platform/pages/viewpageattachments.action?pageId=100303739&metadataLink=true

,

4

Page 6: Job Vacancies Experiment Boro Nikić Satellite workshop on Big Data, NTTS 2015.

Aim of the Irobot tool• IRobotSoft for Visual Web Scraping • IRobotSoft is a visual Web robot software for Web

scraping and Web automation. With IRobotSoft, you can scrape tons of data from the deep Web with a single click! You don't need to have computer skills to do this! IRobotSoft is for Everyone! Follow our discussions and become a Web geek!

• for novice data collectors • for Web testers • for data experts

Link:http://www.irobotsoft.com/

6

Page 7: Job Vacancies Experiment Boro Nikić Satellite workshop on Big Data, NTTS 2015.

Basic Steps

1. Define the name of the Irobot

2. Define the name of the Task

3. Copy and paste the link of desired website into the URL

4. Start Recording Actions

5. Give names to the „scraped“ variables

6. Save the variables

7. Use the option „Repeat Property“

7

Page 8: Job Vacancies Experiment Boro Nikić Satellite workshop on Big Data, NTTS 2015.

Determination of BD methodology of calculation of JV statistics (1)

- Cleaning of data - Methodology for the replacement of existing statistics (on

the level of NSi)- Methodology for the calculation of new statistics (on the

level of NSi)- Methodology for the calculation of new statistics

(international level)

8

Page 9: Job Vacancies Experiment Boro Nikić Satellite workshop on Big Data, NTTS 2015.

Interface with the parameters

9

Page 10: Job Vacancies Experiment Boro Nikić Satellite workshop on Big Data, NTTS 2015.

Determination of BD methodology of calculation of JV statistics (2)

All the documentation about the experiment could be found on:

http://www1.unece.org/stat/platform/pages/viewpageattachments.action?pageId=100303739&metadataLink=true

Document:

Information which could be extracted from the Slovenian Websites and the proposed statistics for the job vacancies.doc

10

Page 11: Job Vacancies Experiment Boro Nikić Satellite workshop on Big Data, NTTS 2015.

Determination of BD methodology of calculation of JV statistics (3)

One of the step in the statistical processing of JV data is assigning the ID of the Legal Unit from the Business Register.

Linking the ID to the „scraped“ unit enables us to get the information about the activity and size of LeU (according to number of employees)

11

Page 12: Job Vacancies Experiment Boro Nikić Satellite workshop on Big Data, NTTS 2015.

„Scraped“ data

Name_LeU Tel numb Mob_numb Town Street Streat_numb Postal_code

AR PLANE d.o.o. 03-809-4100 040 383840 Bistrica ob Sotli

Savatech, d.o.o. KranjARENDA d.o.o. LjubljanaKnauf Insulation d.o.o 04 5114 219 Škofja Loka Trata 32 4220AVIAT d.o.o. Trzin

VIP Virant d.o.o Komenda

12

Page 13: Job Vacancies Experiment Boro Nikić Satellite workshop on Big Data, NTTS 2015.

„Matched“ data

iskani Name_LeU Town_BR id complete_nmae nace_code adressVID dist1

1AR PLANE d.o.o. BISTRICAOBSOTLI

1AR PLANE d.o.o. ZAGAJ 3290476000 AR PLANE, korporacijsko upravljanje in pravna pisarna, d.o.o. 70.220 1474238 0

1APLANE d.o.o. SOLKAN 3307611000 Letalska družba APLANE d.o.o. 30.300 1034269 8

1ARTPLANET SLOVENSKABISTRICA 3498417000ARTPLANET, zavod za razvoj umetnosti, kulture in kakovosti življenja, Slovenska Bistrica 72.200 15

1ARTPLAN, d.o.o. KRANJ 6188265000 ARTPLAN, proizvodnja in trgovina d.o.o. 31.010 2429891 21

1ARPLAN, ANŽE REZAR s.p. PROSENIŠKO 3761843000ARPLAN, projektiranje, inženiring, svetovanje in storitve v gradbeništvu, ANŽE REZAR s.p. 71.129 2315474 25

1AL PLANET, Dejan Janež s.p. SEŽANA 3356892000 AL PLANET, Stavbno pohištvo iz aluminija, Dejan Janež s.p. 25.120 930791 26

1AR-AL NET d.o.o. ČENTIBA 6072526000 AR-AL NET, trgovina in posredništvo d.o.o. 47.910 28

1ARTLINE d.o.o. MENGEŠ 5333644000 ARTLINE, studio za oblikovanje, d.o.o. 73.110 1417055 28

2Savatech, d.o.o. KRANJ

2SAVATECH d.o.o. KRANJ 1661205000SAVATECH družba za proizvodnjo in trženje gumenotehničnih proizvodov in pnevmatike, d.o.o. 22.190 2404555 0

2SAITECH d.o.o. CELJE 5311292000 SAITECH podjetje za trgovino in storitve d.o.o. 43.290 1428363 21

2SAVA TMC, d.o.o. LJUBLJANA 1893718000SAVA TURIZEM - TMC, podjetje za upravljanje dejavnosti turizem, d.o.o. 70.100 2585325 21

2ASTECH d.o.o. LOGATEC 1661078000 ASTECH d.o.o., Inženiring in servisiranje strojnih instalacij 43.220 1617965 25

2AVTECH D.O.O. VIDRGA 3282058000 AVTECH, SVETOVANJE, ZASTOPSTVO, PROIZVODNJA, D.O.O. 70.220 284552 25

2SANOTECHNIK d.o.o. MARIBOR 5850908000 SANOTECHNIK trgovsko podjetje d.o.o. 46.730 1490149 27

3ARENDA d.o.o. LJUBLJANA

3ARENDA d.o.o. LJUBLJANA 1629417000 ARENDA, nepremičninska družba, d.o.o. 68.200 1242548 0

3OPTIKA ARENA d.o.o. MARIBOR 1873512000 OPTIKA ARENA, družba za trgovino in storitve d.o.o. 47.781 499981 10

3PEKARNA ARENA d.o.o. LJUBLJANA 3918076000 PEKARNA ARENA, pekarstvo in trgovina, d.o.o. 10.710 2313488 10

3ARENA SERVIS d.o.o. OSLUŠEVCI 6318797000ARENA SERVIS, izposojanje šotorov, šankov in gostinske opreme ter gostinske storitve, d.o.o. 77.390 10

3ADENDA d.o.o. MIREN 5743729000 ADENDA d.o.o. grafične storitve in oblikovanje 18.130 1365580 16

3AGENDA d.o.o. MARIBOR 5656222000 AGENDA komunikacijski in informacijski inženiring d.o.o. 62.020 163187 16

3RANDA d.o.o. LJUBLJANA 6011624000 RANDA gradbeništvo, storitve in prevozi d.o.o. 41.200 1890496 20

3AGENDA 2003 d.o.o. LJUBLJANA 1824775000AGENDA 2003 premoženjsko svetovanje in računovodske storitve d.o.o. 69.200 63849 24

13

Page 14: Job Vacancies Experiment Boro Nikić Satellite workshop on Big Data, NTTS 2015.

Testing the BD quality indicators proposed by Quality Team

Quality framework consists of three quality hyperdimensions: input, throughput and output hyperdimension

http://www1.unece.org/stat/platform/pages/viewpageattachments.action?pageId=101158888&metadataLink=true

14

Page 15: Job Vacancies Experiment Boro Nikić Satellite workshop on Big Data, NTTS 2015.

Conclusions (1)BD could be used as a source: • for new types of statistics• for existing statistics • for validation of existing statistics

In case of scraping of JV data:• Change of mode of collection • Validation of data collected by traditional way

(administrative sources, questionnaire• Flash statistics

15

Page 16: Job Vacancies Experiment Boro Nikić Satellite workshop on Big Data, NTTS 2015.

Conclusions (2)Before the JV BD source is employed in regular statistical production the scraping tools, procedures of manipulation of data and statistics must be carefully tested in period of at least one year in order to ensure stability of sources and statistics.

More about experiment can be found on

http://www1.unece.org/stat/platform/display/BDP/Sandbox+Task+Team

16

Page 17: Job Vacancies Experiment Boro Nikić Satellite workshop on Big Data, NTTS 2015.

Thank you for your attention!

17