Teaching Open Datasets to Dance Together By Alon Peled The Hebrew University of Jerusalem
Teaching Open Datasets to Dance Together
ByAlon Peled
The Hebrew University of Jerusalem
What is Open Data?•Datasets published by public authorities worldwide
on the Internet
Economic Potential
Public sector organizations, at all levels of government, publish tens of millions datasets on Open Data portals
as dictated by the law.
Challenges
Use
No tools to analyze and integrate the
data
Classification
Inconsistent and poor
tagging of datasets
Discovery
Lack of uniform publication
standards and difficulty finding
datasets
Smart integration of datasets from multiple sourcesExample: Open Data About Natural Gas Projects
Search Results Compared to Google’s Search EngineExample: Data about Toyota Flaws
Search Results Compared to Google’s Search EngineExample: Data about Public Sector Tenders
Technological Innovation: The Process
Open Data Crawlers discover open data catalogues
Visited Open Data Repositories
Original Metadata of the Open Data
Publications
ETL-Extract, Transform, Load
Server(EDW)
Smart Tagging Algorithm(MECA)
Open Data Portals
Enterprise Data Warehouse
(EDW)
Enterprise Data Warehouse
(EDW)With Smart Tagging
Open Dataset
In-Database Affinity
Crowdsourcing - Behavioral
Crowdsourcing - Survey
Expert Dictionary Text Analytics
Smart Tagging History
Repository
Smart Tagging
Selection Algorithm
Detailed Chronology of
the Smart Tagging Process
Adding Smart Tags to the
Metadata of a Specific Open
Dataset
Context Analysis
Technological Innovation: Smart Tagging
Patents
Metadata-Driven Smart Indexing
PCT Application No. PCT/IL2016/051052
"Advanced Computer Implementation For Crawling And/Or Detecting Related Electronically Catalogued Data Using Improved Metadata Processing"
Smart Tagging
United States Patent Application No. 15/272,058
"Method of enriching metadata usable for content searching and system thereof"
Existing Dataset Search Solutions
Search Engines
Portals Vendors
Marketplaces
Limited to a single city/state without smart tagging
Limited to a single economic vertical or a single tagging technique
Open data upload per individual client without smart tagging
Limited to data integration or data trading without smart tagging
New Dataset Search Solutions
Search Engines
Portals Vendors
Marketplaces
Classification Example 01 - A NOAA Dataset
Classification Example 01 - MECA-in-Action
Context Analysis-- Twitter-- WalframAlpha-- GoogleTrends
Data Analytics-- Domain & Demographics-- Column Analysis-- Raw Analysis
Crowdsourcing Analytics-- People (Expert/Layman)-- Survey
Expert Dictionary-- Hints
In-Database Affinity -- Dataset Comparison-- Corpus Comparison
Dancing Together!
Data…Data…Everywhere!
The Politics of (Very Large) Datasets
The Whole is Greater Than Its Parts(OR “Where in the World is my Garbage?”)
Asset #84834, Buenos Aires (Municipal-Spanish), Garbage Collection – Division of Services. API
Asset #49857, Queensland (State-English), Corporate report of dumps.CSV
Asset #106213, Germany(Federal-German), Contractors–years of installation. XLS
Asset #26470 Kenya (National-Swahili), County estimates of households. CSV
Asset #86888, European Union (International-English), Composition of municipal data. API
Thank You!