Top Banner
Teaching Open Datasets to Dance Together By Alon Peled The Hebrew University of Jerusalem
19

Teaching Open Datasets to Dance Together...Economic Potential Public sector organizations, at all levels of government, publish tens of millions datasets on Open Data portals as dictated

Jun 27, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Teaching Open Datasets to Dance Together...Economic Potential Public sector organizations, at all levels of government, publish tens of millions datasets on Open Data portals as dictated

Teaching Open Datasets to Dance Together

ByAlon Peled

The Hebrew University of Jerusalem

Page 2: Teaching Open Datasets to Dance Together...Economic Potential Public sector organizations, at all levels of government, publish tens of millions datasets on Open Data portals as dictated

What is Open Data?•Datasets published by public authorities worldwide

on the Internet

Page 3: Teaching Open Datasets to Dance Together...Economic Potential Public sector organizations, at all levels of government, publish tens of millions datasets on Open Data portals as dictated

Economic Potential

Public sector organizations, at all levels of government, publish tens of millions datasets on Open Data portals

as dictated by the law.

Page 4: Teaching Open Datasets to Dance Together...Economic Potential Public sector organizations, at all levels of government, publish tens of millions datasets on Open Data portals as dictated

Challenges

Use

No tools to analyze and integrate the

data

Classification

Inconsistent and poor

tagging of datasets

Discovery

Lack of uniform publication

standards and difficulty finding

datasets

Page 5: Teaching Open Datasets to Dance Together...Economic Potential Public sector organizations, at all levels of government, publish tens of millions datasets on Open Data portals as dictated

Smart integration of datasets from multiple sourcesExample: Open Data About Natural Gas Projects

Page 6: Teaching Open Datasets to Dance Together...Economic Potential Public sector organizations, at all levels of government, publish tens of millions datasets on Open Data portals as dictated

Search Results Compared to Google’s Search EngineExample: Data about Toyota Flaws

Page 7: Teaching Open Datasets to Dance Together...Economic Potential Public sector organizations, at all levels of government, publish tens of millions datasets on Open Data portals as dictated

Search Results Compared to Google’s Search EngineExample: Data about Public Sector Tenders

Page 8: Teaching Open Datasets to Dance Together...Economic Potential Public sector organizations, at all levels of government, publish tens of millions datasets on Open Data portals as dictated

Technological Innovation: The Process

Open Data Crawlers discover open data catalogues

Visited Open Data Repositories

Original Metadata of the Open Data

Publications

ETL-Extract, Transform, Load

Server(EDW)

Smart Tagging Algorithm(MECA)

Open Data Portals

Page 9: Teaching Open Datasets to Dance Together...Economic Potential Public sector organizations, at all levels of government, publish tens of millions datasets on Open Data portals as dictated

Enterprise Data Warehouse

(EDW)

Enterprise Data Warehouse

(EDW)With Smart Tagging

Open Dataset

In-Database Affinity

Crowdsourcing - Behavioral

Crowdsourcing - Survey

Expert Dictionary Text Analytics

Smart Tagging History

Repository

Smart Tagging

Selection Algorithm

Detailed Chronology of

the Smart Tagging Process

Adding Smart Tags to the

Metadata of a Specific Open

Dataset

Context Analysis

Technological Innovation: Smart Tagging

Page 10: Teaching Open Datasets to Dance Together...Economic Potential Public sector organizations, at all levels of government, publish tens of millions datasets on Open Data portals as dictated

Patents

Metadata-Driven Smart Indexing

PCT Application No. PCT/IL2016/051052

"Advanced Computer Implementation For Crawling And/Or Detecting Related Electronically Catalogued Data Using Improved Metadata Processing"

Smart Tagging

United States Patent Application No. 15/272,058

"Method of enriching metadata usable for content searching and system thereof"

Page 11: Teaching Open Datasets to Dance Together...Economic Potential Public sector organizations, at all levels of government, publish tens of millions datasets on Open Data portals as dictated

Existing Dataset Search Solutions

Search Engines

Portals Vendors

Marketplaces

Limited to a single city/state without smart tagging

Limited to a single economic vertical or a single tagging technique

Open data upload per individual client without smart tagging

Limited to data integration or data trading without smart tagging

Page 12: Teaching Open Datasets to Dance Together...Economic Potential Public sector organizations, at all levels of government, publish tens of millions datasets on Open Data portals as dictated

New Dataset Search Solutions

Search Engines

Portals Vendors

Marketplaces

Page 13: Teaching Open Datasets to Dance Together...Economic Potential Public sector organizations, at all levels of government, publish tens of millions datasets on Open Data portals as dictated

Classification Example 01 - A NOAA Dataset

Page 14: Teaching Open Datasets to Dance Together...Economic Potential Public sector organizations, at all levels of government, publish tens of millions datasets on Open Data portals as dictated

Classification Example 01 - MECA-in-Action

Context Analysis-- Twitter-- WalframAlpha-- GoogleTrends

Data Analytics-- Domain & Demographics-- Column Analysis-- Raw Analysis

Crowdsourcing Analytics-- People (Expert/Layman)-- Survey

Expert Dictionary-- Hints

In-Database Affinity -- Dataset Comparison-- Corpus Comparison

Page 15: Teaching Open Datasets to Dance Together...Economic Potential Public sector organizations, at all levels of government, publish tens of millions datasets on Open Data portals as dictated

Dancing Together!

Page 16: Teaching Open Datasets to Dance Together...Economic Potential Public sector organizations, at all levels of government, publish tens of millions datasets on Open Data portals as dictated

Data…Data…Everywhere!

Page 17: Teaching Open Datasets to Dance Together...Economic Potential Public sector organizations, at all levels of government, publish tens of millions datasets on Open Data portals as dictated

The Politics of (Very Large) Datasets

Page 18: Teaching Open Datasets to Dance Together...Economic Potential Public sector organizations, at all levels of government, publish tens of millions datasets on Open Data portals as dictated

The Whole is Greater Than Its Parts(OR “Where in the World is my Garbage?”)

Asset #84834, Buenos Aires (Municipal-Spanish), Garbage Collection – Division of Services. API

Asset #49857, Queensland (State-English), Corporate report of dumps.CSV

Asset #106213, Germany(Federal-German), Contractors–years of installation. XLS

Asset #26470 Kenya (National-Swahili), County estimates of households. CSV

Asset #86888, European Union (International-English), Composition of municipal data. API

Page 19: Teaching Open Datasets to Dance Together...Economic Potential Public sector organizations, at all levels of government, publish tens of millions datasets on Open Data portals as dictated

Thank You!