David Tarrant [email protected] @davetaz
Publishing Open Data
Provide an overview of current open data publishing practices.
Aim
Understand the dierence between data on the web and the web of dataEvaluate a number of dierent approaches for publishing open data.Develop a strategy for publishing data applicable to a specific domain.
Outcomes
Publication phases
Phase 1: Get the data online, in some form. This will help with the trust and transparency and community building.
Phase 2: Increase the usability of the data by potentially publishing dierently and keeping it up to date.
Data ON the web Government data Private sector data
Google advanced
Aggregators and portals Scraping
data.gov.XX
Government
Government / Private
?
Suppliers
BP: You may not frame this site nor link to a page other than the home page without our express permission. To nd this Google bp sta's'cal review
X
Suppliers
h@p://manufacturingmap.nikeinc.com/#
You agree not to change or delete any ownership noDces from materials downloaded or printed from the PlaEorm. You agree not to modify, copy, translate, broadcast, perform, display, distribute, frame, reproduce, republish, download, display, post, transmit or sell
any Intellectual Property or Content appearing on the PlaEorm
X
Aggregators and portalsCollect together data from across the web into one place.
enigma.io transportAPI
Data IN the web
The developers secret
Linked data Amazing but hard to publishing and use.
ON the web
IN the web
Approaches to publishing data
Exercise
List 3 datasets that have been published ON the web (& where)?
List 1 that has been published IN the web (& where)?5 minutes
Open Data Platforms
h@p://www.ickr.com/photos/wwarby
TypesSpecialist Solu'on Integrated Solu'on
+ Easy to get setup and maintain. + Open Data focused + Clear workows for publishing open data + VisualisaDon tools + Data mashing tools + Best for transacDonal data
+ No new plaEorm to learn + Data is provided in parallel to web pages + No separaDon from authoritaDve data + Easy discovery of data + Best for reference data + Best for Linked Open Data
Key characteristics of specialist solution
1. Separate from your main org website
2. Designed to publish open data, not to fulfill other organisation goals
Key characteristics of integrated solution
1. It is your main website
2. Publishes data alongside everything else that the organisation does
Merging specialist and integrated
Method 1: Build the functionality of your current website into a new open data platform.
Method 2: Hide the specialist solution behind your main website and use it as a loosely coupled CMS.
The sliding scale of specialist solutions
1. Catalogue: Point to data (leave it at source)2. Re-present: Provide data services (leave it at source)3. Host the data: Be the source4. Control the data: Be the authority5. Be the hub: Host the data and processor
12345
Specialist Solutions
h@p://www.ickr.com/photos/okfn
12
23
45
Open Knowledge Foundation Supported
Data CatalogueOpen Source
Feels like a record managerSimple API and search
Lots of community tools
http://demo.ckan.org/
12
Updated July 2014Early) Dataset catalogue (data.gov.uk)
no data hosted or searchedMid) Data and dataset catalogue
no data hosted but it is searchableNow) Integrated data driven web site
data platform is integrated with data, search and content
Evolution of CKAN 12
Features
Publish, Store and Manage Data and MetadataVisual and GeospatialSocialFull Stored HistoryFederate Your Data With Other OrganizationsRich RESTful JSON API for Developers
1 2
Open Data Soft
23
Open Data Soft
Data as a Service (DaaS)Hosted enterprise solution
Rich InterfaceQuery based API (3-Star)
23
Open Data Soft
23
Open Data Soft
Data as a Service (DaaS)Hosted enterprise solution
Rich InterfaceQuery based API (3-Star)
Closed Source (main product)EU Based
23
Data as a Service (DaaS)Hosted enterprise solution
Allows user created contentFull linked API (5-Star)
https://opendata.socrata.com/
45
FeaturesData Publishing, Optimized for Business UsersFlexible Metadata ManagementFederate Your Data With Other OrganizationsMetrics of the Success of Your Initiative in Real-timeAnyone Can Create Maps and ChartsData Becomes SocialDevelopers Are Supported Every Step of the Way
45
Data as a Service (DaaS)Hosted SolutionClean Interface
Powerful API (SODA)
Closed Source (main product)US Based
45
The sliding scale of specialist solutions
1. Catalogue: Point to data (leave it at source)2. Re-present: Provide data services (leave it at source)3. Host the data: Be the source4. Control the data: Be the authority5. Be the hub: Host the data and processor
12345
Specialist Solutions
h@p://www.ickr.com/photos/okfn
12
23
45
Integrated solutions
Integrated solutions expose data using the current infrastructure (web pages).
Data driven web site
Best for reference and live data
The developers secret
Linked data Amazing but hard to publishing and use.
RecapSpecialist Solu'on Integrated Solu'on
+ Easy to get setup and maintain. + Open Data focused + Clear workows for publishing open data + VisualisaDon tools + Data mashing tools + Best for transacDonal data
+ No new plaEorm to learn + Data is provided in parallel to web pages + No separaDon from authoritaDve data + Easy discovery of data + Best for reference data + Best for Linked Open Data
Both great for open data
Integrated solutions more suited for building a web of linked data
ExerciseTake a look at the following portals and list 3 things you like and 3 things you would improve about each:
CKAN (http://data.gov.uk)
Open Data Soft (http://public.opendatasoft.com/)
Socrata (http://data.cityofchicago.org/)
Understand the dierence between data on the web and the web of dataEvaluate a number of dierent approaches for publishing open data.Develop a strategy for publishing data applicable to a specific domain.
Outcomes
Which strategy do you feel best suits your domain and why?
Exercise
David Tarrant [email protected] @davetaz
Thank-You