Addressing exploitability of Smart City data 1 Enrico Daga, Mathieu d’Aquin, Alessandro Adamou, Enrico Motta Data Science Group Knowledge Media Ins8tute, The Open University Milton Keynes (UK) Feedback: @enridaga @datasciencegr #kmiou September 13th, 2016 Trento (Italy) IEEE Interna)onal Smart Ci)es Conference (ISC2) hNp://events.unitn.it/en/isc22016
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Addressing exploitability of Smart City data
1
Enrico Daga, Mathieu d’Aquin, Alessandro Adamou, Enrico Motta Data Science Group Knowledge Media Ins8tute, The Open University Milton Keynes (UK)
Top MK is a virtual card playing game where each card represents a ward in Milton Keynes, with characteris8cs such as area, popula)on, level of qualifica)ons, etc. Two players, one human and the other automa8c, try to win the other’s cards by choosing the characteris8c that has the best chance to win against the other card.
hNps://data.beta.mksmart.org/apps/topmk/
Feedback: @enridaga @datasciencegr #kmiou
The problem of exploitability
• Data come from different owners and have different licenses.• Data are processed into new data before being reused.• What are the policies that apply to the output data?• Can we make use of it in a commercial setting?
4
Could Top Trumps sell this game?
Feedback: @enridaga @datasciencegr #kmiou
"Data exploitability" is the assessment of the policies associated with the data resulting from the computation of diverse datasets in complex data flows.
Under the hood - 1/5
The En)ty-‐Centric API (ECApi) offers an en8ty based access point to the informa8on offered by the Data Hub [2].
Licenses are described as machine readable policies: permissions, prohibi8ons or du8es [3].
Good news, this is OGL, it can be used in commercial applications.
Under the hood - 3/5License
8
Under the hood - 4/5Data flowData flows can be represented with the Datanode ontology [4] as graphs of data “nodes”.
(The logic here) http://purl.org/datanode/ns/http://purl.org/datanode/docs/
This is the semantics behind the code!
9
Under the hood - 5/5Reasoning on Policy PropagationMachine readable policies and data flows allow us to reason on policy propaga8on exploi8ng Policy Propaga)on Rules (PPR) [5].
Yes.(but they must include attribution statements)
10
The problem of exploitability (reprise)
Could Top Trumps sell this game?
How can we make it work at scale?
• Represent diversity of datasets, licenses and data flows• Support developers in the assessment of policies associated with the
data and how they affect their data flows
11
Data cataloguing as the backbone of data governance.Follow the journey of the data and trace the semantics, respecting the diversity datasets, licenses and data flows.
Metadata Supply Chain - 1/2Approach
Delivery
Processing
Record
Content
Data flow
Proven
ance
(Meta)data Catalogue
Acquisi)on
Onboarding
Onboarding Setup a catalogue record of the data source
Processing Describe the Data flow Reason on policy propaga8on
Delivery Provide provenance informa8on
Feedback: @enridaga @datasciencegr #kmiou
12
•Data provider specifies a single License •Same License for any user •License is described in the catalogue •License policies are referenced by Policy Propaga8on Rules
•Data source is accessible •Acquisi8on processes respect the data source License
•Data flows can be described with Datanode •ETL pipelines do not violate the policies •Process execu)ons do not influence policies propaga)on
•Data flow descrip8ons and License policies enable reasoning on policy propaga8on •End-‐user access methods provides provenance informa8on
Evaluation (can we really do that?)
An end-to-end solution for exploitability assessment can be implemented.
Metadata Supply Chain - 2/2
Considering a given set of assump8ons (details in the paper…):
Lessons learnt
13
• Assessing exploitability of smart city data is possible following a holistic approach to data cataloguing:• understanding the semantics of data flows;• understanding the role of policies (licences).
• New open challenges:• Handle the diversity of policies and consequently the size of Policy
Propagation Rules [3].• Support Data providers in the selection of the right license [6].• Support developers in the definition of data flows [7].• Integrate validation of propagated policies [8].• Integrate validation of data flows with respect to policies.• Reasoning with process execution traces (not only at design time).
References[1] M. d’Aquin, J. Davies, and E. Motta. Smart cities’ data: Challenges and opportunities for semantic technologies. Internet Computing, IEEE, 19(6):66–70, 2015.
[2] A. Adamou and M. d’Aquin. On requirements for federated data integration as a compilation process. In Proceedings of 2nd International Workshop on Dataset PROFIling and fEderated Search for Linked Data (PRO- FILES)., pages 75–80, 2015.
[3] Open Digital Rights Language (ODRL) Version 2.1 https://www.w3.org/ns/odrl/2/ODRL21 (accessed 09/09/2016)
[4] E. Daga, M. d’Aquin, A. Gangemi, and E. Motta. Describing semantic web applications through relations between data nodes. Technical Report kmi-14-05, Knowl- edge Media Institute, The Open University, Walton Hall, Milton Keynes, 2014.
[5] E. Daga, M. d’Aquin, A. Gangemi, and E. Motta. Propagation of policies in rich data flows. In Proceedings of the 8th International Conference on Knowledge Capture, page 5. ACM, 2015.
[6] Daga, Enrico ; d'Aquin, Mathieu ; Motta, Enrico and Gangemi, Aldo (2015). A Bottom-Up Approach for Licences Classification and Selection. In: 2015 Workshop on Legal Domain And Semantic Web Applications (LeDA-SWAn 2015), 1 June 2015, Portoroz, Slovenia.
[7] E. Daga, M. d.Aquin, A. Gangemi and E. Motta: An incremental learning method to support the annotation of workflows with data-to-data relations. 20th International Conference on Knowledge Engineering and Knowledge Management. Bologna, Italy, 19-23 November 2016 - ACCEPTED
[8] H.-P. Lam and G. Governatori. The Making of SPINdle. In A. Paschke, G. Governatori, and J. Hall, editors, Proc. RuleML’09, pp. 315–322. Springer-Verlag, 2009