Propagating Data Policies A User Study 1 Enrico Daga (The Open University, UK) Mathieu d’Aquin (Insight Centre, NUI Galway, Ireland) Enrico Motta (The Open University, UK) KCAP2017 The 9th Interna4onal Conference on Knowledge Capture December 4th6th, 2017Aus4n, Texas, United States
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Propagating Data PoliciesA User Study
1
Enrico Daga (The Open University, UK)Mathieu d’Aquin (Insight Centre, NUI Galway, Ireland)Enrico Motta (The Open University, UK)
K-‐CAP2017 The 9th Interna4onal Conference on Knowledge Capture December 4th-‐6th, 2017Aus4n, Texas, United States
City Data Hubs
2
Smart Bins to make garbage collec2on more efficient
Monitor parking spaces to support ci2zens’ mobility
Observe busyness of places to be=er tune services
Forecast car accidents to improve drivers’ awareness
MK:Smart is an integrated innova4on and support programme leveraging large-‐scale city data to drive growth in Milton Keynes (UK).
Data flow descriptions “Describing semantic web applications through relations between data nodes”, Technical Report, 2015Rules acquisition and management “Propagation of policies in rich data flows”, K-Cap, 2015. Policy knowledge acquisition “A Bottom-Up Approach for Licences Classification and Selection”, LeDA-SWAn, 2015. Process knowledge acquisition “An incremental learning method to support the annotation of workflows with data-to-data relations”, EKAW, 2016. Applicability in an end-to-end scenario “Addressing exploitability of smart city data”, IEEE ISC2, 2016. Efficiency of the reasoner “Reasoning with Data Flows and Policy Propagation Rules”, Semantic Web Journal, Special Issue to appear 2017?
Objective
To evaluate the system accuracy and utility:• Are the assumptions made about the needed components the correct
ones?• To what extent a system supporting such a task can be accurate?
When it is not, are there any fundamental reasons?• How difficult is the task? Is this support needed?
6Feedback: @enridaga
User Study DesignA Data Hub manager to take decisions about policy propagation• having the same knowledge used by the system. • 10 participants selected among researchers and PhD students
having data analysis skills• No legal expertise: developers and data managers don’t have that.• Working in pairs: required to develop an agreement.• 5 teams: MAPI, ILAN, CAAN, ALPA, NIFR• Real data sources (MK Data Hub), licences (RDF Licence Database),
and processes (Rapid Miner files found on GitHub.com)• 5 data journeys designed by mapping data sources and processes
in a narrative: SCRAPE, FOOD, CLOUD, AVG, CLEAN• Distributed in a latin square: a team for 2 journeys and a journey for
2 teams; at most 1 shared journey between teams.
7Feedback: @enridaga
User Study DesignWe developed a tool to assist participants in a data journey:1. Understand the process (on Rapid Miner)2. Understand the input datasets (from the MK Data Hub)3. Understand the input policies (from the RDF Licence Database)4. Decide what policies shall propagate to the output (1-5 Likert scale)5. Compare with the automatic reasoner and explain differences
8Feedback: @enridaga
Understand a data journey
9
222 CHAPTER 6. EVALUATION IN THE CONTEXT OF A SMART CITY DATA HUB
Table 6.1: Data Journeys (a,b).
(a) SCRAPE
SCRAPE Milton Keynes Websites Scraper.The content of Websites about Milton Keynes is downloaded. Each web page is processed in orderto select only the relevant part of the content. After each iteration the resulting text is appended to adataset. The resulting dataset is modified before being saved in the local data storage.
Datasets Milton Keynes Council Website (UK OGL 2.0), MK50 Website (All rights reserved),Wikipedia pages about Milton Keynes (CC-By-SA 3.0)
Process https://github.com/mtrebi/SentimentAnalyzer/tree/
master/process/scraper.rmp
Teams ILAN , MAPI
(b) FOOD
FOOD Models for Food Rating Prediction.A lift chart graphically represents the improvement that a mining model provides when comparedagainst a random guess, and measures the change in terms of a lift score. In this task, two techniquesare compared, namely Decision Tree and Naive Bays. The task uses data about Food Ratings ininformation about quality of life in Milton Keynes Wards. The result are two pareto charts to becompared.
Datasets OpenDataCommunities Worthwhile 2011-2012 Average Rating (UK OGL 2.0), FoodEstablishments Info and Ratings (Terms of use)
Process https://github.com/samwar/tree/master/rapid_miner_
training/16_lift_chart.rmp
Teams NIFR, ALPA
Understand a data journey
10
222 CHAPTER 6. EVALUATION IN THE CONTEXT OF A SMART CITY DATA HUB
Table 6.1: Data Journeys (a,b).
(a) SCRAPE
SCRAPE Milton Keynes Websites Scraper.The content of Websites about Milton Keynes is downloaded. Each web page is processed in orderto select only the relevant part of the content. After each iteration the resulting text is appended to adataset. The resulting dataset is modified before being saved in the local data storage.
Datasets Milton Keynes Council Website (UK OGL 2.0), MK50 Website (All rights reserved),Wikipedia pages about Milton Keynes (CC-By-SA 3.0)
Process https://github.com/mtrebi/SentimentAnalyzer/tree/
master/process/scraper.rmp
Teams ILAN , MAPI
(b) FOOD
FOOD Models for Food Rating Prediction.A lift chart graphically represents the improvement that a mining model provides when comparedagainst a random guess, and measures the change in terms of a lift score. In this task, two techniquesare compared, namely Decision Tree and Naive Bays. The task uses data about Food Ratings ininformation about quality of life in Milton Keynes Wards. The result are two pareto charts to becompared.
Datasets OpenDataCommunities Worthwhile 2011-2012 Average Rating (UK OGL 2.0), FoodEstablishments Info and Ratings (Terms of use)
Process https://github.com/samwar/tree/master/rapid_miner_
training/16_lift_chart.rmp
Teams NIFR, ALPA
Comparing with system
11
Explanation: blocked
12
Explanation: propagated
13
K-CAP 2017, (Submitted, 2017), Austin, Texas USA E. Daga et al.
defined as the assessment of the policies associated withthe data resulting from the computation of diverse datasetsimplemented within a City Data Hub [4]. Indeed, assessinghow the policies associated with the sensed data will bepropagated to the results of a data processing pipeline isan important problem. Data consumers might need to checkwhich original sources of the data need to be acknowledgedbecause of an attribution requirement, and even whether theform of exposure or re-distribution they employ is allowedaccording to the policies attached to each individual pieceof data they might obtain from the Data Hub. Researchon policy models and reasoning focuses on the problem oflicence compatibility and composition [2, 9]. However, rea-soning on policy propagation is a necessary preliminary stepto any policy validation or consistency check. In our workwe reuse models developed within the Open Digital RightsLanguage1 (ODRL) research community (for example [12]).A discussion on ODRL action dependencies and how theya↵ect the policy semantics is included in [13]. Nonetheless, tothe best of our knowledge, the first attempt to analyse howpolicies can propagate in manipulation processes is the onepresented in one of our earlier papers [6]. In [6] we introduced
Figure 1: Explanation: propagation trace.
the notion of Policy Propagation Rule (PPR) in order tosolve the task of automatically deciding what policies associ-ated to a data source need to be enforced to the output of aprocess in which that data source is involved. PPRs establisha fundamental connection between a policy - a permission,prohibition or duty - and a semantic relation between twodata objects, expressed with the Datanode ontology [7]. Thus,
1ODRL W3C Community Group: https://www.w3.org/community/odrl/
it is possible to derive that a certain policy of the sourceneeds to be enforced on the target [5]. The Datanode Ontol-ogy [7] allows us to model a data manipulation scenario as anetwork of data objects, making it possible to reason uponthe relations between those data objects and to apply PPRs.However, in our earlier work we focused on the feasibility ofthe approach in terms of knowledge acquisition and manage-ment [6], scalability of the reasoner [5], and applicability inan end-to-end user scenario [4]. In this paper, we go a stepfurther by performing a user experiment in order to evaluatethe feasibility of policy propagation as a solvable problemand the hypotheses behind the development of the system,by relying both on quantitative and qualitative data analysismethods, particularly the Grounded Theory (GT) approach2,in a comparison between the automatic process and a manualone performed by people with the typical skillet found indata consumers, processors and publishers who would becarrying out this task in a realistic context.
3 THE SYSTEM AT A GLANCEThe role of the PP Reasoner is to support users in the assess-ment of the impact of input data policies on the exploitationof the output data of processes and workflows. Consider thecase where Food rating data released by a trusted authorityunder a licence that prohibits distribution is used alongsidepublic data about city roads in order to assess the best Ma-chine Learning approach, among several options, to employfor the prediction of good quality restaurants. This task wouldproduce two types of outputs: (a) a set of datasets aboutroads labelled with the expected food quality rating; and (b)a set of datasets including details about the performance ofeach one of the algorithms tested. While the prohibition ofdistribution should be taken into account when using theformer datasets, the same constraint would not apply to thelatter.
The system is designed to work with a set of referenceknowledge bases:
• Data Catalogue. Provides Datasets general metadata,including the link to the associated policy set (licence,Terms and Conditions, and so forth).
• Licence Catalogue. Includes the set of licences repre-sented using the ODRL Ontology3.
• Process Catalogue. Defines the set of processes repre-sented using the Datanode Ontology4.
• Policy Propagation Rules (PPRs). A rule base, devel-oped and managed as described in [6]. Rules have theform of a connection between an atomic policy and arelation that is supposed to propagate it. For instance,propagates(dn:cleanedInto, odrl:permission
cc:DerivativeWorks) instructs the reasoner topropagate odrl:permission cc:DerivativeWorks
whenever a data item is dn:cleanedInto another, sothat the cleaned item would also have the given policy.
2https://en.wikipedia.org/wiki/Grounded theory3ODRL Version 2.1 Ontology: http://www.w3.org/ns/odrl/2/ODRL214Datanode Ontology: http://purl.org/datanode/ns/
Explaining differences
14
Analysis
• Accuracy analysis. To evaluate the system (agreement with users)• Teams agree the system is right• Teams agree the system is not right
• Eg: the knowledge base needs to be improved (e.g. rules)• Teams disagree: we don’t know whether the system is right!
• Thematic analysis. Focusing on the disagreements between users• From explanations given in the last phase• From discussions during the study (reaching agreement)• Grounded Theory Approach
• Questionnaire. To assess the value to users after the study experience.
15Feedback: @enridaga
Questionnaire
16
Questionnaire
17
Propagating Data Policies: a User Study K-CAP 2017, (Submitted, 2017), Austin, Texas USA
Table 1: Data Journeys.
SCRAPE Milton Keynes Websites Scraper.Websites about Milton Keynes are scraped, and indexed locally.Datasets Milton Keynes Council Website (UK OGL 2.0), MK50
Website (All rights reserved), Wikipedia pages aboutMilton Keynes (CC-By-SA 3.0)
Process https://github.com/mtrebi/SentimentAnalyzer/tree/master/process/scraper.rmp
Teams ILAN , MAPI
FOOD Models for Food Rating Prediction.Two Machine Learning techniques are compared. The process uses dataabout Food Ratings and statistics about quality of life in MK wardsand generates a lift chart and performance vectors.Datasets OpenDataCommunities Worthwhile 2011-2012 Average
Rating (UK OGL 2.0), Food Establishments Info andRatings (Terms of use)
Process https://github.com/samwar/tree/master/rapid minertraining/16 lift chart.rmp
Teams NIFR, ALPA
CLOUD A tag cloud from microblog posts.Twitter posts about Milton Keynes are collected and processed in orderto obtain a clean vector of words, associated with an occurrence score.Datasets Twitter Feed #miltonkeynes (Terms of use)Policies Permissions: copy, display. Prohibitions: give, license,
AVG Moving average of sensors’ records.Calculation of a moving average and plotting from sensor records.Dataset Samsung Sensor Data (Terms of use)Policies Permissions: aggregate, anonymize, archive, derive, in-
Process https://github.com/billcary/Rapid Miner/tree/master/chapter03/MovingAveragePlotter.rmp
Teams NIFR, MAPI
CLEAN Sensor data cleaning workflow.The process performs a number of cleaning operations on sensorsstreams linked with postcodes in order to obtain a dataset ready foranalysis.Datasets Postcode Locations (UK OGL 2.0), Netatmo Weather
Process https://github.com/MartinSchmitzDo/RapidMinerDataCleaner/processes/clean.rmp
Teams CAAN , ILAN
borderline cases selected from the quantitative results. Tothis aim, we transcribed the notes and conversations occurredduring the experiment from the audio recordings and thetool. From these data we derived a set of general themesabout fundamental issues on policy propagation, adopting amethod that is akin to Grounded Theory (GT). We illustratesome exemplary cases and present the extracted themes inthe discussion Section 7.
5 USER’S FEEDBACKBefore analysing the data journeys and how the decisions ofthe users relate to the behaviour of our system it is worthshowing the feedback received after the study was conducted,
collected through a questionnaire. In the questionnaire, weposed some questions about the problem of policy propa-gation to assess the value of the system to the user. Thequestionnaire was completed by the study participants in-dividually. Table 2 summarises the nine closed-ended likertquestions (Q.1� 9), while Figure 3 shows the result of thesingle-choice question (Q.10). The majority of the partic-
Table 2: User’s feedback. The shading of the cells
reflect the distribution of the answers.
Q.ID
Question
Left answ. << < Unsure > >>Right
answ.
Q.1How di�cult was it to take a single decision on whether a policy propagates tothe output?Easy 0 1 3 6 0 Di�cult
Q.2Do you think you had enough information to decide?Yes 2 6 2 0 0 No
Q.3How di�cult was it to reach an agreement?Easy 1 5 2 2 0 Di�cult
Q.4Somebody with strong technical skills is absolutely required to take this decision.Do you agree?Yes 1 8 1 0 0 No
Q.5Somebody with strong technical skills is absolutely required to take this decisioneven with the support of automated reasoning. Do you agree?Yes 1 5 1 3 No
Q.6Understanding the details of the process is fundamental to take a decision. Doyou agree?Yes 6 3 1 0 0 No
Q.7How enjoyable was it to discuss and decide on policies and how they propagatein a process?Very 4 5 0 1 0 Not
Q.8How feasible/sustainable do you think it is to discuss and decide on policiesand how they propagate in a process?Feasible 3 1 3 1 2 Unfeasible
Q.9How sustainable do you think it is to discuss and decide on policies and howthey propagate in a process with the support of our system?Feasible 5 4 0 1 0 Unfeasible
2
10
6 1
The owner of the input data
The process executor
The consumer of the processed data
They must do it together
Nobody (it cannot be done)
Figure 3: Q10. Who should decide on what policies
propagate to the output of a process?
ipants of our study believe that the task can be a di�cultone (Q.1). However, the knowledge provided was adequatefor making an informed decision (Q.2). Deciding whether apolicy propagates is possible, even if not always trivial (Q.3).Users agree on considering policy propagation a problemthat cannot be solved without understanding the details ofthe data manipulation process (Q.6), therefore someone withstrong technical skills needs to be involved (Q.4, Q.5). Theobjective of Q.7 was to check whether users were positivelyinvoved in the study, assuming that an unengaged personwould not put enough e↵ort on expressing his opinion andtaking thorough decisions. Questions Q.8 focused on the sus-tainability of the task. Users feedback on this matter wasspread. Our hypothesis is that two data journeys are probablynot enough to understand how much this task could scale ina real setting. However, our system can e↵ectively support
Q.10 Who should decide on what policies propagate to the output of a process?
Accuracy analysis
18
D Number of policies of data sources / Number of decisions
T_avg Agreement with system, average of the T1 and T2
T12 Agreement between T1 and T2
T_12+ Agreement on certain answers
T_1+,T_2+ Amount of certain answers
Permissions
K-CAP 2017, (Submitted, 2017), Austin, Texas USA E. Daga et al.
the user on taking a decision (Q.1, Q.9). This feedback showsthat policy propagation is a di�cult problem, although itcan be solved with the right knowledge models. Therefore, atool supporting this task has good value for users. The lastquestion (Q.10) was meant to understand whether the DataHub manager could actually decide on policy propagation. Itturns out that most of the users think he/she cannot solvethe issue alone, but he/she should involve the data ownerand the process executor in this task. This conclusion reflectssome of the issues raised during the study, that are discussedin Section 7.
6 ACCURACY ANALYSISIn this Section we show how the decisions made by the userscompare to the system. The decisions taken by the systemare summarized in Table 3. For example, the SCRAPE datajourney required to check 16 policies and the system decidedto propagate 15 of them: 4 of the 5 permissions and all theprohibitions and duties. Tables 4a-4h summarize the resultsof our study in a quantitative way. The values are shownin two sets including the full numbers and the computedratio, considering all the decisions (Tables 4a and 4b), andthen split in permissions (Tables 4c and 4d), prohibitions(Tables 4e and 4f), and duties (Tables 4g and 4h). The valuesare first shown for each one of the user study (data journeyof each team), aggregated for each data journey (averageof both teams) and then as totals considering the decisionsfrom all data journeys (at the bottom). The data journeysrequired from seven to twenty-two policies to be analysedfor a total of seventy-seven decisions. Table 4a shows thenumber of decisions for each data journey (column D) andhow much the teams agreed with the system (T
avg
being theaverage value of the teams on the same data journey).
The agreement with the system is good, distributed dif-ferently across the data journeys and the teams, with anaverage ratio of 0.8. Moreover, this result is supported bythe high agreement rate between the two teams (T
avg
= 0.7).We observe that in more than half of the cases the decisionswere made with the same degree of confidence (T12+=0.6),and that in 70% of the cases users made a sharp decisionabout whether a policy would propagate or not (T1+/T2+
total average is 0.6). Inspecting the table we see that thedata journeys showing a lower agreement are FOOD/T1,AV G/T2 and CLEAN/T2. We will discuss these in the nextSection. The low scores on CLOUD/T12+ and CLOUD/T2+
only show a di↵erence in the degree of confidence of thedecisions, that is not especially relevant in this global view,
Table 4: Agreement analysis.
D: total number of decisions; T1, T2: agreement between sys-
tem and each team; Tavg: average agreement between teams
and system; T12: agreement between teams; T12+ agreement
between teams (only Certainly Yes/Absolutely No answers);
T1+, T2+: amount of Certainly Yes/Absolutely No answers.
Tables on the left indicate totals, while the ones on the right
altough this aspect will be discussed when looking at specificclasses of policies.
Tables 4c and 4d only show results involving policies oftype permission. The average agreement between the systemand the users considering all the decisions is 0.6. Particularly,the SCRAPE data journey for T2 shows a low agreement(0.6), also reflected in the number of common sharp decisions(0.4). This is a low score compared with the agreement ratioof prohibitions (0.9) and duties (0.8) that can be observed inTables 4f and 4h. It is su�cient to consider at this stage how itwas much easier to take decisions on prohibitions and duties,while permissions where a greater source of discussions anddisagreements with the system. Moreover, decisions aboutprohibitions and duties appeared to be sharper than theones about permissions, as both the agreement between theteams (T12) and the choices with strong confidence (T1+,T2+)received higher scores. However, on both types of policies thesource of disagreement is on the FOOD data journey. Weshowed that this is the case (8̃0% agreement). We complementthis data with a statistical analysis based on the Cohens kappacoe�cient (CKC), that takes into account the possibility of
K-CAP 2017, (Submitted, 2017), Austin, Texas USA E. Daga et al.
the user on taking a decision (Q.1, Q.9). This feedback showsthat policy propagation is a di�cult problem, although itcan be solved with the right knowledge models. Therefore, atool supporting this task has good value for users. The lastquestion (Q.10) was meant to understand whether the DataHub manager could actually decide on policy propagation. Itturns out that most of the users think he/she cannot solvethe issue alone, but he/she should involve the data ownerand the process executor in this task. This conclusion reflectssome of the issues raised during the study, that are discussedin Section 7.
6 ACCURACY ANALYSISIn this Section we show how the decisions made by the userscompare to the system. The decisions taken by the systemare summarized in Table 3. For example, the SCRAPE datajourney required to check 16 policies and the system decidedto propagate 15 of them: 4 of the 5 permissions and all theprohibitions and duties. Tables 4a-4h summarize the resultsof our study in a quantitative way. The values are shownin two sets including the full numbers and the computedratio, considering all the decisions (Tables 4a and 4b), andthen split in permissions (Tables 4c and 4d), prohibitions(Tables 4e and 4f), and duties (Tables 4g and 4h). The valuesare first shown for each one of the user study (data journeyof each team), aggregated for each data journey (averageof both teams) and then as totals considering the decisionsfrom all data journeys (at the bottom). The data journeysrequired from seven to twenty-two policies to be analysedfor a total of seventy-seven decisions. Table 4a shows thenumber of decisions for each data journey (column D) andhow much the teams agreed with the system (T
avg
being theaverage value of the teams on the same data journey).
The agreement with the system is good, distributed dif-ferently across the data journeys and the teams, with anaverage ratio of 0.8. Moreover, this result is supported bythe high agreement rate between the two teams (T
avg
= 0.7).We observe that in more than half of the cases the decisionswere made with the same degree of confidence (T12+=0.6),and that in 70% of the cases users made a sharp decisionabout whether a policy would propagate or not (T1+/T2+
total average is 0.6). Inspecting the table we see that thedata journeys showing a lower agreement are FOOD/T1,AV G/T2 and CLEAN/T2. We will discuss these in the nextSection. The low scores on CLOUD/T12+ and CLOUD/T2+
only show a di↵erence in the degree of confidence of thedecisions, that is not especially relevant in this global view,
Table 4: Agreement analysis.
D: total number of decisions; T1, T2: agreement between sys-
tem and each team; Tavg: average agreement between teams
and system; T12: agreement between teams; T12+ agreement
between teams (only Certainly Yes/Absolutely No answers);
T1+, T2+: amount of Certainly Yes/Absolutely No answers.
Tables on the left indicate totals, while the ones on the right
altough this aspect will be discussed when looking at specificclasses of policies.
Tables 4c and 4d only show results involving policies oftype permission. The average agreement between the systemand the users considering all the decisions is 0.6. Particularly,the SCRAPE data journey for T2 shows a low agreement(0.6), also reflected in the number of common sharp decisions(0.4). This is a low score compared with the agreement ratioof prohibitions (0.9) and duties (0.8) that can be observed inTables 4f and 4h. It is su�cient to consider at this stage how itwas much easier to take decisions on prohibitions and duties,while permissions where a greater source of discussions anddisagreements with the system. Moreover, decisions aboutprohibitions and duties appeared to be sharper than theones about permissions, as both the agreement between theteams (T12) and the choices with strong confidence (T1+,T2+)received higher scores. However, on both types of policies thesource of disagreement is on the FOOD data journey. Weshowed that this is the case (8̃0% agreement). We complementthis data with a statistical analysis based on the Cohens kappacoe�cient (CKC), that takes into account the possibility of
K-CAP 2017, (Submitted, 2017), Austin, Texas USA E. Daga et al.
the user on taking a decision (Q.1, Q.9). This feedback showsthat policy propagation is a di�cult problem, although itcan be solved with the right knowledge models. Therefore, atool supporting this task has good value for users. The lastquestion (Q.10) was meant to understand whether the DataHub manager could actually decide on policy propagation. Itturns out that most of the users think he/she cannot solvethe issue alone, but he/she should involve the data ownerand the process executor in this task. This conclusion reflectssome of the issues raised during the study, that are discussedin Section 7.
6 ACCURACY ANALYSISIn this Section we show how the decisions made by the userscompare to the system. The decisions taken by the systemare summarized in Table 3. For example, the SCRAPE datajourney required to check 16 policies and the system decidedto propagate 15 of them: 4 of the 5 permissions and all theprohibitions and duties. Tables 4a-4h summarize the resultsof our study in a quantitative way. The values are shownin two sets including the full numbers and the computedratio, considering all the decisions (Tables 4a and 4b), andthen split in permissions (Tables 4c and 4d), prohibitions(Tables 4e and 4f), and duties (Tables 4g and 4h). The valuesare first shown for each one of the user study (data journeyof each team), aggregated for each data journey (averageof both teams) and then as totals considering the decisionsfrom all data journeys (at the bottom). The data journeysrequired from seven to twenty-two policies to be analysedfor a total of seventy-seven decisions. Table 4a shows thenumber of decisions for each data journey (column D) andhow much the teams agreed with the system (T
avg
being theaverage value of the teams on the same data journey).
The agreement with the system is good, distributed dif-ferently across the data journeys and the teams, with anaverage ratio of 0.8. Moreover, this result is supported bythe high agreement rate between the two teams (T
avg
= 0.7).We observe that in more than half of the cases the decisionswere made with the same degree of confidence (T12+=0.6),and that in 70% of the cases users made a sharp decisionabout whether a policy would propagate or not (T1+/T2+
total average is 0.6). Inspecting the table we see that thedata journeys showing a lower agreement are FOOD/T1,AV G/T2 and CLEAN/T2. We will discuss these in the nextSection. The low scores on CLOUD/T12+ and CLOUD/T2+
only show a di↵erence in the degree of confidence of thedecisions, that is not especially relevant in this global view,
Table 4: Agreement analysis.
D: total number of decisions; T1, T2: agreement between sys-
tem and each team; Tavg: average agreement between teams
and system; T12: agreement between teams; T12+ agreement
between teams (only Certainly Yes/Absolutely No answers);
T1+, T2+: amount of Certainly Yes/Absolutely No answers.
Tables on the left indicate totals, while the ones on the right
altough this aspect will be discussed when looking at specificclasses of policies.
Tables 4c and 4d only show results involving policies oftype permission. The average agreement between the systemand the users considering all the decisions is 0.6. Particularly,the SCRAPE data journey for T2 shows a low agreement(0.6), also reflected in the number of common sharp decisions(0.4). This is a low score compared with the agreement ratioof prohibitions (0.9) and duties (0.8) that can be observed inTables 4f and 4h. It is su�cient to consider at this stage how itwas much easier to take decisions on prohibitions and duties,while permissions where a greater source of discussions anddisagreements with the system. Moreover, decisions aboutprohibitions and duties appeared to be sharper than theones about permissions, as both the agreement between theteams (T12) and the choices with strong confidence (T1+,T2+)received higher scores. However, on both types of policies thesource of disagreement is on the FOOD data journey. Weshowed that this is the case (8̃0% agreement). We complementthis data with a statistical analysis based on the Cohens kappacoe�cient (CKC), that takes into account the possibility of
K-CAP 2017, (Submitted, 2017), Austin, Texas USA E. Daga et al.
the user on taking a decision (Q.1, Q.9). This feedback showsthat policy propagation is a di�cult problem, although itcan be solved with the right knowledge models. Therefore, atool supporting this task has good value for users. The lastquestion (Q.10) was meant to understand whether the DataHub manager could actually decide on policy propagation. Itturns out that most of the users think he/she cannot solvethe issue alone, but he/she should involve the data ownerand the process executor in this task. This conclusion reflectssome of the issues raised during the study, that are discussedin Section 7.
6 ACCURACY ANALYSISIn this Section we show how the decisions made by the userscompare to the system. The decisions taken by the systemare summarized in Table 3. For example, the SCRAPE datajourney required to check 16 policies and the system decidedto propagate 15 of them: 4 of the 5 permissions and all theprohibitions and duties. Tables 4a-4h summarize the resultsof our study in a quantitative way. The values are shownin two sets including the full numbers and the computedratio, considering all the decisions (Tables 4a and 4b), andthen split in permissions (Tables 4c and 4d), prohibitions(Tables 4e and 4f), and duties (Tables 4g and 4h). The valuesare first shown for each one of the user study (data journeyof each team), aggregated for each data journey (averageof both teams) and then as totals considering the decisionsfrom all data journeys (at the bottom). The data journeysrequired from seven to twenty-two policies to be analysedfor a total of seventy-seven decisions. Table 4a shows thenumber of decisions for each data journey (column D) andhow much the teams agreed with the system (T
avg
being theaverage value of the teams on the same data journey).
The agreement with the system is good, distributed dif-ferently across the data journeys and the teams, with anaverage ratio of 0.8. Moreover, this result is supported bythe high agreement rate between the two teams (T
avg
= 0.7).We observe that in more than half of the cases the decisionswere made with the same degree of confidence (T12+=0.6),and that in 70% of the cases users made a sharp decisionabout whether a policy would propagate or not (T1+/T2+
total average is 0.6). Inspecting the table we see that thedata journeys showing a lower agreement are FOOD/T1,AV G/T2 and CLEAN/T2. We will discuss these in the nextSection. The low scores on CLOUD/T12+ and CLOUD/T2+
only show a di↵erence in the degree of confidence of thedecisions, that is not especially relevant in this global view,
Table 4: Agreement analysis.
D: total number of decisions; T1, T2: agreement between sys-
tem and each team; Tavg: average agreement between teams
and system; T12: agreement between teams; T12+ agreement
between teams (only Certainly Yes/Absolutely No answers);
T1+, T2+: amount of Certainly Yes/Absolutely No answers.
Tables on the left indicate totals, while the ones on the right
altough this aspect will be discussed when looking at specificclasses of policies.
Tables 4c and 4d only show results involving policies oftype permission. The average agreement between the systemand the users considering all the decisions is 0.6. Particularly,the SCRAPE data journey for T2 shows a low agreement(0.6), also reflected in the number of common sharp decisions(0.4). This is a low score compared with the agreement ratioof prohibitions (0.9) and duties (0.8) that can be observed inTables 4f and 4h. It is su�cient to consider at this stage how itwas much easier to take decisions on prohibitions and duties,while permissions where a greater source of discussions anddisagreements with the system. Moreover, decisions aboutprohibitions and duties appeared to be sharper than theones about permissions, as both the agreement between theteams (T12) and the choices with strong confidence (T1+,T2+)received higher scores. However, on both types of policies thesource of disagreement is on the FOOD data journey. Weshowed that this is the case (8̃0% agreement). We complementthis data with a statistical analysis based on the Cohens kappacoe�cient (CKC), that takes into account the possibility of
Prohibitions
K-CAP 2017, (Submitted, 2017), Austin, Texas USA E. Daga et al.
the user on taking a decision (Q.1, Q.9). This feedback showsthat policy propagation is a di�cult problem, although itcan be solved with the right knowledge models. Therefore, atool supporting this task has good value for users. The lastquestion (Q.10) was meant to understand whether the DataHub manager could actually decide on policy propagation. Itturns out that most of the users think he/she cannot solvethe issue alone, but he/she should involve the data ownerand the process executor in this task. This conclusion reflectssome of the issues raised during the study, that are discussedin Section 7.
6 ACCURACY ANALYSISIn this Section we show how the decisions made by the userscompare to the system. The decisions taken by the systemare summarized in Table 3. For example, the SCRAPE datajourney required to check 16 policies and the system decidedto propagate 15 of them: 4 of the 5 permissions and all theprohibitions and duties. Tables 4a-4h summarize the resultsof our study in a quantitative way. The values are shownin two sets including the full numbers and the computedratio, considering all the decisions (Tables 4a and 4b), andthen split in permissions (Tables 4c and 4d), prohibitions(Tables 4e and 4f), and duties (Tables 4g and 4h). The valuesare first shown for each one of the user study (data journeyof each team), aggregated for each data journey (averageof both teams) and then as totals considering the decisionsfrom all data journeys (at the bottom). The data journeysrequired from seven to twenty-two policies to be analysedfor a total of seventy-seven decisions. Table 4a shows thenumber of decisions for each data journey (column D) andhow much the teams agreed with the system (T
avg
being theaverage value of the teams on the same data journey).
The agreement with the system is good, distributed dif-ferently across the data journeys and the teams, with anaverage ratio of 0.8. Moreover, this result is supported bythe high agreement rate between the two teams (T
avg
= 0.7).We observe that in more than half of the cases the decisionswere made with the same degree of confidence (T12+=0.6),and that in 70% of the cases users made a sharp decisionabout whether a policy would propagate or not (T1+/T2+
total average is 0.6). Inspecting the table we see that thedata journeys showing a lower agreement are FOOD/T1,AV G/T2 and CLEAN/T2. We will discuss these in the nextSection. The low scores on CLOUD/T12+ and CLOUD/T2+
only show a di↵erence in the degree of confidence of thedecisions, that is not especially relevant in this global view,
Table 4: Agreement analysis.
D: total number of decisions; T1, T2: agreement between sys-
tem and each team; Tavg: average agreement between teams
and system; T12: agreement between teams; T12+ agreement
between teams (only Certainly Yes/Absolutely No answers);
T1+, T2+: amount of Certainly Yes/Absolutely No answers.
Tables on the left indicate totals, while the ones on the right
altough this aspect will be discussed when looking at specificclasses of policies.
Tables 4c and 4d only show results involving policies oftype permission. The average agreement between the systemand the users considering all the decisions is 0.6. Particularly,the SCRAPE data journey for T2 shows a low agreement(0.6), also reflected in the number of common sharp decisions(0.4). This is a low score compared with the agreement ratioof prohibitions (0.9) and duties (0.8) that can be observed inTables 4f and 4h. It is su�cient to consider at this stage how itwas much easier to take decisions on prohibitions and duties,while permissions where a greater source of discussions anddisagreements with the system. Moreover, decisions aboutprohibitions and duties appeared to be sharper than theones about permissions, as both the agreement between theteams (T12) and the choices with strong confidence (T1+,T2+)received higher scores. However, on both types of policies thesource of disagreement is on the FOOD data journey. Weshowed that this is the case (8̃0% agreement). We complementthis data with a statistical analysis based on the Cohens kappacoe�cient (CKC), that takes into account the possibility of
K-CAP 2017, (Submitted, 2017), Austin, Texas USA E. Daga et al.
the user on taking a decision (Q.1, Q.9). This feedback showsthat policy propagation is a di�cult problem, although itcan be solved with the right knowledge models. Therefore, atool supporting this task has good value for users. The lastquestion (Q.10) was meant to understand whether the DataHub manager could actually decide on policy propagation. Itturns out that most of the users think he/she cannot solvethe issue alone, but he/she should involve the data ownerand the process executor in this task. This conclusion reflectssome of the issues raised during the study, that are discussedin Section 7.
6 ACCURACY ANALYSISIn this Section we show how the decisions made by the userscompare to the system. The decisions taken by the systemare summarized in Table 3. For example, the SCRAPE datajourney required to check 16 policies and the system decidedto propagate 15 of them: 4 of the 5 permissions and all theprohibitions and duties. Tables 4a-4h summarize the resultsof our study in a quantitative way. The values are shownin two sets including the full numbers and the computedratio, considering all the decisions (Tables 4a and 4b), andthen split in permissions (Tables 4c and 4d), prohibitions(Tables 4e and 4f), and duties (Tables 4g and 4h). The valuesare first shown for each one of the user study (data journeyof each team), aggregated for each data journey (averageof both teams) and then as totals considering the decisionsfrom all data journeys (at the bottom). The data journeysrequired from seven to twenty-two policies to be analysedfor a total of seventy-seven decisions. Table 4a shows thenumber of decisions for each data journey (column D) andhow much the teams agreed with the system (T
avg
being theaverage value of the teams on the same data journey).
The agreement with the system is good, distributed dif-ferently across the data journeys and the teams, with anaverage ratio of 0.8. Moreover, this result is supported bythe high agreement rate between the two teams (T
avg
= 0.7).We observe that in more than half of the cases the decisionswere made with the same degree of confidence (T12+=0.6),and that in 70% of the cases users made a sharp decisionabout whether a policy would propagate or not (T1+/T2+
total average is 0.6). Inspecting the table we see that thedata journeys showing a lower agreement are FOOD/T1,AV G/T2 and CLEAN/T2. We will discuss these in the nextSection. The low scores on CLOUD/T12+ and CLOUD/T2+
only show a di↵erence in the degree of confidence of thedecisions, that is not especially relevant in this global view,
Table 4: Agreement analysis.
D: total number of decisions; T1, T2: agreement between sys-
tem and each team; Tavg: average agreement between teams
and system; T12: agreement between teams; T12+ agreement
between teams (only Certainly Yes/Absolutely No answers);
T1+, T2+: amount of Certainly Yes/Absolutely No answers.
Tables on the left indicate totals, while the ones on the right
altough this aspect will be discussed when looking at specificclasses of policies.
Tables 4c and 4d only show results involving policies oftype permission. The average agreement between the systemand the users considering all the decisions is 0.6. Particularly,the SCRAPE data journey for T2 shows a low agreement(0.6), also reflected in the number of common sharp decisions(0.4). This is a low score compared with the agreement ratioof prohibitions (0.9) and duties (0.8) that can be observed inTables 4f and 4h. It is su�cient to consider at this stage how itwas much easier to take decisions on prohibitions and duties,while permissions where a greater source of discussions anddisagreements with the system. Moreover, decisions aboutprohibitions and duties appeared to be sharper than theones about permissions, as both the agreement between theteams (T12) and the choices with strong confidence (T1+,T2+)received higher scores. However, on both types of policies thesource of disagreement is on the FOOD data journey. Weshowed that this is the case (8̃0% agreement). We complementthis data with a statistical analysis based on the Cohens kappacoe�cient (CKC), that takes into account the possibility of
Duties
K-CAP 2017, (Submitted, 2017), Austin, Texas USA E. Daga et al.
the user on taking a decision (Q.1, Q.9). This feedback showsthat policy propagation is a di�cult problem, although itcan be solved with the right knowledge models. Therefore, atool supporting this task has good value for users. The lastquestion (Q.10) was meant to understand whether the DataHub manager could actually decide on policy propagation. Itturns out that most of the users think he/she cannot solvethe issue alone, but he/she should involve the data ownerand the process executor in this task. This conclusion reflectssome of the issues raised during the study, that are discussedin Section 7.
6 ACCURACY ANALYSISIn this Section we show how the decisions made by the userscompare to the system. The decisions taken by the systemare summarized in Table 3. For example, the SCRAPE datajourney required to check 16 policies and the system decidedto propagate 15 of them: 4 of the 5 permissions and all theprohibitions and duties. Tables 4a-4h summarize the resultsof our study in a quantitative way. The values are shownin two sets including the full numbers and the computedratio, considering all the decisions (Tables 4a and 4b), andthen split in permissions (Tables 4c and 4d), prohibitions(Tables 4e and 4f), and duties (Tables 4g and 4h). The valuesare first shown for each one of the user study (data journeyof each team), aggregated for each data journey (averageof both teams) and then as totals considering the decisionsfrom all data journeys (at the bottom). The data journeysrequired from seven to twenty-two policies to be analysedfor a total of seventy-seven decisions. Table 4a shows thenumber of decisions for each data journey (column D) andhow much the teams agreed with the system (T
avg
being theaverage value of the teams on the same data journey).
The agreement with the system is good, distributed dif-ferently across the data journeys and the teams, with anaverage ratio of 0.8. Moreover, this result is supported bythe high agreement rate between the two teams (T
avg
= 0.7).We observe that in more than half of the cases the decisionswere made with the same degree of confidence (T12+=0.6),and that in 70% of the cases users made a sharp decisionabout whether a policy would propagate or not (T1+/T2+
total average is 0.6). Inspecting the table we see that thedata journeys showing a lower agreement are FOOD/T1,AV G/T2 and CLEAN/T2. We will discuss these in the nextSection. The low scores on CLOUD/T12+ and CLOUD/T2+
only show a di↵erence in the degree of confidence of thedecisions, that is not especially relevant in this global view,
Table 4: Agreement analysis.
D: total number of decisions; T1, T2: agreement between sys-
tem and each team; Tavg: average agreement between teams
and system; T12: agreement between teams; T12+ agreement
between teams (only Certainly Yes/Absolutely No answers);
T1+, T2+: amount of Certainly Yes/Absolutely No answers.
Tables on the left indicate totals, while the ones on the right
altough this aspect will be discussed when looking at specificclasses of policies.
Tables 4c and 4d only show results involving policies oftype permission. The average agreement between the systemand the users considering all the decisions is 0.6. Particularly,the SCRAPE data journey for T2 shows a low agreement(0.6), also reflected in the number of common sharp decisions(0.4). This is a low score compared with the agreement ratioof prohibitions (0.9) and duties (0.8) that can be observed inTables 4f and 4h. It is su�cient to consider at this stage how itwas much easier to take decisions on prohibitions and duties,while permissions where a greater source of discussions anddisagreements with the system. Moreover, decisions aboutprohibitions and duties appeared to be sharper than theones about permissions, as both the agreement between theteams (T12) and the choices with strong confidence (T1+,T2+)received higher scores. However, on both types of policies thesource of disagreement is on the FOOD data journey. Weshowed that this is the case (8̃0% agreement). We complementthis data with a statistical analysis based on the Cohens kappacoe�cient (CKC), that takes into account the possibility of
K-CAP 2017, (Submitted, 2017), Austin, Texas USA E. Daga et al.
the user on taking a decision (Q.1, Q.9). This feedback showsthat policy propagation is a di�cult problem, although itcan be solved with the right knowledge models. Therefore, atool supporting this task has good value for users. The lastquestion (Q.10) was meant to understand whether the DataHub manager could actually decide on policy propagation. Itturns out that most of the users think he/she cannot solvethe issue alone, but he/she should involve the data ownerand the process executor in this task. This conclusion reflectssome of the issues raised during the study, that are discussedin Section 7.
6 ACCURACY ANALYSISIn this Section we show how the decisions made by the userscompare to the system. The decisions taken by the systemare summarized in Table 3. For example, the SCRAPE datajourney required to check 16 policies and the system decidedto propagate 15 of them: 4 of the 5 permissions and all theprohibitions and duties. Tables 4a-4h summarize the resultsof our study in a quantitative way. The values are shownin two sets including the full numbers and the computedratio, considering all the decisions (Tables 4a and 4b), andthen split in permissions (Tables 4c and 4d), prohibitions(Tables 4e and 4f), and duties (Tables 4g and 4h). The valuesare first shown for each one of the user study (data journeyof each team), aggregated for each data journey (averageof both teams) and then as totals considering the decisionsfrom all data journeys (at the bottom). The data journeysrequired from seven to twenty-two policies to be analysedfor a total of seventy-seven decisions. Table 4a shows thenumber of decisions for each data journey (column D) andhow much the teams agreed with the system (T
avg
being theaverage value of the teams on the same data journey).
The agreement with the system is good, distributed dif-ferently across the data journeys and the teams, with anaverage ratio of 0.8. Moreover, this result is supported bythe high agreement rate between the two teams (T
avg
= 0.7).We observe that in more than half of the cases the decisionswere made with the same degree of confidence (T12+=0.6),and that in 70% of the cases users made a sharp decisionabout whether a policy would propagate or not (T1+/T2+
total average is 0.6). Inspecting the table we see that thedata journeys showing a lower agreement are FOOD/T1,AV G/T2 and CLEAN/T2. We will discuss these in the nextSection. The low scores on CLOUD/T12+ and CLOUD/T2+
only show a di↵erence in the degree of confidence of thedecisions, that is not especially relevant in this global view,
Table 4: Agreement analysis.
D: total number of decisions; T1, T2: agreement between sys-
tem and each team; Tavg: average agreement between teams
and system; T12: agreement between teams; T12+ agreement
between teams (only Certainly Yes/Absolutely No answers);
T1+, T2+: amount of Certainly Yes/Absolutely No answers.
Tables on the left indicate totals, while the ones on the right
altough this aspect will be discussed when looking at specificclasses of policies.
Tables 4c and 4d only show results involving policies oftype permission. The average agreement between the systemand the users considering all the decisions is 0.6. Particularly,the SCRAPE data journey for T2 shows a low agreement(0.6), also reflected in the number of common sharp decisions(0.4). This is a low score compared with the agreement ratioof prohibitions (0.9) and duties (0.8) that can be observed inTables 4f and 4h. It is su�cient to consider at this stage how itwas much easier to take decisions on prohibitions and duties,while permissions where a greater source of discussions anddisagreements with the system. Moreover, decisions aboutprohibitions and duties appeared to be sharper than theones about permissions, as both the agreement between theteams (T12) and the choices with strong confidence (T1+,T2+)received higher scores. However, on both types of policies thesource of disagreement is on the FOOD data journey. Weshowed that this is the case (8̃0% agreement). We complementthis data with a statistical analysis based on the Cohens kappacoe�cient (CKC), that takes into account the possibility of
All policies
Feedback: @enridaga
Thematic analysis• Expected disagreements:
A. system should propagate a policy, it didn’tB. system should block a policy, it didn’tC. system cannot decide: information is not enough
• (C) never occurred: the system has enough information to make a decision!
Example FOOD journey: high disagreement between teams, due to the different interpretation of the output:(a) an enhanced version of the input (input included in the output)(b) an independent dataset (input not included in the output)
(More discussion in the paper)19
Thematic analysis• Incomplete knowledge: rules missing or wrong, data flows
inaccurate, …• Data reverse engineering: recurring theme (e.g. FOOD)• Content-dependent decisions: current experiments assumed
reasoning at design time, although it is also possible to reason on execution traces
• Dependant policies: permission:modify implies permission:use - but we haven’t considered that
• The Legal Knowledge: users commented on the lack of legal expertise. – a support tool is useful– a legal framework on top of the components is theoretically
possible, although unlikely in the short term …
20
Conclusions• The system is accurate as developers/ data managers can be
• variance between the teams similar to the ones between each team and system (Cohen’s kappa coefficient)
• The task is perceived as difficult, although not impossible, and the system is therefore of good value for users
• The assumption behind the system is correct: there is a fundamental correspondence between the possible data-to-data relations and the way policies are propagated• (and reuse does not necessarily imply derivation)
21Feedback: @enridaga
Future work• How to consider the rights of the stakeholders other then data
publishers (e.g. process designer?)• How to assess consistency of processes wrt policies• Policy and process knowledge must be accurate and reflect the
relevant data-to-data relations. How can we assure that?
Challenge: “They must do it together”• Need for shared knowledge about “data actions”, from jargon to
consensus (remodelling, refactoring, extraction, …)• Negotiation of policies between data providers, processors and