Privacy Framework for RDF Data Mining Master’s Thesis Project Proposal By: Yotam Aron.
Post on 04-Jan-2016
215 Views
Preview:
Transcript
Privacy Framework for RDF Data Mining
Master’s Thesis Project ProposalBy: Yotam Aron
OverviewMotivation and GoalBackgroundProposed Solution and DesignExampleConclusion
MotivationData mining continues to become
more widespread.◦Useful for research, public policy,
etc.Want to maintain privacy of
participants in the database.Little work has been done for
privacy for semantic web data.
Previous WorkAnonymizationK-Anonimity1
Differential Privacy systems: PINQ2, AIRAVAT3.
Drawbacks:◦Do not apply to semantic web data.◦Do not support SPARQL.
GoalDevelop a system to protect
dataset participants’ personal data in SPARQL.
Integrates well with existing SPARQL endpoints.
Relatively easy for the user and the administrator to use.
BackgroundRule-based Privacy Policies in AIRDifferential Privacy
Rule-based Privacy Policies in AIR4
Rules define patterns in a SPARQL query.
If pattern is matched, rule infers compliance or non-compliance of incoming SPARQL query.
AIR Example5
air:if {:W s:TriplePattern :T . :T log:includes { :X type:F :V }.
}; air:then [ air:description (“type:F was selected in " q:QUERY) ; air:assert { q:QUERY air:non-compliant-with q:Policy4 . } ] .
SELECT ?s WHERE {?s type:F ?p}
AIR Policy (extract)
Query
AIR will show that the query is non-compliant with Policy4.
Differential Privacy OverviewMinimize probability of privacy
breach.Maximize statistical accuracy.Definition requires that given two
similar datasets, a function query on those two datasets give similar results with high probability.
Makes no assumptions on the underlying dataset.
Differential PrivacyDefinition: We say a randomized
computation M provides ɛ-differential privacy if for any two data sets A and B, and any set of possible outputs S ⊆ Range(M),
Pr[M(A) ∈ S] ≤ Pr[M(B) ∈ S] × exp( ɛ × |A ⊕ B|).
Differential Privacy in PracticeEach user is given an ɛ value that
cannot be exceeded.Each query qi has some noise value ɛi . In
total, the user’s queries must satisfy the property
Noise (usually Laplace), which depends on the aggregate function, is added with variance
Limitations of Differential PrivacyOnly statistical data protected.High variance in data yields poor
query results.Theory not always perfect in
practice.◦Assume no collusion among users.◦Covert channel attacks.6
◦What value of ɛ to choose?
Example, No DPName Salary
Alice 31,000
Bob 47,000
Charlie 20,000
David 21,000
SELECT COUNT(Name) WHERE (Age < 25)
2
Example, No DPName Salary
Alice 31,000
Bob 47,000
Charlie 20,000
SELECT COUNT(Name) WHERE (Age < 25)
1 Big difference in answers!!
Example, With DPName Salary
Alice 31,000
Bob 47,000
Charlie 20,000
David 21,000
SELECT COUNT(Name) WHERE (Age < 25)
2 + noise = ~2 (with high probability)
Example, With DPName Salary
Alice 31,000
Bob 47,000
Charlie 20,000
SELECT COUNT(Name) WHERE (Age < 25)
1+ noise = ~2 (with high probability)
With high probability, records are indistinguishable!
Practical Consequences of DPAn individual’s inclusion in the
dataset is not likely a privacy risk.
The answers to the queries can still be useful.
Achieving Differential Privacy in RDFCurrent techniques for
differential privacy are developed for relational databases.
As a first approximation, reduce triple-store to a relational database.
Improved mechanism as project progresses.
Example of RDF-RDBS Reduction:Person1 foaf:name “Alice”;
foaf:member :DIGfoaf:age “21”foaf:knows :Person2 :Person3.
:Person2 foaf:name “Bob”;foaf:member :DIG;foaf:knows :Person3.
:Person3 foaf:name “Charlie”;foaf:age “22”.
ID Foaf:name
Foaf:member
Foaf:knows
Foaf:age
Person1 “Alice” DIG [Person2,Person3
“21”
Person2 “Bob” DIG [Person3] None
Person3 “Charlie” None None “22”
Proposed SolutionSPARQL Privacy Insurance
Module (SPIM)Build layer between user and
endpoint.Integrate both AIR and
differential privacy.Integrate credential-checking
system.Modify existing differential
privacy framework for use with triple-stores.
ContributionsComplete privacy protection for
triplestores.Differential Privacy sensitivity for
SPARQL 1.1 aggregate functions including count, sum, avg, sum, min, and max.
System Overview
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• TAAC Will:• Verify user has
permission to access
• Send central module data about user
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• SPIM:• Controls order of
privacy operations.
• Interfaces with the SPARQL endpoint.
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• AIR:• Reasoner that
uses rule-based policies to check queries for privacy hazards.
• Extracts information for differential privacy.
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• Policy Files:• Contain the
rules for AIR.
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• Differential Privacy Module:• Checks to see
for query limits (based off ɛ use.
• Applies noise to statistical data.
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• User Data:• Contains user ɛ
data.
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• SPIM:• Controls order of
privacy operations.
• Interfaces with the SPARQL endpoint.
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• Service Description:• Contains
information to be used for the addition of noise.
• Miscellaneous:• Interface to SPARQL
Endpoint• Transaction File• Improved Differential
Privacy Output• Service Description
Generator
• Potential Extensions:• Robustness against
attacks• Concurrency• Optimization for large
systems• Customizable UI• Accountability
Sample ScenarioTriplestore datamining in
biotechnological applications.Biofirm provides data about
hospitals in the US.Alice is a PhD student at MIT.Alice would like to query Biofirm’s
database for research purposes. She just got permissions yesterday and is logging in for the first time.
PreprocessingBiofirm installs SPIM, and runs
the service description generation code.◦May need to create the correct
interface.Makes sure the UI is accessible
online.
Sample Compliant QueryAlice would like to know the total
number of visits that Boston hospitals received.
SELECT (SUM(?s) as ?people) WHERE{?h a biofirm:Hospital.?h biofirm:visits ?s.?h biofirm:location geo:Boston.
}
Epsilon value: 1.0
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• Alice enters query into the provided user interface.
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• TAAC insures that biofirm has given Alice access to its triple-store.
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• Query request arrives at SPIM central module.
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• Policyrunner is called upon to check query for triple patterns that are in violation.
• No violations found. • Since this is Alice’s
first time, AIR extracts what type of permissions Alice has.
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• SPIM creates a profile for Alice. • Gives her an ɛ
value (suppose it 2.0).
• Stores it in triple store.
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• SPIM extracts which variables will yield statistical results and will have differential privacy applied.
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• Differential Privacy module assures that query’s results will not exceed given epsilon value.
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• This is Alice’s first time, and her epsilon value is 2.0 and the epsilon for this query is 1.0. Everything looks good.
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• Query is sent to the endpoint.
• Results are received.
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• Differential privacy module adds noise to appropriate fields, and updates epsilon values.
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• SPIM is ready to return the results.
SPIM Privacy Module
TAAC Credential Checking
AIR Rule Based Privacy
Differential Privacy Module
SPARQL Endpoint
User Interface
Policy Files
User Data
Service Descriptio
n
• Alice receives results.
SummarySystem will combine rule-based
privacy with differential privacy.Develop differential privacy
techniques for semantic web data.
Make privacy module client and administrator friendly.
References K-Anonimity: http://spdp.dti.unimi.it/papers/k-Anonymity.pdf PINQ: http://
research.microsoft.com/pubs/80218/sigmod115-mcsherry.pdf
AIRAVAT: http://www.cs.utexas.edu/~shmat/shmat_nsdi10.pdf
AIR: http://dig.csail.mit.edu/TAMI/2008/12/AIR/ AIR Policy Example: http://
dig.csail.mit.edu/2009/IARPA-PIR/usecase1/generic-policies.n3
Differential Privacy Under Fire: http://www.usenix.org/events/sec11/tech/full_papers/Haeberlen.pdf
top related