Top Banner
Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora Privacy-aware Regression Modeling of Participatory Sensing Data
32

Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Dec 17, 2015

Download

Documents

Hannah Jones
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han

Pallavi Arora

Privacy-aware Regression Modeling of Participatory

Sensing Data

Page 2: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

IntroductionProblem Formulation

Linear regressionPrivacy FilterApplication Server

Model ConstructionPrivacy AnalysisCase StudyDiscussionRelated WorkConclusion

Outline

Page 3: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Crowdsource aka Participatory SensingPredict Statistics or Extrapolate from collected

data approach in paper

Private data Public modelPrivate Data Samples

Population density + Eco-friendly behavior Pollution

Model (Public)

Predict Pollution elsewhere.

Introduction

Page 4: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Analyzes relationship between two variables, X and y

Error (Zero mean const variance)

Output Input Regression CoefficientsGiven X and y estimate β.Regression Model

Data (combination of X and y) Model (β)Given X and β predict y.

Linear Regression

Page 5: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Private PublicUsage of electricity + Time of year Energy

consumption (Model)Given usage pattern predict energy consumption.Help users save on energy cost.

How much gas a vehicle will spend on a given route? How much energy a household will save if they

installed motion-activated light controls?How much weight a 300lb person might lose if

engaged in a particular diet and exercise routine?

Example

Page 6: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Ensure anonymitySecurity mechanism users modify data,

PerturbationIrrecoverably alter data Approach in paper.

Sharing private data

Page 7: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Problem Formulation

Page 8: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Data (time series) output variables (e.g.,

household energy consumption)+ input variables (good predictors of output).

Data Neutral FeaturesReconstruction

Compute private data from features.Higher reconstruction error higher privacy.

Problem Formulation

Page 9: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

The model relating user inputs to the outputs is public.

Each data sample collected by an individual is private and may not be revealed.

The models used in the service are linear in coefficients.

The time-series data can be packed into uncorrelated data samples by aggregation (over time for example).

Assumptions

Page 10: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Minimize the modeling errorAccuracy = No Alteration Accuracy.Perfect modeling

Maximize the reconstruction (breach) errorPerfect Neutrality

Information with shared data = information w/o shared data

Design Goals

Page 11: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Data SegmentationAggregation over time to remove correlation

Sum/average.Length of time interval a day? a month?

Large enough to remove correlation.Result in accurate prediction.Usable by participatory sensing application.Depends on application.

Privacy Filter

Page 12: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Segmentation n data points with d input values.Time independent data.yi to denote the value of the output attribute in the

ith segmentxij to denote the value of jth input of segment iEstimate yi using

Does not prevent privacy appliance usage + temperature inside a house

each month show whether a residence is occupied or not in a particular month.

Segmentation

Page 13: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Input variableOutput variablePredictor variable and

denote

Model of system

Neutral Features

Page 14: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Neutral Features correlations of data

Size of data independent of number of samples n.

Large n larger privacy.

Neutral Features

Constant O(n2)

Vector of length k O(kn2)

Matrix of size k*k O(k2n2)

Page 15: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Construct regression modelLeast Square Estimator (LSE)

Let u1, . . . ,um be the m users of the participatory sensing application and provide

Let

The Application Server

Page 16: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Define

The Application Server

Page 17: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Model coefficients

Only uses the neutral features….YEAHExact model construction.

Regression Error

Error using neutral features

The Application Server

Page 18: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Reconstruction Error

Reconstruction Error of mean values

Effective reconstruction If reconstruction err < 1

Privacy Enabling TransformationsIf reconstruction err > 1

Privacy Analysis

Segmented data

Reconstructed data

Variance of reconstructed data

Page 19: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Optimal Reconstruction find the values Yu and Wu that produce the

given transformed matrices ρu, νu, Θu while maximizing the joint probability of observing such values.

Probability of observing values (known to attacker)

Privacy Enabling Properties

Page 20: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Constraints and data points If data points < constraints 100%

reconstruction 0% privacyIf n infinity, Optimal solution difficult to construct private data. Constraints ≠ Affine non- convex optimization NP hard Exponential time in number of variables.

Inaccuracy and Inefficiency of Reconstruction

Page 21: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Assumption Maximum likelihood is obtained if solution is close to the expected value also n is known.

KNITRO non-linear solver.

Conditions to Protect Privacy

Page 22: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Best value of n?? Number of constraints = number of variables

Simulationn

> k

hig

h re

con

structio

n e

rror

n <

k s

ing

le f

easi

ble

solu

tion

Page 23: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Vertical correlation correlation among different attributes

Horizontal correlationcorrelation within a single attribute

Correlation

Page 24: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Conjecture: If n > 2k error 1.

Page 25: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Predict fuel efficient routeCompare

White noise Perturbation techniqueProposed method

Case Study

Page 26: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

ClientC++Data trace file

Location trace from GPSConfiguration file

Unique application IDSegmentation intervalSegmentation attributes(e.g.

time) Euclidean distance between

valuesPredictor function map X W. Feature Matrices

Transferred as XML to server

Case Study• Server• C++• List of models with unique application ID• Create aggregation matrices

Page 27: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Data 16 users (different cars), different cars, 3 monthsGeo-tagged engine sensor measurement650 segments each ~ 2miles. Input

w1 = m(ST +v TL) m and v Mass and Velocity of vehicle ST Number of stop signs TL Number of traffic lights

w2 = m v2

w3 = mw4 = Av2 A frontal area of car

Output Fuel consumption

Case Study

Page 28: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Reconstruction error

Case Study

Page 29: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Dependence on number of samples

High error for n > 2k

Case Study

Page 30: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Case Study

Page 31: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

RandomizationPerturbationDifferential PrivacyError in modeling

k-anonymityLoss of useful information

Distributed privacy preservationHorizontal or vertical partition aggregate featuresFine grained control to user to prevent his privacy.

Cryptographic techniquesHomographic encryptionComputationally expensive Limited scope

Related work

Page 32: Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.

Regression model same as from private data.Derive a safe number of samples.Study privacy.Neutral features high Reconstruction error .Quantification of privacy does not capture all

privacy breachesDistribution of original data is narrowHigher correlation easy reconstruction.

Can not guarantee privacy in theory.

Conclusion