Top Banner
1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEE Transactions on Industrial Informatics 1 Abstract—In the research of location privacy protection, the existing methods are mostly based on the traditional anonymization, fuzzy and cryptography technology, and little success in the big data environment, for example, the sensor networks contain sensitive information, which is compulsory to be appropriately protected. Current trends such as "Industrie 4.0" and Internet of Things (IoT), generate, process, and exchange vast amounts of security-critical and privacy-sensitive data, which makes them attractive targets of attacks. However, previous methods overlooked the privacy protection issue, leading to privacy violation. In this paper, we propose a location privacy protection method that satisfying differential privacy constraint to protect location data privacy and maximize the utility of data and algorithm in Industrial Internet of Things. In view of the high value and low density of location data, we combine the utility with the privacy and build a multilevel location information tree model. Furthermore, the index mechanism of differential privacy is used to select data according to the tree node accessing frequency. Finally, the Laplace scheme is used to add noises to accessing frequency of the selecting data. As is shown in the theoretical analysis and the experimental results, the proposed strategy can achieve significant improvements in terms of security, privacy, and applicability. Index Terms—Differential privacy, location privacy protection, location privacy tree, Internet of Things. I. INTRODUCTION HE development of information technology has accumulated a great deal of data for today's digital systems. Big data is a very important research and development resource, This work was funded by the National Natural Science Foundation of China (61772282, 61373134, 61772454, and 61402234). It was also supported by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD), Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX17_0901) and Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology (CICAEET). (Corresponding author: Jin Wang.) Chunyong Yin is with the School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, China (e-mail: [email protected]). Jinwen Xi is with the School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, China (e-mail: [email protected]). Ruxia Sun is with the School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, China (e-mail: [email protected]). Jin Wang is with the College of Information Engineering, Yangzhou University, Yangzhou 225127, China (e-mail: [email protected]). and the demand for data publishing, sharing and analysis are also growing rapidly. People pay more and more attention to the security of data protection, and the government, enterprises, and individuals are also improving their understanding of privacy protection. Body sensor networks (BSNs), as a special application of wireless sensor networks (WSNs) [1], are deployed on the surface of bodies for periodically monitoring physical conditions. For a typical social participatory sensing application, it is important to motivate participatory, at the same time, the participatory sensing process should not disclose the privacy information of any participating party (the private data) or the community (patterns, distribution, etc.). Therefore, it is essential to develop a privacy protection data strategy for the protection of the users and the community [2]. Devices in the Internet of Things (IoT) generate, process, and exchange vast amounts of security and safety-critical data as well as privacy-sensitive information, and hence are appealing targets of various attacks. To ensure the correct and safe operation of IoT systems, it is crucial to assure the integrity of the underlying devices, especially their code and data privacy, against malicious modification [3]. The privacy threats of Industrial Internet of Things (IIoT) [4] can be simply divided into two categories: privacy threats based on data and privacy threats on location. Data privacy issues mainly refer to the leakage of secret information in the process of data acquisition and transmission in Internet of Things. Location privacy is an important part of privacy protection of the Internet of Things. It mainly refers to the location privacy of each node in the Internet of Things and the location privacy of the Internet of Things in providing various location services, especially including the RFID reader location privacy RFID user location privacy, sensor Node location privacy, and location-based privacy issues based on location services [3], [5]. Usually, data collected, aggregated and transmitted in sensor networks contain personal and sensitive information, which directly or indirectly reveals the condition of a person. If the data cannot be properly preserved, once exposed to the public, the privacy will be destroyed. Therefore, protecting the privacy of sensitive data is greatly important [6], [7]. Location data implies moving objects, spatial coordinates, current time, and unique features different from other data, which is discrete and of high value. Before the concept of big data, most of the privacy protection methods focus on a small Chunyong Yin, Jinwen Xi, Ruxia Sun, Jin Wang, Member, IEEE Location Privacy Protection based on Differential Privacy Strategy for Big Data in Industrial Internet-of-Things T
8

Location Privacy Protection based on Differential Privacy ...people.cs.pitt.edu/~mosse/courses/cs3720/DiffPrivacy-for-IOT.pdfDifferential privacy model is a strong privacy concept

Mar 14, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Location Privacy Protection based on Differential Privacy ...people.cs.pitt.edu/~mosse/courses/cs3720/DiffPrivacy-for-IOT.pdfDifferential privacy model is a strong privacy concept

1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEETransactions on Industrial Informatics

1

Abstract—In the research of location privacy protection, the

existing methods are mostly based on the traditional anonymization, fuzzy and cryptography technology, and little success in the big data environment, for example, the sensor networks contain sensitive information, which is compulsory to be appropriately protected. Current trends such as "Industrie 4.0" and Internet of Things (IoT), generate, process, and exchange vast amounts of security-critical and privacy-sensitive data, which makes them attractive targets of attacks. However, previous methods overlooked the privacy protection issue, leading to privacy violation. In this paper, we propose a location privacy protection method that satisfying differential privacy constraint to protect location data privacy and maximize the utility of data and algorithm in Industrial Internet of Things. In view of the high value and low density of location data, we combine the utility with the privacy and build a multilevel location information tree model. Furthermore, the index mechanism of differential privacy is used to select data according to the tree node accessing frequency. Finally, the Laplace scheme is used to add noises to accessing frequency of the selecting data. As is shown in the theoretical analysis and the experimental results, the proposed strategy can achieve significant improvements in terms of security, privacy, and applicability.

Index Terms—Differential privacy, location privacy protection, location privacy tree, Internet of Things.

I. INTRODUCTION

HE development of information technology has accumulated a great deal of data for today's digital systems.

Big data is a very important research and development resource,

This work was funded by the National Natural Science Foundation of China

(61772282, 61373134, 61772454, and 61402234). It was also supported by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD), Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX17_0901) and Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology (CICAEET). (Corresponding author: Jin Wang.)

Chunyong Yin is with the School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, China (e-mail: [email protected]).

Jinwen Xi is with the School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, China (e-mail: [email protected]).

Ruxia Sun is with the School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, China (e-mail: [email protected]).

Jin Wang is with the College of Information Engineering, Yangzhou University, Yangzhou 225127, China (e-mail: [email protected]).

and the demand for data publishing, sharing and analysis are also growing rapidly. People pay more and more attention to the security of data protection, and the government, enterprises, and individuals are also improving their understanding of privacy protection.

Body sensor networks (BSNs), as a special application of wireless sensor networks (WSNs) [1], are deployed on the surface of bodies for periodically monitoring physical conditions. For a typical social participatory sensing application, it is important to motivate participatory, at the same time, the participatory sensing process should not disclose the privacy information of any participating party (the private data) or the community (patterns, distribution, etc.). Therefore, it is essential to develop a privacy protection data strategy for the protection of the users and the community [2].

Devices in the Internet of Things (IoT) generate, process, and exchange vast amounts of security and safety-critical data as well as privacy-sensitive information, and hence are appealing targets of various attacks. To ensure the correct and safe operation of IoT systems, it is crucial to assure the integrity of the underlying devices, especially their code and data privacy, against malicious modification [3].

The privacy threats of Industrial Internet of Things (IIoT) [4] can be simply divided into two categories: privacy threats based on data and privacy threats on location. Data privacy issues mainly refer to the leakage of secret information in the process of data acquisition and transmission in Internet of Things.

Location privacy is an important part of privacy protection of the Internet of Things. It mainly refers to the location privacy of each node in the Internet of Things and the location privacy of the Internet of Things in providing various location services, especially including the RFID reader location privacy RFID user location privacy, sensor Node location privacy, and location-based privacy issues based on location services [3], [5].

Usually, data collected, aggregated and transmitted in sensor networks contain personal and sensitive information, which directly or indirectly reveals the condition of a person. If the data cannot be properly preserved, once exposed to the public, the privacy will be destroyed. Therefore, protecting the privacy of sensitive data is greatly important [6], [7].

Location data implies moving objects, spatial coordinates, current time, and unique features different from other data, which is discrete and of high value. Before the concept of big data, most of the privacy protection methods focus on a small

Chunyong Yin, Jinwen Xi, Ruxia Sun, Jin Wang, Member, IEEE

Location Privacy Protection based on Differential Privacy Strategy for Big Data in

Industrial Internet-of-Things

T

Page 2: Location Privacy Protection based on Differential Privacy ...people.cs.pitt.edu/~mosse/courses/cs3720/DiffPrivacy-for-IOT.pdfDifferential privacy model is a strong privacy concept

1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEETransactions on Industrial Informatics

2

number of non-positional data. There are some limitations for the protection of location data privacy in big data. There are two main reasons as follows: (1) Multiple data fusion by big data makes traditional anonymity [8], [9], [10] and fuzzification technology difficult to take effect in location privacy protection [11]; (2) Traditional cryptography technology takes little effect on the real-time analysis required by big data.

In summary, location privacy protection faces great challenges for big data in IIoT. Therefore, we propose a more rigorous LPT-DP-k location privacy protection method based on differential privacy strategy for big data in sensor networks.

Our specific contributions mainly include: (I) We introduce a tree structure to represent the location data

in sensor networks, which called LPT (Location privacy tree), according to the characteristics and retrieval difficulty of location data.

(II) The differential privacy strategy is suitable for location privacy protection because it is insensitive to the background knowledge and the DP-k (Differential privacy-k) model has better protection effect. Laplace and index mechanism are the main implementation mechanisms of differential privacy, which can show the degree of privacy protection by allocating the privacy budget and is relatively reliable and rigorous.

(III) We conduct extensive experiments on real-world datasets which show that the proposed location privacy protection method can protect users’ location privacy without significantly affecting the privacy, availability, security, and effectiveness.

Section 2 presents necessary background and related work. Section 3 introduces the preliminary knowledge. Section 4 details the proposed location privacy protection method. The real-world datasets and experimental results are presented in Section 5 and conclusions in Section 6.

II. RELATED WORK

In the process of data mining and data publishing, the privacy protection of location data mainly involves two aspects: privacy model and utility.

A. Privacy Model

In recent years, with the widespread application of location services and data mining and publishing, location privacy protection model in location service can be divided into two categories: one is the traditional anonymous model based on grouping; the other is the differential privacy model that ignores the attacker’s background knowledge.

(1) Anonymous model based on grouping The traditional anonymous model based on grouping plays

an important role in location privacy protection. Samarati et al. [12] firstly proposed k-anonymity method and other a large number of methods based on k-anonymity [13], [14], [15]. Previous research [15], [16] shows that only using anonymous methods does not provide good protection to a wide range of data. According to [17], the encryption privacy protection method is proposed, which can completely protect the privacy of data and prevent the leakage of data in the process of location

service, but the availability of data is insufficient. The development of traditional location privacy protection technology has gone through three stages: the "informed and consent" method proposed by document [18] has been developed to deal with anonymous processing in a single location, and then to deal with anonymous processing of user's trajectory data. Heuristic privacy measurement, probability deduction, and private information retrieval based technologies are common methods to protect location privacy. The heuristic privacy measurement method mainly protects the users who are not in the strict privacy protection environment, such as k-anonymity, t-closeness [19], m-invariance [20], and l-diversity. However, these three kinds of methods are proposed as a unified attacking model, which protect the location data under the premise of accumulating certain background knowledge. As attackers grasp more background knowledge, these methods cannot effectively protect the user’s location data privacy and the lack of privacy protection about the type of relational privacy protection method is proposed in [21].

(2) Differential privacy model Differential privacy model is a strong privacy concept which

is completely independent of attacker’s background knowledge and computing ability, and has become a research hotspot in recent years. Compared to the traditional privacy protection model, differential privacy has its unique advantages. Firstly, the model assumes that the attacker has the maximum background knowledge. Secondly, differential privacy has a solid mathematical foundation, a strict definition of privacy protection and a reliable quantitative evaluation method. In recent years, differential privacy model has been widely applied in privacy protection. Especially, the FM (Functional mechanism) is introduced in [22], in which an objective function of -differential privacy disturbance optimization problem is used to protect privacy. The Diff-FPM (Privacy-Preserving Mining) algorithm is proposed in [23], [24], [25] and combined with MCMC (Markov Chain Monte Carlo), which provides privacy protection and maintains the high data availability while satisfying ( )-differential privacy conditions. The PrivBasis and SmartTrunc method proposed in [26] adopt differential privacy model in the mining process of frequent itemsets, ensuring the privacy and utility of data analysis and anonymity. The DiffP-C4.5 and DiffGen algorithm introduced in [27] combines differential privacy with decision trees and other data structures to maintain a balance between data privacy and availability.

In a word, differential privacy is an effective privacy protection mechanism, which protects the user’s location privacy while keeping enough useful information for data analysis.

B. Utility Maximization

In the process of location service, data mining and data publishing need to protect location privacy and provide enough information for data analysis. Therefore, data utility is the core issue that needs to be paid attention to. Reference [28] uses compressive sensing theory to propose a perception mechanism

Page 3: Location Privacy Protection based on Differential Privacy ...people.cs.pitt.edu/~mosse/courses/cs3720/DiffPrivacy-for-IOT.pdfDifferential privacy model is a strong privacy concept

1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEETransactions on Industrial Informatics

3

to solve the published issue of statistical results, which can effectively solve the insufficiency problem of data utility, but it destroys the link among data. DP-topk method is proposed in [29], which has a more rigorous definition of effectiveness. But this method ignored the relation among transaction data, and because of its poor efficiency in data processing individually, the availability of algorithm is not high. To maximize the utility of the results and meet the requirement of location privacy protection, the paper proposes LPT-DP-k algorithm, which is a more rigorous differential privacy protection method, not only can guarantee the high availability of data, but also can enhance the usability of the algorithm.

III. PRELIMINARY KNOWLEDGE

Differential privacy is a common privacy protection framework that supported by the solid mathematical theory, which can provide privacy protection for data in the case that the attacker grasps the largest background knowledge. Definition 1 (Differential privacy)

Suppose there is a random algorithm M , MP is all possible

output sets for M , for any given adjacent data set D and D (there is at most one different record between them),

1D D , MS is any subset of MP . If the algorithm M

satisfies the following inequality, the algorithm M will satisfy -differential privacy protection.

exp .r M r MP M D S P M D S (1)

in which .rP represents the randomness of the algorithm M

on the data set D and D . Definition 2 (Sensitivity)

The differential privacy protection method defines two kinds of sensitivity, called global sensitivity and local sensitivity. Suppose there is a query function : df D R , the input is a

data set and the output is a d -dimensional real vector. For any adjacent data set:

1,

= max .D D

f f D f D

(2)

called the global sensitivity of function f , and f represents

the maximum change value of the output results.

f D f D is the 1-order norm distance between f D

and f D .

Definition 3 (Implementation mechanism) Laplace mechanism and Exponential mechanism [30] are

two of the most basic implementation mechanisms of differential privacy protection. In this paper, the Laplace mechanism is used to add the noise that obeys the Laplace distribution to realize the differential privacy. Assuming the privacy protection algorithm f based on the Laplace

mechanism, the noise keeps to the Laplace distribution with the

variance f

is and the mean is 0. Then the probability density

function is:

r

1P , = exp .

2

xx

(3)

in which x represents the specific variable and =f

.

If the algorithm f is proportional to the probability of

,exp

2

D r

to select from O and export r , then the

algorithm f provides -differential privacy protection, the

concrete formula is as follows:

r

,, = : P exp .

2

D rA D r r O

(4)

in which is the global sensitivity of the scoring function

,D r , (4) shows that the higher the score is, the larger the

probability of the selected output will be.

IV. LPT-DP-K LOCATION PRIVACY PROTECTION

This section introduces the overview of LPT-DP-k algorithm and the specific implementation details of the algorithm.

A. LPT-DP-K Algorithm

Firstly, we choose the location privacy tree construction to maintain the relation among the location data. Secondly, we select the sensitive location information which is most likely to disclose privacy to add noise. The algorithm summarizes as follows: Algorithm 1 LPT-DP-k algorithm

1: Initialize: data set *D , location data set D and *D D , differential

privacy parameter , parameter k , the number of nodes that are accessed by

the data is count and empty sets A , B . 2: Construct a location privacy tree (LPT). 3: The node data will be divided into two categories, and the frequent patterns

of accessing frequency not less than count are recorded in the set, see section 3.2.

4: Calculate the output probability, the formula is

1

.

.

ir i n

jj

a wP a

a w

, see

section 3.3.

5: Add noise and combine with *D

DLPT to construct the noisy location

privacy tree *D

DLPT k , see section 3.4.

6: Finally, publish the *D

DLPT k after step 4.

B. Construction of Location Information Tree Corresponding to Location Data

Assuming the location privacy tree *D

DLPT is corresponding

to the data set D , which records each person’s visiting records within a month. According to the location information, LPT-DP-k algorithm uses the tree structure with better correlation. The corresponding Table I is shown as follows:

TABLE I LOCATION CORRESPONDENCE

Number Location information Accessing count

Page 4: Location Privacy Protection based on Differential Privacy ...people.cs.pitt.edu/~mosse/courses/cs3720/DiffPrivacy-for-IOT.pdfDifferential privacy model is a strong privacy concept

1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEETransactions on Industrial Informatics

4

1 Computer building 28

2 Remote Sensing

Institute 13

3 All-in-one card

management center 2

4 Mingde Building 6 5 Training Gym 5 6 Dormitory 3 30 …… …… ……

As is shown in Table I, the user’s location information and accessing frequency of the places within a month is listed, we can get the location correspondence table.

TABLE II LOCATION DATA INFORMATION

Number Location label Accessing count 1-28 1 28 29-41 2 13 42-43 3 2 44-49 4 6 50-54 5 5 55-84 6 30 85-97 1,2 13 …… …… ……

According to the content in Table II, we can construct the location privacy tree like Fig. 1.

0

1#28

2#13

...

...1,3#2

1,2#13

1,2,3#2

1,2,4#6

...

1,2,3,6#2

1,2,3,5#2

...1,2,3,4

#2 Fig. 1. Location privacy tree

As shown in Fig. 1, the total number of nodes in *D

DLPT is *

*

*

1

1

2 1D

D

Di

C

.

C. Weighted Selection based on Exponential Mechanism

We use the exponential mechanism in the differential privacy protection model to do weighted selection. The formula is as follows:

1

..

.

ir i n

jj

a wP a

a w

(5)

r iP a represents the probability of being selected and .ia w

represents the weight of the pattern ia . The selection algorithm

based on the exponential mechanism is as follows: Algorithm 2 Selection algorithm

1: Initialize: records in the frequent pattern set A .

2: Input a set A of frequent pattern records and score each record ia .

3: Calculate the weight .ia w of each pattern record.

4: Take the weight .ia w of each pattern in the set A into the formula

1

.

.

ir i n

jj

a wP a

a w

, select k frequent pattern records ia and record the

set as B .

the scoring function is:

, .i iM A a Q a (6)

iQ a represents the accessing frequency of pattern ia .

Calculate the weight .ia w of each pattern record.

1 ,

. =exp .2

ii

M A aa w

M

(7)

This paper sets up the scoring function , i iM A a Q a ,

iQ a represents the accessing frequency of pattern ia .

, iM A a represents the scoring value of ia . The formula to

calculate M is as follows:

1,

max .i ji j N

M Q a Q a

(8)

M represents the maximum value of the difference in the accessing frequency among N data record modes.

D. Noise Enhancement based on Laplace Mechanism

This paper proposes the method of node data processing to improve the efficiency of data processing and the effectiveness of the algorithm. The concrete steps are as follows: Algorithm 3 Laplace noising algorithm

1: Initialize: the privacy budget , k frequent pattern set B after weighted

selection, query function f .

2: Add noise into the frequent pattern set B to keep to the Laplace distribution

and destabilize the real accessing frequency of k records. The added noise keeps to Laplace distribution that the probability density function is

r

1P , = exp

2

xx

.

3: According to (9), the noise set C can be calculated. Finally, the noisy

location privacy tree *D

DLPT k is generated by combining the original data

tree *D

DLPT with the set C .

Therefore, it should be satisfied in every round of noise adding process:

.f

f b f b lap

(9)

B is the set of k frequent pattern b . f b is the query

function on the record b and f

lap

represents the noise

that keeps to Laplace distribution, as is shown in Fig. 2.

Page 5: Location Privacy Protection based on Differential Privacy ...people.cs.pitt.edu/~mosse/courses/cs3720/DiffPrivacy-for-IOT.pdfDifferential privacy model is a strong privacy concept

1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEETransactions on Industrial Informatics

Fig

be

LP

efth

ap

ap

A.

(GamIII

Us

Mpy

g. 2. Relationship

True positivee used to mea

*DDPT k . Fal

ffectiveness ofe effectivenesTrue Positive

ppear in kf D

False Positiv

ppear not in kf

Accuracy:

False Rejectio

F

Error:

Experiment

Data sThis paper u

Gowalla total-mount of data I shows the ex

ser id Check-0 2010-10-190 2010-10-180 2010-10-170 2010-10-12

OperaThe experi

MATLAB(R20ython). Hardw

p between privacy

e (TP), false Poasure the utili

lse rejection r

f the algorithmss of the algorie (TP): the n

D and in kf D

kTP f

ve (FP): the n

kf D , but in

kFP f D

Accuracy

on Rate (FRR

kf DFRR

LPError

V. EXPERIM

Setting

set uses the chec-check-in dataand records th

xperimental daTA

EXPERIME

-in time LT23:55:27Z 30.2T22:17:43Z 30.2T23:42:03Z 30.2T19:44:40Z 30.2

ating environmimental envi13b) and PyCware environ

y budget and nois

ositive (FP) anty of noisy lo

rate (FRR) is

m. The smallerithm will be. number of fre

D :

k kD f D

number of fre

kf D :

k kf D f

k k

k

f D f

f D

R):

kf D f

k

* *

*

2

D DD D

DD

PT LPT

LPT

MENTAL ANALY

ck-in data set a set). This dhe informationata set. ABLE III ENTAL DATA SET

Latitude Lon23590912 -97.7926910295 -97.7425573099 -97.7626910295 -97.74

ment and paramironment in

Charm (Part ofnment is: 2

se size

nd accuracy [3ocation privac

used to analy

r FRR is, the

equent pattern

.

equent pattern

.kf D

.

D

.

kf D

2 .k

YSIS

from the Godata set has an of the users.

ngitude Locat9513958 2284939537 4206338577 3164939537 420

meter configurn this papef the program.60GHz Cor

31] can cy tree

yze the

higher

ns that

(10)

ns that

(11)

(12)

(13)

(14)

owalla a large Table

tion id847

0315 6637 0315

ration er is

m using e(TM)

i7-6para

Lo

B.

T(1) Timpriv

TIM

patten163264122551

102204409

Arecois co

k

Aextrincrprottime

C.

Twithreprsens3, 4

6700HQ CPUameters are sh

Parameter

k

Location id ocation accessing

frequency

Analysis of Al

The timelinessTimeliness

meliness of wvacy tree. (3) T

ME OF BUILDING

ern n

time.build (s

6 0.00000152 0.00000164 0.00000298 0.00000576 0.00001162 0.0000149

24 0.000026248 0.000037096 0.0000443

As shown in Tonstructing loconstantly chan

EXTRACTI

0.01

k 3432

As is shown raction efficierease of . Atection decreae increases.

Analysis of D

This paper exth 500, 1000, resent the insesitive location

4, and 5:

U, 20.00GB, Whown in Table

TABPARAMETE

20 40 60 80

0.005 0.01 0.0.7

lgorithm Time

analysis is mof constructin

weighted selecThe efficiency

TABAND ADDING NO

INFORMA

) time.update (s)

0.000103010 0.000103110 0.000103324 0.000103010 0.000104032 0.000103000 0.000105032 0.000106340 0.000108970

Table V, the cation privacynging.

TABION EFFICIENCY O

0.03 0.05

3691 4254

in Table VIency of locaAs the inc

ases and the n

Data Utility

tracts data froand 2000 da

ensitive locations. The experim

(

Win10 systemIV during theLE IV ER SETTING

value 100 150 200 300

10000 200.015 0.02 0.025 0

75 1 1.1 1.25 1.5 11 2 3 4 5

28 13 2 6 5

eliness

mainly analyzeng location

ction and updy of extracting BLE V OISE TO RECONSTR

ATION TREE

) pattern

n time.bu

16 0.000032 0.000064 0.0000128 0.0000256 0.0000512 0.0000

1024 0.00002048 …4096 …

efficiency ofy tree is higher

LE VI OF LOCATION DA

0.1 0.5

42103 8932

I, within a cation data increases, the d

number of extr

om multiple gata. The blue ons and the redmental results

a)

m of 64 bit. e experiment.

500 1000 2000 5000 .03 0.035 0.04 0.

1.75 5 10 15 6 30

ed in three aspprivacy tree.

dating of localocation data.

RUCT THE LOCAT

uild (s) time.upda

0014 0.0001020024 0.0001020035 0.0001020061 0.0001020110 0.0001020150 0.0001020280 0.000102

… …… …

f constructingr when the mo

ATA PRIVACY

1.0 1

2 53245 79

certain range,ncreases with degree of priraction in the

groups of pattdots are use

d ones indicats are shown in

5

The

5000

1 0.5

pects: . (2) ation .

TION

ate (s)

2190218021952120209520952030

g and ode n

1.5

9310

, the h the ivacy unit

terns ed to e the

n Fig.

Page 6: Location Privacy Protection based on Differential Privacy ...people.cs.pitt.edu/~mosse/courses/cs3720/DiffPrivacy-for-IOT.pdfDifferential privacy model is a strong privacy concept

1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEETransactions on Industrial Informatics

Figare

loan

Figloc

losewh

g. 3. Data distribue shown before ad

The red sensications are 39

nd the blue ins

g. 4. Data distrcations are shown

There are 141cations in Fi

ensitive positiohich shows th

ution before and adding noise. (b) T

itive locations7 in Fig. 3(a),

sensitive locati

ribution before ann before adding no

1 red sensitive ig. 4(a), afteron and 781 bluat the sensitiv

(b) after protection (k The ones after add

s are 103 and the red sensitiions are 353 in

(a)

(b) nd after protectiooise. (b) The ones

locations and r protection,

ue insensitive lve locations ha

(a)

is 500). (a) The loding noise.

the blue insenive locations an Fig. 3(b).

on (k is 1000).s after adding nois

859 blue insenthere are 21

locations in Figave increased.

ocations

nsitive are 137

(a) The se.

nsitive 19 red g. 4(b),

Fig. locat

Tinsered Fig.utili

Panu

12

WTDPexp

Fig.

FLPTis bproppropthe privand TDPthe lowrelaVIII

UTI

5. Data distribtions are shown b

There are 23ensitive locatiosensitive loca

. 5(b). Tableity.

UTILITY S

Bef

attern umber

Sensitlocation

dots500 103

1000 1412000 235

We compare tPS_LP_Resulterimental resu

6. Error compari

From Fig. 6 shT-DP-k algoritbeginning to sposed methodposed method

Error be svacy protectiond the higher tPS_LP_Resultprivacy param

wer error. Whenatively larger.TI and IX:

ILITY COMPARISO

(bution before and

before adding nois

5 red sensitons in Fig. 5(aations and 162VII gives th

TABLSTATISTICS UNDE

fore the experimentive

ns (red s)

Insensilocations

dots3 3971 8595 1765

the proposed t, TDPS_Sig

ults are shown

ison of adding-no

hows that: (1) thm is the smastabilize, whicd is higher

ds; (2) as the smaller; whenn, the smallerthe privacy prt, TDPS_LP_

meters are requn the privacy pThe experime

TABL

ON OF TOP k FRE

PRIVACY PARA

b) d after protectionse. (b) The ones a

tive locationsa), after protec20 blue insene statistical r

LE VII ER DIFFERENT DA

nt Aftetive (blue )

Sensitilocations

dots)7 137 9 219 5 380

method with gnal and TDn in Fig. 6:

oise methods

the noise erroaller, when ch shows thatand better thprivacy paramn meeting thr is, the morotection leve_Signal and Tuired to be larparameters arental results ar

LE VIII

EQUENT ITEMSET

AMETER ( 1 )

n (k is 2000). (aafter adding noise

s and 1765 ction, there arensitive locationresults of the

ATA MODELS

er the experimentive (red )

Insensitilocations (

dots)353781

1620

three methodDPS_EP and

or of the prop0.015 , the

t the utility ofhan the prevmeter increhe -differe

ore the noise nel will be; (3)TDPS_E methrger to achievee small, the errre shown in T

MINING UNDER F

6

a) The .

blue e 380 ns in data

ive (blue

ds of the

posed error f the vious ases,

ential needs ) for hods, e the ror is Table

FIXED

Page 7: Location Privacy Protection based on Differential Privacy ...people.cs.pitt.edu/~mosse/courses/cs3720/DiffPrivacy-for-IOT.pdfDifferential privacy model is a strong privacy concept

1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEETransactions on Industrial Informatics

51251020

thacthwhm

U

0.0.00

1

th

LP7

Fig

Fig

reva

TPk LPR LPS

50 46 47 00 80 78 00 126 126 00 305 290 000 530 540 2000 680 940

As shown in e accuracy of

ccuracy of the e accuracy of then 500k ,

more than 80%.

UTILITY COMPARI

TP LPR LPS

.01 136 184

.05 178 182 0.1 174 180 0.5 180 190 1 182 188 .5 184 190

As shown in e higher the pFinally, the

PT-DP-k by Fand 8:

g. 7. FRR compa

g. 8. FRR compa

Assuming ksult is shown i

alue of FRR w

P EP DPk LPR 26 49 4 39 96 20 70 190 74

150 455 195 230 890 470 40 1720 1320

Table VIII: (1f 4 methods dprevious threethe proposed mthe accuracy

. TA

ISON OF TOP k F

FREQUENT PAT

P EP DPk LPR 70 188 64 64 188 22 90 190 26

160 190 20 168 188 18 174 190 16

Table IX, therivacy protectpaper compa

FRR. The expe

arison when k

arison when =1 is 100, the pin Fig. 7: with

with DP-topk an

FP LPS EP DPk

3 4 1 22 61 4 74 130 10

210 350 45 460 770 110 1060 1960 280

1) with the valdecreases; (2) e methods is lmethod still rey decreases bu

ABLE IX

FREQUENT ITEMSE

TTERNS ( 20k FP

LPS EP DPk16 130 12 18 136 12 20 110 10 10 40 10 12 32 12 10 26 10

smaller the ption level will ares the utiliterimental resul

100

privacy parameh privacy paramnd the propose

Accuracyk LPR LPS EP

0.93 0.95 0.50.80 0.78 0.30.63 0.63 0.30.61 0.58 0.30.53 0.54 0.20.38 0.47 0.2

lue of k increwhen 20k

less than 70%,emains at abouut still mainta

ET MINING UNDER

0 )

Accuracyk LPR LPS EP

0.68 0.92 0.30.89 0.91 0.30.87 0.90 0.40.90 0.95 0.80.91 0.94 0.80.92 0.95 0.8

privacy paramebe.

ty of DP-toplts are shown i

eter is changemeter increasined method mai

y P DPk3 0.989 0.965 0.950 0.913 0.890 0.86

easing, 0 , the , while

ut 95%, ains at

R FIXED

y P DPk5 0.942 0.945 0.950 0.954 0.947 0.95

eter is,

pk and in Fig.

ed, the ng, the intains

lowwe cwhe

Ashowbut up, levealgo

Tbasenetwconthe becaand posidataTheprotlocais mefficefficproctarg

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

w, but the valuecan see that then the protectiAssuming iswn in Fig. 8: wthe value of Dthe proposed

el of location porithm.

This paper proed on differenworks. The mstructing the lproblem that ause of its cha

d adds noise inition data. It ia and maintaine differential tection of locaation privacy p

more rigorous aciency. In the cient data strucess of locatio

get functions fo

I. F. Akyildiz, Wsensor networkJournal of Comppp. 393-422, 20K. Xing, C. Q.Privacy PreserviIEEE Transact2066-2076, 2017Z. H. Lv, H.“Next-GeneratioFuture Researchvol. 13, no. 4, ppF. Xiao, L. T. Shfor unknown BInternet of ThComputing, no. Z. Q. Wang, FSee-through-waon Battery-free Systems, vol. 17H. Alemdar andsurvey,” CompuZ. J. Fu, K. Repersonalized seimprovement,” Ivol. 27, no. 9, ppH. Rong, T. H.isomorphism mdetection,” Soft T. H. Ma, Y. LAbdullah and A.and Edge Mod1336–1344, 201

e of the propohe availability ion level is higs 1, the valuewith k increas

DP-topk is largemethod has o

privacy protec

VI. CON

oposes a locatntial privacy method exprelocation informthe location d

aracteristics of nformation to cis more effectning high avaprivacy proteation privacy.protection algand has highernext step, we

ucture to expron data expre

for different ap

REFER

W. Su, Y. Sankaraks: a survey,” Cputer & Telecom10. Hu, J. G. Yu, X

ing k-Means Clustions on Industr7. B. Song, P. on Big Data Analh Topics,” IEEE p. 1891-1899, 20ha, Z. P. Yuan and

Bugs based on Anhings,” IEEE Tr99, pp. 1–13, 201F. Xiao, N. Yell System for DevRFID,” ACM T

7, no. 1, pp. 1–21, d C. Ersoy, “Wireuter Networks, volen, J. G. Shu, Xearch over encryIEEE Transactionp. 2546–2559, 20. Ma, M. L. Tan

method in socialComputing, pp. 1

L. Zhang, J. Cao,. R. Mznah, “KDV

dification algorith5.

osed method iof the propose

gh. e of k is chansing, the valueer than that in obvious advan

ction and the e

NCLUSION

tion privacy pstrategy for b

esses the posmation tree modata is difficuf high dispersiocover the origtive in protectailability of daection model . Compared wgorithm, the prr algorithm utilwill discuss to

ress location iession and propplication scen

RENCES asubramaniam andComputer Netwomunications Netw

X. Z. Cheng andstering in Social Prial Informatics,

Basanta-Val, Alytics: State of thTransactions on 17. d R. C. Wang, “Vnalysis for know

Transactions on 17. , R. C. Wang vice-free Human Transactions on

2017. eless sensor netwl. 54, no. 15, pp. 2. M. Sun and F.

ypted outsourcedns on Parallel an

016. ng and J. Cao, “Al network based1–19 , J. Shen, M. L. VEM: a k-degree hm,” Computing

s lower. Fromed method is b

nged, the resue of FRR increthis paper. Tontages both in

effectiveness o

protection mebig data in seition data seodel, which so

ult to be expreon and low den

ginal trajectoryting the privacata and algoriis applied to

with the traditiroposed algorlity and proceso explore the minformation inopose more unarios.

d E. Cayirci, “Wiorks the Internaworking, vol. 38,

d F. J. Zhang, “MParticipatory Sen

vol. 13, no. 4

A. Steed and Mhe Art, ChallengeIndustrial Inform

VulHunter: A Discwn patches in Ind

Emerging Topi

and P. L. YangMotion Sensing BEmbedded Comp

works for healthca2688-2710, 2010. X. Huang, “Ena

d data with efficnd Distributed Sys

A novel subgraphd on graph sim

Tang, Y. Tian, anonymity with V

g, vol. 70, no. 6

7

m this better

ult is ases,

o sum n the of the

ethod ensor et by olves essed nsity,

y and cy of ithm. o the ional rithm ssing more n the

utility

ireless ational

no. 4,

Mutual nsing,” 4, pp.

M. Jo, s, and

matics,

covery dustry ics in

g, “A Based puting

are: A . abling ciency stems,

h K+- milarity

A. D. Vertex 6, pp.

Page 8: Location Privacy Protection based on Differential Privacy ...people.cs.pitt.edu/~mosse/courses/cs3720/DiffPrivacy-for-IOT.pdfDifferential privacy model is a strong privacy concept

1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEETransactions on Industrial Informatics

[10

[11

[12

[13

[14

[15

[16

[17

[18

[19

[20

[21

[22

[23

[24

[25

[26

[27

[28

[29

[30

[31

0] C. Y. Yin, S. model for big & Computatio

1] C. Y. Yin, J. WImproved K -VMobile Inform

2] P. Samarati anwhen discloSIGACTSIGMACM, New Yo

3] R. C. W. WonAn enhanced kin Proc. of theand Data Min

4] N. H. Li, T. beyond k-anoConf., on Data

5] E. Zheleva relationships ion Privacy, S153–171.

6] A. Korolova, Rnetworks,” in P1355–1357.

7] E. D. CristofarPrivacy at theand Privacy, I

8] A. R. Beresforprivacy for appWorkshop on York, 2011, pp

9] B. Bamba, L. Lqueries in moInt’l Conf., on

0] L. Liu, “Fromin Koch C, edACM, New Yo

1] C. Dwork, “DAnnual ACM-and Applied M

2] B. Gu, V. S. support vectoNeural Netwo2014.

3] B. Gu, V. S. Slearning for v140-150, 2015

4] B. Gu and Vv-support vectand Learning

5] B. Gu, X. MMachine,” IEEvol. 28, no. 7,

6] E. T. Shen anprivacy,” in PDiscovery and545–553.

7] C. Zeng, J. F.itemset minin(VLDB), Trent

8] A. Friedman aProc. of the AData Mining (

9] Y. D. Li, Zmechanism: UProc. of the 1Society, ACM,

0] X. J. Zhang, Mtop-k FrequenResearch and

1] C. Y. Yin anclassification algorithm,” M

Zhang, J. W. Xidata security bas

on Practice & ExpW. Xi and R. X. SuValue Method in

mation Systems, ppnd L. Sweeney,

osing informatioMOD-SIGART Sym

ork, 1998, pp. 188ng, J. Y. Li, A. Wk-anonymity mode 12th ACM SIGKning, ACM, New Y

C. Li and S. Veonymity and l-diva Engineering, IEand L. Getoor, in graph data,” in Security, and Tru

R. Motwani, S. UProc. of the 24th I

ro, C. Soriente, Ge time of twitter,”IEEE, Piscatawayrd, A. Rice, N. Skplication functionMobile Computinp. 49–54. Liu, P. Pesti and Tbile environment

n World Wide Webm data privacy to ld. Proc. of the 33rork, 2007, pp. 142

Differential privac-SIAM Symp., on DMathematics, ACM

Sheng, K. Y. Tr learning for or

orks and Learning

Sheng, Z. Wang, v-support vector r5.

V. S. Sheng, “A tor classification,Systems, vol. 28, . Sun and V. S. EE Transactions opp. 1646–1656, 2

nd T. Yu, “MiningProc. of the 19th Ad Data Mining (SI

Naughton and J.ng,” in Proc. of to, Italy, 2013, pp

and A. Schuster, “ACM SIGKDD In(SIGKDD), ACM,. J. Zhang, M.

Utilizing sparse r10th Annual ACM

M, New York, 2011M. Wang, and X. Fnt Pattern Under D

Development, vond J. W. Xi, “M

in cloud compuMultimedia Tools &

i and J. Wang, “sed on clustering perience, vol. 29, un, “Location Priv

n Augmented Reap. 1–7, 2017. “Generalizing daon,” in Proc.mp., on Principle8–202.

W. C. Fu and K. Wdel for privacy-prKDD Int’l Conf., oYork, 2006, pp. 7enkatasubramaniaversity,” in Proc.EEE, Piscataway,

“Preserving thProc. of the 1st A

ust in KDD, ACM

U. Nabar and Y. XuInt’l Conf., on Da

G. Tsudik and A. W in Proc. of IEEE

y, 2012, pp. 285–2kehin and R. Sohanality on smartphong Systems and A

T. Wang, “Supports with privacy grb, ACM, New Yorlocation privacy: rd Int’l Conf., on 29–1430. cy in new settingDiscrete Algorith

M, New York, 201ay, W. Romano

rdinal regression,g Systems, vol. 2

D. Ho, S. Osmanegression,” Neura

robust regulariza” IEEE Transactino. 5, pp. 1241–Sheng, “Structu

on Neural Networ2017. g frequent graph ACM SIGKDD In

SIGKDD), ACM, C

. Y. Cai, “On difthe 39th Conf o

p. 1087–1098. “Data mining withInt’l Conf., on Kn

M, Washington, USWinslett and Y

representation in M Workshop on P1, pp. 177–182. F. Meng, “An AccDifferential Privacol. 51, no. 1, pp. 1

Maximum entropyuting using imp& Applications, p

An improved anoalgorithm,” Concno. 7, 2017.

vacy Protection Bality on Mobile D

ata to provide ano of the 7th es of Database S

Wang, “(a, k)-Anoreserving data pubon Knowledge Di54–759. an, “t-closeness: . of the 23rd IEE2007, pp. 106–11

he privacy of sACM SIGKDD WM, New York, 20

u, “Link privacy iata Engineering, 2

Williams, “HummE Symposium on S299. an, “MockDroid: ones,” in Proc. of Applications, ACM

rting anonymous rid,” in Proc. of trk, 2008, pp. 237Models and algor Very Large Data

gs,” in Proc. of thms Society for In10, pp. 174–183.and S. Li, “Incr” IEEE Transact

26, no. 7, pp. 140

n and S. Li, “Incral Networks, vol.

ation path algoriions on Neural N1248, 2017.

ural Minimax Prorks and Learning

patterns with diffnt’l Conf., on KnoChicago, USA, 20

fferential private fof Very Large D

h differential privnowledge DiscovSA, 2010, pp. 493Y. Yang, “Comp

differential privaPrivacy in the Ele

curate Method forcy,” Journal of Co04-114, 2014.

y model for mobproved informatiopp. 1-17, 2016.

onymity currency

Based on Devices,”

onymity ACM

Systems,

onymity: blishing,” iscovery

Privacy EE Int’l 15. sensitive

Workshop 007, pp.

in social 2008, pp.

mingbird: Security

Trading f the 12th M, New

location the 17th –246. rithms,” a Bases,

the 21st ndustrial

remental tions on 03-1416,

remental . 67, pp.

ithm for Networks

obability Systems,

fferential owledge 013, pp.

frequent Database

vacy,” in very and 3–502. pressive acy,” in ectronic

r Mining omputer

bile text on gain

HInfocoauHis sensis a

Chincryp

Unialgowireand

He is a Professormation Scienuthored more current resea

sor networkingmember of A

na. His curptography, big

iversity. His orithm design,eless ad hoc a

d ACM.

Chunyongfrom ShanChina in 1Ph.D. degGuizhou U2008, respresearch aNew Brun2012.

sor and Dean nce & Techno

than twenty arch interests g, machine lea

ACM.

Jinwen Xicomputer Huaiyin In2011.

Now hedegree itechnologyInformatio

rrent researcg data security

Ruxia Suelectrical eUniversityShe is a

Nanjing Science & research insystems, mathemati

Jin Wandegree froand Telecand 2005, degree froin 2010.

He is aInformatio

research inte, performance

and sensor netw

g Yin receivendong Univers1998. He recegrees in compUniversity, Chpectively. He wassociate at tnswick, Cana

with the Nanology, China. H

journal and cinclude privaarning and net

i received his science and

nstitute of Tech

e is studyingin computery from Nanjion Science ch interests y and privacy.

un received thengineering fr

y of Technologan associate

University Technology, nterests inclu

machine ical modeling.

ng received thom Nanjing Uommunicationrespectively. m Kyung Hee

a professor ion Engineererests mainlyevaluation an

works. He is a

ed the B.S. deity of Technol

eived the M.S.puter science hina, in 2005was a post-docthe Universityada, in 2011

njing UniversitHe has authoreconference papcy preservingtwork security

bachelor degrtechnology

hnology, Chin

g for his masr science ing Universitand Technolare in app

he B.E. degrerom the Shandgy, China, in 1

Professor inof Informa

China. Her cude cyber phy

learning .

he B.S. and MUniversity of Pns, China in 2He received P

e University K

n the Collegring, Yangz

y include round optimizationa Member of I

8

egree logy, . and from and

ctoral y of and

ty of ed or pers.

g and y. He

ree in from

na, in

ster’s and

ty of logy, plied

ee in dong 1997. n the ation

urrent ysical

and

M.S. Posts 2002

Ph.D. Korea

ge of zhou uting n for IEEE