Top Banner
2168-6750 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TETC.2016.2614383, IEEE Transactions on Emerging Topics in Computing 1 On Reliable Task Assignment for Spatial Crowdsourcing Xinglin Zhang, Member, IEEE, Zheng Yang, Member, IEEE, Yunhao Liu, Fellow, IEEE, and Shaohua Tang, Member, IEEE Abstract—The large quantity of mobile devices equipped with various built-in sensors and the easy access to the high-speed wireless networks have made spatial crowdsourcing receive much attention in the research community recently. Generally, the objective of spatial crowdsourcing is to outsource location-based sensing tasks (e.g., traffic monitoring and pollution monitoring) to ordinary mobile workers (e.g., users carrying smartphones) efficiently. In this paper, we study a reliable task assignment problem for spatial crowdsourcing in a large worker market. Specifically, we use worker confidence to represent the reliabil- ity of successfully completing the assigned sensing tasks, and we formulate two optimization problems, maximum reliability assignment (MRA) under a recruitment budget and minimum cost assignment (MCA) under a task reliability requirement. We reveal the special structure properties of these problems, based on which we design effective approaches to assign tasks to the most suitable workers. The performances of the proposed algorithms are verified by theoretic analysis and experimental results on both real and synthetic datasets. KeywordsSpatial Crowdsourcing, Task Assignment, Reliability, Budget, Minimum Cost I. I NTRODUCTION The proliferation of mobile devices equipped with built-in sensors and the ubiquity of high-speed wireless networks have enabled a promising distributed problem solving paradigm, spatial crowdsourcing [1], in which ordinary mobile users are requested to perform location-based sensing tasks, such as traffic monitoring [2], [3], pollution monitoring [4], and geographical data generation [5]. With the large volume and various types of data (e.g., location, time, picture, audio, and video) that can be collected from spatial crowdsourcing, more and more sensing applications that can greatly benefit people’s life are going to be developed in both the academic and industrial communities. Generally, a spatial crowdsourcing system consists of three characters: task requester, spatial crowdsourcing server (SC- server), and mobile worker. Task requesters and workers first This work was supported in part by the National Natural Science Foundation of China under Grant No. 61502178, Natural Science Foundation of Guang- dong Province under Grant No. 2016A030313480, the China Postdoctoral Science Foundation under Grant No. 2015M572318, and the Fundamental Research Funds for the Central Universities. X. Zhang and S. Tang are with the School of Computer Science and Engineering, South China University of Technology, China. E-mail: [email protected],[email protected]. Z. Yang and Y. Liu are with the School of Software and Tsinghua National Lab for Information Science and Technology (TNLIST), Tsinghua University, China. E-mail: {yang, yunhao}@greenorbs.com. register with the SC-server. Then task requesters can outsource sensing tasks to the SC-server, which subsequently determines the task assignment to workers according to certain predefined criteria. Finally, workers complete the assigned tasks by phys- ically travelling to the specified locations of interest and send the results back to requesters through the SC-server. The goal of spatial crowdsourcing is to achieve high social efficiency by matching tasks and workers effectively. The evaluation metrics vary with the task assignment models. Most existing work deal with the situation that the number of tasks and workers are comparable, and the proposed methods strive to maximize the number of successful assignments, given the reliability of assigned tasks and the spatial-temporal con- straints, such as worker travelling distance and task expiration time [1], [6]–[8]. In this paper, we study the task assignment for spatial crowdsourcing in a large worker market, i.e., the number of workers is sufficiently large that a subset of workers are sufficient for completing sensing tasks. As an example, in road condition detection (e.g., reporting potholes on the road), the number of vehicles and passengers (i.e. workers) is more than enough for the road segments of interest. Another example comes from an emergency response scenario, where the requester (e.g., Red Cross) intends to collect pictures or videos of specific damaged locations. The available workers in the vicinity are usually abundant and a subset of them can complete the tasks. Under the aforementioned realistic scenario, we propose the task-and-worker assignment by considering the measurement of reliability. The intuition is that answers provided by workers are sometimes incorrect (e.g., the uploaded photos/videos might be fake), or workers may fail to complete the tasks before task expiration time. Thus, we will take into account the confidence of workers, and define the reliability of each sensing task as the confidence that at least one worker assigned to this task can give correct answers. Workers are assumed to have preferences over the sensing tasks broadcasted by the SC- server. For example, some workers might only accept tasks that are within the predefined walking distance, while others might accept tasks on the way to their destinations (e.g., home). The assignment frameworks we present here are designed for two main objectives: maximizing the reliability of the assigned tasks under a recruitment budget given by requesters, and minimizing the number of assigned workers for a given reliability requirement of the assigned tasks. We formulate the problems by constructing bipartite graphs between tasks and workers, and then investigate the structure properties of the formulations, based on which we propose effective algorithms
13

1 On Reliable Task Assignment for Spatial Crowdsourcingstatic.tongtianta.site/paper_pdf/0f040f08-9e56-11e9-8f2c... · 2019-07-04 · objective of spatial crowdsourcing is to outsource

Aug 03, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 On Reliable Task Assignment for Spatial Crowdsourcingstatic.tongtianta.site/paper_pdf/0f040f08-9e56-11e9-8f2c... · 2019-07-04 · objective of spatial crowdsourcing is to outsource

2168-6750 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TETC.2016.2614383, IEEETransactions on Emerging Topics in Computing

1

On Reliable Task Assignment for SpatialCrowdsourcing

Xinglin Zhang, Member, IEEE, Zheng Yang, Member, IEEE,Yunhao Liu, Fellow, IEEE, and Shaohua Tang, Member, IEEE

Abstract—The large quantity of mobile devices equipped withvarious built-in sensors and the easy access to the high-speedwireless networks have made spatial crowdsourcing receive muchattention in the research community recently. Generally, theobjective of spatial crowdsourcing is to outsource location-basedsensing tasks (e.g., traffic monitoring and pollution monitoring)to ordinary mobile workers (e.g., users carrying smartphones)efficiently. In this paper, we study a reliable task assignmentproblem for spatial crowdsourcing in a large worker market.Specifically, we use worker confidence to represent the reliabil-ity of successfully completing the assigned sensing tasks, andwe formulate two optimization problems, maximum reliabilityassignment (MRA) under a recruitment budget and minimumcost assignment (MCA) under a task reliability requirement. Wereveal the special structure properties of these problems, based onwhich we design effective approaches to assign tasks to the mostsuitable workers. The performances of the proposed algorithmsare verified by theoretic analysis and experimental results on bothreal and synthetic datasets.

Keywords—Spatial Crowdsourcing, Task Assignment, Reliability,Budget, Minimum Cost

I. INTRODUCTION

The proliferation of mobile devices equipped with built-insensors and the ubiquity of high-speed wireless networks haveenabled a promising distributed problem solving paradigm,spatial crowdsourcing [1], in which ordinary mobile usersare requested to perform location-based sensing tasks, suchas traffic monitoring [2], [3], pollution monitoring [4], andgeographical data generation [5]. With the large volume andvarious types of data (e.g., location, time, picture, audio, andvideo) that can be collected from spatial crowdsourcing, moreand more sensing applications that can greatly benefit people’slife are going to be developed in both the academic andindustrial communities.

Generally, a spatial crowdsourcing system consists of threecharacters: task requester, spatial crowdsourcing server (SC-server), and mobile worker. Task requesters and workers first

This work was supported in part by the National Natural Science Foundationof China under Grant No. 61502178, Natural Science Foundation of Guang-dong Province under Grant No. 2016A030313480, the China PostdoctoralScience Foundation under Grant No. 2015M572318, and the FundamentalResearch Funds for the Central Universities.

X. Zhang and S. Tang are with the School of Computer Scienceand Engineering, South China University of Technology, China. E-mail:[email protected],[email protected].

Z. Yang and Y. Liu are with the School of Software and Tsinghua NationalLab for Information Science and Technology (TNLIST), Tsinghua University,China. E-mail: yang, [email protected].

register with the SC-server. Then task requesters can outsourcesensing tasks to the SC-server, which subsequently determinesthe task assignment to workers according to certain predefinedcriteria. Finally, workers complete the assigned tasks by phys-ically travelling to the specified locations of interest and sendthe results back to requesters through the SC-server.

The goal of spatial crowdsourcing is to achieve high socialefficiency by matching tasks and workers effectively. Theevaluation metrics vary with the task assignment models. Mostexisting work deal with the situation that the number of tasksand workers are comparable, and the proposed methods striveto maximize the number of successful assignments, giventhe reliability of assigned tasks and the spatial-temporal con-straints, such as worker travelling distance and task expirationtime [1], [6]–[8]. In this paper, we study the task assignmentfor spatial crowdsourcing in a large worker market, i.e., thenumber of workers is sufficiently large that a subset of workersare sufficient for completing sensing tasks. As an example,in road condition detection (e.g., reporting potholes on theroad), the number of vehicles and passengers (i.e. workers) ismore than enough for the road segments of interest. Anotherexample comes from an emergency response scenario, wherethe requester (e.g., Red Cross) intends to collect pictures orvideos of specific damaged locations. The available workersin the vicinity are usually abundant and a subset of them cancomplete the tasks.

Under the aforementioned realistic scenario, we propose thetask-and-worker assignment by considering the measurementof reliability. The intuition is that answers provided by workersare sometimes incorrect (e.g., the uploaded photos/videosmight be fake), or workers may fail to complete the tasksbefore task expiration time. Thus, we will take into accountthe confidence of workers, and define the reliability of eachsensing task as the confidence that at least one worker assignedto this task can give correct answers. Workers are assumed tohave preferences over the sensing tasks broadcasted by the SC-server. For example, some workers might only accept tasks thatare within the predefined walking distance, while others mightaccept tasks on the way to their destinations (e.g., home).

The assignment frameworks we present here are designedfor two main objectives: maximizing the reliability of theassigned tasks under a recruitment budget given by requesters,and minimizing the number of assigned workers for a givenreliability requirement of the assigned tasks. We formulate theproblems by constructing bipartite graphs between tasks andworkers, and then investigate the structure properties of theformulations, based on which we propose effective algorithms

Page 2: 1 On Reliable Task Assignment for Spatial Crowdsourcingstatic.tongtianta.site/paper_pdf/0f040f08-9e56-11e9-8f2c... · 2019-07-04 · objective of spatial crowdsourcing is to outsource

2168-6750 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TETC.2016.2614383, IEEETransactions on Emerging Topics in Computing

2

with theoretical guarantees.To summarize, the main contributions of this work are as

follows:• We propose a reliable task assignment problem, which

guarantees the quality of sensing tasks, for spatial crowd-sourcing systems with large worker markets.

• We investigate the structure properties of the proposedproblems, based on which we propose effective algo-rithms with theoretical guarantees.

• We conduct extensive experiments on both synthetic andreal datasets and show the effectiveness of the proposedmethods.

The rest of the paper is organized as follows. Section IIpresents the related work. The problem definitions are givenin Section III. Then we discuss the solutions for the twoassignment objectives in Section IV and V, respectively. Theexperimental results are illustrated in Section VI and finallythe conclusion is drawn in Section VII.

II. RELATED WORK

A. Spatial CrowdsourcingSpatial crowdsourcing is closely related to participatory

sensing [9], which aims at exploiting mobile users to activelycollect and report data by using their mobile devices equippedwith various sensors for a given campaign. Participatory sens-ing is mostly application oriented. The Mobile Millenniumproject [10] employs mobile phones to collect and uploadtraffic information to a server in real time. Based on thereported traffic data, the server then builds prediction modelsfor the future traffic condition. With the same spirit, researchersdesigned CalTel [11] and Nericell [12] to pool informationabout traffic and road conditions by leveraging smartphonesand mobile sensors mounted on vehicles. DroneSense [13] isan environmental monitoring system that integrates traditionalwireless sensor networks and participating users with smart-phones. GreenGPS [14] offers a navigation service that allowsdrivers to follow the most fuel-efficient routes customizedfor their vehicles between arbitrary end-points by leveragingsparse sensed data. GreenGPS is not only fuel-efficient, butalso economic and easy to deploy.

Spatial crowdsourcing [1] generalizes the aforementionedparticipatory sensing applications by using a general manage-ment framework. In spatial crowdsourcing, multiple campaignscan be processed efficiently and the goal is to assign tasksefficiently. Kazemi and Shahabi [1] proposed a maximum taskassignment problem. They designed heuristic algorithms tomaximize the number of assigned tasks for a given time inter-val. He et al. [15] modeled the travelling distance constraintand the task completion redundancy to ensure high efficiencywhen allocating location dependent tasks to mobile workers.In [6], Kazemi et al. explicitly considered modeling the workerquality, and designed efficient assignment protocols to pursuehigh completion quality of assigned tasks. Reddy et al. [16]developed a recruitment framework that can identify well-suited participants based on their geographic and temporalavailability as well as participation habits. Pournajaf et al. [17]studied the assignment scenario that workers are able to utilize

spatial cloaking to obfuscate locations. To et al. [7] triedto obtain multiple objectives, i.e., minimizing the travellingdistance of a worker, maximizing task assignment success rate,and protecting worker location privacy, by designing a taskassignment scheme. Cheng et al. [8] considered reliability andspatial-temporal diversity of the tasks that rely on multipleobservation angles. The common assumption of these workis that the numbers of tasks and workers are comparable,and the frugality of assignment is not incorporated. In thispaper, we study the application scenario with a large workermarket, for which we accommodate the assignment efficiencyand frugality.

The frameworks discussed above concern the overall as-signment efficiency. Another direction is to maximize a singleworker’s utility. Deng et al. [18] considered workers’ perspec-tives for task selection, and designed a method to maximize thetotal number of a worker’s selected tasks under the travellingbudget.

B. Crowdsensing

Spatial crowdsourcing is also related to crowdsensing [19]–[21] that emphasizes on using the built-in sensors of mobiledevices to continuously collect sensed data in the background.Pioneer crowdsensing applications include LiFS [22] and TrM-CD [23] for indoor localization, FlierMeet [24] for Cross-Space Public Information Reposting, Tagging, and Sharing,and SmartTrace [25] for 3G/WiFi discovery. The fundamentalobjective of such applications is to select a set of workers thatcan cover a large spatial area in the region of interest. In [26],Cardone et al. proposed a Mobile Crowdsensing platform,where a simple participant selection mechanism is adoptedto maximize the sensing area coverage with a recruitmentbudget. Zhang et al. [27] and Xiong et al. [28] tried to satisfysensing requirement by selecting as few workers as possible.Trajectory histories are used to evaluate the coverage potentialof a worker. Differently, user mobilities are used by Ahmed etal. [29] and Hachem et al. [30] to select a minimal number ofmobile users to meet the coverage requirement.

The above crowdsensing applications emphasize on theautomatic sensing capability of smartphones, while spatialcrowdsourcing studied in this work highlights people’s intellec-tual capability and mobility in completing location-based querytasks that are difficult for passive sensing with smartphones.

C. Incentive Mechanisms

Incentive mechanisms offer ordinary workers benefits tocompensate for their contribution of sensing effort. Jaimeset al. [31] considered the mechanism design based on userlocations, the coverage, and the platform budget. Sun andTham [32] proposed an information-driven incentive mecha-nism with consumer demand awareness. An information utilitymetric is adopted to measure the quality of a worker’s sensorinformation according to the consumer demand. The proposedmechanism can select and incentivize the most informativeworkers, which maximize the satisfaction of the consumerswith fewer number of workers. Wen et al. [33] incorporated the

Page 3: 1 On Reliable Task Assignment for Spatial Crowdsourcingstatic.tongtianta.site/paper_pdf/0f040f08-9e56-11e9-8f2c... · 2019-07-04 · objective of spatial crowdsourcing is to outsource

2168-6750 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TETC.2016.2614383, IEEETransactions on Emerging Topics in Computing

3

sensing quality and workers’ willingness for participation inthe mechanism. Recently, coverage functions with the propertyof submodularity are applied to construct efficient incentivemechanisms for user selection in crowdsensing [34]–[37].These incentive methods are mostly designed for a single cov-erage application in passive sensing, hence they are differentfrom the models in spatial crowdsourcing, where sensing tasksare of large quantity and efficient task assignment is required.

D. Internet of ThingsSpatial crowdsourcing is part of the emerging Internet of

Things (IoT) networking architecture [38]. On one hand, theworkers equipped with mobile devices in spatial crowdsourc-ing forms a powerful set of mobile sensors, which can notonly collect various sensing data with the built-in sensors, butalso connect with the traditional sensor networks deployed inthe environment. In this way, spatial crowdsourcing enhancethe Objects Layer and the sensing capability of the IoT [38].On the other hand, Device-to-Device (D2D) communication inLTE-A networks [39], [40] is promising to facilitate the im-plementation of the task assignment for spatial crowdsourcing.As an example, a worker can deliver the sensed data to nearbyavailable workers through D2D communication if s/he cannottransmit the data through cellular networks [41], [42]. Thisalternative transmission can help improve the task completionrate, and in consequence, improve the quality of experienceand service for spatial crowdsourcing.

III. PROBLEM DEFINITION

In this paper, we study the case that a spatial task isassociated with a specific location and a valid time duration.For example, in order to monitor the air quality condition at aspecified location L, we construct sensing tasks in the form of“please report the air quality condition at the location L at nineo’clock”. The location and time resolution here is applicationdependent and can be varied. A worker’s report beyond thespecified location and time is considered invalid. The formaldefinitions of a spatial task and a worker are given in thefollowing.

Definition 1 (Spatial task) A spatial task j of form(lj , qj , sj , ej) is a sensing query qj to be answered at locationlj between the valid time interval [sj , ej ].

It indicates from the definition that, the sensing query qjof a spatial task j can be answered by a mobile worker onlyif s/he travels physically to the location lj in the given timeinterval, where a worker is defined as:

Definition 2 (Worker) A worker i is a registered mobile userin the SC-server, who is currently located at position li,and has requested to perform a spatial task by setting thetask preference prefi. Each worker i is associated with aconfidence pi,j , which indicates the reliability of the worker ito complete the assigned task j.

Note that a worker i may want to do spatial tasks on theway to some destinations, or perform spatial tasks that arelocated within a walking distance from the current location of

AA

1i

2i

3i

4i

1j

2j

5i

3j

Fig. 1. An example of the set notations. The ground set E = (i1, j1),(i2, j1), (i3, j2), (i5, j1). Define A = (i1, j1), (i3, j2), (i5, j1), thenAn = i1, i3, i5, Am = j1, j2, Aj1

n = i1, i5, Aj2n = i3,

Ai1m = j1, Ai3

m = j2, Ai5m = j1. Given tb = 2, then A′ =

(i1, j1), (i3, j2) is a legitimate assignment.

the worker. The worker designates such preference prefi tothe SC-server, such that s/he will only be assigned tasks thatare possible to be completed.

The SC-server processes spatial tasks in a periodical fashion(e.g., two-hour cycles from 08:00 to 20:00 per day). Atthe beginning of each period, the SC-server pools the validtasks and available workers, and assign tasks to appropriateworkers. We assume that each worker can be assigned withone task in each period and s/he can participate in multipleperiods [1]. After being assigned with a spatial task j, a workeri sometimes may be unable to answer the query correctly (e.g.taking a wrong photo). Thus, each worker i is associated witha confidence pi,j , which is the reliability (or skill) that theworker i can successfully complete the task j. The confidenceof a worker can be inferred from historical data of performingtasks [43], [44]. As learning worker confidence is not the focusof this paper, we assume that the system has acquired theconfidence values before assigning tasks.

With the definitions of spatial tasks and workers, we areready to present our reliable spatial task assignment problems.The reliability of a spatial task indicates the confidence that atleast some workers assigned to this task can successfully finishthe task. Intuitively, assigning more workers to a task will leadto higher reliability, which in turn will result in great confi-dence of completing the task successfully in a large workermarket. However, workers usually require compensations, orincentives, to perform tasks. Even in self-motivated scenarios,assigning too many workers to a task also leads to a waste ofsocial energy. Therefore, we take into account the frugality inmodeling the assignment problems we adopt a uniform costmodel, where the payment for each task/worker is fixed and istypically specified by the task requester. One example adoptingthis payment model is the popular crowdsourcing platformAmazon’s Mechanical Turk. Under this model, researchershave treated workers equally expensive and studied efficienttask assignment schemes without budgets [6], [8] and withbudgets [43], [44] (where budgets are the total number ofrecruited workers). Under this model, we aim to maximizethe overall task reliability for assigned tasks under the budgetconstraint, and minimize the number of assigned workers underthe reliability requirement. The two problems are defined asfollows.

Definition 3 (Maximum reliability assignment (MRA))Given a total recruitment budget tb of the SC-server and the

Page 4: 1 On Reliable Task Assignment for Spatial Crowdsourcingstatic.tongtianta.site/paper_pdf/0f040f08-9e56-11e9-8f2c... · 2019-07-04 · objective of spatial crowdsourcing is to outsource

2168-6750 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TETC.2016.2614383, IEEETransactions on Emerging Topics in Computing

4

confidence pi,j for each worker i and task j, the maximumreliability assignment (MRA) problem is to maximize theoverall expected task reliability by assigning tasks to workersunder the budget constraint.

Definition 4 (Minimum cost assignment (MCA)) Given acompletion reliability threshold ϵ for each task and the confi-dence pi,j for each worker i and task j, the minimum costassignment (MCA) problem is to minimize the number ofassigned workers under the constraint that each task achievesa completion reliability of at least ϵ.

IV. RELIABILITY MAXIMIZATION

In this section, we formulate the MRA problem from theperspective of set function optimization. Define a set elementin the form of an ordered worker-task pair (i, j), whichrepresents that the task j is in the preference set of the workeri. The ground set E is the set of all possible elements, i.e.,E = (i, j)|i ∈ [n], j ∈ [m], where [n] and [m] represent theworker set of size n and the task set of size m, respectively.Given a set A ⊆ E, we construct a set function

f(A) =∑

j∈Am

(1−∏i∈Aj

n

(1− pi,j)), (1)

where An and Am represent the worker and task dimensionsin A, i.e., An = i|(i, j) ∈ A and Am = j|(i, j) ∈ A. Weuse Aj

n to represent the set of workers that can be assignedto the task j in An. It can be induced that

∏i∈Aj

n(1 − pi,j)

is the probability that all the assigned workers for the task jcannot finish the task j. Thus, the term 1−

∏i∈Aj

n(1−pi,j) is

the probability that the task j can be successfully performedby at least one assigned worker. Then the function f(·) isthe reliability metric that we would like to maximize for theoverall assignment. As the SC-server has a recruitment budgettb and each user can accomplish one task at a time, a legitimateassignment A must satisfy that |A| ≤ tb, and |Ai

m| ≤ 1, ∀i ∈An, where Ai

m represents the set of tasks assigned to workeri. Fig. 1 is an example illustrating the set notations and alegitimate assignment.

By constructing the above terms, the MRA problem isequivalent to solving the following constrained optimizationproblem:

maximizeA⊆E f(A)

subject to |A| ≤ tb,

|Aim| ≤ 1, ∀i ∈ An.

To solve the problem, we propose a greedy strategy to selectthe most valuable element in an iterative manner. Note thatthe objective function f(·) is decomposable with respect tothe summation of the terms of task reliability, i.e., addingan element (i′, j′) only influences the reliability of task j′.With this observation, we can simplify the computation processof finding the element with the largest marginal incrementvalue. Specifically, given a worker i′, we calculate the marginal

Algorithm 1 Greedy MRA (G-MRA)1: A← ∅, An ← ∅;2: πj ← 1, ∀j ∈ [m];3: for idx = 1 to tb do4: for j ∈ [m] do5: σj ← maxi∈[n]\An

pi,j · πj ;6: end for7: j∗ ← argmaxj∈[m] σj ;8: A← A ∪ (i(σj∗), j

∗)9: An ← An ∪ i(σj∗);

10: πj∗ ← πj∗ · (1− pi(σj∗ ),j∗);11: end for

increment value by matching it to a task j′:

f(A ∪ (i′, j′))− f(A)

= (1−∏

i∈Aj′n

(1− pi,j′) · (1− pi′,j′))− (1−∏

i∈Aj′n

(1− pi,j′))

=∏

i∈Aj′n

(1− pi,j′) · pi′,j′ .

After calculating all legitimate marginal increment values,the task with the largest marginal increment value will be takenas a candidate for real assignment.

Algorithm 1 sketches the procedure of the proposed greedystrategy. Lines 4–6 compute the largest marginal incrementvalue for each task j. Note that πj =

∏i∈Aj

n(1− pi,j), where

Ajn is the set of workers that have been assigned to task j

in the previous steps. In the current step, σj is the maximummarginal increment value for the task j. We use i(σj) to recordthe corresponding matched worker. In line 7, the algorithmreturns the task j∗ with the largest marginal value with respectto all available tasks. Lines 8–10 update the selected worker-task pair set A and the corresponding worker set An, as well asthe probability product term. Considering the time complexityof Algorithm 1, the inner loop (lines 4–6) takes time O(m ·n).Line 7 has time complexity of O(m). The remaining operationsinside the outer loop have constant time complexity. Therefore,the time complexity of Algorithm 1 is O(tb ·m · n).

To analyze the performance of the proposed algorithm, weresort to the theory of submodular functions. We first provethe following lemma regarding the objective function f(·).Lemma 1 The set function f(·) defined in Eq. (1) is monotonesubmodular.

Proof: We prove the lemma by using the definition ofsubmodular functions.

Definition 5 (Submodular function) Given a ground set E,a function g : 2Ω → R is submodular if for any X ⊆ Y ⊆ Ω,and e ∈ Ω\Y , we have

g(X ∪ e)− g(X) ≥ g(Y ∪ e)− g(Y ),

and g is monotone if g(X) ≤ f(Y ).

Given A ⊆ B ⊆ E and a new worker-task pair (i′, j′) ∈E\B, we certify the inequality given by the definition in three

Page 5: 1 On Reliable Task Assignment for Spatial Crowdsourcingstatic.tongtianta.site/paper_pdf/0f040f08-9e56-11e9-8f2c... · 2019-07-04 · objective of spatial crowdsourcing is to outsource

2168-6750 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TETC.2016.2614383, IEEETransactions on Emerging Topics in Computing

5

nB mB

'i 'j

nA mA

'i 'j

nB mB

'i

'j

nA mA

'i 'j

nB mB

'i

'j

nA mA

'i

'j

nB mB

'i

'j

nA mA

'i 'j

nB mB

'i

'j

nA mA

'i

'j

(a) (b) (c) (d) (e)

Fig. 2. The relation between the new element and the sets.

cases:• j′ /∈ Bm. In this case, there are three possibilities of the

new worker element i′ with respect to the worker setsAn and Bn (Fig. 2a-2c). As can be seen in all the threesubcases, j′ is a new task and has no matched workersin sets Bn and An, therefore we can easily get

f(B ∪ (i′, j′))− f(B) = 1− (1− pi′,j′) = pi′,j′

= f(A ∪ (i′, j′))− f(A).

• j′ ∈ Bm and j′ /∈ Am. There is only one option for i′

in this case (Fig. 2d) due to the constraint An ⊆ Bn.Without loss of generality, we assume that task j′ hasone matched worker with the reliability p in Bn. Thenwe can compute the marginal increment as

f(B ∪ (i′, j′))− f(B)

= (1− (1− p)(1− pi′,j′))− (1− (1− p))

= (1− p)pi′,j′ ≤ pi′,j′

= f(A ∪ (i′, j′))− f(A).

(2)

• j′ ∈ Am. As shown in Fig. 2e, i′ /∈ An in this case.As A ⊆ B, the number of matched workers of taskj′ in Bn is at least as many as that in An. Followingthe same computation process as the second case, wecan conclude that f(B ∪ (i′, j′)) − f(B) ≤ f(A ∪(i′, j′))− f(A).

Next we show that f is also monotone. In fact, as A ⊆ B,each task j′ ∈ Am has more matched workers in Bm, i.e.,Aj′

m ⊆ Bj′

m. By the same computation process in Eq. (2), wecan conclude that the reliability of j′ for B is higher than thereliability for A. The final reliability is a summation of thereliability of each task, hence f(B) ≥ f(A).

The feasible solutions of the MRA problem has a specialstructural property which can be derived by using the matroidtheory [45].

Definition 6 (Matroid) A matroid M = Ω, I is an orderedpair that consists of a finite set Ω and a non-empty collection

I of subsets of Ω satisfying the following conditions: (1) IfY ∈ I and X ⊆ Y , then X ∈ I; (2) If X,Y ∈ I and|X| < |Y |, then there exists an element e ∈ Y \X such thatX ∪ e ∈ I .

In the above definition, Ω is called the ground set of M . Inthe context of optimizing the function f(·), Ω is exactly the setE = (i, j)|i ∈ [n], j ∈ [m], which consists of all legitimateworker-task pairs, and I is the collection of all subsets of E,which indeed includes all possible assignments of maximizingunconstrained f(·).

The two constraints in the MRA problem shrink the solu-tion space, which can be depicted by uniform and partitionmatroids.

Definition 7 (Uniform matroid) A matroid M = Ω, I is auniform matroid if there exists some integer k, so that X ∈ Iif and only if |X| ≤ k.

Definition 8 (Partition matroid) A matroid M = Ω, I isa partition matroid if there exists a partition Y1, Y2, ..., YK ofΩ, so that X ∈ I if and only if |X ∩ Yk| ≤ 1 for all k ∈ [K].

According to these definitions, the constraint |A| ≤ tb inthe MRA problem makes an assignment A lie in a uniformmatroid. Let Bi = (i, j′)|j′ ∈ Ei

m), ∀i ∈ En, which is apartition of E, as Bi∩Bi′ = ∅, ∀i, i′ ∈ En and ∪i∈EnB

i = E.Then the constraint Ai

m ≤ 1, ∀i ∈ An can be restated as |A ∪Bi| ≤ 1, ∀i ∈ En. Therefore, the solution A is also restrictedto a partition matroid. In brief, we connect the solution of theMRA problem and the matroid by the following lemma.

Lemma 2 The feasible solution space of the MRA problemconstitutes an intersection of a uniform matroid and a partitionmatroid.

As algorithm 1 returns a greedy solution for the MRAproblem, combined with the results of Lemmas 1–2 andTheorem 2.1 in [46], it can be readily derived that algorithm 1returns a 3-approximation solution compared to the optimalsolution.

Page 6: 1 On Reliable Task Assignment for Spatial Crowdsourcingstatic.tongtianta.site/paper_pdf/0f040f08-9e56-11e9-8f2c... · 2019-07-04 · objective of spatial crowdsourcing is to outsource

2168-6750 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TETC.2016.2614383, IEEETransactions on Emerging Topics in Computing

6

Theorem 1 Algorithm 1 is a 3-approximation algorithm forthe MRA problem.

Note: The above results can be easily generalized to the sce-narios where spatial tasks have different weights. Specifically,assume that each task j is associated with a weight wj , whichreflects the utility of finishing the task j. Then the objectivefunction in Eq. (1) can be rewritten as:

f(A) =∑

j∈Am

wj(1−∏i∈Aj

n

(1− pi,j)).

The submodular property of this function is exactly the sameas Eq. (1), hence the derivation and greedy heuristic in thissection naturally apply to the weighted objective function.

V. COST MINIMIZATION

By adopting the set notations as the MRA problem, weformulate the MCA problem as a constrained set cardinalityminimization problem:

minimizeA⊆E |A|subject to 1−

∏i∈Aj

n

(1− pi,j) ≥ ϵ, ∀j ∈ Am,

|Aim| ≤ 1, ∀i ∈ An.

We design a greedy algorithm based on the link betweenthe formulated MCA problem and the monotone submodularfunction.

Theorem 2 The formulated MCA problem is reducible to thesubmodular coverage problem with a matroid constraint.

Proof: Consider a set of functions f j(A) = 1 −∏i∈Aj

n(1 − pi,j). Define a truncated function hj(A) =

minf j(A), ϵ, and let h(A) = 1m

∑j∈[m] h

j(A). A keyobservation is that:

f j(A) ≥ ϵ,∀j ∈ [m]⇐⇒ h(A) =1

m

∑j∈[m]

hj(A) = ϵ. (3)

Following the same argument in the proof of Lemma 1, wecan obtain:

Lemma 3 The functions f j(·), hj(·) and h(·) are monotonesubmodular for all j ∈ [m].

Therefore, the optimization problem

minimizeA⊆E |A|subject to f j(A) ≥ ϵ, ∀j ∈ Am,

is equivalent to

minimizeA⊆E |A|subject to h(A) ≥ ϵ,

which is a submodular coverage problem [47], [48]. We hencecomplete the proof by combining the inequalities |Ai

m| ≤1, ∀i ∈ An, which have been shown to form a partitionmatroid in Lemma 2.

Algorithm 2 Greedy MCA (G-MCA)1: A← ∅, An ← ∅, B ← [m];2: πj ← 1, fj ← 0, ∀j ∈ [m];3: while B = ∅ do4: for j ∈ B do5: σj ← minϵ− fj ,maxi∈[n]\An

pi,j · πj;6: end for7: j∗ ← argmaxj∈B σj ;8: A← A ∪ (i(σj∗), j

∗);9: An ← An ∪ i(σj∗);

10: fj∗ ← fj∗ + pi(σj∗ ),j∗ · πj∗ ;11: πj∗ ← πj∗ · (1− pi(σj∗ ),j∗);12: if fj∗ ≥ ϵ then13: B ← B\j∗;14: end if15: end while

The greedy algorithm has been proved to be efficient forsubmodular coverage problem, hence we propose a similargreedy heuristic by taking into account the partition matroidconstraint. Note that the function h(·), akin to f(·), is decom-posable with respect to spatial tasks. Given a current selectedset A and a new element (i′, j′), the marginal increment valueis computed as:

hj′(A ∪ (i′, j′))− hj′(A)

= minϵ− f j′(A), f j′(A ∪ (i′, j′))− f j′(A)= minϵ− f j′(A),

∏i∈Aj′

n

(1− pi,j′) · pi′,j′.

G-MCA (Algorithm 2) depicts the procedure computingthe assignments. Lines 4–7 select the worker-task pair withthe largest marginal increment value according to the derivedformulation above. Line 11 updates the probability productterm of a task, which is required for fast computing themarginal increment value. Lines 10–14 record the reliability ofeach task and remove those meeting the threshold requirement.To analyze the time complexity of G-MCA, let us considerthe task with the set of least reliable workers. Assume that thesmallest confidence value is pmin, and the required number ofworkers is k. Then we have the inequality

1− (1− pmin)k ≥ ϵ,

from which we can get k ≥ log1−pmin(1− ϵ). Therefore, for

the task linked to the least confident workers, G-MCA requiresselecting log1−pmin

(1 − ϵ) workers to satisfy the reliabilityconstraint. We need to ensure all tasks meet the constraint,which takes O(m) time (line 4), and selecting each workertakes O(n) time (line 5). In summary, the time complexity ofG-MCA is O(m · n · log1−pmin

(1− ϵ)).Next we analyze the performance of the greedy heuristic.

Let ∆eh(A) = h(A ∪ e) − h(A). To facilitate the proof,we assume that selecting an element e incurs a cost c(e).Let e1, e2, ..., ek be the sequence of elements selected by thegreedy heuristic with respect to ∆eh(A)/c(e), and let S be acover with the minimum cost c∗. Set A0 = ∅, Ar = et|1 ≤

Page 7: 1 On Reliable Task Assignment for Spatial Crowdsourcingstatic.tongtianta.site/paper_pdf/0f040f08-9e56-11e9-8f2c... · 2019-07-04 · objective of spatial crowdsourcing is to outsource

2168-6750 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TETC.2016.2614383, IEEETransactions on Emerging Topics in Computing

7

t ≤ r for each 1 ≤ r ≤ k. Let lr = h(S) − h(Ar) for each0 ≤ r ≤ k. Let α = min ∆eh(A)

c(e) , β =∆ek

h(Ak−1)

∆e1h(A0), then we

have the following theorem:

Theorem 3 Algorithm 2 returns a (1 + 1β ln h(E)

αc∗ )-approximation solution for the MCA problem.

Proof: First, we show that c(er) ≤ c∗

βlr−1(lr−1 − lr).

Indeed,

lr−1 − lrc(er)

=∆erh(Ar−1)

c(er)≥ βmax

e∈S

∆eh(Ar−1)

c(e)

≥β∑

e∈S ∆eh(Ar−1)∑e∈S c(e)

≥ βh(S)− h(Ar−1)

c∗

=βlr−1

c∗.

Since h(S) = l0 > l1 > · · · > lk = 0 and h(S) ≥ αc∗,there exists a unique index t, such that lt ≥ αc∗ > lt+1. For1 ≤ r ≤ t+ 1, we have

t∑r=1

c(er) +lt − αc∗

lt − lt+1c(et+1)

≤ c∗

β(

t∑r=1

lr−1 − lrlr−1

+lt − αc∗

lt)

≤ c∗

β

∫ l0

αc∗

1

ydy =

c∗

βln

l0αc∗

=c∗

βln

h(E)

αc∗.

(4)

Since c(ek) ≤ lr−1−lrα , for t+ 1 ≤ r ≤ k, we have

αc∗ − lt+1

lt − lt+1c(et+1) +

k∑r=t+2

c(er)

≤ 1

α(αc∗ − lt+1 +

k∑r=t+2

(lr−1 − lr))

= c∗.

(5)

Combining (4) and (5), we obtaink∑

r=1

c(er) =t∑

r=1

c(er) +lt − αc∗

lt − lt+1c(et+1)

+αc∗ − lt+1

lt − lt+1c(et+1) +

k∑r=t+2

c(er)

≤ (1 +1

βln

h(E)

αc∗)c∗.

By setting an equal cost for each element e, we can easilynotice that

∑kr=1 c(er) is the number of selected workers by

the greedy heuristic, and c∗ is the minimum number of requiredworkers. Thus we have completed the proof.

The greedy heuristic of G-MCA is inspired by the monotonesubmodular structure of the formulated problem. We can alsoconsider the another property that the objective function hj(·)

TABLE I. EXPERIMENT SETTINGS.

Parameter Valuesm (synthetic tasks) 600, 800, 1000, 1200, 1400n (synthetic workers) 6000, 8000, 10000, 12000, 14000m (T-drive tasks) 100, 150, 200, 250, 300n (T-drive workers) 3000, 3500, 4000, 4500, 5000

dt (distance threshold) 0.02 0.03 0.04 0.05 0.06[pmin, pmax] [0.5,0.7], [0.6, 0.8], [0.7.0.9], [0.8, 1.0]

tb (synthetic budget) 1000, 1500, 2000, 2500, 3000tb (T-drive budget) 200, 300, 400, 500, 600

ϵ (reliability) 0.75, 0.8, 0.85, 0.9, 0.95

is captioned by the reliability threshold. Therefore, instead ofusing the absolute marginal increment value of each task’sreliability, we would also like to consider how large is theresidual between the current function value and the reliabilitythreshold. Specifically, consider the current selected set A anda new element (i′, j′). Denote the scaled marginal contributionof (i′, j′) as δi′,j′ , and we design a scaled greedy heuristic:

δi′,j′ = min1, fj′(A ∪ (i′, j′))− f j′(A)

ϵ− f j′(A)

= min1,∏

i∈Aj′n(1− pi,j′) · pi′,j′ϵ− f j′(A)

By replacing the marginal increment value computation inline 5 of Algorithm 2 with the above scaled formulation, weobtain a new algorithm called SG-MCA. Intuitively, SG-MCAgives higher priority to the task whose reliability function valueis closer to the threshold.

Note: The above results can be generalized to scenarioswhere each task has a different reliability threshold. Specif-ically, we can associate a reliability threshold ϵj for the taskj, and change Eq. (3) to

f j(A) ≥ ϵj , ∀j ∈ [m]⇐⇒ h(A) =1

m

∑j∈[m]

ϵj .

Then the results can be derived similarly as stated in thissection.

VI. PERFORMANCE EVALUATION

In this section, we evaluate the performance of the proposedmethods over synthetic and real-world data. We first demon-strate the experiment settings. Then we present the results forMRA and MCA problems.

A. Experiment SettingsDatasets. For synthetic data, locations of workers and tasks

are generated in a 2D data space [0, 1]2, following uniformdistribution. The task preference of workers is represented bythe distance. Specifically, a worker is interested in performingtasks located within a distance threshold. The confidence of aworker is randomly produced according to uniform distributionwithin the range [pmin, pmax].

For real data, we use the T-drive dataset [49], [50]. Thisdataset contains the GPS traces of 10,357 taxis in Beijing forone week in 2008. There are around 15 million GPS samplingpoints in the dataset. The total travelling distance is around

Page 8: 1 On Reliable Task Assignment for Spatial Crowdsourcingstatic.tongtianta.site/paper_pdf/0f040f08-9e56-11e9-8f2c... · 2019-07-04 · objective of spatial crowdsourcing is to outsource

2168-6750 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TETC.2016.2614383, IEEETransactions on Emerging Topics in Computing

8

600 800 1000 1200 14000.4

0.6

0.8

1

Number of Tasks

Ave

rage

Rel

iabi

lity

G−MRAGMMRANDOM

(a) Synthetic Data

100 150 200 250 300

0.7

0.8

0.9

1

Number of Tasks

Ave

rage

Rel

iabi

lity

G−MRAGMMRANDOM

(b) T-drive Data

Fig. 3. Effect of the Number of Tasks

0.6 0.8 1 1.2 1.4x 10

4

0.6

0.7

0.8

0.9

1

Number of Workers

Ave

rage

Rel

iabi

lity

G−MRAGMMRANDOM

(a) Synthetic Data

3000 3500 4000 4500 50000.92

0.94

0.96

0.98

1

Number of Workers

Ave

rage

Rel

iabi

lity

G−MRAGMMRANDOM

(b) T-drive Data

Fig. 4. Effect of the Number of Workers

1000 1500 2000 2500 3000

0.7

0.8

0.9

1

Total Budget

Ave

rage

Rel

iabi

lity

G−MRAGMMRANDOM

(a) Synthetic Data

200 300 400 500 6000.7

0.8

0.9

1

Total Budget

Ave

rage

Rel

iabi

lity

G−MRAGMMRANDOM

(b) T-drive Data

Fig. 5. Effect of the Total Budget

9 million kilometers. We consider each taxi as a worker, andassume that each worker is interested in taking spatial taskslocated along her/his trajectory. Spatial tasks are generated inthe road networks of Beijing. Specifically, the road networkis a set of road segments with the attributes of segment ID,segment length, and the GPS coordinates of the head andend of the segment. Spatial tasks are randomly generatedamong the GPS coordinates of the segment ends. We use anHMM algorithm [51] to perform map matching, which mapseach GPS point of a worker’s trace to the corresponding roadsegment. As a result, we can obtain a worker’s preferred tasksby checking the tasks located in the sequence of the worker’stravelling road segments. The reliability of each worker is

generated using the same scheme as for the synthetic data.

Configuration. The numerical settings are listed in Table I,where the default values of parameters are in bold font. Ineach set of experiments, only one parameter is adjusted whilethe others are fixed to their default values. We implement twoschemes, RANDOM and GMM, for comparison. RANDOMselects a task randomly and assigns an available worker tothe task if possible. It serves as a baseline for the proposedproblems. For the MRA problem, we implement RANDOMby running the random selection and assignment processrepeatedly until the budget is used up. For the MCA problem,we execute random selection and assignment process untilthe predefined reliability of each task is satisfied. GMM

Page 9: 1 On Reliable Task Assignment for Spatial Crowdsourcingstatic.tongtianta.site/paper_pdf/0f040f08-9e56-11e9-8f2c... · 2019-07-04 · objective of spatial crowdsourcing is to outsource

2168-6750 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TETC.2016.2614383, IEEETransactions on Emerging Topics in Computing

9

[0.5,0.7] [0.6,0.8] [0.7,0.9] [0.8,1.0]

0.7

0.8

0.9

1

Range of Confidence

Ave

rage

Rel

iabi

lity

G−MRAGMMRANDOM

(a) Synthetic Data

[0.5,0.7] [0.6,0.8] [0.7,0.9] [0.8,1.0]0.95

0.96

0.97

0.98

0.99

1

Range of Confidence

Ave

rage

Rel

iabi

lity

G−MRAGMMRANDOM

(b) T-drive Data

Fig. 6. Effect of the Range of Worker Confidence

0.02 0.03 0.04 0.05 0.060.6

0.7

0.8

0.9

1

Distance Threshold

Ave

rage

Rel

iabi

lity

G−MRAGMMRANDOM

Fig. 7. Effect of the Distance Threshold

is adapted from the greedy strategy for the maximum taskassignment problem [1], where the confidence of a workeris considered equal. It is a proper candidate for comparison,as the proposed methods take into account the difference ofthe worker confidence. For the MRA problem, we executeGMM multiple rounds until the budget is used up. For theMCA problem, we execute GMM multiple rounds until thepredefined reliability for each task is satisfied.

B. Results for MRA Problem

Effect of the number of tasks. Fig. 3 illustrates theperformance of the algorithms on different number of tasks.With the increment of the number of tasks, the average taskreliability decreases. The rationality of this result is that, in afixed worker market with a budget, the overall capability ofperforming tasks is restricted. Therefore, more tasks lead tofewer resources per task. The curves in the figure also showthat the performance of G-MRA is better than that of GMMand RANDOM in all settings of task sizes.

Effect of the number of workers. Fig. 4 shows the resultson different number of candidate workers. In both syntheticand T-drive datasets, the average task reliability obtained byG-MRA is not sensitive to the number of workers. It indicatesthat, once the worker market is sufficiently large, the algorithmis able to assign tasks to the most suitable workers and achievehigh reliability. The results also show that G-MRA performsconsistently better than the two compared approaches.

Effect of the budget. Fig. 5 reports the effect of therecruitment budget: A larger budget leads to higher reliability.The G-MRA algorithm has noteworthy advantages comparedwith the two compared algorithms when the budget is relativelysmall. This frugal property is important to achieve a goodreliability-budget tradeoff.

Effect of the range of worker confidence. Fig. 6 showsthe effect of the confidence range [pmin, pmax] of workerson the reliability of the algorithms. The G-MRA algorithmsobtains high reliability for different ranges, while GMM andRANDOM have a large deviation (Fig. 6b). This reveals thatthe two compared algorithms perform well when all workersare of high confidence when there are a few thousands ofworkers. On the contrary, the G-MRA algorithm can assigntasks efficiently even in a worker market with low confidence.

Effect of the distance. Fig. 7 shows that results of differentdistance thresholds for the synthetic data. Recall that in thesynthetic data, the task preference of a worker is reflectedby the distance between the task and the worker. Hence thedistance threshold indicates the worker-task link probability. Alarger distance threshold leads to a larger preferred task setsof a worker. The figure shows that the performance of thealgorithms are not very sensitive to the variation of the sizeof preferred task sets of workers in the experiment, and theG-MRA performs better than the baseline methods.

C. Results for MCA Problem

Effect of the number of workers. Fig. 8 shows the resultsof different numbers of workers. When the number of workersincreases, the proposed G-MCA and SG-MCA algorithmsrequire fewer recruited workers to meet the task reliabilityrequirement. On the contrary, GMM and RANDOM require n-early the same amount of workers for different numbers of can-didates. The two proposed methods are shown to be effectivein saving the number of assignments in comparison with thetwo baseline methods. Furthermore, G-MCA performs slightlybetter than SG-MCA when the size of candidates is large.Recall that G-MCA selects a task for assignment based on theabsolute marginal increment value of the reliability function,while SG-MCA gives higher priority to the task whose currentreliability function value is closer to the threshold. In Fig. 8a,when the size of worker candidate is large (i.e., when the size

Page 10: 1 On Reliable Task Assignment for Spatial Crowdsourcingstatic.tongtianta.site/paper_pdf/0f040f08-9e56-11e9-8f2c... · 2019-07-04 · objective of spatial crowdsourcing is to outsource

2168-6750 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TETC.2016.2614383, IEEETransactions on Emerging Topics in Computing

10

6000 8000 10000 12000 140000

200

400

600

800

1000

1200

1400

1600

1800To

tal C

ost

Number of Candidates

G-MCA SG-MCA GMM RANDOM

(a) Synthetic Data

3000 3500 4000 4500 50000

50

100

150

200

250

300

Tota

l Cos

t

Number of Candidates

G-MCA SG-MCA GMM RANDOM

(b) T-drive Data

Fig. 8. Effect of the Number of Workers

600 800 1000 1200 14000

500

1000

1500

2000

2500

3000

3500

4000

4500

Tota

l Cos

t

Number of Tasks

G-MCA SG-MCA GMM RANDOM

(a) Synthetic Data

100 150 200 250 3000

100

200

300

400

500

600

700

800

900

Tota

l Cos

t

Number of Tasks

G-MCA SG-MCA GMM RANDOM

(b) T-drive Data

Fig. 9. Effect of the Number of Tasks

[0.5,0.7] [0.6,0.8] [0.7,0.9] [0.8,1.0]0

200

400

600

800

1000

1200

1400

1600

1800

Tota

l Cos

t

Range of Confidence

G-MCA SG-MCA GMM RANDOM

(a) Synthetic Data

[0.5,0.7] [0.6,0.8] [0.7,0.9] [0.8,1.0]0

50

100

150

200

250

300

Tota

l Cos

t

Range of Confidence

G-MCA SG-MCA GMM RANDOM

(b) T-drive Data

Fig. 10. Effect of the Range of Worker Confidence

Page 11: 1 On Reliable Task Assignment for Spatial Crowdsourcingstatic.tongtianta.site/paper_pdf/0f040f08-9e56-11e9-8f2c... · 2019-07-04 · objective of spatial crowdsourcing is to outsource

2168-6750 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TETC.2016.2614383, IEEETransactions on Emerging Topics in Computing

11

0.75 0.80 0.85 0.90 0.950

200

400

600

800

1000

1200

1400

1600

1800

2000

2200

2400

Tota

l Cos

t

Reliability Threshold

G-MCA SG-MCA GMM RANDOM

(a) Synthetic Data

0.75 0.80 0.85 0.90 0.950

50

100

150

200

250

300

350

400

Tota

l Cos

t

Reliability Threshold

G-MCA SG-MCA GMM RANDOM

(b) T-drive Data

Fig. 11. Effect of the Task Reliability Threshold

is larger than 0.8 × 104 candidates in the experiment), thestrategy of G-MCA seems more effective in assignment, asthere are more workers who have high confidence, such thatselecting these workers may likely make the correspondingtask meet the reliability requirement immediately, even thoughthis task is not favored by SG-MCA. Fig. 8b also supports thisphenomenon, where the cost of SG-MCA is relatively smallerthat that of G-MCA given a smaller candidate size.

Effect of the number of tasks. Fig. 9 reveals the effect ofthe number of tasks. The required workers of all three algo-rithms increase with the number of tasks at a similar pace. Thisis because that, given a sufficiently large worker market anda predefined task reliability, each extra task is assigned withan approximate number of workers for each algorithm. Again,the two proposed algorithms have comparable performance,and are superior to the two compared algorithms.

Effect of the range of worker reliability. Fig. 10 reports theeffect of the range of worker confidence. The required numbersof workers by the G-MCA and SG-MCA are smaller than thatof GMM and RANDOM in all cases except for the range of[0.7, 0.9]. This is because that, the default task reliability is 0.9,and in the experiments, the [pmin, pmax] is implemented as anopen interval, which imposes that G-MCA and SG-MCA needat least two workers per task. On the other hand, two workerswith the confidence range of [0.7, 0.9] suffice to ensure a taskwith a reliability value of 0.9 for GMM and RANDOM.

Effect of the task reliability threshold. Fig. 11 showsthe results of fulfilling different task reliability requirements.The number changes of recruited workers by the G-MCA andSG-MCA are less sensitive to the reliability than the GMMand RANDOM. The reason is that the two proposed heuristicsare able to select the most confident workers given a workerbudget, and hence can fulfill a larger range of task reliabilitywith a fixed number of workers.

Effect of the distance. Fig. 12 shows the effect of dif-ferent distance thresholds in the synthetic data. The baselineapproaches are not sensitive to the variation of the distancethreshold, while the G-MCA and SG-MCA perform betterwith the increment of the distance threshold. As the distancethreshold reflects the size of preferred task sets of a worker,

0.02 0.03 0.04 0.05 0.060

200

400

600

800

1000

1200

1400

1600

1800

2000

Tota

l Cos

t

Distance Threshold

G-MCA SG-MCA GMM RANDOM

Fig. 12. Effect of the Distance Threshold

the result indicates that the G-MCA and SG-MCA approachesare effective in assigning the most suitable tasks to workers.

VII. CONCLUSION

In this paper, we propose the problem of reliable spatialcrowdsourcing in a large worker market, which aims at assign-ing tasks to workers efficiently and economically. Specifically,we formulate two assignment problems, maximum reliabilityassignment (MRA) under a recruitment budget, and minimumcost assignment (MCA) under a task reliability requirement.We prove that the two formulated problems possess submod-ular properties, based on which we design efficient greedyheuristics with performance guarantees. Extensive experimentshave been conducted to validate the efficiency and effective-ness of the proposed approaches on both real and syntheticdatasets. In the future, we will investigate to build a crowd-sensing system to incorporate the proposed methods.

REFERENCES

[1] L. Kazemi and C. Shahabi, “Geocrowd: enabling query answeringwith spatial crowdsourcing,” in Proceedings of ACM InternationalConference on Advances in Geographic Information Systems (GIS),2012.

Page 12: 1 On Reliable Task Assignment for Spatial Crowdsourcingstatic.tongtianta.site/paper_pdf/0f040f08-9e56-11e9-8f2c... · 2019-07-04 · objective of spatial crowdsourcing is to outsource

2168-6750 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TETC.2016.2614383, IEEETransactions on Emerging Topics in Computing

12

[2] E. Koukoumidis, L.-S. Peh, and M. R. Martonosi, “Signalguru: lever-aging mobile phones for collaborative traffic signal schedule advisory,”in Proceedings of ACM international conference on Mobile systems,applications, and services (MobiSys), 2011, pp. 127–140.

[3] “Waze,” https://www.waze.com/.

[4] N. Maisonneuve, M. Stevens, M. Niessen, and L. Steels, “Noisetube:Measuring and mapping noise pollution with mobile phones,” Informa-tion Technologies in Environmental Engineering, pp. 215–228, 2009.

[5] “Openstreetmap,” http://www.openstreetmap.org/.

[6] L. Kazemi, C. Shahabi, and L. Chen, “Geotrucrowd: trustworthyquery answering with spatial crowdsourcing,” in Proceedings of ACMSIGSPATIAL International Conference on Advances in GeographicInformation Systems (GIS), 2013.

[7] H. To, G. Ghinita, and C. Shahabi, “A framework for protecting workerlocation privacy in spatial crowdsourcing,” Proceedings of the VLDBEndowment, vol. 7, no. 10, pp. 919–930, 2014.

[8] P. Cheng, X. Lian, Z. Chen, R. Fu, L. Chen, J. Han, and J. Zhao,“Reliable diversity-based spatial crowdsourcing by moving workers,”Proceedings of the VLDB Endowment, vol. 8, no. 10, pp. 1022–1033,2015.

[9] J. Burke, D. Estrin, M. Hansen, A. Parker, N. Ramanathan, S. Reddy,and M. Srivastava, “Participatory sensing,” 2006.

[10] “University of california berkeley, 2008-2009,” http://traffic.berkeley.edu/.

[11] B. Hull, V. Bychkovsky, Y. Zhang, K. Chen, M. Goraczko, A. Miu,E. Shih, H. Balakrishnan, and S. Madden, “Cartel: a distributed mo-bile sensor computing system,” in Proceedings of ACM InternationalConference on Embedded Networked Sensor Systems (SenSys), 2006.

[12] P. Mohan, V. Padmanabhan, and R. Ramjee, “Nericell: rich monitoringof road and traffic conditions using mobile smartphones,” in Proceed-ings of ACM International Conference on Embedded Networked SensorSystems (SenSys), 2008.

[13] W. Sun, Q. Li, and C.-K. Tham, “Wireless deployed and participatorysensing system for environmental monitoring,” in Proceedings of IEEEInternational Conference on Sensing, Communication, and Networking(SECON), 2014, pp. 158–160.

[14] F. Saremi, O. Fatemieh, H. Ahmadi, H. Wang, T. Abdelzaher, R. Ganti,H. Liu, S. Hu, S. Li, and L. Su, “Experiences with greengps–fuel-efficient navigation using participatory sensing,” IEEE Transactions onMobile Computing, vol. 15, no. 3, pp. 672–689, 2016.

[15] S. He, D.-H. Shin, J. Zhang, and J. Chen, “Toward optimal allocationof location dependent tasks in crowdsensing,” in Proceedings of IEEEInternational Conference on Computer Communications (INFOCOM),2014, pp. 745–753.

[16] S. Reddy, D. Estrin, and M. Srivastava, “Recruitment framework for par-ticipatory sensing data collections,” in Pervasive Computing. Springer,2010, pp. 138–155.

[17] L. Pournajaf, L. Xiong, V. Sunderam, and S. Goryczka, “Spatial taskassignment for crowd sensing with cloaked locations,” in Proceedings ofIEEE International Conference on Mobile Data Management (MDM),2014.

[18] D. Deng, C. Shahabi, and U. Demiryurek, “Maximizing the number ofworker’s self-selected tasks in spatial crowdsourcing,” in Proceedings ofACM International Conference on Advances in Geographic InformationSystems (GIS), 2013.

[19] B. Guo, Z. Wang, Z. Yu, Y. Wang, N. Y. Yen, R. Huang, and X. Zhou,“Mobile crowd sensing and computing: The review of an emerginghuman-powered sensing paradigm,” ACM Computing Surveys (CSUR),vol. 48, no. 1, pp. 1–31, 2015.

[20] B. Guo, C. Chen, D. Zhang, Z. Yu, and A. Chin, “Mobile crowd sensingand computing: when participatory sensing meets participatory socialmedia,” IEEE Communications Magazine, vol. 54, no. 2, pp. 131–137,2016.

[21] R. K. Ganti, F. Ye, and H. Lei, “Mobile crowdsensing: Current state and

future challenges,” IEEE Communications Magazine, vol. 49, no. 11,pp. 32–39, 2011.

[22] C. Wu, Z. Yang, and Y. Liu, “Smartphones based crowdsourcing forindoor localization,” IEEE Transactions on Mobile Computing, vol. 14,no. 2, pp. 444–457, 2015.

[23] X. Zhang, Z. Yang, C. Wu, W. Sun, Y. Liu, and K. Xing, “Robusttrajectory estimation for crowdsourcing-based mobile applications,”IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. 7,pp. 1876–1885, 2014.

[24] B. Guo, H. Chen, Z. Yu, X. Xie, S. Huangfu, and D. Zhang, “Fliermeet:A mobile crowdsensing system for cross-space public informationreposting, tagging, and sharing,” IEEE Transactions on Mobile Com-puting, vol. 14, no. 10, pp. 2020–2033, 2015.

[25] C. Costa, C. Laoudias, D. Zeinalipour-Yazti, and D. Gunopulos, “S-marttrace: Finding similar trajectories in smartphone networks withoutdisclosing the traces,” in Proceedings of IEEE ICDE, 2011.

[26] G. Cardone, L. Foschini, P. Bellavista, A. Corradi, C. Borcea, M. Ta-lasila, and R. Curtmola, “Fostering participaction in smart cities: ageo-social crowdsensing platform,” IEEE Communications Magazine,vol. 51, no. 6, pp. 112–119, 2013.

[27] D. Zhang, H. Xiong, L. Wang, and G. Chen, “Crowdrecruiter: selectingparticipants for piggyback crowdsensing under probabilistic coverageconstraint,” in Proceedings of ACM International Joint Conference onPervasive and Ubiquitous Computing (Ubicomp), 2014, pp. 703–714.

[28] H. Xiong, D. Zhang, G. Chen, L. Wang, and V. Gauthier, “Crowd-tasker: maximizing coverage quality in piggyback crowdsensing underbudget constraint,” in Proceedings of IEEE International Conference onPervasive Computing and Communications (PerCom), 2015, pp. 55–62.

[29] A. Ahmed, K. Yasumoto, Y. Yamauchi, and M. Ito, “Distance andtime based node selection for probabilistic coverage in people-centricsensing,” in Proceedings of IEEE Communications Society Conferenceon Sensor, Mesh and Ad Hoc Communications and Networks (SECON),2011.

[30] S. Hachem, A. Pathak, and V. Issarny, “Probabilistic registration forlarge-scale mobile participatory sensing,” in Proceedings of IEEEInternational Conference on Pervasive Computing and Communications(PerCom), 2013.

[31] L. G. Jaimes, I. Vergara-Laurens, and M. A. Labrador, “A location-basedincentive mechanism for participatory sensing systems with budgetconstraints,” in Proceedings of International Conference on PervasiveComputing and Communications (PerCom), 2012, pp. 103–108.

[32] W. Sun and C.-K. Tham, “An information-driven incentive scheme withconsumer demand awareness for participatory sensing,” in Proceedingsof IEEE International Conference on Sensing, Communication, andNetworking (SECON), 2015, pp. 319–326.

[33] Y. Wen, J. Shi, Q. Zhang, X. Tian, Z. Huang, H. Yu, Y. B. Cheng,and X. Shen, “Quality-driven auction based incentive mechanism formobile crowd sensing,” IEEE Transactions on Vehicular Techonology,vol. 64, no. 9, pp. 4203–4214, 2015.

[34] D. Yang, G. Xue, X. Fang, and J. Tang, “Crowdsourcing to smartphones:incentive mechanism design for mobile phone sensing,” in Proceedingsof ACM international conference on Mobile computing and networking(MobiCom), 2012.

[35] A. Singla and A. Krause, “Incentives for privacy tradeoff in communitysensing,” in Proceedings of AAAI Conference on Human Computationand Crowdsourcing (HCOMP), 2013.

[36] X. Zhang, Z. Yang, Z. Zhou, H. Cai, L. Chen, and X. Li, “Free market ofcrowdsourcing: Incentive mechanism design for mobile sensing,” IEEETransactions on Parallel and Distributed Systems, vol. 25, no. 12, pp.3190–3200, 2014.

[37] X. Zhang, Z. Yang, W. Sun, Y. Liu, S. Tang, K. Xing, and X. Mao,“Incentives for mobile crowd sensing: A survey,” IEEE CommunicationsSurveys and Tutorials, vol. 18, no. 1, pp. 54–67, 2016.

[38] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, andM. Ayyash, “Internet of things: A survey on enabling technologies, pro-

Page 13: 1 On Reliable Task Assignment for Spatial Crowdsourcingstatic.tongtianta.site/paper_pdf/0f040f08-9e56-11e9-8f2c... · 2019-07-04 · objective of spatial crowdsourcing is to outsource

2168-6750 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TETC.2016.2614383, IEEETransactions on Emerging Topics in Computing

13

tocols, and applications,” IEEE Communications Surveys and Tutorials,vol. 17, no. 4, pp. 2347–2376, 2015.

[39] J. Liu, N. Kato, J. Ma, and N. Kadowaki, “Device-to-device commu-nication in lte-advanced networks: a survey,” IEEE CommunicationsSurveys and Tutorials, vol. 17, no. 4, pp. 1923–1940, 2014.

[40] J. Liu, S. Zhang, N. Kato, H. Ujikawa, and K. Suzuki, “Device-to-device communications for enhancing quality of experience in softwaredefined multi-tier lte-a networks,” IEEE Network, vol. 29, no. 4, pp.46–52, 2015.

[41] J. Liu, Y. Kawamoto, H. Nishiyama, N. Kato, and N. Kadowaki,“Device-to-device communications achieve efficient load balancing inlte-advanced networks,” IEEE Wireless Communications, vol. 21, no. 2,pp. 57–65, 2014.

[42] J. Liu, H. Nishiyama, N. Kato, and J. Guo, “On the outage probabilityof device-to-device-communication-enabled multichannel cellular net-works: An rss-threshold-based perspective,” IEEE Journal on SelectedAreas in Communications, vol. 34, no. 1, pp. 163–175, 2016.

[43] C.-J. Ho and J. W. Vaughan, “Online task assignment in crowdsourcingmarkets.” in Proceedings of AAAI, 2012.

[44] C.-J. Ho, S. Jabbari, and J. W. Vaughan, “Adaptive task assignmentfor crowdsourced classification,” in Proceedings of ICML, 2013, pp.534–542.

[45] D. J. Welsh, Matroid theory. Courier Dover Publications, 2010.[46] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher, “An analysis of ap-

proximations for maximizing submodular set functionsłi,” MathematicalProgramming, vol. 14, no. 1, pp. 265–294, 1978.

[47] J. Bar-Ilan, G. Kortsarz, and D. Peleg, “Generalized submodular coverproblems and applications,” Theoretical Computer Science, vol. 250,no. 1, pp. 179–200, 2001.

[48] P.-J. Wan, D.-Z. Du, P. Pardalos, and W. Wu, “Greedy approximationsfor minimum submodular cover with submodular cost,” ComputationalOptimization and Applications, vol. 45, no. 2, pp. 463–474, 2010.

[49] J. Yuan, Y. Zheng, X. Xie, and G. Sun, “Driving with knowledge fromthe physical world,” in Proceedings of ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDD), 2011.

[50] J. Yuan, Y. Zheng, C. Zhang, W. Xie, X. Xie, G. Sun, and Y. Huang,“T-drive: driving directions based on taxi trajectories,” in Proceedingsof ACM SIGSPATIAL International Conference on Advances in Geo-graphic Information Systems (GIS), 2010.

[51] P. Newson and J. Krumm, “Hidden markov map matching through noiseand sparseness,” in Proceedings of ACM SIGSPATIAL InternationalConference on Advances in Geographic Information Systems (GIS),2009.

Xinglin Zhang received a B.E. degree in School ofSoftware from Sun Yat-sen University in 2010 and aPh.D. degree in the Department of Computer Scienceand Engineering from Hong Kong University ofScience and Technology in 2014. He is currentlywith the South China University of Technology.His research interests include wireless ad-hoc/sensornetworks, mobile computing and crowdsourcing. Heis a student member of the IEEE and the ACM.

Zheng Yang received a B.E. degree in computerscience from Tsinghua University in 2006 and aPh.D. degree in computer science from Hong KongUniversity of Science and Technology in 2010. Heis currently an associate professor at Tsinghua Uni-versity. His main research interests include wirelessad-hoc/sensor networks and mobile computing. Heis a member of the IEEE and the ACM.

Yunhao Liu received the BS degree in automationfrom Tsinghua University, China, in 1995, the MSand PhD degrees in computer science and engi-neering from Michigan State University, in 2003and 2004, respectively. He is Chang Jiang ChairProfessor and Dean of the School of Software atTsinghua University. His research interests includewireless sensor network, peer-to-peer computing,and pervasive computing. He is a fellow of the IEEEand a fellow of the ACM.

Shaohua Tang received the B.Sc. and M.Sc. Degreesin applied mathematics, and the Ph.D. Degree incommunication and information system all from theSouth China University of Technology, in 1991,1994, and 1998, respectively. He has been a fullprofessor with the School of Computer Science andEngineering, South China University of Technologysince 2004. His current research interests includeinformation security, networking, and informationprocessing. He is a member of the IEEE and theIEEE Computer Society.