 Research
 Open
 Published:
A trustaware task allocation method using deep qlearning for uncertain mobile crowdsourcing
Humancentric Computing and Information Sciencesvolume 9, Article number: 25 (2019)
Abstract
Mobile crowdsourcing has emerged as a promising collaboration paradigm in which each spatial task requires a set of mobile workers in near vicinity to the target location. Considering the desired privacy of the participating mobile devices, trust is considered to be an important factor to enable effective collaboration in mobile crowdsourcing. The main impediment to the success of mobile crowdsourcing is the allocation of trustworthy mobile workers to nearby spatial tasks for collaboration. This process becomes substantially more challenging for largescale online spatial task allocations in uncertain mobile crowdsourcing systems. The uncertainty can mislead the task allocation, resulting in performance degradation. Moreover, the largescale nature of realworld crowdsourcing poses a considerable challenge to spatial task allocation in uncertain environments. To address the aforementioned challenges, first, an optimization problem of mobile crowdsourcing task allocation is formulated to maximize the trustworthiness of workers and minimize movement distance costs. Second, for the uncertain crowdsourcing scenario, a Markov decision processbased mobile crowdsourcing model (MCMDP) is formulated to illustrate the dynamic trustaware task allocation problem. Third, to solve largescale MCMDP problems in a stable manner, this study proposes an improved deep Qlearningbased trustaware task allocation (ImprovedDQLTTA) algorithm that combines trustaware task allocation and deep Qlearning as an improvement over the uncertain mobile crowdsourcing systems. Finally, experimental results illustrate that the ImprovedDQLTTA algorithm can stably converge in a number of training iterations. Compared with the reference algorithm, our proposed algorithm achieves effective solutions on the experimental data sets.
Introduction
With the advancing technology of mobile devices with numerous builtin sensors, mobile crowdsourcing has recently emerged as a new collaboration paradigm in numerous intelligent mobile information systems [1]. The existing mobile crowdsourcing has applications in numerous domains including urban planning, traffic monitoring, ride sharing, environmental monitoring and intelligent disaster response [2]. Mobile crowdsourcing is a combination of spatial crowdsourcing and smart phone technology that employs mobile workers to perform certain tasks in a specific location [3]. For example, in a disaster search and rescue scenario, the requester urgently needs to collect images and videos of search areas from different locations in a country [4]. The requester submits a query to a mobile crowdsourcing server. Then, the server allocates the spatial tasks to the available workers in the vicinity of the disaster location.
Geographic information plays a key factor in many aspects of mobile activities [5]. The goal of typical mobile crowdsourcing is to allocate multiple tasks to a team of suitable workers located within their time zone [6]. The required tasks of mobile crowdsourcing are considered with strong spatial proximity optimization [2]. For the above instance, it is important to rapidly respond to emergencies or disasters [3]. Mobile crowdsourcing systems must assign emergency tasks to workers who are in the vicinity to the target location. Most tasks must be accomplished within a set time, making it impossible for mobile workers to travel long distances to accomplish the required tasks [4].
Moreover, the success of mobile crowdsourcing relies heavily on the quality of locationrelated workers [7]. The existing crowdsourcing systems are dependent on mainly mobile workers to allocate tasks to themselves when logging on to the systems [8], and many spatial tasks may not be allocated to suitable workers [9]. The execution quality of the crowdsourcing tasks suffers because the workers may be malicious participants [10,11,12,13]. The trustworthiness of mobile workers must be considered in the mobile crowdsourcing setting [12]. In this context, mobile crowdsourcing should consider both the trustworthiness and location of mobile workers. This paper focuses on the trustaware task allocation (TTA) optimization problem of mobile crowdsourcing systems.
The objective of optimizing TTA is to maximize the trust score and minimize the distance cost of mobile crowdsourcing. In the real world, mobile crowdsourcing systems are inherently dynamic, and the trust scores of mobile workers are unknown. The mobile crowdsourcing scenario in Fig. 1 has a locationbased task \(t_i(i=1)\), shown in red circles, and two crowd workers \(w_i(1 \le i \le 2)\), shown as blue triangles. At the time stamp \(P_i(1 \le i \le 3)\), worker \(w_i\) and locationbased tasks \(t_i\) join the mobile crowdsourcing system. Assume that spatial tasks \(t_1)\) can be accomplished by \(w_1\) and \(w_2\) who have some compensation traveling distance \(dist(w_i, t_j)\) and trust score \(tr_i\), as described as Table 1. Based on our observation, at time stamp \(P_1\), mobile worker \(w_1\) can be recommended to do the spatial tasks \(t_1\) because of the high trust scores and low travel cost; by contrast, at time stamp \(P_2\) and \(P_3\), mobile worker \(w_2\) may be selected for task \(t_1\).
As mentioned above, mobile workers frequently move to different locations, and trust scores for performing the required tasks are unstable in mobile crowdsourcing systems. Mobile crowdsourcing systems require optimization to be dynamic and adaptive to address this uncertainty [2]. However, mobile crowdsourcing emerged very recently and is typically considered to be a static environment in most existing research [3,4,5]. Most of the current crowdsourcing approaches have vital drawbacks. Mathematical optimization algorithms, in which the evaluation parameters are considered to be certain and fully known in advance, are used to solve the allocation problems of static crowdsourcing systems [2, 6].
The advantage of mobile crowdsourcing is to enable a crowd of mobile workers to offer collaboration services, which enhances the efficiency of performing cooperative tasks while reducing the cost. Unfortunately, static approaches may fail when dealing with task allocations in uncertain mobile crowdsourcing. TTA optimization algorithms should adapt to frequent changes in crowdsourcing systems. The inherently dynamic changes in mobile crowdsourcing systems are difficult to handle. An attempt has been made to design an adaptive learning algorithm to solve this uncertain problem. In [6], Qlearning is employed for the dynamic task allocation of crowdsourcing systems. However, Qlearning has mainly been limited in applicability in addressing only mediumsized optimization problems [14,15,16]. The majority of realworld TTA problems are fundamentally largescale, and the crowdsourcing state space is extremely large because massive spatial tasks and mobile workers exist on mobile crowdsourcing systems. It is a considerable challenge to solve largescale task allocation in uncertain scenarios, which further highlights the need for designing innovative and highly effective learning algorithms to optimize the realworld TTA problems.
In summary, a dynamic TTA algorithm is needed to enhance the model performance by fully exploiting the potential advantage in uncertain mobile crowdsourcing systems. The difficulty lies in accurately modeling the dynamic characteristic of task allocations and making better crowdsourcing decisions, with the aim of maximizing the model performance over a long period of time. Specifically, the dynamic TTA optimization can be formulated as a Markov decision process (MDP) problem. The emerging deep Qlearning (DQL) algorithm shows distinct advantages for largescale MDP problems and has been widely used in dynamic sequential decision making problems [17,18,19,20]. By combining the advantages of both deep neural networks and Qlearning, DQL is able to predict the next state and action in largescale crowdsourcing environments.
This paper investigates a practical and important problem, namely dynamic trustaware task allocation, which aims to maximize the trust score and minimize the travel distance cost in uncertain crowdsourcing environments. This paper mainly focuses on addressing the uncertain largescale task allocation in locationbased mobile crowdsourcing. Our proposed DQLTTA algorithm can be directly extended to the scenario of largescale task allocation in realworld mobile crowdsourcing. The principal contributions of our work can be summarized as below:

TTA optimization is formally defined in mobile crowdsourcing systems. For an uncertain scenario, dynamic TTA optimization is formalized as an MDPbased mobile crowdsourcing model (MCMDP). MCMDP comprises four core elements, including the crowdsourcing state, allocation action and immediate reward.

We first study a Qlearning algorithm to optimize the dynamic allocations in mobile crowdsourcing. In addition, deep Qlearningbased trustaware task allocation (DQLTTA) is proposed to handle largescale MCMDP optimization problems, which are intractable in traditional Qlearning. Tabular QLearning is further advanced by deep neural networks to estimate the Qvalue of the next crowdsourcing state in a practical manner. The DQLTTA algorithm extends the TTA problems to dynamic optimization by combining the advantage of both deep Qnetwork and biobjective trust optimization.

To improve the overall performance DQLTTA, this paper proposes an improved DQLTTA (ImprovedDQLTTA) algorithm to handle the largescale MCMDP problems much more stably in realworld crowdsourcing scenarios. The novel deep neural network architecture with an action advantage function is integrated in ImprovedDQLTTA, which performs better than the DQLTTA algorithm in mobile crowdsourcing. The pivotal idea of ImprovedDQLTTA is to design two estimators to separately learn the state value and action advantage functions with two streams of fully connected neural network layers. Additionally, the minibatch stochastic gradient descent with advanced training mechanisms and an Epsilondecreasing greedy policy are integrated into ImprovedDQLTTA. In this context, ImprovedDQLTTA can maintain good stability to solve largescale trustaware allocation problems in uncertain mobile crowdsourcing. Theoretical analysis is conducted to demonstrate the applicability of ImprovedDQLTTA.

The experimental results illustrate that the proposed ImprovedDQLTTA algorithm can achieve greater effectiveness and stability in uncertain scenarios of largescale mobile crowdsourcing systems.
The rest of this paper is organized as follows. We discuss related work on dynamic mobile crowdsourcing in "Related work" section. The preliminaries and formulation of mobile crowdsourcing are presented in "Preliminaries and problem formulation" section. In "Trustaware task allocation with deep Qlearning" section, the improved deep Qlearningbased trust aware task allocation algorithm is proposed for uncertain mobile crowdsourcing. An experimental study conducted to illustrate the value of the proposed algorithm is discussed in "Experimental results and analysis" section, followed by conclusions.
Related work
Crowdsourcing is a newly emerging field that enables organizations or companies to make their requests on numerous intelligent web platforms [21], such as Amazon MTurk, Upwork, and Crowdflower [22]. Crowdsourcing has been widely applied to annotations [23], graph searching [24], data analysis [25], query processing [26], and socialaware collaborations [27, 28]. In such applications, the required tasks can be accomplished by online workers on basis of crowdsourcing techniques. However, these workers do not have to travel to the target locations to accomplish the required tasks. Unlike general crowdsourcing, locationbased mobile crowdsourcing systems usually require mobile workers to move to the specified location to perform tasks.
Task allocation in locationbased mobile crowdsourcing
Mobile crowdsourcing entails a novel mechanism for tasks performed by mobile workers. Task allocation in the locationbased mobile crowdsourcing has gained increasing attention in recent years [29, 30]. Locationbased mobile crowdsourcing is a subclass of spatial crowdsourcing that allocates available mobile workers to spatial tasks on a mobile crowdsourcing system [2]. A task allocation framework was formally presented for the locationbased mobile crowdsourcing [3]. Kazemi and Shahabi proposed a task assignment problem for spatial crowdsourcing [12]. They proposed a networkflowbased algorithm for handling the allocation problem. The goal of this framework is to maximize the number of tasks matched with workers [31]. To extended this spatial allocation problem to the maximum score assignment problem for skillsbased crowdsourcing [3]. To handle the largescale query problem, Li et al. proposed RTreebased approximation algorithms for task allocation in mobile crowdsourcing [5]. Recently, an optimal task allocation problem was presented to address the quality constraints [29].
Considering the private participating mobile devices [8], Tran and To et al. proposed a realtime algorithm for spatial task allocation in serverassigned crowdsourcing [5]. This framework can be employed to protect the real locations of mobile workers and to maximize the crowdsourcing success rates [10, 11]. Unlike private locationbased queries, this study focuses on trustaware task allocation in mobile crowdsourcing. The workers in crowdsourcing processing are not always trusted [13]; thus, another work by Kazemi aimed to address the optimization of trust in task allocation [12]. The trustworthiness of mobile workers must be considered in the mobile crowdsourcing setting [12]. The goal is to optimize spatial proximity and trustworthiness management in task allocation. For participatory mobile systems, trust evaluation is an effective mechanism to promote mobile system performance by identifying the trustworthiness of potential participants [13]. This process can formally be defined as processing spatial tasks for a crowd of trustworthy mobile workers close to the target location.
Uncertain mobile crowdsourcing and deep reinforcement learning
In the existing research, mobile crowdsourcing is considered to occur in a stationary environment, in which the crowdsourcing quality is considered to be a certain parameter that is fully known in advance [2]. Crowdsourcing is defined as a static problem, and the multiple quality objectives are invariant [2]. By contrast, in uncertain crowdsourcing systems, all parameter values may change spontaneously [6]. Thus, Cheng et al. proposed a predictionbased allocation strategy to estimate the location and quality distributions of future workers and tasks for global optimal task allocation [32]. In comparison, our proposed method aims to maximize the reward utility on the basis of deep Qlearning algorithms by adapting the dynamic trustaware allocation strategy to the uncertain mobile crowdsourcing environment.
Qlearning algorithms have been found to be more suitable for uncertain problems of crowdsourcing [2, 6]. A QLearning agent learns how to address uncertain decisionmaking problems with dynamic environments [14]. Since tabularQ learning requires iterative updating to converge, an optimal policy is difficult to find in limited time [14,15,16]. To solve this problem, deep Qnetworks algorithm was proposed by combining Qlearning with deep neural networks [17,18,19,20]. Van Hasselt, Guez and Silver proposed double DQN to address the overestimation of Qlearning [17]. Schaulman et al. designed a prioritized experience replay mechanism to enhance the efficiency of training data [33]. Wang et al. proposed dueling DQN to further improve the convergence speed [34]. The dueling DQN algorithm represents both the state value function and the related advantage function [35].
A Qlearning algorithm was adopted in our previous work to obtain the optimal allocation policy in uncertain crowdsourcing [6]. However, the Qlearning algorithm is limited by its slow convergence in the large crowdsourcing state and action space. To address this limitation, we propose a novel neural network with advanced deep Qlearning algorithm. This algorithm extends the TTA optimization to dynamic crowdsourcing problems by means of a deep Qlearning algorithm. Most importantly, we propose an improved adaptive optimization algorithm by combining TTA optimization and advanced deep Qlearning, which maintains great efficiency in solving largescale TTA problems in uncertain mobile crowdsourcing.
Preliminaries and problem formulation
In this section, a trustaware task allocation scenario is formally introduced to deal address challenges in unreliable mobile crowdsourcing environments.
Mobile crowdsourcing preliminaries
The basic concepts of mobile crowdsourcing systems [2, 3, 6] are formally defined as follows.
Definition 1
(Spatial task) Denote a spatial task st as a tuple: \(st=\langle expir, loca, stype \rangle\). The textual property describes the task submitted by the requester. The location property loca denotes the location coordinates in relation to the required task. The expiry property expir is the specific time of task completion. The type property stype indicates the spatial task type.
A task st can be accomplished by a mobile worker only if the mobile worker physically travels to the target location loc within the specific time expired. All spatial tasks have time constraints, and the mobile workers must physically travel to the target location on time. Mobile workers are formulated as follows.
Definition 2
(Mobile worker) Denote a mobile worker as a tuple: \(cw=\langle hisinfo, exptype, loc \rangle\). The property hisinfo is a crowdsourcing data sequence that records the series of spatial tasks allocated to mobile worker cw. A mobile worker \(cw_{i}\) has his own expertise exptype to be competent for a type stype of crowdsourcing task \(st_{i}\). The competence of worker \(cw_t\) for task \(st_t\) can be evaluated in terms of a quality score. The location loc represents the current location of the mobile worker.
A worker \(cw_t\) is associated with the traveling cost \(dist(cw_t, st_j)\), and \(dist(cw_t, st_j)\) is the traveling distance between \(cw_t\) and \(st_j\). Accordingly, mobile workers \(cw_t\) are recommended to perform spatial tasks \(st_j\) if they are in the near vicinity of mobile worker \(cw_t\).
Definition 3
(Travel distance) Distance \(f_{dist}(x_{i}^{j})\) specifies the travel cost in terms of the movement required to get from the location aloc of mobile worker \(cw_{j}\) to the location bloc of spatial task \(st_{i}\). The distance may be computed on the basis of the Euclidean distance metric.
where \((aloc_x\), \(aloc_y)\) and \((bloc_x\), \(bloc_y)\) are the coordinates of aloc and bloc.
In the optimization process, the algorithms wish to allocate workers \(cw_t\) to spatial tasks \(t_t\) with a minimum traveling cost so that the sum quality value of the allocation is maximized and the total distance cost is minimized [6]. However, in uncertain environments, numerous discrete events cause the execution failure of spatial tasks. Therefore, a trustaware allocation optimization metric is required for solving the unreliable quality problem of mobile crowdsourcing systems.
Trust assessment metric
The main impediment to the success of spatial task allocation is the issue of trust evaluation for mobile workers. To evaluate the trustworthiness of a mobile worker, we consider and evaluate two parameters: reputation and expertise. The calculation of each trust parameter is discussed first; then, the trust evaluation is explained in detail.
The reputation of a worker reflects the probability, calculated based on historical data, of completing a spatial task. In general, the reputation of mobile workers can be described with reference to their mobile worker IDs and trust parameters. The reputation metric of a worker is formally represented by definition 4.
Definition 4
(Worker reputation) The reputation of a mobile worker is denoted as \(wqos=\langle id_{cw}, rep \rangle\), where \(id_{cw}\) represents the mobile worker ID and rep is the reputation value of the mobile worker. \(rep_t\) denotes the reputation attribute of the ith mobile worker cw. The \(i\)th mobile worker is determined by observation of whether the previous interactions among mobile workers result in successful task execution. The observation is often described by two variables: \(n_{i}\), denoting the number of successful interactions, and \(N_{i}\), denoting the total number of interactions for the ith mobile worker. The trust value can be calculated as:
where the trust value of a service is initialized to 1/2.
Definition 5
(Worker expertise) Denote expertise as the knowledge estimation of a mobile worker, which is especially important for spatial tasks that require particular knowledge, such as geomatics skills and familiarity with geographical information science. Denote the matching between the ith mobile worker’s skills and jth task requirements by \(E_{i}^{j}\). Suppose that \(E_{j}^{ta}\) is the requirements for the ith task and that \(E_{i}^{cw}\) is the collection of expertise of mobile worker \(cw_t\); then,
where the trust value of a service is initialized to 1/2.
All the trust parameters are combined into a single value for computing the trust score of the mobile worker.
where \(w_{rep}\) and \(w_{exp}\) are the weights of each crowdsourcing parameter, \(w_{rep} + w_{exp} = 1\).
In reality, task allocation is not a static decision process, as spatial tasks and mobile workers interact dynamically with the system. The allocation decision process is conducted iteratively, and each iteration involves allocating spatial tasks to trustworthy mobile workers in uncertain scenarios.
Weighting TTA approach with normalization
TTA problems focus not only on trust management [36] but also spatial optimization. The aim of a crowdsourcing problem is to match tasks and mobile workers such that the trust score is maximized and the allocation distance is minimized. The objective functions of TTA are normalized between 0 and 1, and the biobjective allocation optimization problem is formulated as follows:
for \(\sum _{k=1}^{2}w_k = 1, w_t > 0\). The minimum travel cost maximum trustaware allocation optimization is changed to a weighted sum single objective problem with the maxmin operator [36], where \(z_{k}^{U}\) is the minimum value of the kth objective and \(z_{k}^{N}\) is the maximum value of the kth objective; and all objective function \(f_k(x)\) and \(X = (x_{1,1},...,x_{m,n})\) is the matrix of decision variables. To solve this problem, a trustdistance weighted function is defined as an integrated optimization of trust scores and allocation distance costs. Owing to the dynamic nature of uncertain crowdsourcing scenarios, the trust values of mobile workers and tasks cannot be known in advance. In addition, many workers may be unavailable on the mobile crowdsourcing system at run time.
Trustaware task allocation with deep Qlearning
The majority of TTA optimization approaches require prior knowledge, but such approaches are not applicable in dynamic mobile crowdsourcing environments, where the availability of mobile workers is subject to frequent and unpredictable changes [2, 6]. Let us consider the submission of spatial tasks from requesters through a mobile crowdsourcing system, whereby spatial tasks are reached in an online manner. In such a scenario, the mobile crowdsourcing system possesses no prior information regarding spatial tasks and mobile workers.
The crowdsourcing TTA optimization problem is modeled as a Markov decision processbased mobile crowdsourcing (MCMDP) problem. Deep Qlearning is introduced to address the MCMDP problem. Furthermore, we propose an improved deep Qlearningbased trustaware task allocation (ImprovedDQLTTA) algorithm by combining trust crowdsourcing optimization and deep Qlearning, which enables the learning agent to solve largescale MCMDP problems in an uncertain scenario.
MDP model for uncertain mobile crowdsourcing
To address the dynamic problems of uncertain crowdsourcing TTA, a Markov decision process is adopted. The Markov decision process, a machine learning model, is a typical intelligence framework for modeling sequential decisionmaking problems under uncertainty [15]. In this paper, the MDP is applied to demonstrate the trustaware task allocations and adaptation processes schematically in uncertain mobile crowdsourcing.
A mobile crowdsourcing MDP consists of a fivetuple \(= \langle S, A, P, R, O\rangle\), where S is a state space composed of a finite set of crowdsourcing states, A is a crowdsourcing action space composed of a finite set of actions, P is the transition function for reaching the next crowdsourcing state \(s'\) from state s when an action \(a \in A(s)\) is performed by a crowdsourcing agent, R is a real crowdsourcing valued reward function, where the agent receives an immediate reward \(r = R(s's,a)\), and O is the crowdsourcing observation space in which the agent can fully observe the mobile crowdsourcing decision environment. On this basis, the mobile crowdsourcing MDP can be defined as follows.
Definition 6
(Mobile Crowdsourcing MDP (MCMDP)) A MCMDP is formally defined as a seventuple: MCMDP\(=\langle S^{i}, s_{0}^{i}, s_{r}^{i}, A^{i}, P^{i}, R^{i}, O^{i} \rangle\), where:

\(S^{i}\) is the set of tasks in the state space of a particular crowdsourcing partially observed by agent i.

\(s_{0}^{i} \in S\) is the initial task and any execution of the mobile crowdsourcing beginning from this task.

\(s_{r}^{i} \in S\) represents the terminal task. When arriving at the terminal task, an execution of mobile crowdsourcing is terminated.

\(A^{i}\) is the set of mobile workers that can perform tasks \(s \in S^{i}\), and mobile worker cw belongs to \(A^{i}\) only if the precondition is satisfied by s.

P is a probability value, that is, a transition distribution \(P(s's, a)\) that determines the probability of reaching the next state \(s'\) from state s if action \(a \in A(s)\) is fulfilled by a crowdsourcing agent. The probability distribution \(P(s's, a)\) can be defined as
$$\begin{aligned} \sum _{s' \in S} P(s's, a) = 1, \forall s \in S,\forall a \in A. \end{aligned}$$(7) 
\(R^{i}\) is the reward function when mobile worker \(cw \in A^{i}\) is invoked, agent i transits from s to \(s'\), and the learning agent obtains an immediate reward \(r^i\). The expected value is \(R^{i}(s's, ws)\). Consider selecting mobile worker cw with multiple quality criteria, where agent i receives the following quality vector as a reward:
$$\begin{aligned} \begin{aligned} QoS(s,cw,s') =&[f_{tr}(s,cw,s'), f_{dist}(s,cw,s')]^{T}, \end{aligned} \end{aligned}$$(8)where each \(f_k(\cdot )\) denotes a quality attribute of mobile worker cw.

O is the crowdsourcing observation space in which the agent can fully observe the mobile crowdsourcing decision environment.
The MCMDP solution is a collection of TTA decision policies, each of which can be described as a procedure of trustaware task allocation \(cw \in A\) by agent i in each state s. These policies, denoted as \(\pi\), actually map spatial tasks to mobile workers, defined as \(\pi = S \rightarrow A\). The MCMDP policy can be defined as a mobile crowdsourcing model. The main idea is to identify the optimal policy for trustaware allocation in uncertain mobile crowdsourcing.
Deep Qlearningbased trustaware task allocation algorithm
The above section analyzed the optimization problem of trust aware allocation by means of the MCMDP model. The optimization objective is to maximize the longterm rewards of the MCMDP. The solution of the MCMDP can be denoted as a policy \(\pi\) that guides a learning agent to take the right action for the specific crowdsourcing state.
Dynamic task allocation with QLearning
The uncertain mobile crowdsourcing problem can be formulated as an MCMDP model. However, the transition probabilities are not known, and we do not initially know the rewards of taking the allocation action. In this case, Qlearning is suggested for a crowdsourcing agent to determine the optimal policy. Qlearning is a temporal difference learning algorithm [14, 15] that takes into account the fact that the agent initially has only partial knowledge of the crowdsourcing MCMDP. In general, assume that an agent learns from experience to address uncertain mobile crowdsourcing. The agent can obtain a set of stateaction rewards \(\langle s_1, a_1, r_1,s_2, a_2, r_2,\cdots , s_t, a_t, r_t \rangle\), which indicates that the agent was in state \(s_t\), selected action \(a_t\), and obtained reward \(r_t\). Figure 2 illustrates the sequence of the crowdsourcing state and stateaction reward pairs.
Temporal difference learning agents determine the increment to \(V(s_t)\) in each time step. At time t, the agents immediately create an update by using discount rewards and computing \(V(s_t)\). Temporal difference learning [15] can defined as
The goal of temporal difference learning agents is to update \(V(s_t)\) by \(R(s_t)+\gamma \cdot V(s_t)\) in each step. Tabular Qlearning is a common approach in temporal difference learning for maximizing total rewards. For each state s and action a, the tabular Qlearning algorithm takes an action, observes a reward r, enters a next state \(s'\), and updates Q(s, a). The key of the Qlearning algorithm is a straightforward value Q(s, a) iteration update. Q(s, a) is accumulated for the current estimate of \(Q^{\pi }\) in each training iteration. The learning table values of Q(s, a) are revised by the following function:
The learning rate \(\alpha \in [0,1]\) indicates the extent to which the existing estimation of \(Q^{\pi }(s, a)\) contributes to the next estimation. The Q(s, a) values ultimately converge to the optimum value \(Q^{*}(s, a)\) [15]. Thus, the Qlearningbased allocation algorithm ultimately discovers an optimal policy for any finite MCMDP [6]. The basic optimization involves incorporating both the travel distance and the trust score of mobile workers into the dynamic mobile crowdsourcing decisions. Thus, the reward function of Q Learningbased TTA is defined as in Definition 7.
Definition 7
(Reward function) Suppose that a mobile worker completing a task can be estimated by a trust score \(f_{tr}(x_i^j) = tr(x_i^j)\). Each mobile worker is required to move from location aloc to bloc when completing the spatial task, which incurs a distance cost \(f_{dist}(x_i^j)\). The distance cost is evaluated in terms of the distance \(f_{dist}(x_i^j)=dist(aloc, bloc)\) between aloc and bloc. As a result, the reward function is determined with QoS vectors \([f_{tr}(x_i^j), f_{dist}(x_i^j)]\). Owing to the different scale of each QoS objective, the QoS value is mapped into the interval [0, 1]. With the minmax operator, the learning reward function adopts the linearly weighted sum approach to calculate the value of all QoS objectives:
In the training iterations, the learning agent estimates its optimal policy by maximizing the total of received crowdsourcing rewards in the uncertain scenario.
Dynamic task allocation with deep Qlearning
Tabular Qlearning is not a feasible solution owing to the largescale state and action spaces in uncertain mobile crowdsourcing systems. Moreover, a Qlearning table is environmentspecific and not generalized. In largescale uncertain systems, there are too many states and actions to store in machine memory, and learning the value of each state is a slow process. This section introduces a new and highly effective QLearningbased task allocation mechanism.
To adapt to changes in largescale mobile crowdsourcing systems, we propose a deep Qlearningbased trustaware task allocation (DQLTTA) algorithm that is a combination of advances in deep neural network and Qlearning techniques. Specifically, the dynamic TTA problem is formalized as a Markov decision processbased mobile crowdsourcing model. The experience of a crowdsourcing state transition is denoted as \(s, a, r, s'\), and a set of crowdsourcing states and allocation actions with a transition policy constitute an MCMDP. One episode of an MCMDP forms a limited sequence of crowdsourcing states, allocation actions and rewards:
where \(s_t\) denotes the current state, \(a_t\) denotes the current action, \(r_t\) denotes the reward after performing an action, and \(s_{t+1}\) denotes the next state in the dynamic mobile crowdsourcing system.
The DQLTTA algorithm directly combines a deep neural network and QLearning to solve the dynamic trustaware allocation problem. The DQLTTA learning algorithm uses a value iteration approach, in which the crowdsourcing value function \(Q = Q(s, a; \theta )\) is a parameterized function with parameter \(\theta\) that takes crowdsourcing state S and crowdsourcing action space A as inputs and returns a crowdsourcing Q value for each action \(a \in A\). Then, we can use a greedy approach to select a crowdsourcing action:
DQLTTA iteratively solves the mobile crowdsourcing MDP problem by learning the weights of the deep neural network towards the optimization objective. The DQLTTA algorithm differs from QLearning in two ways. Traditional QLearning is based on the Bellman equation, and the Q value is iteratively updated: \(Q_{t+1}(s,a) = E [r + \gamma \cdot max_{a'}Q_t(s', a')s, a]\). QLearning algorithms with value iterations are impractical for largescale crowdsourcing problems. Thus, it is practical to employ a dynamic crowdsourcing function approximation to assess the action value function \(Q(s, a; \theta )\approx Q^{*}(s, a)\), which is a typical function approximation.
DQLTTA is designed as a function approximation with weight \(\theta\) for the mobile crowdsourcing MDP problem. The parameters of the DQLTTA function approximation can be learned by minimizing loss function \(L(\theta _t)\), which is optimized at iteration i
where \(y_t\) is the target value for iteration i and can be computed as
DQLTTA considers the crowdsourcing states and allocation actions as the inputs of a deep Qnetwork and outputs the Qvalue for dynamic allocations. Figure 3 illustrates the deep Qlearningbased trustaware task allocation (DQLTTA) algorithm framework.
Dynamic task allocation with improved deep Qlearning
As discussed in [17, 33,34,35], the performance of deep Qlearning algorithms may not to be stable. To improve the overall performance of DQLTTA, an improved DQLTTA algorithm (ImprovedDQLTTA) is further proposed to handle largescale MCMDP problems much more stably in uncertain mobile crowdsourcing environments. Our proposed ImprovedDQLTTA algorithm has been improved with the following important mechanisms: (i) minibatch stochastic gradient descent approach with advanced training mechanisms; (ii) Epsilondecreasing greedy policy; iii) a novel deep neural network architecture with an action advantage function.
Minibatch stochastic gradient descent The parameters of ImprovedDQLTTA from an earlier training iteration \(\theta _{t1}\) are fixed while optimizing the loss function \(L(\theta _t)\). Note that the targets rely on the ImprovedDQLTTA weight parameters. A local minimum of the loss function by the gradient is obtained as follows,
Instead of calculating the full expectation in the above gradient, the loss function of the ImprovedDQLTTA is computationally optimized by stochastic gradient descent [17]. The weights of the ImprovedDQLTTA approximation are trained using a gradient descent rule, and the parameter \(\theta\) can be updated using stochastic gradient descent by
Stochastic gradient descent is simple and appealing for DQLTTA; however, it is not sample efficient. In this paper, minibatch stochastic gradient descent learning is therefore proposed to discover the optimal fitting value function of ImprovedDQLTTA by training on minibatch crowdsourcing data. Instead of making decisions based solely on the current allocation experience, the allocation experience replay helps the ImprovedDQLTTA network to learn from several minibatches of crowdsourcing data. Each of these allocation experiences is stored as a fourdimensional vector of \(\langle state, action, reward, next state\rangle\). During training iteration t, allocation experience \(e_t = (s_t, a_t, r_t, s_{t+1})\) is stored into a replay tuple \(D = \{e_1, ..., e_t\}\). The memory buffer of the allocation experience replay is fixed, and as new allocation experience are inserted, previous experience are removed [19]. To train the ImprovedDQLTTA neural networks, uniform minibatches of experiences are extracted randomly from the allocation memory buffer.
To obtain stable Qvalues, a separate target network is used to estimate the loss function after every training iterations; another neural network, whose weights are changed gradually compared to the primary Qnetwork, is also used [35]. In this context, the ImprovedDQLTTA algorithm learns to optimize two separate neural networks \(Q(s, a; \theta )\) and \(Q(s, a; {\hat{\theta }})\) with current learning parameters \(\theta\) and previous learning parameters \({\hat{\theta }}\). \(\theta\) are updated numerous times during the training iterations and are cloned to the previous parameters \({\hat{\theta }}\) after \(NUM_{training}\) iterations.
ImprovedDQLTTA is refreshed with a batch of collected samples in the experience replay buffer by means of minibatch stochastic gradient descent at each decision epoch.
Theorem 1
(The convergence analysis of minibatch stochastic gradient descent) Assume that there are two constants A and B that satisfy \(E[\Vert \triangledown h_b(\theta )\Vert ^2]\le A\) and \({\mathbb {E}}[\Vert \theta ^*  \theta _t\Vert ^2]\le B\), where t denotes the gradient optimization iteration and
Let \(h_{min}(\theta )=min\{h(\theta _1), h(\theta _2), \cdots , h(\theta _t)\}\) and assume that
When the optimization of the minibatch approach reaches \(t+1\) iterations, then
According to the conditional expectation of mathematics, we can obtain
Taking the expectation of \(\theta _t\) in Equation (22) yields
Accordingly,
Since \({\mathbb {E}}[\Vert \theta _{t+1}  \theta ^{*}\Vert ^2] \ge 0\), we obtain
Since \({\mathbb {E}}[\Vert \theta _{t}  \theta ^{*}\Vert ^2] \le B\), we obtain
and
Since \(\sum _{t=0}^{\infty } \eta _t=\infty\), it is clear that \(h_{min}(\theta ) \rightarrow h(\theta ^*)\).
Therefore, it can be concluded that ImprovedDQLTTA with minibatch stochastic gradient descent converges to \(h(\theta ^*)\).
\(\epsilon\)decreasing greedy policy The ImprovedDQLTTA algorithm selects the allocation action a with the maximum Q value by exploiting the knowledge found by the current s. To build a better estimate of the optimal ImprovedDQLTTA function, the algorithm should explore and select a different allocation action from the current best allocation. In this paper, the \(\epsilon\)greedy policy is employed to select a random allocation action \(\epsilon\) at one time (\(0 \le \epsilon \le 1\)) and to select the optimal allocation action by maximizing its Q value at the other time [15]. By means of this strategy, ImprovedDQLTTA can achieve a trade off between exploration and exploitation in uncertain mobile crowdsourcing systems. The \(\epsilon\)greedy policy can be illustrated as follows
where actnum denotes the total number of available allocation actions.
Theorem 2
(\(\epsilon\)greedy policy improvement) For any \(\epsilon\)greedy policy \(\pi\), the \(\epsilon\) greedy policy \(\pi '\) with respect to \(q_{\pi }\) is an improvement, \(v_{\pi '}(s) \ge v_{\pi }(s)\).
Therefore, the \(\epsilon\)greedy policy is an improvement, \(v_{\pi '}(s) \ge v_{\pi }(s)\).
To maintain a good balance of exploration and exploitation, a suitable learning parameter should be selected for the \(\epsilon\)greedy strategy. In the early training time, a more random policy should be used to encourage initial exploration, and as training time progresses, a more greedy policy should be considered. The training performance of ImprovedDQLTTA can be improved by using an \(\epsilon\)greedy parameter that changes during training, which is defined as following.
where \(\epsilon _i\) is the initial value of \(\epsilon\), \(\epsilon _f\) is the final value of \(\epsilon\), and explore is the total number of training steps.
Novel neural network architecture with action advantage function To further improve the convergence stability, a novel deep network architecture is integrated into ImprovedDQLTTA for learning the crowdsourcing decision process with an action advantage function [33,34,35]. The key idea of this mechanism is to design a novel neural network with two sequences of fully connected layers. In this way, the state values and the action advantage are separately learned by the novel ImprovedDQLTTA neural network. Figure 4 illustrates the novel neural network architecture.
For a stochastic policy \(\pi\), \(Q_{\pi }(s, a)\) and \(V_{\pi }(s)\) can be formulated as
The action advantage function can be defined as
Note that \({\mathbb {E}}[G_{\pi }(s, a)]=0\). Intuitively, the \(V_{\pi }(s)\) function calculates the value of a particular state s, and \(Q_{\pi }(s, a)\) evaluates the value of selection action a in state s and then combines the results to estimate the crowdsourcing action value. Based on this definition, the evaluation of the relative importance of the each crowdsourcing action can be obtained from the action advantage function \(G_{\pi }(s, a)\).
To estimate the values of V and G functions, ImprovedDQLTTA is implemented with a novel neural network, where two streams of fully connected layers output vector \(V(a;\beta )\) and vector \(G(s,a;\alpha )\). ImprovedDQLTTA combines \(V_{\pi }(s)\) and \(G_{\pi }(s, a)\) to obtain \(Q_{\pi }(s, a)\), as follows
and
where \(\alpha\) and \(\beta\) are parameters of the two sequences of novel neural network layers. The action advantage function has zero advantage in selecting an action. For \(a^*=argmax_{a \in A}Q(s, a; \alpha , \beta ) = argmax_{a \in A}G(s, a;\alpha )\), the function obtains \(Q(s, a^*; \alpha , \beta )=V(s;\beta )\). Furthermore, for better stability, an alternative module of ImprovedDQLTTA replaces the max operator with an average operator
ImprovedDQLTTA is an intelligent algorithm for addressing sequential decisionmaking problems of mobile crowdsourcing systems. ImprovedDQLTTA is implemented with minibatch stochastic gradient descent, \(\epsilon\)decreasing greedy policy, and a novel network architecture with an action advantage function. To intelligently develop an appropriate strategy, ImprovedDQLTTA is built with a multiplelayer network that takes the crowdsourcing state encoded in a \([1 \times statenum]\) vector and learns the best action (mobile workers), mapping all possible actions in a vector of length actnum. In summary, the pseudo code for improved deep Qlearningbased trustaware task allocation is illustrated in Algorithm 1.
ImprovedDQLTTA is able to effectively identify an optimal solution for the largescale MCMDP. ImprovedDQLTTA operates by learning to optimize the expected reward of selecting an action for a given state and discovering the optimal actionselection policy to stably adapt to changes in a largescale environment.
Experimental results and analysis
The prototype applications were programmed on the JetBrains PyCharm Community platform. All the algorithms were implemented in Python 3.5 programming language, and the experiments were run on 64 bit windows 10 with an Intel(R) Core(TM)i57300HQ@2, 50 GHz, 16 GB of RAM, and 500 GB disk storage. The performance of the proposed ImprovedDQLTTA algorithm was compared to the reference algorithms. A series of experiments were performed on synthetic data from the real world. In this section, computer simulations are conducted to illustrate the performance of the proposed ImprovedDQLTTA algorithm in mobile crowdsourcing systems. We first present the experimental setting; then, the performance under different scenarios simulated and analyzed. Finally, the convergence of the proposed ImprovedDQLTTA algorithm is illustrated.
Experimental setting
Existing research has addressed spatial task allocation by simulating mobile crowdsourcing environments by means of experimental data sets. Data sets from locationbased social networks have been used to evaluate dynamic crowdsourcing algorithms. A similar approach is followed here to evaluate the performance of the proposed algorithm. The experimental data set is presented and evaluated in the following subsections.
The synthetic data set consists of realworld data obtained from Gowalla, a popular locationbased social network. Gowalla was selected as our experimental data set for evaluating ImprovedDQLTTA, and San Francisco was chosen as the experimental region, within the boundary \([37.709, 37.839, 122.373, 122.503]\). The Gowalla data set includes checkins by a large number of users at numerous locations in San Francisco. The data set comprises 1,083 persons, 38,333 locations, and 227,428 checkins. For synthetic experimentation purposes, the task and worker locations were randomly initialized with \(latitude \sim \mu (37.71, 37.83)\) and \(longitude \sim \mu (122.37, 122.50)\). Table 2 summarizes both data sets used for the datadriven initialization [6].
Figure 5 illustrates the geographical map and its data table for mobile crowdsourcing systems in San Francisco.
The synthetic data were used to study the proposed algorithm. Users in the Gowalla data set are regarded as mobile workers, and the locations and checkins are initialized in relation to the mobile users. Mobile crowdsourcing requesters are randomly generated by sampling Gowalla checkins [6]. To evaluate the scalability of our proposed ImprovedDQLTTA, the mobile crowdsourcing parameters are set as in Table 3.
Furthermore, the trustworthiness of mobile workers is evaluated in terms of their trust value \(tr_j\), sampled from the parameterized uniform distribution, that is, \(tr_j \sim \mu (tr_{min},tr_{max})\). For workers, the qualities of the tasks are also randomly generated from a uniform distribution, that is, \(sw_{i,j} \sim \mu (tr_j, 0.1)\). We set the mobile worker trust parameters as tr, and the trust score range is [0.5, 1], [1, 2], [2, 3], [3, 4], [4, 4.5). To satisfy the experimental requirements, we evaluate our proposed ImprovedDQLTTA algorithm on synthetic and realworld data sets. The quality data for each mobile worker are simulated in the MCMDP model with a random vector. The parameters of the quality vector are obtained from a Gaussian distribution. Figure 6 illustrates the quality distribution of mobile workers.
With the purpose of solving dynamic decision problems in uncertain mobile crowdsourcing, the ImprovedDQLTTA algorithm iteratively runs until convergence. As the trust score and travel distance of mobile workers are dynamic, the trust and distance values of \(10\%\) of the mobile workers are regenerated periodically every 30,000 episodes.
Algorithm parameter study
The parameters of the algorithm are defined for our experiments to ensure highquality crowdsourcing solutions, and the number of iterations is set to 30,000. This section discusses two core parameters of ImprovedDQLTTA: the learning rate \(\alpha\) and \(\epsilon\)greedy rate. The following experiments investigate the two learning parameters.
Experiment 1: ImprovedDQLTTA learning rate evaluation. To improve the learning efficiency of the proposed algorithm, a suitable learning rate must be seleted. This experiment varies the learning rate \(\eta\) to investigate the learning efficiency. As shown in Fig. 7, when \(\eta =0.001\), the Q value continues its downward trend after approximately 20000 iterations, which indicates that the learning with this parameter setting is inefficient; when \(\eta =0.005\), the Q value are continue their upward trend after approximately 30,000 iterations, which indicates \(\eta =0.005\) results in inefficient learning; when \(\eta =0.09\), the Q values reach to a maximum after approximately 15,000 iterations, but the learning value is not stable; when alpha=0.1, the Q values rapidly reach to the maximum after around 15,000 iterations, and the final Q values with \(\eta =0.1\) is higher than the that of the other \(\eta\) settings.
Experiment 2: ImprovedDQLTTA \(\epsilon\)greedy rate evaluation. In this experiment, the number of spatial tasks is set to 50, the number of candidate mobile workers is set to 50, and the greedy rate \(\epsilon\) is varied. To investigate the impact of the greedy rate on the proposed ImprovedDQLTTA algorithm, the final epsilon parameter of \(\epsilon\)greedy is varied in [0.01, 0.05, 0.1, 0.5]. Figure 8 illustrates the average cumulative rewards with different final \(\epsilon\) values. From the results, we can make an observation that the learning quality with \(\epsilon _f=0.01\) is higher than other \(\epsilon _f\) settings.
In the ImprovedDQLTTA algorithm, the learning rate \(\eta\) is defined as 0.1, the initial \(\epsilon\)greedy value is defined as 0.9, and the final \(\epsilon\)greedy value is defined as 0.01 as shown in Table 4.
ImprovedDQLTTA performance study
Experiment 3: Learning efficiency with different numbers of worker. As illustrated in Fig. 9, the Q value performance is evaluated with respect to the number of mobile workers. The number of mobile workers is varied in [5, 10, 50], and the spatial task number is set to 50.
Figure 9 shows that the Q values increase with increasing number of mobile workers because a greater number of mobile workers increases the chances of selecting better workers.
Experiment 4: Average rewards with different worker and task scales. As illustrated in Fig. 10, the average cumulative reward performance is estimated with respect to the number of mobile workers. The number of mobile workers varies in [10, 50, 100], and the number of spatial tasks is varied in [10, 50, 100].
Figure 10 illustrates that the average cumulative rewards increase with increasing number of candidate mobile workers because a greater number of workers increases the chances of selecting higherquality workers by using ImprovedDQLTTA.
Experiment 5: Comparison of DQLTTA and ImprovedDQLTTA. The mobile crowdsourcing scale is denoted by the number of spatial tasks. The experimental number of mobile workers is set to 50, and the number of required spatial tasks are set to [10, 50, 100]. The experiment creates \(10 \times 50\), \(50 \times 50\), and \(100 \times 50\) taskandworker pairs matrix for comparing the proposed algorithm ImprovedDQLTTA and DQLTTA. The xaxes indicates the training steps.
Evaluation of Q values: The proposed ImprovedDQLTTA algorithm is run in this setting and the Q values are compared with those of DQLTTA. Figure 11 shows the Q value results for the proposed ImprovedDQLTTA algorithm with DQLTTA algorithm. The ImprovedDQLTTA algorithm consistently produces higher Q values than the DQLTTA algorithm after approximately 15,000 iterations.
Evaluation of training loss cost: The proposed ImprovedDQLTTA algorithm is run in this setting and the training lost costs are compared with those of DQLTTA. Figure 12 illustrates the loss function cost results of the proposed ImprovedDQLTTA algorithm and DQLTTA algorithm. The two algorithms converge within the certain number of training steps. Therefore, the learning accuracies of ImprovedDQLTTA gradually improves as training progresses.
Evaluation of the average accumulated reward: The proposed ImprovedDQLTTA algorithm is run in the setting of experiment 5 and the average accumulated reward is compared with that of DQLTTA. Figure 13 illustrates the average accumulated award results of the proposed ImprovedDQLTTA algorithm and DQLTTA algorithm. The ImprovedDQLTTA algorithm consistently leads to higher rewards than the DQLTTA algorithm during the training, which indicates the proposed ImprovedDQLTTA algorithm produces a better allocating solution in uncertain mobile crowdsourcing systems.
Efficiency evaluation of ImprovedDQLTTA: We compare our proposed ImprovedDQLTTA algorithm to DQLTTA in terms of runtime performance. According to the experimental requirements, the number of spatial tasks is set to 50, and the number of available mobile workers is varied in [10, 50, 100]. The proposed ImprovedDQLTTA algorithm is run in this setting. Figure 14 illustrates the runtime performance of the proposed ImprovedDQLTTA algorithm in comparison to that of DQLTTA when varying the number of mobile worker. The blue bar describes the average runtime of the ImprovedDQLTTA with different mobile worker scales. As illustrated in the figure, ImprovedDQLTTA is more efficient than the DQLTTA algorithm in terms of calculation time.
The above experiments illustrate the performance of the proposed ImprovedDQLTTA in terms of Q value, loss cost, average accumulated reward and run time. The experimental results on the data sets of uncertain mobile crowdsourcing illustrated that ImprovedDQLTTA algorithm outperformed DQLTTA algorithm. Therefore, our proposed ImprovedDQLTTA produces better solutions than DQLTTA. Moreover, ImprovedDQLTTA is much more stable when solving largescale MCMDP problems of uncertain mobile crowdsourcing systems. Given enough iterations, the ImprovedDQLTTA algorithm will converge to the optimal Q value. Therefore, ImprovedDQLTTA can learn to optimize its efforts to solve the dynamic trustaware task allocation problems in an adaptive and effective manner.
Conclusion
Due to the advancing technology of smart phones with numerous builtin sensors, mobile crowdsourcing has recently promoted the combination of collective intelligence beyond geographical boundaries. Mobile workers need to collaborate with other workers for accomplishing multiple tasks. Trustworthiness is considered as a key factor in mobile crowdsourcing to enable effective collaboration. In this paper, a new and highly effective learning algorithm has been proposed to process dynamic Trust aware Task Allocations (TTA) in uncertain mobile crowdsourcing systems. Specifically, the TTA optimization problem, which aims at maximizing trust score and minimizing the travel distance cost, is formulated as Mobile Crowdsourcing Markov Decision Process (MCMDP). Furthermore, to solve the largescale MCMDP problem, an Improved Deep QLearningbased Trust aware Task Allocation (ImprovedDQLTTA) algorithm is proposed as an improvement over trust collaboration optimization modelling in uncertain crowdsourcing systems. The proposed algorithm combines both trust aware task allocation optimization and deep QLearning techniques. The theoretical analysis was conducted to prove the applicability of ImprovedDQLTTA. Experimental simulations were carried out to establish the obvious advantage of our proposed algorithm through comparisons with the reference algorithm. The ImprovedDQLTTA algorithm exhibits distinct advantages that make it effective to largescale spatial collaboration problems in uncertain mobile crowdsourcing systems.
Availability of data and materials
Not applicable.
References
 1.
Ju R, Zhang Y, Zhang K (2015) Exploiting mobile crowdsourcing for pervasive cloud services: challenges and solutions. IEEE Commun Mag 53(3):98–105
 2.
Hassan UU, Curry E (2016) Efficient task assignment for spatial crowdsourcing: a combinatorial fractional optimization approach with semibandit learning. Expert Syst Appl 58((C)):36–56
 3.
To H (2016) Task assignment in spatial crowdsourcing: challenges and approaches. In: Proceedings of the 3rd ACM SIGSPATIAL PhD symposium, San Francisco, CA, USA, 26 June–1 July 2016
 4.
Tran L, To H, Fan L (2018) A realtime framework for task assignment in hyperlocal spatial crowdsourcing. ACM Trans Intell Syst Technol 9(3):37:1–37:26
 5.
Li Y, Shin B (2017) Taskmanagement method using Rtree spatial cloaking for largescale crowdsourcing. Symmetry 9(12):311
 6.
Sun Y, Wang J, Tan W. Dynamic workerandtask assignment on uncertain spatial crowdsourcing. In: IEEE CSCWD 20th international conference on computer supported cooperative work in design, 2018. pp 755–760
 7.
Guo B, Liu Y, Wang L (2018) Task allocation in spatial crowdsourcing: current state and future directions. IEEE Intern Things J 5(3):1749–1764
 8.
Wang Y, Cai Z, Tong X (2018) Truthful incentive mechanism with location privacypreserving for mobile crowdsourcing systems. Comput Netw 135:32–43
 9.
Zhao Y, Han Q (2016) Spatial crowdsourcing: current state and future directions. IEEE Commun Mag 54(7):102–107
 10.
Liu A, Wang W, Shang S (2018) Efficient task assignment in spatial crowdsourcing with worker and task privacy protection. GeoInformatica 22(2):335–362
 11.
Chi Z, Wang Y, Huang Y (2018) The novel location privacypreserving CKD for mobile crowdsourcing systems. IEEE Access 6:5678–5687
 12.
Kazemi L, Shahabi C, Chen L, et al. GeoTruCrowd:trustworthy query answering with spatial crowdsourcing. In: ACM Sigspatial international conference on advances in geographic information systems. ACM, 2013. pp 314–323
 13.
Hayam M, Sonia BM, Omar H (2015) Trust management and reputation systems in mobile participatory sensing applications: a survey. Comput Netw 90:49–73
 14.
Watkins CJ (1989) Learning from delayed rewards. Ph.D. thesis. Cambridge University, Cambridge
 15.
Sutton RS, Barto AG (1988) Reinforcement learning: an introduction, vol 1. MIT Press, Cambridge
 16.
Azevedo CR, Von Zuben FJ (2015) Learning to anticipate flexible choices in multiple criteria decisionmaking under uncertainty. IEEE Trans Cybern 46(3):778–791
 17.
Mnih V et al (2015) Humanlevel control through deep reinforcement learning. Nature 518:529–533
 18.
Silver D et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359
 19.
Liu N, et al (2017) A hierarchical framework of cloud resource allocation and power management using deep reinforcement learning. In: Proc. IEEE 37th Int. Conf. Distrib. Comput. Syst. (ICDCS), Atlanta, GA, USA, pp 372–382
 20.
Sun Y, Peng M, Mao S (2018) Deep reinforcement learning based mode selection and resource management for green fog radio access networks. IEEE Intern Things J 99:1
 21.
Chittilappilly AI, Chen L, Ameryahia S (2016) A survey of generalpurpose crowdsourcing techniques. IEEE Trans Knowl Data Eng 28(9):2246–2266
 22.
Doan A, Ramakrishnan R, Halevy AY (2011) Crowdsourcing systems on the WorldWide Web. ACM, New York
 23.
Whitehill J, Wu TF, Bergsma J, Movellan JR, Ruvolo PL (2009) Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Advances in neural information processing systems. pp 2035–2043
 24.
Parameswaran A, Sarma AD, GarciaMolina H (2011) Humanassisted graph search: it’s okay to ask questions. Proc Vldb Endow 4(5):267–278
 25.
Liu Xuan, Meiyu Lu, Ooi Beng Chin, Shen Yanyan, Sai Wu, Zhang Meihui (2012) CDAS: a crowdsourcing data analytics system. Proc VLDB Endow 5(10):1040–1051
 26.
Bulut MF, Yilmaz YS, Demirbas M. Crowdsourcing locationbased queries. In: 2011 IEEE international conference on pervasive computing and communications workshops (PERCOM Workshops). IEEE, New York, pp 513–518
 27.
Sun Y, Tan W, Li LX (2016) A new method to identify collaborative partners in social service provider networks. Inform Syst Front 18(3):565–578
 28.
Awal GK, Bharadwaj KK (2014) Team formation in social networks based on collective intelligencean evolutionary approach. Appl Intell 41(2):627–648
 29.
Miao C, Yu H, Shen Z (2016) Balancing quality and budget considerations in mobile crowdsourcing. Decis Support Syst 90:56–64
 30.
Feng Z, Zhu Y, Zhang Q et al (2014) Trac: Truthful auction for locationaware collaborative sensing in mobile crowdsourcing. In: Proceedings of the IEEE INFOCOM conference, pp 1231–1239
 31.
Kazemi L, Shahabi C (2012) GeoCrowd: enabling query answering with spatial crowdsourcing. In: Advances in geographic information systems. pp 189–198
 32.
Cheng P, Lian X, Chen L et al (2017) Predictionbased task assignment in spatial crowdsourcing. In: International conference on data engineering, pp 997–1008
 33.
Schaul T, Quan J, Antonoglou I, et al (2016) Prioritized experience replay. In: International conference on learning representations, ICLR
 34.
Wang Z, Schaul T, Hessel M, et al (2016) Dueling network architectures for deep reinforcement learning. In: International conference on machine learning, pp. 1995–2003
 35.
Li Y (2017) Deep reinforcement learning: an overview
 36.
Tan W, Sun Y, Li L (2014) A trust serviceoriented scheduling model for workflow applications in cloud computing. IEEE Syst J 8(3):868–878
Acknowledgements
The authors thank the reviewers for their suggestions which helped in improving the quality of the paper.
Funding
This work was supported in part by the National Natural Science Foundation of China under Grant No. 61272036, Anhui Provincial Natural Science Foundation under Grant No. 1908085MF191, and the University Natural Science Foundation of Jiangsu Province under Grant No. 18KJB520008.
Author information
Affiliations
Contributions
Yong Sun contributed to the original idea, algorithm and the whole manuscript writing. Wenan Tan supervised the work and helped with revising and editing the manuscript. Both authors read and approved the final manuscript.
Corresponding author
Correspondence to Yong Sun.
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Received
Accepted
Published
DOI
Keywords
 Collaborative computing
 Mobile crowdsourcing
 Dynamic task allocation
 Collaborative partners selection
 Trust optimization