Heuristics for spatial finding using iterative mobile crowdsourcing
 Seng W. Loke^{1}Email author
DOI: 10.1186/s1367301600616
© Loke et al. 2016
Received: 20 July 2015
Accepted: 24 November 2015
Published: 17 April 2016
Abstract
Crowdsourcing has become a popular method for involving humans in sociallyaware computational processes. This paper proposes and investigates algorithms for finding regions of interest using mobile crowdsourcing. The algorithms are iterative, using cycles of crowdquerying and feedback till specified targets are found, each time adjusting the query according to the feedback using heuristics. We describe three (computationally simple) heuristics, incorporated into crowdsourcing algorithms, to reducing the costs (the number of questions required) and increasing the efficiency (or reducing the number of rounds required) in using such crowdsourcing: (i) using additional questions in each round in the expectation of failures, (ii) using neighbourhood associations in the case where regions of interest are clustered, and (iii) modelling regions of interest via spatial point processes. We demonstrate the improved performance of using these heuristics using a range of stylised scenarios. Our research suggests that finding in the city is not as difficult as it can be, especially for phenomena that exhibit some degree of clustering.
Keywords
Mobile crowdsourcing Spatial finding HeuristicsBackground
Crowdsourcing, or crowd computing, is an important powerful approach to problemsolving and critical information gathering, harnessing the power of the crowd, and creatively combining machine and human computations [1–3]. Crowdsourcing can be used to do tasks that either no machine alone can do or where involving humans is better (e.g., CrowdDB [4]). Mobile crowdsourcing, i.e. crowdsourcing to mobile users, presents significant new opportunities and challenges, with enormous possibilities for human computations and tasks with spatial and temporal properties [5–12].
One important class of crowdsourcing applications is where information and tasks to be crowdsourced have spatiotemporal properties and are advantageously done by mobile device users (e.g., crowdsourcing for carpark spaces, locations of crowds, maps of areas, transport demand, emergency needs, photos/video at different locations of a parade, location of flora and fauna), different from general tasks that can be crowdsourced such as language translation or copyediting. Such crowdsourcing may be done over extended periods of time and data centralised for analytics, or can be done in an adhoc realtime ondemand manner (e.g., issuing crowdsourced queries to assess a situation in the vicinity within the next few minutes). Realtime crowdsourcing of queries can be useful, even with prior information available (e.g., there is a database of historical traffic or car parking availability for a given area) obtained from some source (e.g., either from stationary sensors or previous crowdsourcing efforts or other means) in order to obtain uptodate information that can complement or update prior information, especially in the case where information can easily deviate from history (e.g., car parking), or historical data are too coarsegrained (e.g., the database only has average car park availability for a large area over days), or without any prior information, where the queries are to obtain new information about regions in the area, in a justintime ondemand manner. Realtime responsiveness in crowdsourcing is a challenge but methods have been explored making it a real possibility. For example, Bernstein has done interesting work in preparing crowds via additional incentives in order to achieve realtime performance [13].
Crowdsourcing involves incentives, and hence, costs, such as monetary costs for payments for answers as well as efficiency costs, e.g., in terms of time taken to achieve an adequate response. Algorithms where humans are viewed as data processors have been explored for finding the maximum [14], filtering [15] and for finding a subset of items with a given property among a given unstructured collection [16], taking into consideration the need to optimise cost and efficiency at the same time.
In this paper, we propose and investigate iterative crowdsourcing processes, based on work from [17], to find regions of particular interest (e.g., regions satisfying particular properties) from among a collection of regions, with all such regions within a given fixed size area, as typically seen in the context of mobile crowdsourcing, where the human contributors (or workers) for crowdsourced tasks/queries are people within the area with mobile devices, so that queries or jobs posed to them (and their answers) have spatial properties. We also have in mind realtime crowdsourcing where results are intended for the hereandnow, rather than obtained over a long period of time, but our work does not deal specifically with strict realtime constraints.
In particular, we consider cases where association between regions can be exploited to reduce the costs and increase efficiency in crowdsourcing. Often, information about a region provides clues about information of its neighbouring regions: a region that is crowded might be adjacent to another crowded region or a polluted region might be adjacent to another polluted region, even if this is not always the case. We argue that this is the case for a number of realworld phenomena including car parking, 3G/4G bandwidth, crowded areas, and noise pollution. So, for example, if one wants to use crowdsourced queries to find regions where there are car parks available, regions where there is currently high 3G/4G bandwidth, regions which are crowded, and regions with noise pollution above some threshold, then neighbourhood or proximity associations can be exploited.
In the rest of this paper, we first outline the spatial finding problem in "The spatial finding problem" section, and discuss possible solutions in "A crowdsourcing solution" section. Then, we introduce three heuristics for spatial finding in crowdsourcing and describe experimentation to demonstrate the effectiveness of the heuristics, a heuristic using more queries than the minimal in "A heuristic that embraces failure: redundant questioning" section and a heuristic using immediate neighbourhood associations in "A heuristic for spatial finding: neighbourhood associations" section, and a heuristic using spatial point processes in "Experiments" section. We then review related work in "Related work" section and conclude in "Conclusion" section.
The spatial finding problem
The spatial finding problem is a simple variation of the problem first proposed in [17]. The basic version of the original problem is as follows: given a (large) set of items, a predicate, and a number k, use humans to find k items from the given set that satisfy a given predicate. An example of an instance of this problem is given as follows: “Consider a data set I of images, from which we want to find 10 images that satisfy a predicate or flter f, e.g., whether it is a photo of a cat. We consider each image, and ask humans the question, e.g., ‘does this image show a cat?’ Suppose on average that 20 % of the photos are of cats. For the purposes of this example, we assume that humans do not make mistakes while answering questions.” A solution to the above problem might be sequential (to ask about one image at a time) and stop whenever 10 images are found; this algorithm is costoptimal asking only as many questions as needed, but could take a long time (requiring many rounds of questioning)—the cost and latency depends on which images are picked. Another solution is to consider asking about all images in parallel; this is fast (requiring only one round of questioning), but is costly since one needs to pay for all the questions asked. A third possible solution which is between the first and second solution in terms of the costlatency tradeoff is to ask \((10\, {}\, x)\) questions at the current round of questioning, if we already found x cat images so far.
This original version does not deal with spatial properties of items as we do in this paper. We define below a spatial version of the above problem and while we explore the above solution ideas, we consider spatial heuristics for picking items to ask about. The general notion of costlatency tradeoffs, however, also applies here.
Our Problem Assume a a large area R partitioned into n regions \(\{ r_1,\ldots r_n \}\). The problem is to find a set \(S \subseteq R\) of at least \(k \le n\) regions, each of which evaluates to true for a given predicate F representing some criteria, i.e. \(F(r) = TRUE\), for each \(r \in S\). We also want to solve this problem with the lowest cost (assuming we need to pay to get a question about a region answered) and in a most efficient way (the number of rounds of questions required).
For example, we want to find at least k regions with available car parking spaces, and can divide a large area into a set of regions, about which we can then ask the crowd about, but each time we ask the crowd about a region, we assume that we incur a cost. Another example is to find a notsocrowded cafe and can issue a query to find at least k regions with a notsocrowded cafe, answers being given by people near or within the region. A third example is to find a high bandwidth (WiFi, 4G or otherwise) region.
There are two factors to deal with in any solution to the problem. One is the cost, where we assume that each time a query is issued to find out about a region, a cost \(\phi\) is incurred (which includes the cost of issuing the query as well as incentives paid for an answer to the query) so that if we ask about k regions, we incur a total cost of \(k. \cdot \phi\). The other is efficiency which we define to be the number of rounds of querying required, where in each round, a set of queries is issued in parallel to find out about a particular chosen set of regions (assuming one query per region).
While we do not prescribe the mechanism used by users to issue queries to other users, we assume that some cost is incurred per query issued (or per answer about a region obtained). We assume, for simplicity, here that each query issued is always answered, and answered accurately and truthfully. The cost could be measured in different forms, e.g., to the query issuer, the cost could be a monetary incentive to a user providing an answer about a query (about a region). This means that a general mechanism to ask everyone could incur a high cost if everyone actually answers, as we discuss below.
A crowdsourcing solution
An initial solution to the above problem is to simply adapt the solutions from [16], which was initially developed to find particular items from a database of items: assuming that each region is an item requiring a binary answer YES or NO (TRUE or FALSE), we have the algorithm below which is to find particular regions from a collection of regions. Note that YES/NO questions are very easy for users to respond to (but of course, tend to provide less information than more general responses).
 1
It has been shown from [16] that a way to minimise total cost is to ask about one region at a time (say, in any order) and stopping whenever k regions are found satisfying the criteria, so that we never ask more questions than required or get more than k positive answers; only in the worst case, this scheme can lead to R rounds.
 2
A more expensive solution but very efficient (requiring one round) is to ask R questions about all the regions in parallel. This could be done, for example, by posting a query like ‘where can I find a parking spot?’ on a wideaudience medium, such as a (mobile accessible) Website say, and anyone or everyone in any region can answer the question; in effect, we are asking about all regions at the same time. Since we assume that a query about any region is always answered, the cost is then \(R \cdot \phi\). Now, suppose we want a solution that can achieve a cost less than \(R \cdot \phi\). To find k positive regions, note that a method to do this might be to issue R queries and then wait for a certain fixed period of time for k positive responses and paying for all the first \(K \ge k\) answers obtained on a firstcome firstserve basis^{1}—however, this has already incurred costs in issuing the R queries and also paying for what may be largely \(K{}k\) negative answers; to avoid such costs, we want to select regions to ask about (reducing the cost of issuing queries) and focus search queries to where there is a higher likelihood of getting a positive response. For each query issued on a region, the first answer obtained could be used and paid for, or it could be obtained via taking a majority vote of the first z answers (where z is the number of answers that can be paid for from the budget \(\phi\)).
 3
A third solution which aims to minimise cost and maximise efficiency at the same time is as follows, which will be the main focus in the rest of this paper. In each round, we ask no more questions than that required if all the answers were positive. More precisely, in round i, if \(k_i < k\) regions have already been found where F evaluates to TRUE, in parallel, we ask questions about a further \(k \, {} \, k_i\) regions which we have not asked the crowd about previously. It can be seen that this solution never asks more questions than required in this case, and hence, minimises cost, but at the same time, would provide a means to finish in fewer rounds than solution (1). More precisely, chooseCandidates(\(k,D,R \backslash O\)) is as in Algorithm 2.
This algorithm is essentially that in [16] but tailored to spatial finding, where it was shown that the total number of questions it requires is comparable to solution (1), when both are operating on the same input.
A key feature of the algorithm is how \(choose Candidates(\cdot )\) actually selects the regions to ask about. In contrast to Algorithm 2 which, in each round, randomly selects a region to ask about (which we call the random spatial crowdsourcing), the algorithm we propose later will use a neighbourhood association heuristic to select regions to ask about which will be called associative spatial crowdsourcing. We first discuss the performance of two versions of random spatial crowdsourcing below, one without a heuristic as above and another with a heuristic that embraces failure.
Analysis of spatial crowdsourcing via solution (3) In the worst case, the total cost is \(R \cdot \phi\) with the largest number of rounds R. The best case total cost is \(k \cdot \phi\) with the least number of rounds being 1.
Let us consider the average case. We describe the typical case of the algorithm via a success factor \(0 < \sigma \le 1\), where we assume that in each round, the fraction of queries answered positively with F evaluating to TRUE is \(\sigma\). In random spatial crowdsourcing, we would have \(\sigma = \alpha\), where \(\alpha\), as given earlier, is the proportion of regions in R where F evaluates to TRUE, since our choice of regions to ask about in each round is random.
Thus, to look for k positive regions, the total cost of the algorithm is \(Cost(k) = \phi \cdot Q(k)\).
Lemma 1
The function T is monotonically increasing, i.e. for any m, \(T(n) \ge T(m)\) for all \(n \ge m\).
Intuitively, a larger \(\sigma\) can improve performance both of the cost and number of rounds of the algorithm. But \(\alpha\) is assumed fixed, and so, we introduce heuristics to increase the success rate of queries in each round.
A heuristic that embraces failure: redundant questioning
We set \(\sigma = \alpha\) which means that we use the proportion of regions in R where F evaluates to TRUE as an estimate of the success factor. Now, since each query only has an \(\alpha\) chance of success, we can improve performance by having more queries in each round as also noted in 17]: we will have \(\gamma\) times more, where \(1 \le \gamma \le 1/\alpha\). Note that if \(\gamma = 1\) means zero redundancy as in the solution above in "A crowdsourcing solution" section. That is, we can obtain k regions faster by having \(\gamma \cdot (k{}k_i)\) queries in the next round \(i+1\), where \(k_i\) is the number of regions found to be TRUE so far, up to and including round i. In round \(i+1\), by asking more queries, the number of successes is then \(\lceil (number~of~queries) \cdot \alpha \rceil = \lceil (\gamma \cdot (k{}k_i)) \cdot \alpha \rceil\).
Lemma 2
Given \(\gamma\), the function \(Q'_{\gamma }\) is monotonically increasing, i.e. for any m, \(Q'_{\gamma }(n) \ge Q'_{\gamma }(m)\) for all \(n \ge m\).
Thus, to look for k positive regions, the total cost of the algorithm RSCR is \(Cost'_{\gamma }(k) = \phi \cdot Q'_{\gamma }(k)\).
We also have the following relationship between the number of rounds taken by RSCR (denoted by \(T'\)) and the number of rounds taken by RSCNR (denoted T).
Theorem 1
For any \(\frac{1}{\sigma } \ge \gamma \ge 1\) and a given nonnegative integer n, we have \(T'_{\gamma }(n) \le T(n)\).
And the following relationship between the cost of RSCR (denoted by \(Cost'\)) and the cost taken by RSCNR (denoted Cost).
Theorem 2
For any \(\frac{1}{\sigma } \ge \gamma \ge 1\) and a given nonnegative integer n, we have \(Q'_{\gamma }(n) \le \lceil \gamma \rceil \cdot Q(n)\), i.e. \(Cost'_{\gamma }(n) \le \lceil \gamma \rceil \cdot Cost(n)\).
Theorem 1 means that RSCR can result in fewer rounds than RSCNR, but according to Theorem 2, it is no worse than a factor of \(\lceil \gamma \rceil\) in terms of costs.
We conducted experiments with this technique of posing more queries (with the expectation of \(\sigma = \alpha\) proportion of successes) in each round to see how it helps the performance and how much extra costs it incurs.
Experiments
Typical run
Comparing RSCNR and RSCR
We compare the performance of RSCNR and RSCR using different values of \(k = 20\) and different kinds of distributions of positive regions, but with the total number of regions being 1600 (40 × 40 grid). Below, we give the average number of rounds and average number of question asked over 1000 runs. Note that RSCNR is the case of RSCR with \(\gamma = 1\).

It can be seen that even for all values of \(\alpha\) tested, using additional questions in each round can substantially reduce the number of rounds even to 1, without substantially increasing the number of questions asked, since convergence is quick. For example, averaging over 1000 runs, with \(\alpha = 0.40125\) in Fig. 2a, with \(\gamma = 1.246\), we use a total of four rounds on average with 50 questions on average, compared to using seven rounds on average and 49 questions on average with \(\gamma = 1\) (or no redundant questions), and with \(\gamma = 2.492\), only 1 round is used on average, with 53 questions used on average.

When \(\alpha\) is smaller, the cases where ‘1’s are sparse and harder to find, we see that more questions are asked with RSCR but resulting in much greater reduction in the number of rounds. For example, with \(\alpha = 0.110625,0.050625,0.034375\) in Fig. 2d–f, increasing \(\gamma\) results in a substantial drop in the number of rounds (e.g., 31 to 11, 69–12, and 100–12), with only a small increase in the number of questions asked on average.

In all cases of \(\alpha\) tested, for a large enough \(\gamma\), on average, asking 11 % more questions in total can lead to a reduction to only 1 round required.
Discussion
The results show that if we ask \(1 < \gamma \le \frac{1}{\alpha }\) more questions (the additional questions are socalled redundant questions) than the minimum required in each round, we can significantly reduce the number of rounds, and because we reduce the number of rounds, we end up asking only a small number of additional questions than we needed to in total, compared to no redundancy. We see that this result holds for a large range of scenarios. However, it must be noted that the gains are greater only when \(\alpha\) is small and large enough \(\gamma\), and the value of \(\gamma\) relies on some prior knowledge (or estimate) of \(\alpha\), which might be difficult to obtain in practice; using too small a \(\gamma\) does not result in much improvements in efficiency but using too large a \(\gamma\) wastes questions. The result of average 1 round in Fig. 2a–f are all obtained with \(\gamma\) having the value approximately \(\frac{1}{\alpha }\), as each of the \(k \cdot \frac{1}{\alpha }\) questions randomly chooses a region with \(\frac{1}{\alpha }\) chance of being positive.
Results for \(\alpha = 0.418125\) (\(\alpha = 0.40125\)) with less clustering (more clustering), averaged over 1000 runs
\(\gamma\)  1 (1)  1.196 (1.246)  2.392 (2.492) 
Avg. no. of questions  47 (49)  48 (50)  52 (53) 
Avg. no. of rounds  6 (7)  4 (4)  1 (1) 
A heuristic for spatial finding: neighbourhood associations
In the case where we know nothing about the unobserved regions, any selection is as good as any other. The solutions have not so far considered the case where knowing something about a region tells us something about another region. For example, we assume initially no information, i.e. \(F(r) = TRUE\) with probability \(\alpha\) (and \(F(r) = FALSE\) with probability \(1\,{}\, \alpha\)). The value of \(\alpha\) may be estimated from some initial density measure if there is some a priori information, but here, we take \(\alpha\) to be the true proportion of regions in R that satisfy F. Hence, the success factor \(\sigma\) has been approximated via \(\alpha\) in the experiments above.
It can be seen from the definitions of Q, \(Q'\), T and \(T'\) above and their monotonically increasing properties that if \(\sigma\) was to increase, we can reduce the number of questions and the number of rounds. In this section and the next section, we consider heuristics that can improve the success factor in each round of querying.
Given direct observation of a region, then F(r) must evaluate to true or false, but without direct observation of a region, we can only compute the probability of F(r) being true or false in some way. Note that we say we observe a region whenever we ask the crowd a question about it.
More precisely, let N be a function that returns the immediate neighbours of a region, i.e. \(N(r) \subseteq R\) is the set of regions sharing a boundary with r defined in some way. N(r) would have eight members at most if R is divided into a grid of rectangular regions (including diagonally adjacent regions).
From the point of view of the unobserved region r, it is possible that multiple neighbouring regions have been observed, and so, we need to combine the influence from multiple observed neighbours.
\(Pr(F(r) = TRUE~~obs(N(r))~) = Pr(~F(r) = TRUE~~obs(R)).\)
If we are using solution (3), in each round i, for simplicity, we compute probabilities only for regions not yet observed, with the aim of choosing the \(k{}k_i\) regions most likely to evaluate F to TRUE, and we use only observed information. For example, an unobserved region r that has no observed neighbours will have \(Pr(F(r) = TRUE) = \alpha\) even if all its unobserved neighbours q have estimated \(Pr(F(q) = TRUE~~ obs) > \alpha\) given some observations obs.
In the previous random spatial crowdsourcing algorithm, in SpatialCrowdsourcing (k, F, R) given above, chooseCandidates(\(c,R \backslash O\)) chooses c candidates from \(R \backslash O\) in a random way, and in the associative spatial crowdsourcing algorithm, chooseCandidates(\(c,R \backslash O\)) chooses c candidates from \(R \backslash O\) by selecting the c regions with the highest probability of F evaluating to TRUE, i.e., for each region \(r \in R \backslash O\), we compute the probability of \(F(r) = TRUE\) using the neighbourhood formula above and select c regions with the highest probabilities according to the formula, randomly selecting among equal probability regions.
Experiments with randomly generated area maps
We study the effect that the extent of clustering has with the use of this heuristic as k varies and as \(\delta\) is varied. In the first set of experiments, we generate area maps with \(\alpha\) set to values within the range [0.15, 0.20] and clustering introduced so that where whenever there are three ‘1’s surrounding a region, the region will be a ‘1’ (otherwise the region is either ‘1’ or ‘0’ with equal probability).
Note, however, that with little clustering, associative spatial crowdsourcing provides little to nor advantage, and can even do slightly worse in case it assumed clustering when there wasn’t any. However, as we show in the following examples, contiguous and clustered regions (fortunately) occur in a range of realworld scenarios. Below, we use maps sourced from realworld applications as a starting point representing the current state of the world from which we want to find regions of interest.
Experiments on finding parking
We consider using spatial crowdsourcing to look for regions with parking spaces. For our experiments, we use a parking map abstracted from a San Francisco parking census data, dividing an area into 26 × 20 regions, as illustrated in Fig. 5, which shows the location of parking lots. The problem we address here is then: given the parking map, which we assume here captures the current state of the world with regard to parking in that area, we want to find k = 5 or k = 40 regions where there is parking available, using crowdsourcing. (Note that, in reality, there could be fewer regions with available parking since some of the parking spaces would have been taken up.) Hence, a query will ask if there are parking spaces in a region of size 37 by 37 m, and for simplicity, answers are binary, YES or NO, and we assume truthfulness in answers given.
Experiments on finding crowds
Experiments on finding coverage/bandwidth
A heuristic based on spatial point processes
To go beyond simple immediate neighbourhood influences, we explore an alternative heuristic for choosing potential regions to query about, i.e. to guess the location of positive regions (labelled ‘1’s) within an area, based on modelling the distribution of positive regions using spatial point processes [18].
Suppose that an area R has been divided into I disjoint subareas \(R_1, R_2, \ldots R_I\), i.e. each \(R_i\) has a set of regions. At time t, for a subarea \(R_i\), positive regions are assumed to be distributed according to a Poisson process with intensity \(\lambda _i(t)\) (where intensity here is defined to be the average number of positive regions per unit area, or the potential of an event to appear at any location). The expected number of positive regions in area \(R_i\) is given by \(\lambda _i(t) \cdot R_i\), where \(R_i\) denotes the size of (or the number of regions in) \(R_i\), and the positive regions in \(R_i\) are assumed distributed uniformly within \(R_i\).
Note that we recompute the intensity after each round since we observe more regions after each round and so can improve the model after each round. The method for \(chooseCandidates(\cdot )\) in the spatial crowdsourcing algorithm then chooses the c regions with the highest \(\lambda _h(\cdot )\).
Below, for short, the associative spatial crowdsourcing algorithm using the immediate neighbourhood heuristic given in the previous section is termed ASCIN, and the associative spatial crowdsourcing algorithm using the spatial point process modelling is termed ASCSPP.
Experiments with a randomly generated map
Experiments on finding noisy areas
But for k = 5, as in Fig. 14c and d, there is considerable variation (high standard deviation) and in fact, ASCSPP has, on average, poorer performance than both ASCIN and RSCNR; with k = 5, ASCSPP performs extremely well in some cases but does extremely poorly in particular cases, which raised the average substantially. The reason is as follows: it was found that ASCSPP can end up exploring large sparse areas. For example, dividing the area in Fig. 13b into four subareas: topleft, topright, bottomleft and bottomright, (i.e., \(\frac{TLTR}{BLBR}\)), ASCSPP could sometimes have a good start (for all ASCIN, ASCSPP and RSCNR, in the first round, the initial regions to query are random) finding the first few ‘1’s in the topleft subarea quickly but end up exploring almost the entire sparse subarea in the topleft or bottomleft to find the final ‘1’. But with higher k (e.g., with k = 40), as the search continues, ASCSPP starts to find areas of higher density better than ASCIN and RSCNR and then, on average, outperforms ASCIN and RSCNR.
Related work
Algorithms for crowdsourcing has been a relatively new endeavour but currently a very active area of work.^{3} The past half decade has seen much development in the area, e.g., the work in [17] in crowdsourcing algorithms, the work in [19] on crowdsourcing for discovery, the work in [20, 13, 21] on achieving realtime results in crowdsourcing, and the work in [22] on crowdselection for microtasks. There is already a range of commercial frameworks such as Amazon Mechanical Turk,^{4} CrowdFlower,^{5} and CrowdCloud.^{6} Focus has been myriad, from user interaction aspects to algorithmic and framework aspects.
There is also the emerging trend of geocrowdsourcing, where everexpanding groups of users collaboratively and often voluntarily (or paid to) contribute different types of spatial or geographic information [9, 23, 24]. Also, emerging are work on location/contextbased crowdsourcing where location is key in distributing jobs to workers [5, 25, 11, 26] with applications in transportation and so on, and work in [27] where different strategies are explored to answer locationbased crowdsourcing of queries. However, we believe our work is original in approaching the spatial finding problem. Microsoft Research has an interesting set of spatial crowdsourcing projects, focusing on mechanisms to encourage ordinary people to perform tasks at specific locations.^{7} The gMission system [28] is a platform to support spatial crowdsourcing and provides a range of features including matching potential workers with tasks, but our work focuses on which areas to query rather than workers.
As mentioned earlier, this work is partly motivated by increasing work on mobile crowdsourcing mentioned earlier and mobile crowdsensing [29–32], where mobile context provides valuable situational knowledge that can be crowdsourced. We did not deal with incentives in this paper but assumed that we will get response about a region whenever a query is asked but how to use incentives to get appropriate responses is also an active area of research.
In [25], a Gaussian process model was used to predict future traffic saturation at junctions with sensors with generalisation to junctions without sensors. A Gaussian approach might be used in modelling the distribution of positive regions, but with too small a k and too small a proportion of observed regions, it is uncertain if meaningful predictions can be made with this approach but it could be investigated. Spatial sampling techniques such as spatial simulated annealing using prior information [33] can be employed in place of our random sampling approach and compared to our heuristics.
The work in [34] reviews mobile crowdsourcing pointing out further challenges such as incentive mechanisms, reputation management, and task allocation.
Conclusion
This paper proposed and investigated finding regions of interest from a set of regions of an area using iterative crowdsourcing processes controlled by the principle of a queryfeedback loop interleaved with query adjustment based on responses and heuristics. We have described three simple, though effective, heuristics for reducing the costs (the number of questions required) and increasing the efficiency (or reducing the number of rounds required) in using crowdsourcing for finding regions of interests:

using a proportionate number of redundant questions in each round in the expectation of failure, as already pointed out in earlier work by [17],

immediate neighbourhood associations in the case where regions of interest are clustered contiguously, and

spatial point processes for approximating distribution of positive regions, working even without contiguous positive regions, with approximations improved on each round, with the use of historical information to guide starting queries in cases of low k.
We demonstrated, via a range of maps (synthetic and realworld based), that our heuristics lead to improved performance over randomly choosing regions to ask about. While we use stylised maps based on realworld distributions of parking, crowd, bandwidth coverage, and noise, our research suggests that finding in the city is not as difficult as it can be for phenomena that exhibit some degree of clustering. While our focus has been on spatial problems, we also note that the heuristics are generalisable to nonspatial problems as long as meaningful associations can be defined among items.
Future work involves exploring a combination of the heuristics in real deployments as well as other applicationspecific spatial and geographically based heuristics, and heuristics that exploit historical information—so, for example, we can include historical information in computing probabilities, i.e. for a region r, we calculate \(Pr(~F(r) = TRUE~~obs(N(r))~\wedge ~history(r))\). There are also many applications to explore, from finding vacant/quiet coffeeshops to finding strategic points of interest in emergency situations. Dealing with uncertainties and unresponsive crowds are further issues to consider, e.g., taking into account regions with low density of people. We did not deal with the problem of incentives and strategic sampling will need to be considered in the future.
\(K{}k\) if we have that only positive responses need reply. Sometimes, no reply could be taken as a negative response.
Declarations
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Brabham DC (2013) Crowdsourcing. The MIT Press, CambridgeGoogle Scholar
 Law E, Ahn LV (2011) Human computation. Synthesis lectures on artificial intelligence and machine learning. Morgan & Claypool Publishers, San Rafael
 Michelucci P (2013) Handbook of Human Computation. Springer Publishing Company, Incorporated, New YorkView ArticleGoogle Scholar
 Franklin MJ, Kossmann D, Kraska T, Ramesh S, Xin R (2011) CrowdDB: answering queries with crowdsourcing. In: Proc. of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD ’11, ACM, New York, pp 61–72
 Alt F, Shirazi AS, Schmidt A, Kramer U, Nawaz Z (2010) Locationbased crowdsourcing: extending crowdsourcing to the real world. In Proceedings of the 6th Nordic Conference on HumanComputer Interaction: extending boundaries, NordiCHI ’10, New York, ACM, pp 13–22
 Georgios G, Konstantinidis A, Christos L, ZeinalipourYazti D (2012) Crowdsourcing with smartphones. IEEE Internet Computing 16(5):36–44View ArticleGoogle Scholar
 Gupta A, Thies W, Cutrell E, Balakrishnan R (2012) mClerk: enabling mobile crowdsourcing in developing regions. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’12, ACM, New York, pp 1843–1852
 Charoy F, Benouare K, ValliyurRamalingam R (2013) Answering complex locationbased queries with crowdsourcing. In: Proc. of the 9th IEEE Int. Conf. on Collaborative Computing: Netw., App. and Worksharing, IEEE Computer Society
 Kazemi L, Shahabi C (2012) Geocrowd: enabling query answering with spatial crowdsourcing. In: Proceedings of the 20th International Conference on Advances in Geographic Information Systems, SIGSPATIAL ’12, ACM, New York, pp 189–198
 Della Mea V, Maddalena E, Mizzaro S (2013) Crowdsourcing to mobile users: a study of the role of platforms and tasks. In DBCrowd, pp 14–19
 Tamilin A, Carreras L, Ssebaggala E, Opira A, Conci N (2012) Contextaware mobile crowdsourcing. In: Proceedings of the 2012 ACM Conference on Ubiquitous Computing, UbiComp ’12, ACM, New York, pp 717–720
 Yan T, Marzilli M, Holmes R, Ganesan D, Corner M (2009) mCrowd: a Platform for Mobile Crowdsourcing. In Proc. of the 7th ACM Conference on Embedded Networked Sensor Systems, SenSys ’09, ACM, New York, pp 347–348
 Bernstein MS, Brandt J, Miller RC, Karger DR (2011) Crowds in two seconds: enabling realtime crowdpowered interfaces. In UIST, pp 33–42
 Guo S, Parameswaran A, GarciaMolina H (2012) So who won?: dynamic max discovery with the crowd. In: SIGMOD Conference, pp 385–396
 Parameswaran AG, GarciaMolina H, Park H, Polyzotis N, Ramesh A, Widom J (2012) Crowdscreen: algorithms for filtering data with humans. In SIGMOD Conference, pp 361–372
 Das Sarma A, Parameswaran A, GarciaMolina H, Halevy A (2014) Crowdpowered find algorithms. In: Data engineering (ICDE), 2014 IEEE 30th International Conference on, pp 964–975
 Parameswaran AG (2013) Humanpowered data management. PhD thesis, Department of Computer Science, Stanford University, Stanford
 Cressie N, Wikle CK (2011) Statistics for SpatioTemporal Data., Wiley Series in Probability and StatisticsWiley, HobokenMATHGoogle Scholar
 Faridani S (2012) Models and Algorithms for crowdsourcing discovery. PhD thesis, Berkeley, University of California
 Bernstein MS (2013) Crowdpowered systems. KI 27(1):69–73Google Scholar
 Bigham JP, Jayant C, Ji H, Little G, Miller A, Miller RC, Miller R, Tatarowicz A, White B, White S, Yeh T (2010) Vizwiz: nearly realtime answers to visual questions. In Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology, UIST ’10, pp 333–342, New York, ACM
 Abraham I, Alonso O, Kandylas V, Slivkins A (2013). Adaptive crowdsourcing algorithms for the bandit survey problem. CoRR, abs/1302.3268
 Shahabi C (2013) Towards a generic framework for trustworthy spatial crowdsourcing. In: Proceedings of the 12th International ACM Workshop on Data Engineering for Wireless and Mobile Acess, MobiDE ’13, ACM, New York, pp 1–4
 Sui D, Elwood S, Goodchild M (2013) Crowdsourcing geographic knowledge: volunteered geographic information (VGI) in theory and practice, Springer
 Schnitzler F, Liebig T, Mannor S, Morik K (2014) Combining a gaussmarkov model and gaussian process for traffic prediction in dublin city center. In EDBT/ICDT Workshops, pp 373–374
 Wu D, Zhang Y, Bao L, Regan AC (2013) Locationbased crowdsourcing for vehicular communication in hybrid networks. IEEE Trans Intell Transp Sys 14(2):837–846View ArticleGoogle Scholar
 Benouaret K, ValliyurRamalingam R, Charoy F (2013) Answering complex locationbased queries with crowdsourcing. In CollaborateCom, pp 438–447
 Chen Z, Fu R, Zhao Z, Liu Z, Xia L, Chen L, Cheng P, Cao CC, Tong Y, Zhang CJ (2014) gmission: a general spatial crowdsourcing platform. PVLDB 7(13):1629–1632Google Scholar
 Ganti RK, Ye F, Lei H (2011) Mobile crowdsensing: current state and future challenges. IEEE Commun Mag 49(11):32–39View ArticleGoogle Scholar
 Hu X, Chu T, Chan H, Leung V (2013) Vita: a crowdsensingoriented mobile cyberphysical system. IEEE Trans Emerg Top Comput 1(1):148–165View ArticleGoogle Scholar
 Lane ND, Chon Y, Zhou L, Zhang Y, Li F, Kim D, Ding G, Zhao F, Cha H. Piggyback crowdsensing (pcs): energy efficient crowdsourcing of mobile sensor data by exploiting smartphone app opportunities. In: Proceedings of the 11th ACM Conference on Embedded Networked Sensor Systems, SenSys ’13, ACM, New York, pp 1–7
 Sherchan W, Jayaraman PP, Krishnaswamy S, Zaslavsky A, Loke S, Sinha A (2012) Using onthemove mining for mobile crowdsensing. In: Proceedings of the 2012 IEEE 13th International Conference on Mobile Data Management (mdm 2012), MDM ’12, IEEE Computer Society, Washington, pp 115–124
 Van Groenigen JW (1997) Spatial simulated annealing for optimizing sampling. In: Soares AO, GmezHernandez J, Froidevaux R (eds) geoENV I  Geostatistics for Environmental Applications, vol 9., Quantitative Geology and GeostatisticsSpringer, Netherlands, pp 351–361View ArticleGoogle Scholar
 Ren J, Zhang Y, Zhang K, Shen X (2015) Exploiting mobile crowdsourcing for pervasive cloud services: challenges and solutions. Commun Mag IEEE 53(3):98–105MathSciNetView ArticleGoogle Scholar