Experimental settings
The system evaluation consisted of two phases. The first phase was to analyze the performance of the baseline system without user feedback in field tests. The accuracy and precision of the baseline system was calculated. By analyzing these two performance metrics, we can determine whether or not our baseline system is suitable for comparison purposes. During the second phase, we explored how the proposed user feedback model improved the system performance.
Experiments and evaluations with this feedback model were conducted in an complex indoor office environment, which is the part of the 2nd floor of the Engineering Building at Memorial University. The reason we chose this experimental field is that we can fully control our evaluation process under this environmental setting. The space was divided into a grid using a 3 ×3m cell size. 33 positions were selected within the hallways for training the baseline system (denoted the training area), and an additional 20 positions were selected as untrained positions for testing purposes (denoted the non-training area). A diagram of the setting is provided in Figure 4. System anchors were created in the training area only. Note that the non-training area lacks valid system or user anchors. It can be treated as an area that is the result of environment alteration, a new Wi-Fi coverage area, or a region that was neglected in the training of the system.
The prototype system was developed for iPhone OS 3.1.2; experiments were conducted using the Apple iPhone and iPod Touch devices.
The system training was conducted during semester break (April, 2010). Each RSS fingerprint had been generated by extracting features from 20 Wi-Fi scans, which took approximately two minutes. The baseline system evaluation was conducted during the summer semester (May - July, 2010) with much more interference from other people and their electronic devices. Thus, the RSS data provided by users are more capable of describing the Wi-Fi characteristics of the current environment.
As mentioned earlier in Section ‘Human-Centric collaborative feedback model’, the parameters in the feedback model are used to adjust the rate of change of the α and β factors (i.e., the sensitivity of our user feedback model). In production environments, the sensitivity of the user feedback model will depend on the number of users and the degree of trust of those users. For the purpose of evaluation, we increased the sensitivity of the user feedback model in order to speed up the rate at which the system is able to learn from user feedback.
We set the value of parameter α to be 1, which means that the magnification factor of parameter α is 2. The value of parameter β was set to be 0.6. As such, according to the design of our user feedback, these parameter setting will weight the first four users much larger than subsequent users, which grants the system a fast learning ability.
Baseline system evaluation
Since the time that a user is willing to spend waiting for a positioning result influences the service quality, we have conducted an experiment to investigate the relationship between time (i.e., the number of Wi-Fi scans) and system performance. We use the baseline system to determine the smallest number of Wi-Fi scans (measured at one scan per second) needed for the system to produce a reasonably accurate result. At the same time, the performance of our baseline system can be evaluated with respect to other similar systems described in the literature.
In the training area, for each survey point, we have collected 20 scans of the Wi-Fi RSS, using these incrementally to query the positioning system. The average positioning error after each scan is plotted as the bottom curve in Figure 5. We can observe that for a small number of scans, the system has an error between 2 and 4m. As more scanned RSS data are used (i.e., greater than four), the accuracy stabilizes at around 2m.
The system precision, as another very important metric for system performance, is plotted in Figure 6. We selected the positioning precision for 9 out of the 20 scans, illustrating three phases of Wi-Fi sampling. The early phase consists scans 1, 2, and 3 (red curves). In this phase, due to insufficient Wi-Fi RSS data, the precision is low. The second phase includes scans 5, 10, and 15 (green curves), it is in the middle of the Wi-Fi sampling and has more Wi-Fi RSS data than the first phase. The last phase is at the end of Wi-Fi sampling (scans 18, 19, and 20), which includes all RSS vectors (blue curves). From Figure 6, we can see that the green and blue curves are very close to each other, which means that a scan number greater than four will not generate significant precision improvement. However, if the Wi-Fi scan number is small (i.e., less than four), the probability of generating outliers is considerably high.
Similarly, in the non-training area, we also collected 20 scans for each position. We plotted the positioning accuracy for the number of scans as the top curve in Figure 5 and positioning precision in Figure 7. In this case, the system performance is significantly lower than in the training area due to the lack of system anchors. However, in both the training area and non-training area, four scans provide a reasonable trade-off between performance and positioning time. Therefore, we use this as the number of scans in the remainder of our experiments.
According to the analysis of our baseline system, the average positioning error is between 2m and 4m, respectively, depending on the Wi-Fi sampling time. It is in fact only marginally worse than the 0.7m to 4m average positioning error yielded by the best-performing but intensively trained Horus system (using 100 Wi-Fi scans and much smaller grid space of 1.52 m and 2.13 m) [16]. Thus, we believe this baseline system is qualified to evaluate the value of the proposed human-centric collaborative feedback model.
Collaborative feedback model evaluation
In order to evaluate the benefits of the collaborative feedback model, we have defined a number of different scenarios that represent specific types of behaviours of users. While we do not claim that any of these evaluations represents what would occur in real world use, they allow us to examine how the system will react to different types of feedback. Our future plans for real-world field trials are discussed in Section ‘Conclusions and future work’.
Knowledgable and helpful feedback
Next, we investigate how the user feedback model improves the system performance. In this scenario, whenever the system returns a position that does not match the true position of the user, feedback was provided. We modelled the user as being knowledgable and helpful; whenever the position was inaccurate, the user suggested positive feedback 80% of the time, and negative feedback 20% of the time. We believe it is a reasonable choice for situations where users are highly motivated to provide accurate and positive feedback. In fact, there may be many other users who are providing null feedback (i.e., using the system and trusting the results). However, since such types of users do not affect the evolution of the model, they are not discussed here.
Within the training area, we define a round as a traversal of all grid cells. In a round, the user stops at each survey position to scan the RSS for nearby APs (using four scans). If the result is correct, the user moves to the next position. Otherwise, the user provides feedback before moving on. The average positioning accuracy after nine such rounds of visiting and testing each position is plotted in Figure 8. In the course of providing this user feedback, the positioning error within the training area improved from approximately 2.5m to 1.5m after just four rounds. From there, little change was observed. Note that the baseline system accuracy ranged from 4m to 2m without feedback. At this point, with the integration of human-centric collaborative feedback, the system performance is further improved even in the well trained area.
The precision is also improved after four rounds of user-involved positioning within the training area, as we can see in the green and blue curves which are closer to the y axis than red curves shown in Figure 9. Furthermore, green and blue curves are close to each other, which indicates that the model reaches its optimal performance after approximately four rounds of knowledgeable and helpful feedback.
Within the non-training area, the experiment followed the same procedure as in the training area, producing the data plotted in Figure 8. Because there was no training data in these regions, the initial positioning error was rather large. However, after 13 rounds of collecting user feedback, the error decreased from 9m to 2m. The precision is also significantly increased as plotted in Figure 10. As a result, the system performance in an area that had not been previously trained became comparable to the training area.
The reliable user feedback contains information (user fingerprint) that best characterizes the current Wi-Fi RSS features. Such helpful information can help the system to improve the performance. At the beginning of the test within the non-training area, the model contained only system anchors, and therefore could only return the position of a system anchor (i.e., within the training area) to the user. These positions were often far from the true position of the user. As a result of the positive feedback, user anchors were added and the relative weight of these anchors were enhanced by the α factor. Similarly, with the negative feedback, the weight of the system anchors were reduced by the β factor. As a result, the positioning accuracy increased as more user anchors become valid candidate positions.
What this means for indoor positioning systems is that the system training and maintenance costs can be reduced significantly by relying on knowledgable and helpful end users working on a partially trained system, eventually achieving the same level of accuracy as a fully trained system. Also, the resolution of the positioning system is improved because many reliable user anchors fill the gap between system anchors, thus reducing the grid space or increasing the grid resolution.
At this point, the optimal combination of different types of user feedback is not considered. To conduct experiments testing each possible combination is impractical within a limited time period. In fact, this problem can be explored if we could use a simulation testbed. We can collect a large amount of real Wi-Fi RSS data to simulate the Wi-Fi scans. When the simulated positioning process is finished, virtual user positive or negative feedback can be generated to the evolve the model. As such, the system performance with an arbitrary combination of positive and negative feedback could be estimated.
Mixed feedback
In a real environment, user feedback can be either helpful or malicious. At this point, we assume that the accuracy of user feedback follows the normal distribution. Thus, the feedback from malicious users should exist as outliers. We could employ some supervised classification algorithms such as logistic regression or SVM to classify the malicious users. However, the Wi-Fi RSS fingerprinting based positioning is essentially an unsupervised or instances-based approach (similar to KNN). For instance-based learning, we can cluster different user feedback based on their RSS features and locations, which avoids labeling whether the user is benign or malicious. As described in the previous section, we take the grid-based clustering approach with predefined centres. The reliability of each cluster is compensated by our user feedback model. Furthermore, the performance of instance-based approaches is in fact highly dependent on whether we will have a large dataset or the noise level in training dataset. Thus, if the noise level is very high (e.g., all user feedback are from malicious users), the performance of the system will not be acceptable.
In this experiment, we test the model to determine its ability to recover from incorrect feedback. In particular, we model the user feedback as completely malicious at the beginning and as completely informative thereafter. Such a behaviour is not typical but it provides a “worst case scenario” study of the system, followed by its ability to recover from incorrect or malicious feedback.
Our focus here is on the training area only. As seen in the previous experiments, the non-training area can become nearly as good as the training area with sufficient user feedback. As such, we expect similar results within the non-training area as the training area with respect to mixed feedback.
During the initial phase of this experiment, whenever the system returns a correct position estimation, the malicious user has a 50% chance of either providing negative feedback of suggesting a random false position. When the system is incorrect, the malicious user provides null feedback. Following a similar methodology as the previous experiments, such malicious feedback was provided for four rounds. Another eight rounds of feedback from a knowledgeable and helpful user was then collected.
The position errors for this experiment are plotted in Figure 11. We observe that the system error starts out with around 4m and quickly increases to 14m as a result of the malicious feedback. At the same time, the system precision is also reduced to an unacceptable level, shown as the red curves in Figure 12. With an error of 14m and extremely low precision, the system is considered to be fairly disturbed by the malicious users. At this point, we turn the user into knowledgable and helpful to provide positive feedback whenever the system is incorrect.
The user behaviour in this case is the same as in the previous subsection. The helpful feedback quickly corrects the significant positioning errors, recovering to the starting accuracy after five rounds of feedback, and below 3m after eight rounds. At the same time, the system precision is stabilized as indicated by the blue curves in Figure 13. As a result, our system has recovered from the low accurate state by integrating helpful and knowledgeable feedback.
In real life, helpful and malicious feedback are often mixed together to feed the model. As such, the phenomena described in this experiment might be rarely observed. However, it in fact provides the “worst-case”. If the model can eliminate the negative effect introduced by continuous malicious or unreliable user feedback, then it is reasonable to deduce that it is robust to malicious user feedback in more moderate or general cases.