In this section, we will measure the prediction accuracy of the P-DPA algorithm by using a transition matrix with two different memory sizes. In addition, we developed a Markov-chain algorithm for predicting the D-trajectory of a targeted individual. The Markov-chain algorithm is used only for comparison with the P-DPA algorithm.
P-DPA with two memory sizes
-
P-DPA_M1 The P-DPA algorithm uses a transition matrix that has the probabilities of visiting different destinations after the current location of individual(x). Hence, suffix _M1 refers to a memory size of one destination.
-
P-DPA_MN Suffix _MN with the P-DPA algorithm indicates that the algorithm uses a transition matrix that has the probabilities of visiting different destinations after visiting n number of destinations. For example, if individual(x) visited location(i), location(j) and location(k), then, the P-DPA algorithm will predict the next destination that has the highest probability to be visited after those three destinations.
Markov-chain algorithm with no memory
Markov-chain algorithm is used in several human trajectory approaches to find the next location of an individual [8–11]. Our Markov-chain algorithm consists of the following steps:
-
The training set. 2 years of different trajectories from the V-Mall dataset and 60 % of the GPS dataset of individual(x) will be used by Markov-chain algorithm for training purposes.
-
Predefined prediction length. Different lengths will be used in the process of predicting the D-trajectory of individual(x). Suffixes _3, _5 and _7 with Markov-chain algorithm indicates that the algorithm will be used to predict a trajectory with a length of 3, 5 and 7, respectively. For example, MC_5 means that Markov-chain algorithm will predict a trajectory of five destinations.
-
The initial distribution matrix \(\pi\) that contains the probabilities of visiting every destination by individual(x).
-
The transitions matrix T that has the probabilities of different transitions of individual(x) from one destination to another. The training set of individual(x) is used to build \(\pi\) and T.
-
The D-trajectory prediction process includes the following steps:
-
i.
The first destination to be visited by individual(x) is the destination that has the highest probability in \(\pi\).
-
ii.
The Markov-chain algorithm does not keep a memory of the last visited destination(s) when making a prediction for the possible next destination. Hence, the next remaining destinations on the predicted trajectory will be found by using the following formula:
$$\begin{aligned} \pi ^i=\pi ^{i-1} * T \end{aligned}$$
(1)
The next destination to be visited by individual(x) is the destination that has the highest probability in \(\pi ^i\).
-
iii.
Based on the predefined prediction length, the previous step will be repeated to find the remaining destinations of the predicted D-trajectory.
Measuring prediction success
Note that while we predict a sequence of stores (a D-trajectory), the evaluation we provide in the rest of the paper focuses on precision and recall measures (analogous to information retrieval since relevance of ads relies on this more) rather than order (order is considered elsewhere for the shopping mall scenario in [12]).
We are measuring the prediction accuracy of the P-DPA algorithm as follows:
$$\begin{aligned} Precision= |D \cap A| / |D| \end{aligned}$$
(2)
$$\begin{aligned} Recall= |D \cap A|/|A| \end{aligned}$$
(3)
$$\begin{aligned} F-measure= 2*((Precision*Recall)/(Precision+Recall)) \end{aligned}$$
(4)
where A is the actual D-trajectory (i.e., its corresponding set of destinations) used by individual(x), and D is the D-trajectory (i.e., its corresponding set of destinations) predicted by the P-DPA algorithm.
As mentioned earlier, a synthetic dataset of indoor D-trajectories and a real dataset of outdoor D-trajectories (derived from a GPS dataset), altogether involving 14 subjects/clients (five in a virtual mall and nine GPS based), were used to evaluate the accuracy of a P-DPA predicted D-trajectory compared to the actual trajectory used by the subject/client.
Dynamic prediction length vs. fixed length
Figures 5 and 6 show how using dynamic prediction length versus using a fixed length resulted in a significantly accurate prediction of the number of destinations that could be visited by a targeted individual. Figure 5 shows the length prediction accuracy for every virtual client in the V-Mall dataset. By using a dynamic prediction length, the P-DPA algorithm was able to predict the number of destinations with an accuracy of 75.94 % with client-S1, 92.99 % with client-M4, 82.30 % with client-M5 and 100 % with client-S2. On the other hand, using a fixed length with Markov-chain algorithm resulted in different accuracy values. The Markov-chain algorithm with a fixed length of 3 was able to achieve a length prediction accuracy of 49.49 % with client-S1, 13.73 % with client-S2 and 9.53 % with client-F8. Using a fixed length of 7 helped the Markov-chain algorithm predict the number of destinations with an accuracy of 43.32 % with client-M4 and 81.04 % with client-M5.
Figure 6 shows the length prediction accuracy for every subject in the GPS dataset of [6]. In comparison to the Markov-chain algorithm with different fixed length values, the P-DPA algorithm was able to achieve the highest prediction accuracy regarding the number of destinations that could be visited, for every subject. The P-DPA algorithm was able to predict the length of the D-trajectory with an accuracy of 71.13 % with SUB-1, 75.57 % with SUB-2, 71.46 % with SUB-3, 60.37 % with SUB-4, 69.42 % with SUB-7, 68.88 % with SUB-8, 71.90 % with SUB-9, 70.67 % with SUB-10 and 67.23 % with SUB-12. On the other hand, the Markov-chain algorithm with a fixed length of 3 was able to achieve a length prediction accuracy of 63.16 % with SUB-2 and 61.76 % with SUB-9, while using a fixed length of 7 resulted in an accuracy of 60.56 % with SUB-8 and 66.08 % with SUB-12. With a fixed length of 5, the Markov-chain algorithm predicted the number of destinations with an accuracy of 67.23 % with SUB-1, 67.08 % with SUB-3, 66.16s % with SUB-4, 65.00 % with SUB-7 and 63.67 % with SUB-10.
Prediction accuracy with V-Mall dataset
Figure 7 shows the Precision value with every virtual client in V-Mall dataset. With client-S1, who has an extremely irregular behavior, Markov-chain algorithm with a fixed length of 3 achieved the highest Precision value of 18.15 %, while P-DPA_M1 achieved 12.90 % and P-DPA_MN achieved 16.80 %. With virtual clients, client-M4, client-M5 and client-F8, who have a regular behavior, the P-DPA algorithm was able to achieve the highest precision values over the Markov-chain algorithm with different fixed lengths. With client-M4, the P-DPA_M1 achieved a Precision value of 83.09 % and the P-DPA_MN achieved a Precision of 82.38 %, while the Markov-chain algorithm with a fixed length of 3 achieved 31.76 %. The highest Precision value with client-M5 was achievable by P-DPA_MN and P-DPA_M1 with Precision of 64.79 and 64.21 %, respectively. The Markov-chain algorithm with a fixed length of 3 was only able to obtain a Precision of 39.01 % with client-M5. Both prediction algorithms were struggling to obtain accurate predictions with client-F8, who is a mother of two children. With client-F8, P-DPA_MN achieved a Precision of 50.03 %, P-DPA_M1 achieved 48.92 % and Markov-chain achieved 44.20 % using a fixed length of 3. The extreme regularity of client-S2 helped both algorithms, P-DPA and Markov-chain, to achieve a Precision value of 100 %.
Figure 8 shows the recall value with the virtual clients in V-Mall. The extreme irregularity of client-S1 was a significant obstacle for P-DPA, and hence, P-DPA_M1 achieved a recall value of 9.57 % and P-DPA_MN achieved 9.69 %, while the Markov-chain algorithm with a fixed length of 7 was able to achieve the highest Recall of 62.31 %. With client-M4, client-M5 and client-F8, the P-DPA algorithm achieved the highest Recall values. With client-M4, P-DPA_M1 achieved a Recall value of 82.09 % and P-DPA_MN achieved 81.24 %, where the Markov-chain algorithm with a fixed length of 7 achieved only 25.89 %. P-DPA_M1 and P-DPA_MN achieved a Recall value of 66.94 % with client-M5, and Markov-chain using a fixed length of 7 achieved 32.61 %. The main challenge for both algorithms was client-F8. With that virtual client, P-DPA_MN achieved a Recall of 46.97 % and P-DPA_M1 achieved 45.99 %, while Markov-chain with a fixed length of 3 could not achieve more than 4.32 %. Even though client-S2 has an extremely regular behavior, Markov-chain algorithm achieved only a Recall of 12.43 % using a fixed length of 3, while both P-DPA_M1 and P-DPA_MN achieved a recall value of 100 %.
Figure 9 shows the prediction accuracy of both algorithms, P-DPA and Markov-chain, through the F-measure values with every client in V-Mall dataset. The highest F-measure value of 22.26 % with client-S1 was achievable by the Markov-chain algorithm with a fixed length of 5, while P-DPA_M1 and P-DPA_MN could not obtain more than 10.29 and 11.26 %, respectively. On the other hand, with clients, client-M4, client-M5 and client-F8, the P-DPA algorithm achieved the highest F-measure values over the Markov-chain with different fixed lengths. P-DPA_M1 achieved F-measure value of 82.33 % and P-DPA_MN achieved 81.51 % with client-M4, while the Markov-chain algorithm with the use of a fixed length of 7 could not achieve more than 25.25 % with the same client. With client-M5, P-DPA_M1 and P-DPA_MN achieved F-measure values of 64.49 and 64.76 %, respectively, whereas Markov-chain with a fixed length of 5 achieved only 31.91 %. Even with the significant shopping history of client-F8, who is a mother of two children, P-DPA_M1 was able to achieve F-measure value of 46.75 % and P-DPA_MN achieved 47.87 %, while Markov-chain with a fixed length of 3 obtained only 7.69 %. The last virtual client is client-S2, who has an extremely regular behavior. With that client, P-DPA_M1 and P-DPA_MN was easily able to achieve a F-measure of 100 %, while F-measure value of 22.56 % was the highest value that the Markov-chain algorithm achieved with a fixed length of 3.
Prediction accuracy with GPS dataset
Figure 10 shows the Precision values with every subject in the GPS dataset of [6]. With SUB-1, Markov-chain with a fixed length of 3 achieved a precision value of 79.87 %, but with a different fixed length, there was a significant drop in the precision value. For instance, with the same subject, SUB-1, MC_5 achieved a precision of 57.74 and 42.28 % with MC_7, while P-DPA_M1 and P-DPA_MN using a dynamic length achieved Precision values of 51.06 and 54.21 %, respectively. P-DPA_M1 with SUB-2 achieved the highest precision value of 77.78 %, and Markov-chain with a fixed length of 3 achieved 72.03 %, while P-DPA_MN comes in third place with a precision of 70.26 %. The highest precision values with subjects, SUB-3, SUB-4, SUB-8 and SUB-10, were achievable by P-DPA_M1 with Precision values of 68.51, 70.85, 38 and 61.70 %, respectively. On the other hand, Markov-chain with a fixed length of 3 achieved the highest precision values with SUB-7 and SUB-12. MC_3 achieved a precision of 58.33 % with SUB-7 while P-DPA_M1 obtained only 48.81 %. With SUB-12, P-DPA_MN achieved a precision of 56.80 % and P-DPA_M1 achieved 53.66 % whereas MC_3 achieved the highest Precision value of 66.64 %. With SUB-9, both algorithms managed to achieve a high precision value, with P-DPA_M1 achieving a Precision of 62.27 % and MC_3 achieving 62.90 %.
Figure 11 shows the Recall values with every subject in the GPS dataset. With SUB-1, P-DPA_M1 achieved a Recall value of 47.04 % and P-DPA_MN achieved 37.25 % while Markov-chain with a fixed length of 7 achieved the highest Recall value of 59.49 %. MC_7 also achieved the highest Recall value of 77.03 % with SUB-2 whereas P-DPA_M1 achieved a Recall of 74.08 % and P-DPA_MN achieved 62.17 %. On the other hand, P-DPA_M1 was able to achieve the highest Recall with SUB-3, SUB-4 and SUB-8, with Recall values of 60.17, 52.54 and 40.16 %, respectively. With those subjects, MC_7 achieved a Recall value of 48.06 % with SUB-3, 48.35 % with SUB-4 and 18.89 % with SUB-8. With SUB-9 and SUB-12, the highest Recall values of 75.21 and 47.16 %, respectively, were achievable by MC_7 while P-DPA_M1 achieved only a Recall value of 56.23 % with SUB-9 and 43.59 % with SUB-12. The Recall values with SUB-7 were very close, with MC_7 achieving a Recall of 43.90 % and P-DPA_M1 achieving 42.85 %. With SUB-10, both algorithms managed to achieve almost identical Recall, with MC_7 achieving a Recall value of 65 % and P-DPA_M1 achieving 64.60 %.
Figure 12 shows the prediction accuracy of both algorithms, P-DPA and Markov-chain, through the F-measure values with every subject in the GPS dataset. The highest F-measure value of 57.92 % with SUB-1 was achievable by Markov-chain algorithm with a fixed length of 3, while P-DPA_M1 and P-DPA_MN could not obtain more than 46.17 and 41.45 %, respectively. On the other hand, with subjects, SUB-2, SUB-3, SUB-4, SUB-7, SUB-8, SUB-10 and SUB12, the P-DPA_M1 achieved the highest F-measure values of 72.51, 60.26, 55.39, 43.15, 38.18, 59.95 and 45.52 %, respectively. With those subjects and in the same order, MC_3 achieved recall values of 65.21, 45.14, 40.69, 42.35, 22.24, 54.10 and 43.10 %. With SUB-9, the first place was reserved for MC_3 with F-measure value of 61.24 %, then P-DPA_M1 in second place with F-measure of 55.63 %, and the third place was for P-DPA_MN with F-measure of 50.96 %.
Self-histories vs. group-histories
In comparison to the prediction accuracy results using Self-Histories in Figs. 9 and 12, Fig. 13 shows the F-measure values that represent the prediction accuracy of both algorithms, P-DPA and Markov-chain, using group-histories style instead of self-histories with the virtual clients in V-Mall dataset. From Fig. 13, we can see a significant drop in accuracy when using group-histories with the P-DPA algorithm. With client-S1, P-DPA achieved F-measure of 0 % in comparison to 10.29 % when using Self-histories. With the P-DPA algorithm and Group-Histories, the drop in F-measure values can be seen as follows: from 82.33 to 74.16 % with client-M4, from 64.49 to 50.85 % with client-M5, from 46.75 to 43.16 % with client-F8, and from 100 to 45.32 % with client-S2, who has an extremely regular behavior. Also, the prediction accuracy of Markov-chain algorithm with different fixed lengths and by using group-histories suffered from a serious drop in accuracy. With Markov-chain algorithm and group-histories, the drop in F-measure values was as follows: from 22.26 to 7.18 % with client-S1, from 25.25 to 12.89 % with client-M4, and from 31.91 to 28.88 % with client-M5. However, MC_7 and with the use of group-histories was able to achieve a significant increase in F-measure value from 7.38 to 66.87 % with client-F8, who is a mother of two children. In addition, MC_7 achieved another increase in F-measure value from 22.42 to 33.66 % with client-S2.
Figure 14 shows the F-measure values that represent the prediction accuracy of both algorithms, P-DPA and Markov-chain, using group-histories instead of self-histories with the subjects in the GPS dataset. From Fig. 14, we can see a significant drop in accuracy when using group-histories with the P-DPA algorithm. With the P-DPA algorithm and Group-Histories, the drop in F-measure values can be seen as follows: from 46.17 to 0 % with SUB-1, from 72.51 to 0 % with SUB-2, from 60.26 to 17.85 % with SUB-3, from 55.39 to 4.71 % with SUB-4, from 43.15 to 0 % with SUB-7, from 38.18 to 7.08 % with SUB-8, from 55.63 to 0 % with SUB-9, from 59.95 to 0 % with SUB-10, and from 45.52 to 35.15 % with SUB-12. In addition, with Markov-chain algorithms and Group-Histories, the significant drop in F-measure values was as follows: from 57.92 to 0 % with SUB-1, from 65.21 to 0 % with SUB-2, from 45.14 to 3.41 % with SUB-3, from 40.69 to 0 % with SUB-4, from 42.35 to 0 % with SUB-7, from 61.24 to 46.77 % with SUB-9, from 54.10 to 0 % with SUB-10, and from 44.15 to 1.12 % with SUB-12.
The results show that there is, in this case, greater value in predicting based on what the individual him/her-self did in the past rather than what other people might typically do.
Monday for monday vs. no preselected days
The P-DPA algorithm focuses on activities and trajectories that occurred on previous weekdays that are of the same day of the week as the current predicted weekday of a targeted individual. For example, if the current weekday is Monday, then, the P-DPA algorithm will focus on recorded trajectories that occurred on up to 2 years of all previous Mondays. Figures 9 and 12 show the good prediction accuracy that P-DPA managed to achieve when using this approach. In comparison, Figs. 15 and 16 show the significant drop in F-measure value when using recorded trajectories for up to 2 years of all previous days. With V-Mall dataset, there was a drop in F-measure value by 2.19 % with Client-S1, 15.96 % with Client-M4, 6.56 % with Client-M5 and 3.35 % with Client-F8, as shown in Fig. 15. With the GPS dataset, the drop in F-measure value was between 3.30 and 11.85 %, a drop by 4.36 % with SUB-1, 11.72 % with SUB-2, 7.71 % with SUB-3, 11.85 % with SUB-4, 3.30 % with SUB-7, 9.24 % with SUB-8, 4.10 % with SUB-9, 3.60 % with SUB-10 and 11.25 % with SUB-12, as shown in Fig. 16.
The results show that selecting histories of the same day of the week as the day whose D-trajectory is to be predicted helps, rather than simply using the history of all days.