 Research
 Open Access
 Published:
Personal destination pattern analysis with applications to mobile advertising
Humancentric Computing and Information Sciences volume 6, Article number: 17 (2016)
Abstract
Many researchers expect mobile advertising to be the killer application in mobile business. In this paper, we introduce a trajectory prediction algorithm called personal destination pattern analysis (PDPA) to analyse the different destinations in various trajectories of an individual, and to predict a trajectory or a set of destinations that could be visited by that individual. The PDPA algorithm works on an individual level. Every destinationpattern analysis is related to the selfhistory and the personal profile of a targeted individual, not on what others do. In addition, we developed a prototype system called SmartShopper. SmartShopper is a personal destinationpatternaware pervasive system for mobile advertising in (outdoor and indoor) retail environments. The predicted destinations from the PDPA algorithm will be used by SmartShopper to generate a list of relevant advertisements adapted to the personal profile of previous destinations of a targeted individual. We tested the destination prediction accuracy of the PDPA algorithm with a synthetic dataset of a virtual mall and a real GPS dataset.
Introduction
The significant rapid growth of mobile advertising is making advertising companies consider mobile phones as a new platform for advertising revenue [1, 2]. One of the strategic places for advertising is retail environments, as most mobile phone users visit a mall to purchase products or services, and many researchers are exploiting different methods to generate a personalized list of advertisements that could capture the interest of a mobile phone user [3–5].
The aim of this paper is to propose an approach for generating a list of relevant advertisements for a targeted individual by first predicting the set of destinations that could be visited by that individual. Here, a Dtrajectory is a sequence of destinations visited by a user, not necessarily the actual physical path/walk taken by the user when visiting the destinations. Hence, the term Dtrajectory is used to indicate that we predict a sequence of meaningful destinations that could be visited by a targeted individual, rather than simply a path. Consequently, Dtrajectory is a subsequence of the actual physical path/walk of the targeted individual. We investigate an approach that uses only selfhistories rather than the histories of others for predictions, i.e., rather than predicting the behaviour of a person based on what others typically do as in other work, we predict based only on what the person him/herself did. The key contributions of this paper are:

a novel approach to predict a set of destinations that could be visited by a targeted individual using only the personal history of that individual,

a new perspective on generating a list of advertisements for users, i.e. by using the predicted set of destinations and the personal profile of a targeted individual,

highlighting the use of selfhistories for destination prediction, rather than grouphistories (as in typical clustering style prediction approaches), i.e., we predict based on what the person typically does, rather than what people typically do,

highlighting the notion that taking into account the day of the week is crucial in capturing human routines and regularity, i.e. to predict what a person will do on a day, say its a Tuesday, it can be more useful to look at what the person typically do on Tuesdays, rather than what the person does on all days of the week, and

highlighting two kinds of behavioural regularity, in terms of the number of destinations visited and the actual destinations visited.
It must also be noted that our algorithm predicts a Dtrajectory of a person for a day, given a history of Dtrajectories of the person; the history is a set of Dtrajectories, one Dtrajectory for each day in the person’s history. That is, we are not simply predicting the next destination given visited destinations so far, as in other work.
In order to illustrate the application we have in mind, we developed a prototype system (SmartShopper) and a destination prediction algorithm called personal destination pattern analysis (or PDPA, for short). SmartShopper uses the PDPA algorithm to predict the set of destinations that could be visited by a shopping mall visitor. Then, SmartShopper will use the predicted destinations and the personal history of the targeted mall visitor to generate a list of advertisements that aims to capture the interest of that visitor with high probability. We tested the prediction accuracy of the PDPA algorithm on a synthetic dataset of an indoor mall and on a real GPS dataset of outdoor Dtrajectories for nine persons (the GPS dataset was collected by the LifeMap system of [6]). The experimental results show the good prediction accuracy of the PDPA algorithm.
The rest of this paper is organised as follows. Section reviews related work. Section presents our Dtrajectory prediction algorithm. Section has the details of the synthetic dataset of the virtual mall,and the details of the GPS dataset of [6]. Section shows the experimental results. The details of the prototype system, SmartShopper, are discussed in "System prototype" section. "Conclusion and future work" section concludes the paper with future work.
Related work
Asahara et al. [7] have developed a prediction method that uses a mixed Markovchain model (MMM) to predict the next location of an individual. In their prediction method, they categorize individuals into groups, assuming that individuals in each group are having similar behavior. Accordingly, they build a mixed Markovchain model for every group, and every MMM is using one unobservable parameter. The unobservable parameter is used to determine which mixedmodel of which group of individuals should be used to generate the transition matrix. They tested the prediction accuracy of their MMM on a physical dataset that were collected during a big event conducted in a famous shopping mall in Japan. For 90 min, the trajectories of 691 participants in the experiment during the big event had been recorded. After that, the recorded trajectories had been divided to 10 sets, 9 sets were used for training purposes and only one set was used to test the prediction accuracy of MMM. For the prediction accuracy test, they built 190 mixed Markovchain models and the average accuracy was 64 % [7]. However, recording the participants’ trajectories during a particular event for a specific period of time could result in significant similarity between most of the recorded trajectories. This is because that the movements of the participants were mostly according to the designated path of the big event of the mall. Moreover, they used 190 mixed Markovchain models to achieve the prediction accuracy of 64 %, whilst one Markovchain model with one transition matrix for the 691 participants was able to achieve a prediction accuracy of 45.6 %. The first process of dividing the participants into groups and the second process of building 190 mixed Markovchain models would take time, and time is also required to predict the next location of an individual.
Kolodziej et al. [8] developed an algorithm using an activitybased continuoustime Markov jump process, which is an extension of the AMPuMM algorithm of [9]. In their algorithm, instead of using the discrete time set used in AMPuMM [9], they used a continoustime Markov process to build a model of the users movements. Their algorithm was able to predict the next location of a mobile user within a range of 115–250 m from the correct location of that user. Nevertheless, when the prediction algorithm is only using a set of four locations and the predicted location is that far from the correct location, the prediction algorithm needs a number of improvements to handle a bigger set of locations and to be able to provide a more accurate predictions. Furthermore, our PDPA algorithm is tested using synthetic dataset of a virtual mall that has 72 stores and a real GPS dataset for nine persons with a number of locations between 109 and 448.
Gambs et al. [10] developed an algorithm called, nMMC. The nMMC algorithm predicts the next place of an individual by using the Mobility Markovchain (MMC) model and the last two visited locations. In their algorithm, they built a transitions matrix that has the probabilities of all the different transitions of an individual to place k after visiting place i and place j. The nMMC algorithm managed to achieve a prediction accuracy ranging from 70 to 95 %. However, nMMC predicts the next place of an individual among only a small set of three possible locations: home, work, and other. Furthermore, if the next place predicted by the nMMC algorithm for a targeted individual was Other, it was not mentioned by the researchers how such a prediction could be used to generate and send useful and relevant information to that individual. Moreover, in their algorithm, n has a constant value of 2 which represents the two previously visited locations but our prediction algorithm, PDPA, is using a version of a transition matrix that has a memory size of n, which means it has the probabilities of visiting different destinations after visiting n number of locations by a targeted individual, where n is determined by analysing the dataset.
Another group of researchers designed a framework called LASA, Location Aware Shopping Advertisement, that uses an ontology based formulation of clients and products profiles to generate a list of ads related to the selection history of a targeted client [5]. LASA uses the shopping mall’s WiFi access points to detect the current location of a client, and once the client’s location is detected, it will send a list of titles of different products available in all stores in the coverage area of the access point that discovered the client’s current location. Nevertheless, the coverage area of an access point may include a big number of stores which could result in generating a long list of products’ titles from those stores. Furthermore, a number of access points could detect the signal of a client’s mobile phone which could affect the accuracy of determining the current location of that client. Consequently, failing to discover the correct location of a targeted client will hinder the process of determining the list of products’ titles that should be sent to that client. In addition, our PDPA algorithm predicts a set of stores that could be visited and then generate a list of ads for those stores instead of just generating a possibly long list of ads for every store in the discovered area of a targeted client as in LASA.
Kim et al. [3] developed a system called, AdNext, that uses Bayesian networks to build a transition matrix for the shopping mall clients in order to predict the business type of the next location that a targeted client could visit. AdNext uses the business type of the last two locations visited by a targeted client to predict the business type of the next location for that client. In addition, AdNext detects the current location of the targeted client by using the shopping mall’s access points. Based on the discovered location of the targeted client and the predicted business type, AdNext will start generating a list of ads for every store that lies under the predicted business type and it is also in the area of the detected location. However, predicting the exact next store that could be visited by a client is significantly different from predicting the business type of the next location. Predicting the business type of the next location could result in generating many ads, because such a list will include ads from every store that falls under the predicted type of business. Furthermore, the knowledge of the business type of the last two visited locations is crucial for AdNext to be able to predict the business type of the next location. Consequently, if a targeted client has a plan to only visit one or two stores, then, AdNext will not be able to make any predictions. Instead of predicting the business type, our prediction algorithm, PDPA, predicts a set of stores that could be visited by a targeted client, and then, uses the predicted stores to generate a list of advertisements for the targeted client. In addition, PDPA algorithm can predict the first store to visit by a targeted client while AdNext will need to wait for the targeted client to visit at least two stores in order to be able to make any predictions.
The PDPA algorithm
We developed a trajectory prediction algorithm called PDPA, personal destination pattern analysis. The PDPA algorithm is predicting what we call a Dtrajectory, which is a trajectory of destinations. The Dtrajectory is a subsequence of the actual walking trajectory of a targeted individual (containing meaningful destinations, meaningful as determined by the domain).
Different from other work, note that our algorithm does not predict based on what others do (e.g., in collaborative/clustering prediction style approaches), but based purely on the selfhistory of the individual him/herself. The PDPA consists of the following steps.

Targeted analysis period and targeted trajectories. The PDPA algorithm will analyse up to a history window of 2 years of the different trajectories made by a targeted individual on the same day of the week—e.g. to predict a Dtrajectory for a Wednesday, we look at the Dtrajectories for all Wednesdays in the last 2 years. We observe that same day of the week is a useful selection filter for the history to be used (we also experimentally show this later).

Dynamic prediction length. It is necessary to determine the length of the predicted Dtrajectory. The dynamic trajectory length is used by the PDPA algorithm to determine the number of destinations that are most likely to be visited by a targeted individual on a specific day. The dynamic length will be determined by finding the number of destinations that were visited by an individual on the majority of days that are similar to the day of his/her current activity. For example, if the current activity of individual(x) is occurring on Sunday and the number of destinations that are often visited by individual(x) on all previous Sundays is 7, then the length of the predicted Dtrajectory by PDPA is seven destinations.

The initial distribution matrix. D is an initial distribution matrix that contains the probabilities for every destination to be the first destination to visit by individual(x). The matrix is computed from the selfhistory of individual(x).

The transition matrix. R is a transition matrix that contains the probabilities of different transitions of individual(x) from one destination to another. R will be used as a transition model for individual(x), and it’s built from the set of the targeted analysis trajectories. The PDPA algorithm is using two versions of the transition matrix. The first version of the transition matrix has a memory size of one, which means that the transition matrix has the probabilities of visiting different destinations after visiting the current location of individual(x). The second version of the transition matrix has a memory size of n, which means it has the probabilities of visiting different destinations after visiting n number of locations by individual(x), where n is obtained by analysing the dataset.

Dtrajectory prediction. The following steps show how PDPA predicts the Dtrajectory that is most likely to be used by individual(x) in the current day of his/her current activity.

1.
Once individual(x) starts leaving his/her house or parking his/her car in a shopping mall, PDPA will use the current date and location of individual(x) to fetch the records of his/her various trajectories for a period of up to 2 years that precede the current date.

2.
Then, PDPA will filter and analyse only the trajectories used on all selected historical days, i.e. days that are of the same day of the week as the day of the current activity (e.g. if the current day whose Dtrajectory is to be predicted is Monday, then only the trajectories on Mondays in the history are selected and used).

3.
From the fetched trajectories, the dynamic prediction length (i.e. length of the Dtrajectory to be predicted) will be determined by finding the number of destinations that were visited by individual(x) on the majority of the selected historical days.

4.
Then, PDPA will build the initial distribution matrix D that contains the probabilities for every destination to be visited first by individual(x), and it will also build the transition matrix R that contains the probabilities of different transitions of individual(x) from one destination to another.

5.
The destination that is most likely to be visited first by individual(x) is the destination that has the highest probability in D.

6.
The next destination to be visited by individual(x) is the destination that has the highest probability in R to be visited after visiting the current destination.

7.
We use the dynamic prediction length to determine the number of times required to repeat the previous step to find all the remaining destinations of the predicted Dtrajectory.

1.
Algorithm 1 shows how PDPA predicts the Dtrajectory of individual(x), and Fig. 1 shows an example of a predicted Dtrajectory by the PDPA algorithm. Assuming that we have an individual named Jack, and he is about to leave his house, and the current day is Monday, then, the PDPA algorithm will focus and analyse Dtrajectories that were recorded on previous Mondays. From Fig. 1, the current day is Day29, A(Mon) is the actual trajectory of Jack, and D(Mon) is the predicted Dtrajectory by the PDPA algorithm.
Datasets
In this section we will discuss the details of the two datasets that we used to measure the prediction accuracy of the PDPA algorithm. Using seed information from real individuals, we generated a synthetic dataset that has 3 years of different indoor trajectories and various shopping activities inside a virtual mall. The second dataset is a real GPS dataset of outdoor trajectories for nine persons, which is collected by the LifeMap system of [6] (over roughly 8 months). Note that for our work, the important aspect of the dataset to be considered is the size of the history and the behaviour variability across clients, rather than the number of clients [though there should be sufficient variability in the clients to demonstrate that our system can work with different types of client behaviours (i.e. different extents of behavioural (ir)regularity)]; hence, the nine clients in that dataset and the different generated synthetic behaviours we have were more than adequate to demonstrate the applicability of our algorithms across a range of behaviours (from extremely regular to very irregular).
The virtual mall dataset
We built a simulation system that consists of a virtual mall and a number of virtual clients. The purpose of the simulation system is to generate a synthetic dataset that consists of an adequate number of records of various shopping activities, which could allow us to test the prediction accuracy of the PDPA algorithm.
The virtual mall
The simulation system consists of a virtual mall, called VMall. Table 1 shows the components that we used to build VMall. We used a significant number of components in order to make VMall very close to a mall in the real world.
Shopping behavior model
After the initial construction of VMall, we prepared a survey of 14 questions to assist in understanding the shopping behavior of different individuals and then developed a shopping model for every one of the participants in the survey. The survey’s questions are divided into two groups. The first group consists of questions intended to help VMall understand the relationship between the participant and VMall, to obtain information as follows:

1.
How certain are you about visiting VMall at least one time in any month of the 12 months?

2.
How certain are you about the number of visits that you usually make to VMall in any month of the 12 months?

3.
How certain are you about visiting VMall at least one time on any day of the week?

4.
How certain are you about starting your visit to VMall by using any gate of the 4 gates of VMall as your favorite gate?

5.
How certain are you about starting your visit to VMall at any of the following times (9 a.m., 10 a.m., 11 a.m., 12 p.m., 1 p.m., 2 p.m., 3 p.m., 4 p.m.) on any day of the 7 days in a week?

6.
How certain are you about the number of stores that you usually visit on any day of the 7 days in a week?
The second set of questions is designed to evaluate the relationship between the participant and different stores in VMall, e.g., to get information such as:

1.
How certain are you about visiting a store of the 72 stores?

2.
How certain are you about visiting any of the 72 stores at least one time every month?

3.
How certain are you about visiting any of the 72 stores at least one time on every day of a week?

4.
How certain are you about the time that you usually spend in any of the 72 stores on different days of a week? (Use: 5, 10, 15, 20, 25, >25 min)

5.
How certain are you about using any of the 72 stores as your ”first store to visit” in your different visits to VMall?

6.
How certain are you about buying at least one item from each of the 72 stores in every visit you make to VMall?

7.
How certain are you about the number of different items that you usually buy from each of the 72 stores in every visit you make to VMall?

8.
How certain are you about the chances of visiting a new store in every trip you make to VMall? (New store is a store that have not been visited before)
Note that the idea of the survey is not only to see if the participant visits any particular store or uses a particular gate or visits at a particular time, but for those questions where the answer is positive, for each store/gate in the mall or time slot, we also seek to find out the probability of visiting it and which times are more likely.
We carefully selected three individuals to participate in the survey, after an initial observation that they would likely provide diverse (in terms of regularity of visiting stores) stereotypical behaviors. The participants have the following characteristics: a married female that has two children, and two single males. Note that the number of participants is not as significant as we aim towards distinguishable stereotype behaviors, and the survey is only meant to provide seed data for data generation. The collected answers are used by our simulation system to generate a shopping behavior model for every one of the participants. Such models will help VMall to generate a synthetic dataset of various purchasing activities that could be performed by each stereotype participant in VMall. The system generated 3 years of different trajectories and shopping activities for each one of the participants. Then, the PDPA algorithm will analyse only the personal history of a specific individual, which is different from other collaborative/clustering prediction style approaches.
Clients creation and dataset generation
After building the shopping behavior models of the participants, VMall will start generating various shopping activities through the following settings:

Virtual clients creation. VMall will create 5 virtual clients. Three virtual clients will be created based on the seed data obtained from the shopping behavior model of each one of the survey’s participants (using questions described earlier). Those virtual clients will have the following tags: clientM4, clientM5, and clientF8. In addition, VMall will create another two virtual clients using a shopping behavior model developed completely by VMall. The first virtual client will be created as a client with nonspecific and irregular shopping behavior, and it will be tagged clientS1. Having an extremely irregular shopping behavior means that such client can visit any number of stores on any day and at any time with absolutely no specific reason/pattern. The fifth virtual client will be created as a client with an extremely regular shopping behavior, and it will be tagged clientS2. This client has an extremely regular behavior because he is always visiting the same stores on the same days and at the same times. Figure 2 shows a sample of different trajectories conducted by clientF8 in VMall. The purpose of creating 5 virtual clients with three different types of shopping behaviors is to generate a synthetic dataset that consists of various shopping activities with 3 main stereotypical shopping behaviors, irregular, regular and extremely regular, which allows us to cover the range of possible trajectories better than even real data (which may not exhibit the range of stereotypical behaviours needed for this study). The different transitions of clientS1 and clientS2, as shown in Figs. 3, 4, represent two extreme ends and it can be intuitively observed that the shopping behavior of any individual will be between these two ends (in terms of regularity rather than actual stores visited). Therefore, having three or more respondents to our survey will not change the fact that the shopping behavior of any individual will be between the two extreme ends, irregular and extremely regular.

Generation period. VMall will generate various shopping activities and trajectories for a period of 3 years for every virtual client. 2 years of the generated activities will be assigned to a training set, and the activities of the 3rd year will be used to evaluate the prediction accuracy of the PDPA algorithm. After the completion of the generation step, we will have a dataset that consists of 3 years of various shopping activities for five virtual clients in VMall. Table 2 describes the sizes of the contents of the virtual mall dataset. For this dataset, each destination is a visited store and each Dtrajectory is a sequence of visited stores in a visit to the shopping mall. Note that for five clients, there are five datasets of Dtrajectories, one for each client.
GPS dataset
DTrajectories based on a real GPS dataset was used to evaluate the prediction accuracy of the PDPA algorithm. The GPS dataset contains finegrained mobility data from mobile phones of 12 persons during a period of two months in Seoul, Korea, but only the records of nine persons were extractable.^{Footnote 1} This dataset was collected by the LifeMap^{Footnote 2} system of [6]. Table 3 describes the sizes of the contents of the Dtrajectory dataset derived from this GPS dataset. It should be noted that, for this dataset, we define a destination as a location with stay time >2 min, and a Dtrajectory is a sequence of destinations of an individual in 1 day. Note that for the nine clients, there are nine datasets of Dtrajectories, one for each client/subject.
Evaluations and results
In this section, we will measure the prediction accuracy of the PDPA algorithm by using a transition matrix with two different memory sizes. In addition, we developed a Markovchain algorithm for predicting the Dtrajectory of a targeted individual. The Markovchain algorithm is used only for comparison with the PDPA algorithm.
PDPA with two memory sizes

PDPA_M1 The PDPA algorithm uses a transition matrix that has the probabilities of visiting different destinations after the current location of individual(x). Hence, suffix _M1 refers to a memory size of one destination.

PDPA_MN Suffix _MN with the PDPA algorithm indicates that the algorithm uses a transition matrix that has the probabilities of visiting different destinations after visiting n number of destinations. For example, if individual(x) visited location(i), location(j) and location(k), then, the PDPA algorithm will predict the next destination that has the highest probability to be visited after those three destinations.
Markovchain algorithm with no memory
Markovchain algorithm is used in several human trajectory approaches to find the next location of an individual [8–11]. Our Markovchain algorithm consists of the following steps:

The training set. 2 years of different trajectories from the VMall dataset and 60 % of the GPS dataset of individual(x) will be used by Markovchain algorithm for training purposes.

Predefined prediction length. Different lengths will be used in the process of predicting the Dtrajectory of individual(x). Suffixes _3, _5 and _7 with Markovchain algorithm indicates that the algorithm will be used to predict a trajectory with a length of 3, 5 and 7, respectively. For example, MC_5 means that Markovchain algorithm will predict a trajectory of five destinations.

The initial distribution matrix \(\pi\) that contains the probabilities of visiting every destination by individual(x).

The transitions matrix T that has the probabilities of different transitions of individual(x) from one destination to another. The training set of individual(x) is used to build \(\pi\) and T.

The Dtrajectory prediction process includes the following steps:

i.
The first destination to be visited by individual(x) is the destination that has the highest probability in \(\pi\).

ii.
The Markovchain algorithm does not keep a memory of the last visited destination(s) when making a prediction for the possible next destination. Hence, the next remaining destinations on the predicted trajectory will be found by using the following formula:
$$\begin{aligned} \pi ^i=\pi ^{i1} * T \end{aligned}$$(1)The next destination to be visited by individual(x) is the destination that has the highest probability in \(\pi ^i\).

iii.
Based on the predefined prediction length, the previous step will be repeated to find the remaining destinations of the predicted Dtrajectory.

i.
Measuring prediction success
Note that while we predict a sequence of stores (a Dtrajectory), the evaluation we provide in the rest of the paper focuses on precision and recall measures (analogous to information retrieval since relevance of ads relies on this more) rather than order (order is considered elsewhere for the shopping mall scenario in [12]).
We are measuring the prediction accuracy of the PDPA algorithm as follows:
where A is the actual Dtrajectory (i.e., its corresponding set of destinations) used by individual(x), and D is the Dtrajectory (i.e., its corresponding set of destinations) predicted by the PDPA algorithm.
As mentioned earlier, a synthetic dataset of indoor Dtrajectories and a real dataset of outdoor Dtrajectories (derived from a GPS dataset), altogether involving 14 subjects/clients (five in a virtual mall and nine GPS based), were used to evaluate the accuracy of a PDPA predicted Dtrajectory compared to the actual trajectory used by the subject/client.
Dynamic prediction length vs. fixed length
Figures 5 and 6 show how using dynamic prediction length versus using a fixed length resulted in a significantly accurate prediction of the number of destinations that could be visited by a targeted individual. Figure 5 shows the length prediction accuracy for every virtual client in the VMall dataset. By using a dynamic prediction length, the PDPA algorithm was able to predict the number of destinations with an accuracy of 75.94 % with clientS1, 92.99 % with clientM4, 82.30 % with clientM5 and 100 % with clientS2. On the other hand, using a fixed length with Markovchain algorithm resulted in different accuracy values. The Markovchain algorithm with a fixed length of 3 was able to achieve a length prediction accuracy of 49.49 % with clientS1, 13.73 % with clientS2 and 9.53 % with clientF8. Using a fixed length of 7 helped the Markovchain algorithm predict the number of destinations with an accuracy of 43.32 % with clientM4 and 81.04 % with clientM5.
Figure 6 shows the length prediction accuracy for every subject in the GPS dataset of [6]. In comparison to the Markovchain algorithm with different fixed length values, the PDPA algorithm was able to achieve the highest prediction accuracy regarding the number of destinations that could be visited, for every subject. The PDPA algorithm was able to predict the length of the Dtrajectory with an accuracy of 71.13 % with SUB1, 75.57 % with SUB2, 71.46 % with SUB3, 60.37 % with SUB4, 69.42 % with SUB7, 68.88 % with SUB8, 71.90 % with SUB9, 70.67 % with SUB10 and 67.23 % with SUB12. On the other hand, the Markovchain algorithm with a fixed length of 3 was able to achieve a length prediction accuracy of 63.16 % with SUB2 and 61.76 % with SUB9, while using a fixed length of 7 resulted in an accuracy of 60.56 % with SUB8 and 66.08 % with SUB12. With a fixed length of 5, the Markovchain algorithm predicted the number of destinations with an accuracy of 67.23 % with SUB1, 67.08 % with SUB3, 66.16s % with SUB4, 65.00 % with SUB7 and 63.67 % with SUB10.
Prediction accuracy with VMall dataset
Figure 7 shows the Precision value with every virtual client in VMall dataset. With clientS1, who has an extremely irregular behavior, Markovchain algorithm with a fixed length of 3 achieved the highest Precision value of 18.15 %, while PDPA_M1 achieved 12.90 % and PDPA_MN achieved 16.80 %. With virtual clients, clientM4, clientM5 and clientF8, who have a regular behavior, the PDPA algorithm was able to achieve the highest precision values over the Markovchain algorithm with different fixed lengths. With clientM4, the PDPA_M1 achieved a Precision value of 83.09 % and the PDPA_MN achieved a Precision of 82.38 %, while the Markovchain algorithm with a fixed length of 3 achieved 31.76 %. The highest Precision value with clientM5 was achievable by PDPA_MN and PDPA_M1 with Precision of 64.79 and 64.21 %, respectively. The Markovchain algorithm with a fixed length of 3 was only able to obtain a Precision of 39.01 % with clientM5. Both prediction algorithms were struggling to obtain accurate predictions with clientF8, who is a mother of two children. With clientF8, PDPA_MN achieved a Precision of 50.03 %, PDPA_M1 achieved 48.92 % and Markovchain achieved 44.20 % using a fixed length of 3. The extreme regularity of clientS2 helped both algorithms, PDPA and Markovchain, to achieve a Precision value of 100 %.
Figure 8 shows the recall value with the virtual clients in VMall. The extreme irregularity of clientS1 was a significant obstacle for PDPA, and hence, PDPA_M1 achieved a recall value of 9.57 % and PDPA_MN achieved 9.69 %, while the Markovchain algorithm with a fixed length of 7 was able to achieve the highest Recall of 62.31 %. With clientM4, clientM5 and clientF8, the PDPA algorithm achieved the highest Recall values. With clientM4, PDPA_M1 achieved a Recall value of 82.09 % and PDPA_MN achieved 81.24 %, where the Markovchain algorithm with a fixed length of 7 achieved only 25.89 %. PDPA_M1 and PDPA_MN achieved a Recall value of 66.94 % with clientM5, and Markovchain using a fixed length of 7 achieved 32.61 %. The main challenge for both algorithms was clientF8. With that virtual client, PDPA_MN achieved a Recall of 46.97 % and PDPA_M1 achieved 45.99 %, while Markovchain with a fixed length of 3 could not achieve more than 4.32 %. Even though clientS2 has an extremely regular behavior, Markovchain algorithm achieved only a Recall of 12.43 % using a fixed length of 3, while both PDPA_M1 and PDPA_MN achieved a recall value of 100 %.
Figure 9 shows the prediction accuracy of both algorithms, PDPA and Markovchain, through the Fmeasure values with every client in VMall dataset. The highest Fmeasure value of 22.26 % with clientS1 was achievable by the Markovchain algorithm with a fixed length of 5, while PDPA_M1 and PDPA_MN could not obtain more than 10.29 and 11.26 %, respectively. On the other hand, with clients, clientM4, clientM5 and clientF8, the PDPA algorithm achieved the highest Fmeasure values over the Markovchain with different fixed lengths. PDPA_M1 achieved Fmeasure value of 82.33 % and PDPA_MN achieved 81.51 % with clientM4, while the Markovchain algorithm with the use of a fixed length of 7 could not achieve more than 25.25 % with the same client. With clientM5, PDPA_M1 and PDPA_MN achieved Fmeasure values of 64.49 and 64.76 %, respectively, whereas Markovchain with a fixed length of 5 achieved only 31.91 %. Even with the significant shopping history of clientF8, who is a mother of two children, PDPA_M1 was able to achieve Fmeasure value of 46.75 % and PDPA_MN achieved 47.87 %, while Markovchain with a fixed length of 3 obtained only 7.69 %. The last virtual client is clientS2, who has an extremely regular behavior. With that client, PDPA_M1 and PDPA_MN was easily able to achieve a Fmeasure of 100 %, while Fmeasure value of 22.56 % was the highest value that the Markovchain algorithm achieved with a fixed length of 3.
Prediction accuracy with GPS dataset
Figure 10 shows the Precision values with every subject in the GPS dataset of [6]. With SUB1, Markovchain with a fixed length of 3 achieved a precision value of 79.87 %, but with a different fixed length, there was a significant drop in the precision value. For instance, with the same subject, SUB1, MC_5 achieved a precision of 57.74 and 42.28 % with MC_7, while PDPA_M1 and PDPA_MN using a dynamic length achieved Precision values of 51.06 and 54.21 %, respectively. PDPA_M1 with SUB2 achieved the highest precision value of 77.78 %, and Markovchain with a fixed length of 3 achieved 72.03 %, while PDPA_MN comes in third place with a precision of 70.26 %. The highest precision values with subjects, SUB3, SUB4, SUB8 and SUB10, were achievable by PDPA_M1 with Precision values of 68.51, 70.85, 38 and 61.70 %, respectively. On the other hand, Markovchain with a fixed length of 3 achieved the highest precision values with SUB7 and SUB12. MC_3 achieved a precision of 58.33 % with SUB7 while PDPA_M1 obtained only 48.81 %. With SUB12, PDPA_MN achieved a precision of 56.80 % and PDPA_M1 achieved 53.66 % whereas MC_3 achieved the highest Precision value of 66.64 %. With SUB9, both algorithms managed to achieve a high precision value, with PDPA_M1 achieving a Precision of 62.27 % and MC_3 achieving 62.90 %.
Figure 11 shows the Recall values with every subject in the GPS dataset. With SUB1, PDPA_M1 achieved a Recall value of 47.04 % and PDPA_MN achieved 37.25 % while Markovchain with a fixed length of 7 achieved the highest Recall value of 59.49 %. MC_7 also achieved the highest Recall value of 77.03 % with SUB2 whereas PDPA_M1 achieved a Recall of 74.08 % and PDPA_MN achieved 62.17 %. On the other hand, PDPA_M1 was able to achieve the highest Recall with SUB3, SUB4 and SUB8, with Recall values of 60.17, 52.54 and 40.16 %, respectively. With those subjects, MC_7 achieved a Recall value of 48.06 % with SUB3, 48.35 % with SUB4 and 18.89 % with SUB8. With SUB9 and SUB12, the highest Recall values of 75.21 and 47.16 %, respectively, were achievable by MC_7 while PDPA_M1 achieved only a Recall value of 56.23 % with SUB9 and 43.59 % with SUB12. The Recall values with SUB7 were very close, with MC_7 achieving a Recall of 43.90 % and PDPA_M1 achieving 42.85 %. With SUB10, both algorithms managed to achieve almost identical Recall, with MC_7 achieving a Recall value of 65 % and PDPA_M1 achieving 64.60 %.
Figure 12 shows the prediction accuracy of both algorithms, PDPA and Markovchain, through the Fmeasure values with every subject in the GPS dataset. The highest Fmeasure value of 57.92 % with SUB1 was achievable by Markovchain algorithm with a fixed length of 3, while PDPA_M1 and PDPA_MN could not obtain more than 46.17 and 41.45 %, respectively. On the other hand, with subjects, SUB2, SUB3, SUB4, SUB7, SUB8, SUB10 and SUB12, the PDPA_M1 achieved the highest Fmeasure values of 72.51, 60.26, 55.39, 43.15, 38.18, 59.95 and 45.52 %, respectively. With those subjects and in the same order, MC_3 achieved recall values of 65.21, 45.14, 40.69, 42.35, 22.24, 54.10 and 43.10 %. With SUB9, the first place was reserved for MC_3 with Fmeasure value of 61.24 %, then PDPA_M1 in second place with Fmeasure of 55.63 %, and the third place was for PDPA_MN with Fmeasure of 50.96 %.
Selfhistories vs. grouphistories
In comparison to the prediction accuracy results using SelfHistories in Figs. 9 and 12, Fig. 13 shows the Fmeasure values that represent the prediction accuracy of both algorithms, PDPA and Markovchain, using grouphistories style instead of selfhistories with the virtual clients in VMall dataset. From Fig. 13, we can see a significant drop in accuracy when using grouphistories with the PDPA algorithm. With clientS1, PDPA achieved Fmeasure of 0 % in comparison to 10.29 % when using Selfhistories. With the PDPA algorithm and GroupHistories, the drop in Fmeasure values can be seen as follows: from 82.33 to 74.16 % with clientM4, from 64.49 to 50.85 % with clientM5, from 46.75 to 43.16 % with clientF8, and from 100 to 45.32 % with clientS2, who has an extremely regular behavior. Also, the prediction accuracy of Markovchain algorithm with different fixed lengths and by using grouphistories suffered from a serious drop in accuracy. With Markovchain algorithm and grouphistories, the drop in Fmeasure values was as follows: from 22.26 to 7.18 % with clientS1, from 25.25 to 12.89 % with clientM4, and from 31.91 to 28.88 % with clientM5. However, MC_7 and with the use of grouphistories was able to achieve a significant increase in Fmeasure value from 7.38 to 66.87 % with clientF8, who is a mother of two children. In addition, MC_7 achieved another increase in Fmeasure value from 22.42 to 33.66 % with clientS2.
Figure 14 shows the Fmeasure values that represent the prediction accuracy of both algorithms, PDPA and Markovchain, using grouphistories instead of selfhistories with the subjects in the GPS dataset. From Fig. 14, we can see a significant drop in accuracy when using grouphistories with the PDPA algorithm. With the PDPA algorithm and GroupHistories, the drop in Fmeasure values can be seen as follows: from 46.17 to 0 % with SUB1, from 72.51 to 0 % with SUB2, from 60.26 to 17.85 % with SUB3, from 55.39 to 4.71 % with SUB4, from 43.15 to 0 % with SUB7, from 38.18 to 7.08 % with SUB8, from 55.63 to 0 % with SUB9, from 59.95 to 0 % with SUB10, and from 45.52 to 35.15 % with SUB12. In addition, with Markovchain algorithms and GroupHistories, the significant drop in Fmeasure values was as follows: from 57.92 to 0 % with SUB1, from 65.21 to 0 % with SUB2, from 45.14 to 3.41 % with SUB3, from 40.69 to 0 % with SUB4, from 42.35 to 0 % with SUB7, from 61.24 to 46.77 % with SUB9, from 54.10 to 0 % with SUB10, and from 44.15 to 1.12 % with SUB12.
The results show that there is, in this case, greater value in predicting based on what the individual him/herself did in the past rather than what other people might typically do.
Monday for monday vs. no preselected days
The PDPA algorithm focuses on activities and trajectories that occurred on previous weekdays that are of the same day of the week as the current predicted weekday of a targeted individual. For example, if the current weekday is Monday, then, the PDPA algorithm will focus on recorded trajectories that occurred on up to 2 years of all previous Mondays. Figures 9 and 12 show the good prediction accuracy that PDPA managed to achieve when using this approach. In comparison, Figs. 15 and 16 show the significant drop in Fmeasure value when using recorded trajectories for up to 2 years of all previous days. With VMall dataset, there was a drop in Fmeasure value by 2.19 % with ClientS1, 15.96 % with ClientM4, 6.56 % with ClientM5 and 3.35 % with ClientF8, as shown in Fig. 15. With the GPS dataset, the drop in Fmeasure value was between 3.30 and 11.85 %, a drop by 4.36 % with SUB1, 11.72 % with SUB2, 7.71 % with SUB3, 11.85 % with SUB4, 3.30 % with SUB7, 9.24 % with SUB8, 4.10 % with SUB9, 3.60 % with SUB10 and 11.25 % with SUB12, as shown in Fig. 16.
The results show that selecting histories of the same day of the week as the day whose Dtrajectory is to be predicted helps, rather than simply using the history of all days.
Discussion
Based on the experimental results, we make the following observations.

Compared to the use of a predefined length, using a dynamic length to predict the number of destinations that could be visited by a targeted individual allowed the PDPA algorithm to achieve good prediction accuracy with most subjects and virtual clients, as shown in Figs. 9 and 12.

Figure 17 shows the average Fmeasure with all virtual clients in VMall dataset. The PDPA algorithm was able to achieve the highest average Fmeasure of 60.77 %, while with Markovchain algorithm, the best average Fmeasure of 21.71 % was achievable when using a fixed length of 7. Figure 18 shows the average Fmeasure with all subjects in the GPS dataset of [6]. The PDPA algorithm achieved the highest average Fmeasure of 52.97 % and Markovchain algorithm with a fixed length of 3 achieved an average Fmeasure of 48 %. The problem of showing only the average Fmeasure is that it does not show exactly how accurate the predictions of a prediction algorithm are for each individual. For example, as shown in Fig. 17, the average Fmeasure of 60.77 % achieved by the PDPA algorithm does not explicitly show how the prediction algorithm was able to achieve a significant Fmeasure value of 100 % with clientS2 or a low Fmeasure of 10.29 % with clientS1. In addition, with the GPS dataset, the PDPA algorithm achieved a high Fmeasure value of 72.51 % with SUB2 and low Fmeasure of 38.18 % with SUB8, but both results are hidden inside the average Fmeasure value of 52.97 %, as shown in Fig. 18.

In our work, we focus on the selfhistory of an individual in order to produce predictions related only to that individual. We also show the accuracy of our prediction algorithm with every single individual. Accordingly, in order to have a better idea of why with some individuals the results were significantly better and why with others our algorithm was struggling to produce an accurate prediction, we measured the behavior regularity of every individual and show how such level of regularity could affect the prediction accuracy of our PDPA algorithm, positively or negatively. Williams et. al. [13] defined regularity as repeated activity over time. For example, in their approach, the behavior of a targeted individual can be described as highly regular when that individual is visiting a location at very similar times each week. They tested their approach on three datasets and they found that the majority of individuals in those datasets are deemed to have high irregular behavior. The work in [13] found that the three datasets have finegrained scale and two of the datasets are at a citywide scale, and hence, high irregular behavior for the majority of individuals might have been an expected result by them.
Here, we use a different approach to measure the behavior regularity of a targeted individual. The level of regularity for a targeted individual is assessed by using two types of regularity. The first type of regularity is Destination regularity, which is about measuring the tendency of a targeted individual to visit a similar set of destinations on the same day of the week, e.g. Monday this week is similar to Monday next week. Accordingly, we are trying to find an answer to the following question: what are the chances of visiting the same set of destinations that were visited on the last same day of the week?
We use the following formula to measure the destination regularity of a targeted individual.
where

\(W_k \in W = \{Mon,Tue,Wed,Thu,Fri,Sat,Sun\}\)

\(W_k(L_i) = a\ set\ of\ locations\ visited\ on\ previous\ W_k\)

\(W_k(L_{i+1}) = a\ set\ of\ locations\ visited\ on\ next\ W_k.\)
The second type of regularity is Length Regularity, which is about measuring the tendency of a targeted individual to visit the same number of destinations on the same day of the week. Consequently, we are trying to find an answer to the following question: what are the chances of visiting the same number of destinations that were visited on the last same day of the week?
We use the following formula to measure the length regularity of a targeted individual.
where \(\sigma (W_k)\) is the standard deviation of length difference in all trajectories of subject(x) on \(W_k\).
These two types of regularity, destination and length, were chosen because of their direct impact on our PDPA algorithm. The process to predict the number of destinations that could be visited by a targeted individual will be affected by the level of regularity of that individual regarding the length of his/her trajectory from one weekday to another similar weekday. In addition, the PDPA algorithm is using an initial distribution matrix and a transition matrix. Both matrices will be affected by the level of regularity of a targeted individual regarding the set of destinations that he/she tends to visit from one weekday to another similar weekday.
Figures 19 and 20 show the average destination regularity value for every subject in the GPS dataset for every weekday, and Fig. 21 shows the average and standard deviation for length difference regularity for the same subjects of the GPS dataset. The PDPA algorithm managed to achieve Fmeasure value of 72.51 % with SUB2 and Fmeasure of 38.18 % with SUB8. Both subjects have a fairly substantial level of regularity regarding visiting similar set of destinations, as shown in Fig. 19. However, Fig. 21 shows that SUB2 managed to have good average and standard deviation values regarding the regularity of visiting similar number of destinations on the same day of the week, but SUB8 has the worst average and standard deviation values on most weekdays compared to other subjects. Consequently, the good level of regularity of SUB2 in both types, destination and length, helped PDPA algorithm to achieve Fmeasure of 72.51 % whereas the behavior of SUB8, who has a high level of regularity regarding the set of destinations and a low level of regularity regarding the number of destinations, hindered the PDPA algorithm from capturing a distinctive Dtrajectory pattern, which resulted in a low Fmeasure value of 38.18 %. The behavior regularity level for the remaining subjects can be assessed using the same two types of regularity, which helps explain why with some individuals the results were significantly better while, with others, our algorithm was struggling to produce accurate predictions. PDPA successfully exploits regularities when present.
System prototype
We used the Android platform to develop SmartShopper, a personal destinationpatternaware pervasive system, in order to test the interaction between a client’s mobile phone and the server of VMall. Figure 22 shows the welcome screen of SmartShopper.
Figures 23 and 24 show the PDPA based predicted set of stores that could be visited by the targeted client and the prediction processing time, which is 0.14 seconds.
Then, the client can click on any of the predicted stores and a list of advertisements will be generated and sent to him/her, as shown in Fig. 25. Figure 26 shows the details of a chosen ad by the targeted client.
Conclusion and future work
In this paper, we have introduced a trajectory prediction algorithm, PDPA, that uses the selfhistory of a targeted individual in order to predict the set of destinations that could be visited by him/her. We show that selfhistories can provide reasonable predictions of future destinations, which can be exploited for ads. In future work, extensive analysis of the behavior of an individual is required in order to be able to identify which other behavior attributes can affect the prediction accuracy of PDPA, positively or negatively. Other personal profile information can be integrated to possibly improve the predictions such as past histories of transactions or purchases and current situations of the user. We think that our approach of using personal/selfhistories, rather than the combined histories of many people, can provide more personalised ads, and will address issues of variations of behavior across individuals. Further work will experimentally compare and contrast predictions that use the selfhistory of what a person typically does with predictions that use the combined histories of what people typically do in other scenarios.
Notes
 1.
Data for SUB5, SUB6 and SUB11 are not extracted.
 2.
CRAWDAD dataset yonsei/lifemap(v.20120103), http://crawdad.cs.dartmouth.edu/yonsei/lifemap, Jan. 2012.
References
 1.
Dhar S, Varshney U (2011) Challenges and business models for mobile locationbased services and advertising. Commun ACM 54(5):121–128
 2.
Krumm J (2011) Ubiquitous advertising: the killer application for the 21st century. IEEE Pervasive Comput 10(1):66–73
 3.
Kim B, Ha JY, Lee S, Kang S, Lee Y, Rhee Y, Nachman L, Song J (2011) Adnext: a visitpatternaware mobile advertising system for urban commercial complexes. In: Proceedings of the 12th workshop on mobile computing systems and applications. ACM pp. 7–12
 4.
Sánchez JM, Cano JC, Calafate CT, Manzoni P (2008) Bluemall: a bluetoothbased advertisement system for commercial areas. In: Proceedings of the 3rd ACM workshop on performance monitoring and measurement of heterogeneous wireless and wired networks. ACM pp. 17–22
 5.
Liapis D, Vassilaras S, Yovanof GS (2008) Implementing a lowcost, personalized and location based service for delivering advertisements to mobile users. In: 3rd international symposium on wireless pervasive computing 2008. ISWPC 2008, IEEE pp. 133–137
 6.
Chon Y, Shin H, Talipov E, Cha H (2012) Evaluating mobility models for temporal prediction with highgranularity mobility data. In: IEEE International Conference on pervasive computing and communications (Percom) 2012. IEEE pp. 206–212
 7.
Asahara A, Maruyama K, Sato A, Seto K (2011) Pedestrianmovement prediction based on mixed markovchain model. In: Proceedings of the 19th ACM SIGSPATIAL international conference on advances in geographic information systems. ACM pp. 25–33
 8.
Zhang D, Xia F, Yang Z, Yao L, Zhao W (2010) Localization technologies for indoor human tracking. In: 5th international conference on future information technology (FutureTech), 2010. IEEE pp. 1–6
 9.
Kolodziej J, Khan SU, Wang L, MinAllah N, Madani SA, Ghani N, Li H (2011) An application of markov jump process model for activitybased indoor mobility prediction in wireless networks. In: Frontiers of information technology (FIT). IEEE pp. 51–56
 10.
Gambs S, Killijian MO, del PradoCortez MN (2012) Next place prediction using mobility markov chains. In: Proceedings of the first workshop on measurement, privacy, and mobility. ACM p. 3
 11.
Mathivaruni R, Vaidehi V (2008) An activity based mobility prediction strategy using markov modeling for wireless networks. In: Proc of the world congress on engineering and computer science 2008 WCECS. Citeseer pp. 379–384
 12.
Barzaiq O, Loke SW, Lu H (2015) On trajectory prediction in indoor retail environments for mobile advertising using selected selfhistories. In: Proceedings of the 12th IEEE international conference on ubiquitous intelligence and computing (UIC 2015)
 13.
Williams MJ, Whitaker RM, Allen SM (2012) Measuring individual regularity in human visiting patterns. In: Privacy, security, risk and trust (PASSAT), 2012 international conference on social computing (SocialCom). IEEE, pp 117–122
Authors' contributions
Both coauthors contributed significantly to the research and this paper, and the lead author is the main contributor. Both authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Author information
Affiliations
Corresponding author
Additional information
Osama O. Barzaiq and Seng W. Loke contributed equally to this work
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Barzaiq, O.O., Loke, S.W. Personal destination pattern analysis with applications to mobile advertising. Hum. Cent. Comput. Inf. Sci. 6, 17 (2016). https://doi.org/10.1186/s1367301600732
Received:
Accepted:
Published:
Keywords
 Personal destinations pattern analysis
 Mobile advertising
 Human mobility
 Dtrajectory prediction