Next Place Prediction Based on Spatiotemporal Pattern Mining of Mobile Device Logs

Due to the recent explosive growth of location-aware services based on mobile devices, predicting the next places of a user is of increasing importance to enable proactive information services. In this paper, we introduce a data-driven framework that aims to predict the user’s next places using his/her past visiting patterns analyzed from mobile device logs. Specifically, the notion of the spatiotemporal-periodic (STP) pattern is proposed to capture the visits with spatiotemporal periodicity by focusing on a detail level of location for each individual. Subsequently, we present algorithms that extract the STP patterns from a user’s past visiting behaviors and predict the next places based on the patterns. The experiment results obtained by using a real-world dataset show that the proposed methods are more effective in predicting the user’s next places than the previous approaches considered in most cases.


Introduction
Owing to the recent exponential growth of location-aware services based on mobile devices, such as smart phones, smart watches and tablet PCs, predicting a user's next place becomes an important research topic in both academia and industry [1][2][3][4][5]. This problem concentrates on predicting a place that will be visited by a user in advance before she/he arrives, on the basis of the user's past visiting behaviors inferred through utilizing sensors, such as Global Positioning System (GPS) and wireless fidelity(WiFi) sensor, that are commonly available in modern mobile devices.
When the level of geographical granularity for prediction comes into consideration, a more precise level is desired to enable further sophisticated services. Through discovering the next places at the level of users' daily lives, such as local shops and school cafeterias, various customized applications can be enabled, including recommendation of tailored information, such as automated reservation and personalized advertisements [6][7][8].
To predict a user's next place, three types of patterns, namely sequential, temporal sequential and periodic patterns, have been intensively studied. Mobile sequential patterns have been utilized to predict next places based on frequently-observed sequential patterns of the places visited [9]. Gambs et al. proposed a modified version of a Markov chain to predict next places by analyzing mobile movement behaviors [10]. Similarly, Alavarez-Garcia et al. and Jeung et al. employed a hidden Markov chain-based method to infer a user's final place [11,12].
Gidófalvi et al. extended the Markov chain-based approach for predicting next places in a continuous manner by adopting an inhomogeneous continuous-time Markov model [13,14].
Rodriguez-Carrion et al. suggested a light version of the Lempel-Ziv (LZ) based prediction algorithm to perform predictions on mobile devices [15,16]. Morzy and Pei et al. proposed a rule-based approach that discovers associations between an individual user and a place through utilizing a modified a priori algorithm [17][18][19].
Furthermore, several attempts have been made to enhance the performances of next place prediction by considering both spatial and temporal aspects. Giannotti et al. classified users' moving intentions into geographically-triggered and temporally-triggered intentions in terms of place and time [20]. Lu et al. proposed a methodology for mining two types of trajectory patterns, periodic behavior and swarm pattern [21]. By using a nonlinear time series analysis, Scellato et al. attempted to additionally consider arrival time in predicting next places [22]. Wang and Prabhala proposed a user-specific periodicity model based on each user's visiting history [23].
While the previous work mainly focused on the problem of predicting the user's next locations in terms of cell IDs [15,16,24] or at the levels of intra-city or inter-cities [9,10,14,22,25,26], the geographical granularity considered in this paper is at the level of people's daily lives (e.g., buildings). When applied to such a fine level of location granularity, the previous approaches suffer from one or more limitations due to the following unique characteristics of the next place prediction problem discussed in this paper.
First, mobile device logs are likely to contain much noise and missing data related to users' past visits, which is caused by various reasons, such as measurement errors, wireless connection problems or unpowered mobile devices. Dealing with such noise and missing data is crucial, since they make it difficult to achieve accurate parameter estimation and rule generation, leading to unrealistic predictions of next places eventually. Moreover, as the geographical granularity becomes finer, the impact of such error-prone data on the performance of a prediction method becomes more severe.
Next, compared to the cases with coarse location granularity, there is a larger amount of irregular visits in the past history of a user in the considered problem, which makes it even more difficult for a prediction model to accurately identify the user's visiting patterns. For instance, various irregular visits, such as going shopping or to movies, are frequently found between the regular visits of going to and returning from work. As a result, if those irregular visits are simply ignored, prediction models often fail to capture the patterns hidden among them.
Finally, it is necessary to be able to predict the next place for a user by utilizing only a small amount of observations available for the user, since collecting a user's mobile device log is usually a time-consuming and costly task. Accordingly, the methods, such as rule mining and decision tree, that require a significant amount of history information for prediction do not appear to be a viable option when time and cost are an issue.
Motivated by the above remarks, we attempt to develop a novel framework that aims to predict a user's next place based on the user's past visiting behaviors through considering periodicity in addition to time and location. To address the three challenges mentioned above, the proposed framework maps the individual visit of a user to one of the visiting patterns by utilizing the pattern extraction algorithms and the pattern similarity function proposed in this research.
The proposed framework constructs spatiotemporal (ST) trajectories, each of which represents a sequence of stays in terms of place and time, from a limited amount of past visit data for each user. Spatiotemporal-periodic (STP) patterns are then extracted from the user's ST trajectories by the proposed STP extraction algorithm. The algorithm searches for STP patterns through considering both occurrence frequencies and associations with ST trajectories with respect to time for effective recognition of irregular or new visits as STP patterns. In particular, we employ a smoothing function to deal with the noisy and missing data.
Subsequently, STP trajectories are built by mapping each ST trajectory to an STP pattern that is most similar to the trajectory among the extracted STP patterns. Basing on gapped sequence mining [27], the proposed framework is able to identify user's sporadic visits in her/his daily life through constructing gapped STP (GSTP) trajectories that allow gaps to accommodate irregular visits that cannot be specified in advance. The next place visited by a user is then predicted by the proposed prediction algorithm based on the user's current and recent visits. This paper is organized as follows: In Section 2, the details of the proposed methods are described. In Section 3, the data collection details and experimentation results are described. The conclusions are presented in Section 4.

Proposed Framework
In this section, we describe in detail how the proposed framework extracts GSTP trajectories and predicts the user's next places through considering sequential, temporal and periodic characteristics of a mobile device log. Figure 1 illustrates the overall training process of the proposed framework that consists of four steps to generate GSTP trajectories. The training process proceeds as follows.  First, an ST trajectory, defined as a sequence of stays in which each stay is represented in terms of a place visited, and the arrival and departure time, as well as the day of week for the visit, is constructed from raw data. Second, we extract STP patterns from ST trajectories to capture periodic revisits by taking periodicity into consideration. The existence of an STP pattern for a user indicates that the user tends to periodically revisit a particular place at a specific time associated with the pattern.
Next, in the STP trajectory construction step of Figure 1, ST trajectories are mapped into a sequence of the extracted STP patterns, named the STP trajectory, based on the similarity between an STP pattern and an element of an ST trajectory. Finally, gap-constrained sequential pattern mining is applied to the STP trajectories to construct a user's GSTP trajectory that allows unobserved places in the user's STP trajectories. The generated GSTP trajectories from the training process are then used for prediction of the next place when the user's most recent STP trajectory data are provided as test data. The detailed descriptions are presented in the following sections.

WiFi-Based Place Identification
We employ a WiFi fingerprint-based localization method [28] for extracting the places visited from a user's mobile device log. It is well known that this method has advantages over GPS-based approaches when tracking and identifying people's movements in indoor environments, particularly in urban areas, and the method also provides several benefits in terms of energy efficiency, compared to the GPS-based ones, as it utilizes WiFi sensor data.
The localization method requires a WiFi fingerprinting database, containing WiFi access point (AP) data, each of which consists of a place, p, the basic service set identifier (BSSID) and a range of received signal strengths (RSSI) observed at p. The database is used to infer a user's visit to a place by matching WiFi APs with those in the database according to BSSIDs and their RSSI ranges. Table 1 shows an example for the WiFi fingerprint-based localization method. The example shows a WiFi fingerprinting database, raw WiFi data and the localization result in Table 1a-c, respectively, where distinct places are indexed. As an example, for an instance observed at 15:00 on 11 November 2013 in Table 1b, the place is identified as p 2 , since only p 2 in the database has the matching BSSIDs and RSSI values in range with those of the instance.  Table 1b indicates that the place can be either p 2 or p 3 according to Table 1a, but only the RSSI range of p 2 contains −55, the observed RSSI value. Similarly, the observations of BSSID_2 and BSSID_3 at 15:00 on 11 November 2013 also indicates that place must be p 2 , and therefore, the place at 15:00 on 11 November 2013 is inferred to be p 2 . Through repeating this process for all of the instances in raw WiFi data collected, the places visited by a user, as well as the corresponding timestamps can be constructed. An example of the localization result is shown in Table 1c, where each timestamp is interpreted as the time when the user arrived at a place, indicating that the user was at p 2 from 15:00 to 15:45 on 11 November 2013, for instance.

ST Trajectory Construction
Once the places and their associated timestamps are identified, the ST trajectory of the i-th day, T i , is constructed. T i denotes a sequence of stays, and each stay, T i,j , is defined as a four-tuple, (p, t s , t f , d), where t s and t f , respectively, represent the start and finish time of the stay in minutes, and d ∈ {Mo, Tu, We, Th, Fr, Sa, Su} is the day of the week for T i . T i,j 's are ordered chronologically in T i .
T i,j is identified from the log containing the localization result by grouping consecutive logs corresponding to the same place while preserving the ascending order of the timestamps. t s and t f are determined to be the start and the finish time of the group, respectively, and p is set to the place of the group. Subsequently, T i is constructed from identified T i,j 's according to the value of i of T i,j . Table 2 shows an example of ST trajectories constructed from the data in Table 1c, assuming that 11 November 2013 is Monday. The first stay, T 1,1 , shown in Table 2, corresponds to the first instance of Table 1c, since the places of the first and second instance of Table 1c are different. As a result, t s and t f for T 1,1 are set to the timestamps of the first and second instance of Table 1c, respectively, leading to T 1,1 = (900, 945, p 2 , Mo). Since the minutes are measured from the beginning of a day, the user is inferred to have stayed at p 2 between 15:00 to 15:45 on Monday. The other stays are generated similarly from Table 1c, and T i 's are constructed as shown in Table 2.

STP Pattern Extraction
In this paper, we consider a weekly periodicity for extracting STP patterns of a user, as most people have weekly visiting patterns [29]. Extraction of the STP patterns from ST trajectories consists of three steps: grouping ST trajectories based on weekly periodicities, computing the probabilities of a stay according to its periodicity group membership and generating STP patterns from the probabilities.
Specifically, ST trajectories are grouped according to the day of the week contained in T i to accommodate the weekly periodicity of user's movements. For each group of ST trajectories, the probability of a stay is computed through examining whether or not a user has visited a place at a specific time based on the arrival and departure time. Then, STP patterns are extracted by finding time segments that exceed a certain threshold in terms of the probability. Detailed description of each step is given in the following.

ST Trajectory Grouping for Periodicity Identification
In order to take the periodicity into account, ST trajectories are grouped based on a weekly periodicity, denoted as D ∈ D, where D denotes a set of all possible combinations of the days of the week, {Mo, Tu, We, Th, Fr, Sa, Su}. For instance, D = {Mo, We} represents a periodicity of visits that tend to be made on every Monday and Wednesday. We let T D be the set of T i 's containing T i,j whose d belongs to D. For instance, if D = {Mo}, T {Mo} formed from Table 2 can be expressed as where N is the set of positive integers.

Computing the Probability of a Stay
Given D, T D is used for computing the probability of a stay at place p at a discrete time, t ∈ {1, ..., t max }, denoted as q D,p,t , where t and t max , respectively, are the time since the beginning of a day and the time at the end of the day. Both t and t max are measured in minutes, ranging from 1 to 1440, and accordingly t max = 1440.
A procedure for calculating q D,p,t based on counting the number of stays in the ST trajectories that belong to T D is shown in Algorithm 1 in which P(T D ) is a function that returns the set of places included in the ST trajectories contained in T D . In Line 2, the temporary variable, Q D,p,t (T i ), that stores the probability of a stay for T i is initialized to zero for all t. In Lines 5 to 11, Q D,p,t (T i ) becomes one if there exists a stay T i,j = (p, t s , t f , d) ∈ T i , such that t s ≤ t ≤ t f , and set to the maximum between its current value and the result of smoothing, otherwise. Finally, q D,p,t is computed by averaging Linear smoothing is applied to q D,p,t to accommodate the variability of a stay, as well as noise in the raw WiFi data. λ s is a slope parameter that determines the penalty amount for the stays that are not exactly matched in terms of time. The penalty is proportional to the distance between t and t s or between t and t f , increasing Q D,p,t (T i ) as t goes near t s or t f , but only up to Q D,p,t s (T i ) or Q D,p,t f (T i ), respectively. Algorithm 1: Algorithm for Calculating the Probability of a Stay.  Figure 2a,b respectively shows the calculation results of Q {Mo},p 2 ,t (T 1 ) and Q {Mo},p 2 ,t (T 15 ) over time, based on Table 2 Figure 2c shows the plot for q {Mo},p 2 ,t based on T 1,1 , T 1,3 , T 1,5 , T 15,1 and T 15,3 for each t, where trapezoid shapes are attributed to the application of smoothing to the stay probabilities.
Specifically, the value of q {Mo},p 2 ,t is computed as follows: First, we consider T 1,1 and compute Q {Mo},p 2 ,t (T 1 ). Since t s and t e for T 1,1 are 900 and 945, respectively, Q {Mo},p 2 ,t (T 1 ) = 1 when t ∈ [900, 945]. For other t's, the linear smoothing function in Algorithm 1 is applied. As a result, Q {Mo},p 2 ,t (T 1 ) = 0 when t < 880 or t > 965, Q {Mo},p 2 ,t (T 1 ) = 1 − 0.05(900 − t) when t ∈ [880, 900) and Q {Mo},p 2 ,t (T 1 ) = 1 − 0.05(t − 945) when t ∈ (945, 965]. The probabilities of stays at p 2 of T 1,3 and T 1,5 are computed similarly, and the results are shown in Figure 2a. Subsequently, we repeat the above probability calculations for all T i ∈ T D and set q {Mo},p 2 ,t as the average of probabilities computed for T 1 and T 15 , resulting in the thick line in Figure 2c.

Extracting STP Patterns
After computing q D,p,t for all D, p and t, we proceed to compute the set of STP patterns for D, denoted as Π D . We define each STP pattern π ∈ Π D as a triplet, (p, τ s , τ f ), where τ s and τ f , respectively, stand for the start and finish time of π. Given periodicity D and place p, STP patterns are extracted from q D,p,t by finding the time segments whose associated probabilities are greater than a certain threshold, θ. Detailed descriptions on how to extract the STP patterns are shown in Algorithm 2.
Algorithm 2: Algorithm for STP Pattern Extraction. In Lines 3 to 15 of Algorithm 2, STP patterns are identified only for the consecutive time epochs whose probabilities are greater than or equal to θ. E is a temporary variable that records the set of consecutive time epochs, such that q D,p,t ≥ θ. When q D,p,t falls below θ and E is not empty, a new STP pattern is identified by respectively setting τ s and τ f to be the start and finish time of the new pattern in Lines 8 to 9. Then, the new pattern is added to Π D . The above procedure is repeated for all p ∈ P(T D ), and the algorithm finally returns the set of extracted STP patterns, Π D . Table 3a shows an example of STP patterns extracted from the ST trajectories in Table 2 when T {Mo} = {T 1 , T 8 , T 15 } and θ = 0.5. In Table 3a, π 1 is extracted from T 1,1 , T 1,3 and T 15,1 , while π 3 is from T 1,5 and T 15,3 . STP patterns, π 1 and π 3 , are illustrated as shaded areas in Figure 2c. For instance, π 1 is extracted from q {Mo},p 2 ,t as follows: the goal is to find the consecutive time segments that satisfy q {Mo},p 2 ,t ≥ θ. Since q {Mo},p 2 ,t = 2×(1−0.05(900−t)) 3 and q {Mo},p 2 ,t = 2−0.05(t−1035) 3 when t ∈ [880, 900] and t ∈ [1035, 1055], respectively, t ∈ [895, 1045] satisfies q {Mo},p 2 ,t ≥ θ. Therefore, τ s and τ f of π 1 are set to 895 and 1045, respectively, and as a result, π 1 = (p 2 , 895, 1045). Table 3. Examples of (a) STP patterns; and (b) STP trajectories.
Note that the existence of T 1,2 between T 1,1 and T 1,3 was ignored during the construction of π 1 . This is due to the smoothing applied to the probability of a stay, q {Mo},p 2 ,t , after the finish time of T 1,1 and before the start time of T 1,3 , resulting in the effect of treating the user's stay at p 1 during 10 min as a temporary visit that is often observed while a user is moving to another location. Indeed, the smoothing allows us to effectively combine the multiple re-visits to the same place even though the time intervals of their stays do not overlap, while providing a means to deal with temporary or irregular visits.

Generation of STP Trajectories
The user's movement pattern is represented as an STP trajectory that is generated from ST trajectories by utilizing the extracted STP patterns for the user. We let s denote an STP trajectory. s is a sequence consisting of symbols, each of which corresponds to an STP pattern or event. It starts with event e s and ends with event e f , respectively indicating the start and finish of s. The set of s's generated for weekly periodicity D is denoted as S D .
s is constructed by replacing each stay in an ST trajectory with the STP pattern that is most similar to the ST trajectory while sequentially exploring each ST trajectory in the ascending order of time. The similarity between a stay and an STP pattern is calculated based on overlap between the time segments of the stay and the pattern.
Specifically, the similarity between T i, and π , denoted as t-sim(T i, , π ), is defined as Equation (1). t-sim(T i, , π ) = length of overlap between time intervals of T i, and π length of time interval of T i, Algorithm 3 shows the detailed procedure for generating a set of STP trajectories from T D and Π D , given a threshold for pattern similarity, θ . In Algorithm 3, · is used for representing a sequence, and ⊕ denotes an operator for the concatenation of two sequences.
Algorithm 3: Algorithm for STP Trajectory Construction. In Line 2, e s is added to s to represent the start of s, and in Line 6, stay T i, with the smallest t s is selected to traverse in the ascending order of time and is removed from T i in the next line. From Line 8 to 14, the algorithm attempts to find the matching patterns for T i, based on t-sim() among the candidate STP patterns by traversing the patterns in Π D one by one in the chronological order of τ s .
When there is no matching pattern found for T i, , the event of visiting place P(T i, ), denoted as e(p ), instead of an STP pattern is added to s as in Lines 15 to 18, where P() is a function that returns the place contained in stay T i, . The STP trajectory for T i is augmented with s in Line 19, and e f is appended to s to indicate the end of the sequence in Line 21. Finally, the constructed STP trajectory s is added to S D as a member, and the algorithm returns S D as an output. Table 3b shows an example of STP trajectory construction result from the ST trajectories in Table 2 by applying Algorithm 3 with the STP patterns defined in Table 3a. In Table 3b, STP trajectory s 1 consists of three STP patterns, π 1 , π 2 and π 3 . For constructing s 1 , π 1 is selected as a matching pattern for T 1,1 , since π 1 has the highest similarity among the STP patterns considered. Actually, t-sim(T 1,1 , π 1 ) was one as the time interval of T 1,1 is included in that of π 1 . For s 3 , an event of visiting place p 4 rather than an STP pattern is inserted at the fourth position, since there exists no matching pattern related to visiting p 4 around that time.

Gapped Sequence Mining
Among many sequential pattern mining algorithms that have been proposed in the past to discover frequent patterns from sequences, the gapped sequence mining algorithm has been known to provide satisfactory results in many applications [30]. It extracts patterns with consideration of gap constraints when finding frequent subsequences to relax the consecutiveness requirement on the subsequences. We employ a gap-constrained sequential pattern mining algorithm, known as cSPADE (Sequential Pattern Discovery using Equivalence classes with constraints) [27], to discover frequent subsequences from STP trajectories. It allows us to deal with irregular visits, as well as uncertainties in a mobile device log due to the presence of noisy data by using gap symbols. Table 4a presents the result of applying the cSPADE algorithm to the STP trajectories in Table 3b. The outputs of the cSPADE algorithm are frequent subsequences with gaps, as well as their confidence values, which are then used to generate GSTP trajectories. The confidence of a sequence indicates the likelihood of occurrence of the last symbol in the sequence, given that all of the preceding symbols before the last one have been observed.

For instance, the fifth sequence in
More formally, a GSTP trajectory, defined as four-tuple σ = (s p , s c , s s , u), is obtained from a frequent subsequence in such a way that s s and s c , respectively, are the symbols at the last and the second to last positions of the frequent subsequence, and s p corresponds to the rest. That is, a frequent subsequence found by cSPADE is split into three parts that respectively represent the current STP pattern or event (i.e., s c ), the preceding patterns or events (i.e., s p ) before s c and the succeeding pattern or event (i.e., s s ) after s c . When S g is located at the second to last position in a frequent subsequence, its immediate predecessor together with S g form s c , and all of the other predecessors constitute s p . cSPADE is applied for each weekly periodicity D in D, and the resulting GSTP trajectories are stored into Σ D .
Once GSTP trajectories are obtained, the average length of S g contained in s p of σ can be computed by counting the number of symbols corresponding to S g for each STP trajectory used to discover σ during the training process and taking their average. For instance, since σ 5 of Table 4b has been derived from s 1 and s 3 of Table 3b and S g = {π 2 } for s 1 and S g = φ for s 3 , the average length of S g for s p of σ 5 is (1 + 0)/2 = 0.5. The average length of S g in s c can be computed in the same way.
Finally, u(σ) represents the utility of GSTP trajectory, σ, when making a prediction of the next place, and it is defined as Equation (2).
where λ and λ are weight parameters. u(σ) considers not only the confidence of a frequent subsequence, but also the average length of gaps located in s p and s c to accommodate the uncertainty associated with a GSTP trajectory. Note that the utility of a GSTP trajectory decreases as the gap symbols become longer. Furthermore, we set λ to be greater than λ to put more emphasis on the utility related to the current and next places. Table 4b presents an example of GSTP trajectories generated from the frequent subsequences in Table 4a, where λ and λ were set to 0.1 and 0.5, respectively. There is a one-to-one correspondence between the subsequence of Table 4a and the GSTP trajectory of Table 4b. As an example, we consider the fifth subsequence in Table 4a, which is π 1 , S g , π 3 , S g , e f . s s , s c and s p of σ 5 are e f , π 3 , S g and π 1 , S g , respectively, as S g is at the second to last position in the subsequence. Therefore, u(σ 5 ) is 0.49 + 0.1 × 1 1+0.5 + 0.5 × 1 1+0.5 = 0.89. Figure 3 depicts the test process for predicting the next location of a user when a new observation on the user's movement is made. In order to predict the next place, it is necessary to convert the user's movement logs to an STP trajectory and then to compare it to the GSTP trajectories identified during the training process. The steps involved in the test process are exactly the same as those in the training process in Figure 1, except for skipping the STP pattern extraction step for generating an STP trajectory. Once an STP trajectory is obtained from the test data, the user's next place is predicted by Algorithm 4, which finds the most similar GSTP trajectory to the STP trajectory and predicts the next location by following the GSTP trajectory found.   ∀σ ∈ Σ D 4 The ties are broken by picking σ with the highest u. 5 p ← place p of s s Algorithm 4 describes how the proposed framework infers the next place from an STP trajectory of user A, based on input data s A , which is the STP trajectory of user A given as test data and Σ D , GSTP trajectories constructed during the training process. s A is split into s c A and s p A , which respectively are the last symbol that can be either an STP pattern or an event corresponding to the currently visiting place, and all of the symbols preceding s A , denoting the past movements.

Next Place Prediction
In Line 3 of Algorithm 4, the best matching STP pattern σ is found by examining the entire GSTP trajectories Σ D . σ = (s p , s c , s s , u) is obtained by use of the similarity between two sequences, p-sim(s p , s p A ), that measures the length of overlapping subsequences between the sequences corresponding to the past movements. The similarity function is defined in Equation (3) in which S g is counted as of a length of one when calculating the length.
When there exist more than one STP pattern with same similarity value, the tie is broken by picking σ with the highest u. The algorithm then returns p , which is the place contained in s s representing a pattern or an event. Finally, execution of Algorithm 4 is repeated for all D ∈ D to select the best σ across the various weekly periodicities.
As an example, we assume s A = π 1 , π 3 , implying that user A is currently at p 2 (from Table 3a). Among the GSTP trajectories in Table 4b, σ 3 and σ 4 have the highest similarity, 1.0, as s p A = π 1 is the same as s p 's of σ 3 and σ 4 . Between σ 3 and σ 4 , we choose σ 4 , since u(σ 4 ) > u(σ 3 ). Accordingly, p 4 is predicted to be the user A's next place, as e(p 4 ) of s s of σ 4 indicates an event of visiting place p 4 .

Dataset
Among several types of mobile devices, we adopted smartphones as data collection devices, since they are equipped with WiFi sensors and frequently carried by users anywhere they go throughout their daily activities. For experimentation, we implemented an Android mobile app that records the data pertaining to user's visits, such as timestamps and WiFi signals, every minute. The mobile app was then distributed to eight students at Seoul National University (SNU), and the data were collected during two months spanning from September to November 2013.
The subjects were chosen in such a way that they have different majors; half of them are residents of a campus dormitory; and half of them take classes for more than 4 days a week, so that they can represent different campus lifestyles. As all of the participants were undergraduate students and the experiments were conducted during a semester, most activities they performed during the study period were related to typical campus life, including having a meal at a cafeteria, taking a class in a classroom, sleeping in a dormitory, doing homework in the library and doing exercise at a gym.
Since our research was a part of a smart campus project that aims to study intelligent services facilitating better campus life, data collection was conducted only inside the SNU campus, and all of the places considered were located within the campus. Another reason for limiting the scope to the SNU campus only was due to the availability of the WiFi fingerprinting database required by the proposed approach. Building a WiFi fingerprinting database involves time-consuming tasks and is very costly, but only the database for the SNU campus was available at the time of this research.
Throughout the experiments, all participants were instructed to carry their mobile devices with them as much as possible to gather comprehensive data that can reflect their actual daily movements. The full dataset contains 714,448 WiFi signal logs, and 19.85 WiFi APs were detected on average for each observation. Since the logs also include locations outside campus, only about 52 percent of logs were successfully mapped into meaningful places based on a localization method using the WiFi fingerprinting database for campus buildings. Furthermore, the first 42 days' logs out of 60 days were selected as training data for constructing the prediction model, and the rest was used for evaluating the model's performance. Figure 4 shows an example of a subject's ST trajectories retrieved from the collected data. In this figure, blocks of the same gray level indicate visits to the same place, and white backgrounds represent unknown locations. The horizontal axis corresponds to the time from 0:00 to 24:00 of a day, while the vertical axis represents the number of days from the beginning of the experimentation. That is, the horizontal block stands for the subject's stay at some place from the time at which the block begins until the time at which the block ends, and appearances of the blocks with the same gray level along with the vertical axis indicate that the subject visited the same place at similar time slots across the days.  From Figure 4, it can be observed that frequent revisits to the same place were usually made with weekly periodicities rather than daily due to the characteristics of campus life, and accordingly, we have extracted patterns based on the weekly periodicity. Yet, there are many irregular or exceptional visits that can be attributed to noisy observations, errors during localization or participant's peculiarities, making the problem of next place prediction difficult. We address this difficulty by use of smoothing for constructing STP patterns and also by applying gapped sequence mining during the generation of GSTP trajectories.

Parameter Settings
For STP pattern extraction and STP trajectory construction, the parameters were determined experimentally by taking the values that maximize the performance of the proposed model. Figure 5a,b shows the prediction accuracy results when varying θ, θ and λ s individually while the other parameters were fixed. From Figure 5a,b, it can be seen that large θ hurts the performance as more false STP patterns are introduced, and roughly 50 min of smoothing are appropriate for identifying a stay. The highest performance was achieved when we respectively set θ, θ and λ s to 0.06, 0. 16   The maximum gap, maximum window size, minimum support, λ and λ are the parameters involved in the gapped sequence mining. Individual effects of λ and the minimum support on the accuracy are respectively plotted in Figure 5c,d, where the maximum performance was achieved when setting λ and the minimum support to 0.5 and 0.15, respectively. Performance differences were negligible when varying the values of the maximum gap, maximum window size and λ, and we set them to 3, 7 and 0.1, respectively.
Finally, several weekly periodicities were selected in consideration of the characteristics of campus life, which include the periodicities based on a single day, except Saturday and Sunday, and those based on typical class schedules at SNU, resulting in D

Evaluation Results
In order to demonstrate the effectiveness of the proposed framework, we have implemented two first-order Markov chain-based methods that predict the next place by calculating the probabilities for all of the possible next places based on the transition probabilities among places and choosing the place with the highest probability. We remark that the same ST trajectory data (like those in Table 2) were used for both the proposed methods and the first-order Markov chain methods to be fair with the presence of noisy data in the comparison.
The comparison results for the proposed methods and the Markov chain methods in terms of the accuracy metric are presented in Figures 6 and 7, in which MC, MC-P, STP and GSTP, respectively stand for (1) the Markov chain method without periodicity consideration; (2) the Markov chain method with periodicity consideration; (3) the prediction based on STP trajectories; and (4) the prediction based on GSTP trajectories. While MC predicted the next locations by using all of the available ST trajectory data without taking the day of week information into account, MC-P exploited the day of week information by selectively utilizing ST trajectories grouped by weekly periodicities according to the day on which prediction was made. Since MC prediction was performed on all of the trajectories in the training data, its accuracy results are the same across the day of the week, as shown in Figures 6 and 7.
On the other hand, STP is based on the STP trajectory data (e.g., Table 3b) for the prediction that was made by choosing the pattern or event that has the highest transition probability from a current pattern or event after computing the transition probabilities between patterns or events. Finally, GSTP trajectories (e.g., Table 4b) were used for predicting the next location with the GSTP method.
As shown in Figure 8, the overall accuracy results of MC and MC-P were worse than those of STP and GSTP. These poor performances yielded by the Markov chain-based methods are due to their inability to address the irregularities of visits, which is the characteristic often observed in campus life. In particular, the performance results of MC-P imply that the periodicity alone cannot help with increasing the accuracy. Figure 8 also shows that GSTP slightly outperformed STP on average, while their performance variabilities barely differ. Furthermore, it can be observed from Figure 8 together with Figures 6 and 7 that STP and GSTP tend to provide more stable performances across the different days of the week than MC-P. The next places were far from being predictable for some subjects, owing to the high irregularity in visiting behaviors when the MC and MC-P were used, but the prediction performances were greatly improved for them when applying the proposed methods, STP and GSTP. In particular, STP and GSTP significantly outperformed MC and MC-P for Subjects 3-5, as shown in Figures 6 and 7.
Accordingly, it appears that the proposed notion of STP trajectory facilitates accuracy enhancement through generalizing observations into patterns, as well as accommodating periodicities. In addition, incorporation of gaps into the pattern sequence by the GSTP method was also successful for further increasing the accuracy. These together imply that the proposed framework was effective at predicting the user's next location.

Effects of Movement Regularity
Besides the overall accuracy, we found out that the performance of the proposed methods significantly varied depending on the lifestyle of a subject. After the data collection experiment, we conducted a short survey asking about the regularity assessment for the subject's movements during the study period in terms of a 3-point Likert scale. A score of 3 was reported by Subjects 1-3, indicating that they managed highly regular life patterns. On the other hand, the score of Subject 5 was 1, whereas the score of the rest was 2.
Based on this survey result, it appears that the performance of the proposed method was satisfactory when a subject exhibited highly regular behaviors, leading to the average prediction accuracy of more than 0.7 for Subjects 1 and 3. In contrast, when the visiting behavior of a subject was not very regular, the prediction performances of STP and GSTP were low, as suggested by the results for Subjects 6 and 8.
To further explore the relationship between the regularity of movements and the prediction performance, we computed Jaccard similarity [31], which measures overlaps among the visited places by a subject for each day of the week, and employed it as a metric for assessing the regularity. Figure 9 shows the result that contains 40 plots corresponding to five different days of the week for eight subjects and their resulting performances.
The regularity score varied according to the subject, as well as the day of the week. The Pearson correlation coefficient for the plots in Figure 9 was 0.267, indicating a weak positive relationship between the regularity and the prediction accuracy of GSTP, which suggests that the regularity alone cannot fully explain the prediction performance due to the GSTP's ability of accommodating irregularities through the smoothing and gapped sequence mining. It is still interesting to note that we can observe more dots in the upper right corner of Figure 9 for Subjects 1 to 3, who reported the highest scores for their subjective regularity assessment than for the other subjects, and vice versa.

Conclusions
In this paper, we exploited time, location and periodicity information to effectively predict the user's next place through introducing the notion of the STP pattern and the application of gapped sequence mining. Frequently-and periodically-observed visiting behaviors were recognized as STP patterns for a user, and the patterns were then used for representing the user's past visits as STP trajectories. Subsequently, the extracted STP trajectories were further generalized to GSTP trajectories to accommodate irregularities of visits, as well as to deal with exceptional stays.
Through the experimentation based on a real-world dataset collected from eight people, it was found that the proposed methods outperform the conventional methods based on the Markov chain in terms of prediction accuracy.
As future work, we plan to apply our work to larger and more complex environments than a university campus, such as urban areas or travel sites with more participants, and to further enhance the proposed spatiotemporal-periodic patterns through developing more sophisticated similarity measures that can effectively accommodate diverse types of irregularities and the semantic meaning of places.