Movement Recommendation System Based on Multi-Spot Congestion Analytics

: A method is proposed for resolving human congestion at a speciﬁc time at key spots in an area. Sensing data on real-world human ﬂows are analyzed, and important information for changing movement behavior is accordingly provided. By using conventional approaches, this was a difﬁcult task, whereas in the proposed approach, the targets and timing of providing information for congestion mitigation are determined based on spot importance. A congestion transition model is constructed from actual data and the results of a questionnaire survey. Finally, congestion mitigation in key spots is simulated after movement recommendation has been provided.


Introduction
Cyber-physical systems (CPSs) [1] are increasingly used to promote sustainability. CPSs collect data in the real world (physical), analyze it using digital technologies in the cyber world, and feed it back to the physical side with additional value. For example, it is possible to build a comfortable society by improving the efficiency of public services using prediction and optimization technologies. No matter how the social environment changes, a sustainable society can be realized by repeating these series of feedback cycles. The collected data will be used in various fields to revitalize industries and solve social problems.
One example of the problem is to mitigate congestion in cities. It is important to create a city where residents can live safely and comfortably. When a part of a city is upgraded, human congestion at that location can be problematic. Nowadays, CPSs have the potential to solve this problem. The problem dealt with in this study is a small problem in a limited area, but it may be possible to solve an urban problem by expanding this system. Besides, congestion mitigation also has the potential to solve energy problems by managing power demand [2]. Hori et al. [2] show that facility power demand is highly correlated with surrounding human congestion. The results show that by controlling congestion in the surrounding area, it is possible to cut the power peak of the facility and shift the peak time. This shows that a new energy management system can be realized with congestion control. This is one of the effects that CPSs bring.
Data on human congestion and people flow information can be acquired in real time using wireless technologies such as Bluetooth and Wi-Fi [3,4], so that congestion prediction [5] and people flow analysis [4] may be carried out. On a large commercial facility or university campuses, congestion observation in a specific facility is performed using Wi-Fi probe requests, and thus real-time congestion status and prediction are provided to facility users [6]. However, as these services are accessed only purposely, they cannot be effectively used for congestion mitigation. Therefore, it is conceivable that congestion can be mitigated by providing recommendation information through the Social Networking Service (SNS), which users often access.
In this study, the congestion problem of a specific facility is addressed. Information, including real-time congestion status and prediction as well as behavior recommendation, are provided via SNS, based on sensing data obtained by Wi-Fi probe requests. The proposed approach aims at mitigating congestion by urging users to change their behavior.

Congestion Mitigation
Several studies dealing with congestion issues have been conducted in other research areas [7,8]. Hoogendoorn et al. [7] review what is known about the influence about the influence of automation on traffic flow efficiency and behavior of road users. However, most studies into the influence of automation on traffic flow efficiency are conducted with microscopic simulations. Tsiropoulou et al. [8] model museum visitors' quality of experience (QoE). Visitors' QoE function is formulated considering the different weight of the parameters. One of them is crowd density. Congestion is an important indicator when considering human satisfaction as seen in these studies.
Congestion is also an important factor in evacuation planning [9,10]. In the paper [9], a distributed and autonomous evacuation process based on the key principles of reinforcement learning and game theory is proposed in order to support distributed and efficient evacuation planning in public safety systems. The presented numerical results demonstrated its superior performance in an evacuation process. Liu et al. [10] propose an improved quantum ant colony algorithm for exhaustive optimization of the evacuation path. A few decisions are made in a very short period. However, the usefulness of these methods [9,10] is simulation-only verification. Verification using actual data has not been performed.
Congestion mitigation verification has been performed at the time of congestion information provision using simulation in several studies. Suzuki et al. [11] aimed at congestion mitigation by congestion information presentation at event sites such as amusement parks and discussed the influence of information provision using simulation. Depending on the setting, there are cases that may not match reality because the simulation performed for verification used virtual data. Nie et al. [12] and Shimizu et al. [13] proposed congestion mitigation by considering simulation information using data actually measured at amusement parks. However, as mentioned in that study, it was a discussion under specific conditions and did not take into consideration the change of the congestion status in each time zone. To mitigate congestion, it is important to present information according to the congestion status. It is also necessary to consider this in the simulation.
In this study, a congestion transition model for a specific real-world facility is constructed based on real observation congestion data. Moreover, the influence on congestion mitigation of the content and the timing of the recommendation information is considered by conducting several simulations.

People Flow Data Analysis
Owing to the popularization of devices such as smartphones and tablets, it has become possible to obtain data on individual behaviors by using applications and sensors in user terminals. For example, there are several studies using Global Positioning System (GPS) data for people flow analysis [14]. A large amount of location information can be acquired by GPS without new infrastructure development. However, user consent is required to collect and use GPS data, and it is difficult to acquire position information of indoor user terminals. Some studies have used Bluetooth for people flow analysis [15]. In that case, it is necessary to install a specific application in the user terminal. Additionally, it can be used only when the user permits data acquisition. Therefore, it is difficult to collect a large amount of data.
People flow analysis using Wi-Fi probe requests has also been performed [4]. Media Access Control (MAC) addresses continuously observed during the night, which are outside the usage time interval of the facility, are excluded from the analysis target so as to use only data of portable terminals such as smartphones. This shows that it is possible to grasp the rough trend of people flow from the number of movements calculated based on the schedule of an event. However, when experiments are conducted in a real environment, data processing of user terminals with a long probe request transmission interval is not performed. The probe request transmission interval of a user terminal varies greatly depending on, for instance, the type of the Operating System (OS) and whether the screen is on or data communication is being performed. Therefore, it is difficult to correctly estimate the movement of user terminals with wide probe request transmission intervals using only the collected probe request data.
There are some studies on congestion prediction [16,17]. Tseng et al. [16] propose a support vector machine based real-time highway traffic congestion prediction model. Du et al. [17] propose a hybrid multimodal deep learning method for short-term traffic flow prediction. These systems [16,17] use multi-modal data to achieve high-precision prediction. However, in order to achieve this, it is necessary to collect various big data. Moreover, there remains a problem that the generated model cannot be applied if the location is changed. In contrast, our research does not require big data for learning and can find key spots for congestion mitigation and change the behavior of people there.

Movement Recommendation System Based on Multi-Spot Congestion Analytics
The purpose of this study is to resolve congestion at specific times at key spots. To this end, it is necessary to have individual behaviors changed by providing information to the persons planning to visit a spot at the congestion peak time. A system is proposed that analyzes real-world sensing data on human flows and automatically provides effective information for accordingly changing the movement of people. Hence, the targets and timing of providing information to be provided for congestion mitigation are determined according to spot importance. As trial-and-error is difficult in a real environment, a congestion transition model is constructed from actual data, and congestion mitigation in key spots is simulated after movement recommendation has been provided. The proposed method can be summarized as follows.

Estimation of human congestion in a facility and the movement of people between spots by
sensing Wi-Fi probe requests. 2. Calculation of the spot importance using the estimated people flow. 3. Determination of the targets and timing of providing the information to be provided. 4. Simulation of congestion mitigation at key spots after movement recommendation provision.
The details will be provided in the following sections.

Wi-Fi Based Real-World Sensing
In this study, Wi-Fi probe requests periodically transmitted by terminals are collected using the Wi-Fi packet sensor [3] shown in Figure 1. When a terminal is detected by the Wi-Fi packet sensor, information such as the universally unique identifier (UUID), the received signal strength indicator (RSSI), and their observation time is stored in the database. UUID is generated from the MAC address of the terminal.
The number of UUIDs is used as an indicator of human congestion. As not all users of the target facilities possess terminals with Wi-Fi function, the number of UUIDs does not represent the exact number of users. However, it is known that the number of observed UUIDs and the actual number of users are highly correlated [18]; thus, the number of UUIDs is used as an indicator of human congestion. To use the number of UUIDs as a degree of congestion, terminals whose RSSI value is larger than a fixed value are targeted so as to accurately detect terminals existing around the facility, as it is known that the RSSI value and the distance from the Wi-Fi packet sensor to the terminal are highly correlated, and the RSSI value is large if the distance to the terminal is short [19]. The number of UUIDs is counted under this condition. This solves the problem that one person may be simultaneously counted at several points. The MAC address may be randomized depending on the device. This causes the problem that a device may be multiply recognized. Therefore, randomizing devices is eliminated for counting UUIDs.

Determination of the Targets and Timing of Providing Information by Spotrank
By analyzing the data acquired in Section 3.1, the targets and timing of providing information for congestion mitigation are determined according to spot importance. To detect key spots, a new concept called SpotRank is proposed, which ensures that the appropriate timing is considered for providing recommendations.
The origin-destination (OD) table is created using the data acquired in Section 3.1. As UUID does not change in a short period, information on the movement of people is obtained from the same UUID observed at each spot. The OD table is a k × k table M t , in which rows indicate starting spots and columns indicate arrival spots. Thus, the (i, j) entry m ij of M t represents the number of users that moved from spot i to spot j at time t. Here, each spot in the OD table is taken as a node, and an edge is established for spots between which the user moved. Through these processes, a people flow network can be constructed composed of multiple spots and information on the movement of people.
To know the effective spot when recommendations are made, a metric called SpotRank is proposed for determining the importance of each spot each time. It is based on the PageRank approach [20], whereby the centrality of evaluating the importance of each node in the network can be obtained. It is possible to determine the importance of each spot by replacing web pages with spots.
SpotRank is composed of the following three factors.
• The number of paths into the spot.
• Whether the path is from a key spot.
• The number of outflow paths from inflow source spots.
The SpotRank of a spot where people are flowing from a large number of other spots is high. In addition, SpotRank also is high at the spot where there is inflow from high-SpotRank spots.
A method for obtaining SpotRank will now be described in detail. The components a ij of the adjacency matrix indicating the presence or absence of movement from spot i to spot j are defined by Here, m ij is a component of the OD table M t . L. Page et al. [20] did not weight each edge. However, in this study we consider weighting because edges with many moved users are considered to be more important. The component of the probability transition matrix of SpotRank, H , is defined by Equation (2). Here, w ij is the weight of the edge generated from the spot i to the spot j, and is represented by Equation (3).
Page et al. [20] defined the PageRank vector with the rank of each page as follows: Here, c is a coefficient used to ensure that R 1 = 1 and e is a vector having a constant PageRank value given to all pages, called rank source vector, and e 1 = 1. Iterative calculations are performed until convergence is attained and a solution is obtained. When the matrix is the probability matrix of a Markov chain, all components are non-negative and the sum of each row is 1. Similarly, the proposed SpotRank solves this equation by normalizing, so that the sum of the components is 1.

Movement of People between Spots
The congestion degree C t at each time t is obtained from the data observed by the Wi-Fi packet sensor. As shown in Figure 2, by comparing the UUIDs detected at time t and time t + 1, the movement information is obtained. The number V t of new visits is estimated from the number of newly detected UUIDs at time t + 1. The number L t of departures from the facility is estimated from the number of UUIDs that are no longer detected. The number S t of persons staying at the facility is estimated by using the UUIDs detected at both times t and t + 1. Additionally, the number of persons flowing into the facility after time t + 1 is denoted by U t and expressed by These relationships are briefly shown in Figure 3. The number C t of terminals within the facility is sum of S t−1 and V t−1 or sum of S t and L t , and is expressed by The number of terminals outside the facility is expressed by From these, the behavior selection probability P(X) when the recommendation is not provided can be obtained. The event X represents the behavior selected by users, namely, The behavior selection probability P(X) of each event calculated from Equations (6) and (7), is expressed as follows:

Movement Recommendation
In this study, it is assumed that social networking service (SNS) is used as a means of providing recommendation, and the information to be provided includes the following: • Current congestion status. • Congestion prediction in the near future.
• Future action recommendation.
The congestion status of the current facility displayed here is the degree to which the facility is crowded, compared with past congestion data. It is represented by four levels: "less crowded", "ordinary", "crowded", and "heavily crowded". The congestion status is determined by the threshold for the number of detected terminals. Here, we conducted preliminary experiments to show that it was effective to convey congestion intuitively to users by presenting it as the state of congestion rather than the specific number UUIDs. The pattern that is most similar to previous congestion transitions from past observed data is determined and is used as a prediction value as in [5]. The congestion prediction is also displayed in four levels. Future action recommendations are selected according to the congestion status from a pre-prepared comment set.
In the next section, the change in the number of persons in the facility will be estimated when the information composed of these three elements is provided to all users.

Simulation of Congestion Mitigation
The change in the congestion transition in the facility is modeled after recommendation is provided using congestion transition data every 10 min. In the modeling process, the change in the congestion degree of the facility at time t + 1 when the recommendation is provided at time t is first estimated. The degree of congestion at time t + 1 calculated from Equations (6), (8), and (9), is expressed as follows: The congestion degree of the facility at time t + 1 is expressed as the sum of the number of persons who select the behavior A i,i in the facility and the number of persons who select the behavior A o,i outside the facility.
After recommendation has been provided, the behavior selection probabilities P(X = A i,i ) and P(X = A o,i ) are changed. The behavior selection probability after recommendation is defined as P(X |S, I). Here, X , S, and I represent the behavior selected after recommendation, the state of the users at the time of recommendation, and the provided information, respectively. Various cases are considered regarding S. In this study, it is assumed that users may have some time restriction. Then, the degree of congestion after providing recommendation is expressed as follows: In Equation (13), it is assumed that all users were informed. This not the case, however, in real-world applications.
The congestion degree C t+1 at time t + 1 when the information is provided at time t can be expressed as follows by the combining Equations (12) and (13): P(α) is the proportion of users who were informed. In this study, P(X |S, I) was estimated by conducting a questionnaire survey among the facility users. It is difficult to obtain P(X |S, I) directly from the questionnaire results. Therefore, P(X |S, I) is calculated by subdividing it. The probability P(X = A i,i |S, I) that the user chooses to stay in the facility after the information has been provided is first considered. There are two types of users who stay in the facility after the information I has been provided: • Users who intended to remain in the facility before the information had been provided and who remain in the facility even after the information has been provided. • Users who intended to leave the facility before the information had been provided and who remain in the facility even after the information has been provided.
Thus, P(X = A i,i |S, I) can be obtained as follows: P(X = A i,i ) and P(X = A i,o ) can be obtained by Equations (8) and (9). If P(X = A i,i |X = A i,i , S, I) and P(X = A i,i |X = A i,o , S, I) in Equation (15) can be obtained, P(X = A i,i |S, I) can be estimated. Similarly, P(X = A o,i |S, I) can be estimated from P(X = A o,i |X = A o,i , S, I) and P(X = A o,i |X = A o,o , S, I). The conditional probabilities P(X |X, S, I) are obtained by the questionnaire survey. In the questionnaire, the following four cases are set as the behavior X before the information is provided. When various types of information I are presented for each case, the selected behavior X is determined by the questionnaire. The degree of congestion in the facility at time t + 1 if the information is provided at time t can thereby be obtained.
The congestion transition after time t + 2 does not represent a realistic situation and thus cannot be estimated from the questionnaires. Therefore, in this study, to make the model realistic, the congestion transition after the information has been provided refers to the day of the most similar trend in the database, and the transition data from t − 1 to t + 1 is used to obtain the data point with the least square difference from the database. The data after time t + 2 is estimated from the selected data.
The change in the congestion transition in the target facility by information provision can be modeled by the above approach.

Experiment
In this experiment, to verify the validity of the proposed method, congestion mitigation in key spots was simulated after movement recommendation was provided. First, SpotRank was calculated using real-world data. Based on these results, candidates of the targets and timing of providing information for congestion mitigation were selected. Finally, congestion mitigation was simulated using a congestion transition model based on the results of the questionnaire.
The data to be analyzed was collected by Wi-Fi packet sensors installed at 20 locations within the Ito campus of Kyushu University during the period from 10 May to 17 September 2016. Figure 4 shows the installation locations of the Wi-Fi packet sensors. The total number of UUIDs collected during this period was 165,893. Figure 5 shows visualization results of the people flows.

Targets and Timing of Providing Information for Congestion Mitigation Are Determined Based on Spotrank
As shown in Figure 6, the SpotRank of each spot was visualized by a directed graph. The circle of each spot is larger if the value of SpotRank is larger. By visualizing the time series change of SpotRank, it is possible to intuitively grasp the transition of the spot that plays a central role.
Furthermore, the transition of the time series of SpotRank was considered. In Figure 7, the results regarding Spot No. 1, whose SpotRank is high throughout the day, are shown. The blue line represents original data of congestion degree, the red dashed line represents congestion difference, and the green dashed line represents SpotRank. In Figure 7, congestion refers to the number of UUIDs. The range of SpotRank is 0 to 1. The total of SpotRank in all spots is 1. As can be seen from the figure, the degree of the SpotRank shows a high value slightly before the congestion degree becomes a high value. This is because SpotRank considers the surrounding spots.
Additionally, it was investigated how early SpotRank could detect congestion occurrence. Figure 8 shows the result of the comparison between congestion difference and SpotRank. This is a histogram about the difference in time when each value is high. SpotRank frequently becomes large at an earlier time (60 min on average) than the congestion occurrence. Compared to this result, the congestion difference often only grows just before congestion occurrence. It is considered that the time during which SpotRank is large is a good indicator of the timing to provide recommendation. The above processing was operable in real time.   For example, in Figure 7, it is assumed that the time from 11:20 a.m. to 12:00 p.m. is appropriate. A peak of SpotRank is observed at 11:20 a.m., and a peak of the congestion degree is observed at 12:00 p.m. This assumption will be verified in the next section.

Simulation of Congestion Mitigation in Key Spots after Provision of Movement Recommendation
Congestion mitigation was simulated when recommendation was provided using the results obtained in Section 4.1. First, a behavioral change model was constructed. In this experiment, a questionnaire survey was conducted among the facility users regarding the extent to which they change their behavior after movement recommendation to obtain P(X = A i,i |S, I) and P(X = A o,i |S, I) that are described in Section 3.4.2. In the questionnaire, it was assumed that in the state S = S 1 , there is no time restriction for all users. Moreover, when I was presented in each case in Section 3.4, the choice that users made was investigated. The following four types of behavior recommendation R were selected as constituents of I: • Congestion prediction and recommendation to go to the facility as soon as possible.
• Current congestion status and recommendation to go to the facility after a while.
• Congestion prediction and recommendation to leave the facility as soon as possible.
• Only the information that users can spend comfortably by going to the facility right now.
An example is shown below. The results of the questionnaire when the following information is provided are shown in Figure 9. Correspondence between the results of Figure 9 and the conditional probabilities P(X |X, S, I 1 ) is shown in Table 1. Figure 9. Results of questionnaire when information I was presented in each case set in Section 3.4. Table 1. Correspondence between the results of Figure 9 and the conditional probabilities P(X |X, S, I 1 ).

Don't Change Behavior
Change Behavior In the following experiments, the predicted values after 30 min were used as congestion prediction. This is because most users did not need a prediction exceeding 30 min in this facility, as shown from the results of a previous questionnaire.
• The current congestion status is "heavy congestion". • Congestion prediction after 30 min is "ordinary". • Behavior recommendation "If you go to the facility now, you will get caught in congestion".
Here, the congestion status was determined by the threshold for the number of UUIDs. The congestion stauts when UUIDs are 0 or more and less than 125 is defined as "less crowded", 125 or more and less than 190 is defined as "ordinary", 190 or more and less than 250 is defined as "crowded", and 250 or more is defined as "heavily crowded." Considering "case (A)" in Figure 9, it is confirmed that there is a high possibility that several users who planned to go to the facility are not eventually likely to go. From the result of "case (C)", it is seen that there is a possibility that the staying time of a user who planned to leave the facility may increase. From the result of "case (B)", it was found that there was a possibility that users planning to visit the facility in the future may choose to visit the facility as soon as possible because the recommendation may increase the number of visitors. In "case (D)", the same reason may be considered for choosing to leave the facility. Thus, when the same information is provided to the entire user group rather than individually, the influence on congestion varies depending on the situation and the interpretation of each user. Therefore, it is important to provide information according to the situation.
In this study, Twitter was assumed as a means of providing information, considering its widespread use [21]. The percentage of users viewing the information provided in Equation (14) was taken as P(α) = 0.1. Based on the above conditions, changes in congestion trends in the facility by providing recommendation were modeled.
The results of the simulation using the created model are shown below. Figure 10 shows the congestion transition when the following information is given at 11:00 a.m. This is a time when SpotRank is relatively small in the result of Section 4.1.
• Congestion prediction at 11:30 a.m. • Behavior recommendation that "If you do not go to the facility at the moment you will get caught in congestion". Figure 11 shows the congestion transition when the following information is given at 11:40 a.m. This is a time when the SpotRank is large in the result of Section 3.4.
• Congestion prediction at 12:10 p.m. • Behavior recommendation that "If you do not go to the facility at the moment you will get caught in congestion".
In Figures 10 and 11, the blue line shows the congestion transition for the original data and the red line shows the congestion transition when information is provided. As shown in Figure 10, there is a small change immediately after the recommendation, but the peak of the congestion hardly changes. As shown in Figure 11, although the degree of congestion immediately after information provision has increased, the peak of congestion degree greatly decreases.
The leveling rate expressed by Equation (16) is used as a measure of congestion mitigation.
Leavingrate(%) = average congestion peak congestion × 100 A larger index implies better mitigation. Even though the leveling rate of the original data is 46.1%, the leveling rate when information is provided at 11:00 a.m. is 46.5%, and the leveling rate when information is provided at 11:40 a.m. is 54.4%. It was found that it is possible to recommend effectively by using the proposed SpotRank. Thus, the hypotheses formulated in Section 4.1 are correct.

Conclusions
A movement recommendation system was proposed based on real-world sensing. A congestion transition model of a real-world facility was constructed based on real observation data of the congestion status using the results of a questionnaire survey. The proposed SpotRank was used for selecting the targets and timing of providing information for congestion mitigation. This was experimentally verified. Congestion mitigation in key spots was simulated after provision of movement recommendation. Our system can operate in real time without the need for big data. The problem dealt with in this study is a small problem in a limited area, but it may be possible to solve an urban congestion problem by expanding this system.
In future work, it is to be verified how well the constructed congestion transition model represents real environments. In this study, only one facility was targeted, but in fact an adjacent facility may also influence the behavior selection after information provision. It is also necessary to consider cases where there are several adjacent facilities.