4.3.1. Detection of Arrival and Departure Times
For the discovery of an event and estimation of its start and end time we analyse the general pattern during the period between 6:00 and 23:00 on different days. To this end, we split the period into 1-h intervals and then compute the
stoppings balance (V),
the normal stoppings balance (W),
the number of stoppings near the venue (P), and the normal number of stoppings near the venue (Q) for each interval following the procedure explained in
Section 3.2.2. From these values, we derived the values of upper and lower bounds following the procedure explained in
Section 3.2.2. We use the value 1 as scale factor (α). Since we need to use both the highest and lowest values of the variable
V we compute both its upper bound (
UV) and lower bound (
LV). On the other hand, we compute only the upper bound for
P because we need to use only the highest values.
Figure 5 shows the variation of
P and its related variables
Q, and
UP on a sample day without an event while
Figure 6 shows the variation of
V and its related variables
W,
UV, and
LV on the same day.
In the next step, we analyse the temporal variation of these variables. From the variation of
P with respect to its related variables we get candidate event indicators that need to be confirmed using the variation of
V with respect to its related variables. That is, the peaks of
P that exceed the corresponding upper bound values are candidate event delimiters in time (arrival and departure). It is important to note that the peak of
P is considered relative to the upper bound; it corresponds to the highest shift above the upper bound (see for example in Figure 8; the peak is at 9:00 and not at 10:00). From the example shown in
Figure 5, we have four candidate event delimiters: 10:00, 12:00, 15:00, and 19:00. For each candidate, we take the interval from the last day hour (before it) at which
P was below the upper bound to the first day hour (after the candidate) at which
P was below the upper bound. This interval allows us to take into account uncertainties due to aggregating data into 1-h intervals; hence we call it the “uncertainty interval”.
We search for a peak in the variation of
V (see
Figure 6) within the uncertainty interval. We distinguish two types of peaks. If the value at a certain time in the interval is higher than all preceding and following values in the interval, we have a “positive peak”. If the value at a certain time in the interval is less than all preceding and following values in the interval we have “a negative peak”. The time corresponding to a positive peak is a
candidate arrival time because it corresponds to an exceptionally high number of
approaching stoppings. On the other hand, the time corresponding to a “negative peak” is a
candidate departure time, because it corresponds to an exceptionally high number of
moving-away stoppings.
While searching for peaks within an uncertainty interval, the peak found is labelled with its peak level and type. These values are used to determine the type of candidate (arrival time, departure time) and to confirm or reject the candidate. The search for a peak within an uncertainty interval has one of the following seven possible results:
No peak is found (see
Figure 7d). The peak type is set to 0.
A “positive peak” is found. The peak type is set to 1.
- (a)
The value
b at the peak is such that
b > UVi (see
Figure 7a). The peak level is set to 1.
- (b)
The value
b at the peak is such that
Wi <
b ≤
UVi (see
Figure 7b). The peak level is proportionally calculated based on the relation between
UV and
W (see Equation 4).
- (c)
The value
b at the peak is such that
b ≤
Wi (see
Figure 7c). The peak level is set to 0.
A “negative peak” is found. The peak type is set to −1.
- (a)
The value
b at the peak is such that
b <
LVi (see
Figure 7e). The peak level is set to 1.
- (b)
The value
b at the peak is such that
LVi ≤
b <
W (see
Figure 7f). The peak level is proportionally calculated based on the relation between
LV and
W (see Equation (4)).
- (c)
The value b at the peak is such that
Wi ≤
b (see
Figure 7g). The peak level is set to 0.
Any candidate for which the verification results in peak type = 0 (case 1) or peak level = 0 (cases 2c and 3c) is immediately rejected. The remaining candidates are ordered chronologically.
We consider that in the case that an event has occurred there is a peak corresponding to the arrival of attendees followed by a peak corresponding to their departure. Therefore, if there is such a sequence we confirm the event occurrence on this day and take the original peaks corresponding to the two candidates as the arrival and departure times, respectively. The remaining candidates (for which the verification produces peak level > 0, but the peak is not part of a correct Arrival-Departure sequence) are unknown cases. Unknown cases correspond to an abnormal mobility in the vicinity of the stadium that needs a further analysis to detect the cause.
By using the example shown in
Figure 5 and
Figure 6, we explain the above procedure. We consider the candidate event delimiter 10:00, which has as uncertainty interval “9:00 to 11:00” (see
Figure 5). Then, we search for a peak within this interval in the variation of the differences between
approaching and
moving-away stoppings (see
Figure 6). The search finds no peak in this interval and therefore sets the peak type to zero, an indication that the candidate must be rejected. The reasoning behind this decision is that the interval cannot contain an event delimiter if it does not contain a peak in
approaching stoppings or
moving-away stoppings. The process continues to verify all the candidates.
While
Figure 5 and
Figure 6 show the application of this analysis method to a day without an event,
Figure 8 and
Figure 9 show its application to the data of a day with an event occurring. As seen in
Figure 8, there are four candidate event delimiters: 9:00, 13:00, 16:00, and 20:00 located in uncertainty intervals of 8:00 to 11:00, 11:00 to 14:00, 15:00 to 18:00, and 19:00 to 21:00, respectively. The search for peaks in these intervals from the data presented in
Figure 9 found three peaks corresponding to the first three candidates, respectively, while no peak was found for the last candidate. The intervals of the peaks were evaluated as follows:
Peak at 9:00 in the interval 8:00–11:00, peak type: −1, peak level: 0.667
Peak at 13:00 in the interval 11:00–14:00, peak type: 1, peak level: 1
Peak at 16:00 in the interval 15:00–18:00, peak type: −1, peak level: 1
Peak at 20:00 in the interval 19:00–21:00, peak type: 0
By applying the method of confirming event occurrence as explained previously, the last peak is immediately rejected. The remaining three candidates form a sequence Departure-Arrival-Departure. The last two peaks in the sequence (corresponding to 13:00 and 16:00) are confirmed to be Arrival and Departure times, respectively. The intervals around these two peaks are taken as inputs to the next step for further analysis. The peak at 9:00 is an unknown case that does not indicate a big event at the stadium.
4.3.2. Analysis of Temporal Patterns of Arrival and Departure
We proceeded to conduct a local analysis at a finer temporal granularity level. To this end, we performed the same analysis on the
number of stoppings near the venue (i.e., the stadium in this case). The analysis is focused on the time intervals containing the confirmed arrival and departure times. We subdivided each 1-h time interval into four sub-intervals of 15 min each.
Figure 10 shows the temporal variation of the variables during smaller intervals around the arrival and departure times. This analysis allows us to refine the answer to the question of estimating the start and end times of the event assuming that the highest peak corresponds to the start or end of the event. From
Figure 10a we see that the start time estimated in the previous step to be 13:00 is refined to be around 13:30. Similarly,
Figure 10b shows a refinement of the end time from 16:00 to around 16:15.
The analysis further shows the temporal patterns of the arrival and departure of event attendees. The temporal pattern of arrival presented in
Figure 10a shows that some event attendees have been arriving earlier before the event start time, as shown by the shorter peaks that exceed the upper bound between 11:00 and 13:00. After the start of the event (approximately after 15 min) the number of
stoppings at the venue sharply dropped below the upper bound becoming almost normal. This suggests that in general event attendees arrived on time. The temporal pattern of departure shown in
Figure 10b suggests that it has taken less than 30 min after the end of the event for the
stoppings near the venue to return to normal, meaning that event attendees did not spend much time at the venue after the end of the event.
Different big events may show different temporal patterns of arrival and departure of attendees. For example, unlike the attendees of the event on 24 November 2012 who arrived on time and departed as soon as the event ended (see
Figure 10), the attendees of the event on 14 November 2012 kept arriving after the start of the event, as shown by the
number of stoppings near the venue that remained above the upper bound for some time after the peak (see
Figure 11a). Attendees of the latter event also departed progressively, as shown by the
number of stoppings near the venue, which remained above the upper bound during a long time interval after the peak (see
Figure 11b).