This section presents the results obtained from applying the proposed method, which combines supervised classification and unsupervised clustering techniques for the analysis of urban incidents. The experiments were conducted using different datasets in order to evaluate the model’s performance in both controlled scenarios and real-world contexts.
4.5.1. Results with the Guayaquil Dataset
The first set of experiments was conducted using a synthetic hybrid corresponding to the city of Guayaquil, generated to simulate urban incident reports with associated textual and geographic information. This dataset incorporates controlled variability, allowing the model’s performance to be evaluated in a scenario designed to emulate the city’s information dynamics. The unstructured nature of the data introduces specific challenges, such as the use of colloquial language, abbreviations, and ambiguous descriptions, as well as a high density of messages that must be effectively filtered by the classifier.
The classification of incidents was carried out using the ARF model. The results of this stage are presented through the confusion matrix shown in
Figure 4, which shows the distribution of the model’s predictions against the actual classes of incidents generated for the Guayaquil environment. This multi-class classification phase is essential for identifying specific mobility patterns, differentiating between accidents, obstacles, traffic, and other road events.
In
Figure 4, we can see that the classifier correctly identifies 626 reports corresponding to the “no incidents” category, which is the class with the highest number of correct predictions. This result demonstrates the model’s solid ability to filter out irrelevant posts from the information flow. Similarly, there are 473 correct classifications in the “obstacle” category, followed by 215 correct classifications in the “traffic” category and 203 in the “accident” category.
However, the matrix reveals a considerable level of confusion between categories, due to the semantic complexity of the evaluated dataset. In particular, it is observed that 313 actual reports of “accident” and 339 of “obstacle” were misclassified as “no incidents.” This behavior suggests that, when faced with descriptions with a low level of detail or intentional ambiguity in the processed texts, the model tends to make conservative predictions toward the filtering class.
Likewise, cross-confusion is evident, especially in the “traffic” category, where 317 actual reports were assigned to “no incidents” and 149 to “obstacle.” This pattern is consistent with the nature of the language used on social media, even in simulated scenarios, where congestion, partial blockages, or delays are often described using generic terms that make it difficult to accurately distinguish between routine traffic and physical events on the road. Despite these confusions, the model maintains sufficient accuracy to feed into the subsequent stages of spatial analysis.
To complement the confusion matrix-based analysis,
Table 5 presents the overall performance metrics of the ARF classifier obtained for the Guayaquil hybrid dataset, considering precision, recall, F1-score, and support per category.
As shown in
Table 5, the model achieves an overall accuracy of 50.6% over a total of 3000 processed instances. At the class level, the traffic category shows perfect accuracy (1.000), indicating that all instances classified under this label correspond to congestion events; however, its low recall value (0.314) reflects the model’s limited ability to retrieve all simulated traffic reports, evidencing the classifier’s restrictive behavior.
A similar pattern is observed in the accident class, which achieves high precision (0.927) but low recall (0.284), suggesting that the model prioritizes minimizing false positives for critical events at the expense of omitting a significant fraction of these incidents. On the other hand, the non-incident category has the highest recall value (0.797), confirming the effectiveness of the model in the initial filtering phase, although with lower accuracy (0.392) due to the displacement of other categories towards this class. The obstacle class presents the most balanced performance with an F1-score of 0.529, reflecting a moderate ability to identify this type of event in a complex evaluation environment. Overall, the macro averages (0.702 in precision) highlight the difficulties inherent in classification in highly variable urban contexts, where semantic similarity between categories limits the balanced performance of the ARF model.
It is important to note that the accuracy value observed in this controlled scenario should not be interpreted as a limitation of the proposed approach, but rather as a direct consequence of the incremental nature of the data flow and the conservative strategy adopted by the model. In highly dynamic urban contexts characterized by semantic ambiguity, the classifier prioritizes the reduction of false positives and the stability of the filtering process, even at the expense of sacrificing overall accuracy. This behavior is consistent with the objective of the method, whose purpose is not to maximize the individual classification of messages, but to ensure that the events used in the later stages of spatial analysis are relevant, reliable, and stable over time.
For the spatial analysis of the Guayaquil dataset, the geographic coordinates associated with reports classified as incidents by the ARF model were used. In order to analyze the temporal evolution of the reported events, the data were processed in an incremental input stream, where each 10 min interval incorporates approximately 500 new messages, simulating a continuous real-time monitoring scenario. In the first observation interval (0–10 min), for example, the flow consisted of 85 accident reports, 44 obstacle reports, and 316 traffic reports, which allows us to initially observe the operational load of the approach and the early distribution of the types of incidents detected.
Figure 5 shows the evolution of this flow.
During the initial phase, shown in
Figure 5a,b, there is a dispersion of reports that begins to concentrate in an incipient manner. In these first 20 min, the method begins to outline the location of events on road corridors, where traffic reports (blue) clearly predominate, followed by a few isolated accidents (red) and a minimal presence of obstacles (orange), which mark the beginning of detected activity on the road network.
Subsequently, in the central intervals of
Figure 5c,d, it is evident that the accumulation of incidents reaches its maximum saturation point. In this phase, the density of points clearly outlines the city’s main avenues and roadways, where the overlap of accident and congestion reports suggests that the initial events have escalated, severely affecting the flow of traffic on the busiest arteries.
Finally, when observing
Figure 5e,f, a notable reduction in point density can be seen. This decrease in the arrival of new messages in the last 20 min indicates a transition in the intensity of the information flow within the monitoring window. However, the persistence of certain nodes on the main roads allows us to identify areas of recurrence that remain active, demonstrating that certain critical incidents have a prolonged impact on the road network, which the algorithm continues to track effectively despite the decrease in the volume of new data.
These visualizations show how incremental report management allows for the identification of areas with a higher recurrence of incidents in real-time, reflecting significant variations in both type and density throughout the entire observation hour.
In order to complement the spatial analysis based on the point distribution of reports, the evolution of the clusters generated by the DenStream algorithm is analyzed below. While
Figure 5 shows the dispersion and progressive concentration of individual incidents,
Figure 6 summarizes this behavior by identifying dense clusters that represent areas of persistent traffic conflict. These clusters emerge dynamically as the density of events in the flow increases and allow the abstraction of specific information into more stable and representative spatial structures for real-time monitoring.
As shown in
Figure 6, a global visualization of the spatial clustering status after 60 min of continuous processing is presented, where the scale bar indicates the distance in kilometers. This figure simultaneously shows the clusters detected from the first 10 min to the final snapshot, differentiated by their internal status within the model. The clusters represented in purple correspond to dense centers in the decay phase, while the cluster in green represents the center detected in the previous snapshot (50 min). The cluster highlighted in red corresponds to the current dense center, generated after 60 min of processing, which concentrates 40 reports within the neighborhood radius
and has a weight
. This behavior demonstrates DenStream’s ability to maintain memory of past events and progressively update spatial concentrations as new reports are incorporated. Although the visualization mainly emphasizes the state of the most recent cluster, the model incrementally identifies multiple dense centers throughout the flow, approximately one every 10 min, reflecting changes in both the geographic location and semantic composition of the incidents.
The values mentioned above are summarized quantitatively in
Table 6, which shows, for each temporal snapshot of the flow (in minutes), the dense center detected, the status of the cluster within the model, the number of cluster groups formed, the number of reports contained within the neighborhood radius (
), the weight associated with the cluster, the distribution of reports by category, and the dominant category of each cluster. This table clearly shows the temporal evolution of dense centers, complementing the information presented in
Figure 6 and facilitating the interpretation of incident concentration patterns in the city.
Table 6 allows for a detailed analysis of the temporal evolution of dense centers detected by DenStream throughout the data flow corresponding to the city of Guayaquil. In the initial 10 min snapshot, a dense cluster with 87 reports within the neighborhood radius
and a weight
is identified, dominated by the traffic category (72 reports), reflecting an early phase characterized by recurring traffic congestion. At 20 min, the cluster significantly increases its density to 113 reports within
and reaches the highest weight observed (
), with a clear transition to the obstacle category as dominant, a trend that consolidates at 30 min with a dense cluster composed exclusively of obstacle reports. In the 40 and 50 min snapshots, a progressive decrease in both the number of reports and the weight of the cluster can be observed, indicating a process of temporary dissipation of events. Finally, after 60 min, the model detects a dense center in its current state with 40 reports within
and a weight
, again dominated by the obstacle category, evidencing the consolidation of a persistent incident that continues to affect urban mobility. Taken together, these results confirm that DenStream not only identifies relevant spatial concentrations, but also captures the temporal transition in the nature of traffic incidents within a continuous flow.
4.5.2. Results with the Panama Dataset
This subsection presents the results obtained from the validation dataset corresponding to Panama City, which was designed to simulate georeferenced traffic reports published on social media under controlled conditions. Thanks to its design based on historical seeds, the dataset incorporates semantic variability, category diversity, and realistic spatial distribution, allowing the behavior of the proposal to be evaluated in a representative and reproducible urban scenario.
The classification process was carried out using the ARF model, which operated incrementally to simulate a continuous flow of incoming messages. The performance of the classifier was evaluated using the confusion matrix presented in
Figure 7, which allows analyzing the correspondence between the predicted classes and the actual labels of the Panama dataset.
In
Figure 7, we can see that the model correctly classified 917 real reports in the “traffic” category, consolidating itself as the class with the highest number of correct predictions in this scenario. This is followed by the “obstacle” category with 895 correct predictions, while the classifier accurately identified 531 instances of ‘accident’ and 250 corresponding to “no incidents.” These results confirm the robustness of the approach for operating in simulated urban environments with high information density.
However, the confusion matrix reveals ambiguities arising from the operational correlation of incidents. There were 90 cases in which actual reports of “traffic” were classified as “obstacles,” as well as 54 instances of “non-incidents” mistakenly assigned to this same category. Similarly, 40 reports of “accidents” were predicted as “traffic” and 45 as “obstacles,” reflecting the close conceptual relationship between physical events on the road and their immediate effects on traffic congestion. Despite these specific confusions, the low cross-error rate between critical categories confirms the effectiveness of the model in identifying the general nature of incidents in a continuous data stream.
To complement the confusion matrix analysis, additional performance metrics were calculated, including precision, recall, and F1-score for each category, as well as the corresponding support.
Table 7 summarizes these results, providing a quantitative view of the classifier’s performance by class.
In the case of the Panama dataset, the model’s performance shows greater stability in the classification metrics, reflecting less semantic variability in the validation data flow. However, as in the Guayaquil scenario, the main objective of the classification process is to serve as a reliable filtering stage for incremental spatial analysis. From this perspective, the model’s usefulness is evaluated based on its ability to consistently feed the clustering process, rather than on the isolated optimization of traditional classification metrics.
As shown in
Table 7, the model achieved an overall accuracy of 86.4%, accompanied by a weighted F1-score of 0.864, calculated over a total of 3000 instances. These values confirm that the ARF classifier maintains solid and consistent performance under controlled conditions, being able to handle continuous data streams with multiple categories and high event density.
The analysis by category shows balanced performance. The “No incidents” class has perfect accuracy (1.000), indicating that the method effectively filters out irrelevant messages, eliminating false positives in this category. Meanwhile, the “Obstacle” category achieves the highest recall value (0.912), demonstrating the model’s ability to capture the vast majority of events related to road obstructions. The “Traffic” and “Accident” classes have F1-scores of 0.879 and 0.856, respectively, ensuring reliable discrimination between routine traffic congestion and events of greater operational criticality.
For the spatial analysis of the Panama dataset, the geographic coordinates associated with reports classified as incidents by the ARF model within a 60 min time window were processed. The data were analyzed incrementally in 10 min intervals, with each window incorporating approximately 500 messages. In the first observation interval (0 to 10 min), for example, the flow consisted of 37 accident reports, 18 obstacle reports, and 405 traffic-related reports, allowing us to visualize the operational load of the algorithm and the predominance of events associated with traffic congestion from early stages.
Figure 8 shows the spatial distribution of individual incidents in each time interval, allowing us to observe how the progressive accumulation of events reflects the urban mobility dynamics modeled in Panama City.
In
Figure 8a, we can see a dispersion that aligns with the main road corridors and avenues, where traffic reports (blue) predominate massively, with a very limited presence of accidents and obstacles. However, just 10 min later in
Figure 8b, the flow dynamics change dramatically; there is a sudden increase in orange and red dots, which gives greater visibility to critical events. This trend is accentuated in the central intervals in
Figure 8c,d. In these phases, the method reaches its highest saturation density, showing a strong prevalence of accidents (red) and obstacles (orange) that almost completely overshadow traffic reports. This saturation of high-priority incidents clearly delineates the arteries with the highest flow, suggesting that physical events are conditioning mobility throughout the monitored area. As the observation window closes, the composition of the flow changes again. In
Figure 8e, a transition can be seen where obstacles (orange) become the almost exclusive category on the map, while in the final interval in
Figure 8f, the algorithm records a return to the predominance of traffic reports (blue). This final evolution is key, as it demonstrates the model’s ability to track how a situation of multiple accidents and obstacles leads back to a condition of widespread traffic congestion before the monitoring hour is complete. While the spatial distribution of individual incidents allows for the identification of general mobility patterns and areas with high operational load, an analysis based solely on points limits the ability to distinguish persistent concentrations of events in space. In order to complement this spatial analysis and capture road impact areas in a more structured way, an incremental clustering approach using the DenStream algorithm is incorporated below, which allows the identification and tracking of the temporal evolution of incident clusters as the data flow progresses.
Figure 9 shows the overall status of spatial clustering after 60 min of continuous processing. The visualization integrates the clusters identified from the first 10 min to the final snapshot, allowing us to observe their temporal evolution and their status within the model. The clusters represented in purple correspond to dense centers that are in a decay phase, while the cluster in green represents the center detected in the previous snapshot (50 min). The cluster highlighted in red identifies the current dense center, generated at 60 min, which groups 162 reports within the neighborhood radius
, reaching a weight of
. This result demonstrates the ability of the DenStream algorithm to preserve relevant historical information and dynamically update spatial concentrations as new data is incorporated, facilitating the detection of persistent clusters of high relevance in complex urban scenarios. The values mentioned above are detailed in
Table 8, which shows, for each temporal snapshot of the flow (in minutes), the dense center detected, the status of the cluster within the model, the number of cluster groups formed, the number of reports contained within the neighborhood radius (
), the weight associated with the cluster, the distribution of reports by category, and the dominant category in each observation interval.
The
Table 8 allows for a detailed analysis of the temporal evolution of the dense centers detected by DenStream throughout the data flow corresponding to Panama City. In the snapshot of the first 10 min, a dense center is identified with 234 reports within the neighborhood radius
and a weight
, composed exclusively of the traffic category, reflecting an initial phase characterized by massive and highly concentrated traffic congestion. In the 20 and 30 min snapshots, a geographical shift of the dense center is observed, accompanied by a marked semantic transition, where the clusters become dominated by accident reports, concentrating 129 and 135 events respectively, which evidences a phase of high criticality associated with physical incidents that exceed routine traffic. Then, in the 40 and 50 min snapshots, the model detects dense centers dominated by the obstacle category, with 102 and 146 reports within
, suggesting the persistence of prolonged blockages or interruptions in the road network. Finally, in the 60 min snapshot, DenStream identifies a dense center in its current state with 162 reports and a weight
, where traffic is once again the dominant category, indicating a reconfiguration of urban dynamics toward sustained vehicle saturation as a result of previous incidents. Taken together, these results demonstrate the algorithm’s ability to incrementally capture not only the location of critical areas, but also the temporal transformation in the nature of road events within a real urban environment.