3.2. Results and Discussions
(1) Validating the accuracy and precision. The validation process involves the background computation from the previously described pair of datasets.
The first dataset [
38] includes three scenarios for testing: Test 1, which focuses on image sequences captured in outdoor conditions, characterized by high compression levels and low quality. Tests 2 and 3 examine indoor conditions, where changes in overall luminance tend to be more stable.
The second dataset [
40], consists of ten scenarios divided into two categories: base scenarios (sequences 1 to 4) and dynamic scenarios (sequences 5 to 10). The base scenarios illustrate standard applications for background subtraction techniques, while the dynamic scenarios present more complex situations for detecting moving objects.
The computed results are divided into two stages: confusion matrix measures, which include precision, recall, f1 score, and IoU (intersection over union) [
40]. The second error quantifies the model error in computing the motion objects map using Bias error (Bias), standard deviation of Bias (std Bias), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE) [
25].
The results for [
38] dataset are summarized in
Table 2. Upon examining the data, it is evident that the proposed algorithm has greater precision compared to the reference algorithm. Notably, performance is robust in Test 1, representing outdoor scenarios. This is a positive outcome because the proposal leverages texture information rather than relying solely on intensity. In indoor scenarios, the performance becomes closer to the reference algorithm [
19].
The results for both methods in terms of recall are quite similar. However, the proposed algorithm generally generates bounding boxes that are slightly larger than the moving objects. This results in a trade-off, slightly penalizing precision and recall, but it has the advantage of fully detecting the moving object. In contrast, the reference algorithm emphasizes object edges and produces moving maps with holes when objects are slightly oversized. In outdoor scenarios, the behavior of MMBS becomes superior to MOG.
Additionally, the proposed method handles shadows and reflections more effectively in indoor environments, with the limitation that structural elements become slightly smaller than shadows and artifacts generated by lights. At the same time, MOG is more sensitive to these factors. The F1 Score and IoU results for both approaches are comparable, with the MMBS approach showing better overall performance in outdoor scenarios.
The model errors are presented graphically in
Figure 7 for the three sequences. The results from the outdoor scenarios shown in
Figure 7a demonstrate that MMBS becomes more stable, adding less noise while moving. In outdoor scenarios, the MOG approach shows greater accuracy in stable conditions and in the absence of reflections (as indicated by the red peak in
Figure 7b). This indicates that while our proposal becomes more stable, the MOG approach performs better in these circumstances, which is reflected in the probability density function (pdf) distribution. The proposal exhibits a slightly higher bias error in the indoor scenario than MOG. This is due to the finer shape of the person walking being smaller than the structural elements used. This limitation becomes one of the practical challenges of MMBS in real situations. However, the pdf distribution for our proposal is more evenly centered around zero, whereas MOG shows a less favorable error distribution centered around a non-zero position. Finally, in the outdoor scenario, RMSE shown in
Figure 8a, the motion detection accuracy of MMBS significantly outperforms that of the reference method, demonstrating greater precision and stability. In indoor scenarios, the performance of both methods is similar; however, when shadows and reflections are introduced, the RMSE for MMBS remains nearly zero, while the RMSE for MOG tends to increase.
The second dataset [
40] encompasses more extensive scenarios, including dynamic, indoor, and outdoor conditions, providing a more realistic testing environment. In this study, the enhanced MoG approach referenced in [
37] was implemented to facilitate comparison with our proposal. The results concerning confusion measures are summarized in
Table 3, highlighting the superior precision of our proposal compared to the enhanced MoG.
In baseline scenarios, represented by scenarios 1 to 4, our proposal generally outperformed the baseline, achieving up to a improvement in the best cases. In the worst-case scenario, our performance was only slightly inferior, approximately behind the reference. These baseline scenarios predominantly depict ideal conditions in an indoor environment, where our proposal demonstrates better overall efficiency. In scenarios with low resolutions or small moving objects, recall and precision decrease due to the size of the structural elements, which do not align perfectly with the motion zone. This issue is exacerbated in environments without texture, such as indoor settings.
When analyzing dynamic scenarios, particularly the more challenging cases from 5 to 10, while precision does drop significantly for both algorithms, our proposal remains advantageous in every case. The best-performing scenario exhibits a difference of in favor of our proposal, while the worst case shows a advantage. Overall, the proposal has an increment of than enhanced MOG. In terms of recall measure, the overall becomes approximately to MMBS instead of to enhanced MOG, which indicates better precision to detect motion objects without introducing noise to the motion map.
Regarding error measures,
Table 4 shows that the biases in both scenarios are similar, with our proposal demonstrating a slight advantage. This suggests that it consistently adapts well to varying motion conditions. A smaller standard deviation than the reference algorithm further supports this adaptability. The bias distribution for our proposal is centered around 0, indicating general accuracy in modeling motion. In contrast, the reference method exhibits multiple modes in the bias, reflecting its inability to adapt effectively. Furthermore, the MAE and RMSE indicate that our proposal significantly reduces errors in motion detection.
To illustrate,
Figure 9 and
Figure 10 display the best and worst cases of our proposal regarding RMSE across different tested frames.
Figure 9a shows that the Bias error approaches 0 for our proposal in the best scenario, with the Bias distribution centered around 0. The small maxima on the left side of the probability density function represents instances where our proposal struggles to detect motion efficiently. Still, it does so with considerably lower errors than the enhanced MoG. Although the results for the enhanced MoG are similar, the expected value shows less Bias error, while the deviation is more significant. This observation is confirmed by
Figure 10a, where the RMSE graph indicates that our proposal generally has lower errors than the enhanced MoG. In the worst-case scenario, as seen in
Figure 9a, our proposal categorizes motion objects based on the dynamics of the scenario; however, it still maintains a lower bias compared to the enhanced MoG. The bias distributions for our proposal remain centered at 0, confirming its clear advantage over the reference algorithm. This behavior is validated in
Figure 10b, where the proposal consistently exhibits lower RMSE from frame to frame.
(2) Vehicle analysis from a roundabout. To test MMBS approach, we contextualize the scenario: In recent years, Mexico has experienced significant emigration from rural towns to urban areas, rapidly increasing urban population density. This growth necessitates improved resilience planning. One such initiative involves optimizing the use of vital resources. The town has adopted various measures to ensure the quality of these essential resources. One key initiative is replacing traffic signals with roundabouts in suburban areas to facilitate quick vehicle traffic flow. As a result, monitoring roundabouts and intersections where high vehicle density leads to traffic congestion has become imperative.
To conduct our analysis, we focused on the roundabouts in the suburban areas of Querétaro, a city experiencing one of the fastest growth rates in Mexico.
Figure 11 illustrates the following: (a) the location of Querétaro, (b) the position of Querétaro City, (c) the structure of the town, highlighting the town center in red and the suburban areas in blue; (d) the suburban areas where roundabouts have replaced traffic lights are shown in purple, while the red circles indicate the locations of the roundabouts examined in this study. As can be seen, the test scenario corresponds to the primary access points to significant avenues, representing some of the most critical traffic resources in the area.
The morphological and MOG approaches have been used for a qualitative comparison.
Figure 12 illustrates the morphological approach, showing some frames along time, where the motion map and the texture information are shown, including several samples over time. Similarly,
Figure 13 displays various motion maps and background estimations from the contrast algorithm. Both approaches create a probabilistic model based on pixel intensity. However, using direct intensity values has the disadvantage of treating luminance changes as different values or compensating dynamically for the camera’s white balance, which can introduce significant noise into the motion map. In contrast, the morphological approach does not rely on pixel intensity; instead, it focuses on local texture information. By analyzing local information, this method extracts the structures of pixel intensities rather than just the intensity values themselves. This makes it more reliable under varying lighting conditions, effectively reducing motion noise in the motion map. In
Figure 12 and
Figure 13, in the background layer, the velocity convergence is illustrated. The estimated foreground layer is shown in grayscale. In the case of the proposed method, this reflects the learned information about local texture, while in the case of the reference algorithm, it represents the average intensity. It is evident that the proposed method focuses on the local texture structure, whereas the reference algorithm emphasizes pixel intensity. In
Figure 12 and
Figure 13, in the motion layer, the morphological approach demonstrates that texture information is effectively learned. This leads to increased robustness against local changes in lighting within the same video frame, resulting in improved classification of foreground and background planes based on texture. In contrast, the MOG approach relies on intensity sensing, which makes it more vulnerable to consistent lighting conditions.
As noted, there are frames where objects appear fused, which is problematic when analyzing vehicles. In this regard, the morphological approach provides a more precise differentiation of vehicles while minimizing motion noise.
In motion detection, a colored map represents motion densities, which helps identify areas with heavy infrastructure use. This application is illustrated in
Figure 14, where a superimposed colored map shows vehicle density over time. The data were collected over a half-hour period at 3:00 PM in Mexico City. Vehicle patterns can be observed in this sample, highlighting different density zones. Notably, several areas exhibit varying densities in the lower section of the roundabout. This variation is due to the continuous flow of vehicles in that region.
Finally, the criteria for detecting motion are effective in outdoor environments. The main contribution of this work represents a significant advance in modeling systems because it becomes fully modeled as a discrete task using mathematical morphology, which, in terms of computation, makes the development of discrete algorithms even more efficient than MOG approaches based on continuous spaces. The clear advantages come in the sense that the algorithm is a discrete model that is easy to implement on a computer with relatively low complexity because it has been implemented as an integer algorithm. These criteria facilitate quick convergence and establish a stable background subtraction model. The figure depicts scenarios where a drone is in motion, leading to an inconsistent background. In these cases, the proportion of the foreground suddenly increases, prompting the model to restart. Within a few frames, a consistent model for motion detection is established.