Next Article in Journal
Short-Term Highway Traffic Flow Prediction via Wavelet–Liquid Neural Network Model
Previous Article in Journal
Stochastic Finite Element Analysis for Static Bending Beams with a Two-Dimensional Random Field of Material Properties
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Morphological Background-Subtraction Modeling for Analyzing Traffic Flow

by
Erik-Josué Moreno-Mejía
,
Daniel Canton-Enriquez
,
Ana-Marcela Herrera-Navarro
and
Hugo Jiménez-Hernández
*
Faculty Informatics, Universidad Autónoma de Querétaro, Av. de las Ciencias S/N, Juriquilla, Santiago de Querétaro 76230, Mexico
*
Author to whom correspondence should be addressed.
Modelling 2025, 6(2), 38; https://doi.org/10.3390/modelling6020038
Submission received: 16 March 2025 / Revised: 18 April 2025 / Accepted: 6 May 2025 / Published: 9 May 2025

Abstract

:
Automatic surveillance systems have become essential tools for urban centers. These technologies enable intelligent monitoring that is both versatile and non-intrusive. Today, these systems can analyze various aspects, such as urban traffic, citizen behavior, and the detection of unusual activities. Most intelligent systems utilize photo sensors to gather information and assess situations. They analyze data sequences from these photo sensors over time to detect moving objects or other relevant information. In this context, background modeling approaches are crucial for efficiently detecting moving objects by differentiating between the foreground and background, which serves as the basis for further analysis. Although current methods are effective, the dynamic nature of outdoor environments can limit their performance due to numerous external variables that affect the collected information. This paper introduces a novel algorithm that uses mathematical morphology to create a background model by analyzing texture information in discrete spaces, leading to an efficient solution for the background subtraction task. The algorithm dynamically adjusts to global luminance conditions and effectively distinguishes texture information to label the foreground and background using morphological filters. A key advantage of this approach is its use of discrete working spaces, which enables faster implementation on standard hardware, making it suitable for a variety of devices. Finally, our proposal is tested against reference datasets of surveillance and common background subtraction algorithms, demonstrating that our method adapts better to outdoor conditions, making it more robust in detecting different moving objects.

1. Introduction

The dynamics of human populations have undergone significant changes in recent decades [1]. The shift from rural to urban living has increased dramatically, presenting substantial challenges in ensuring a high quality of life for residents in emerging urban centers [2]. A United Nations document outlines important considerations for local plans regarding growth, development, and the planning of additional urban areas [3]. The complexities of life and social interactions in urban cities create intricate challenges due to various factors and considerations [4]. In urban centers, effectively monitoring and understanding the needs of inhabitants poses both technical and theoretical challenges, as the intrinsic dynamics can be complex [5]. Many circumstances are influenced by the specific context of each city and external variables [6].
This situation has become convenient with the implementation of intelligent monitoring systems aided by Artificial Intelligence (IA) [7]. This technology uses transversally of other well-known fields, such as Computer Vision, Image Analysis, Data Analysis, and Smart Sensing, to mention a few [8]. In this context, Computer Vision aided by photo sensor technology allows the implementation of algorithms to analyze the object’s dynamics [9]. These algorithms are based on analyzing matrix data of intensities prescribed by the sensor, exploiting the spatial and temporal information sampling intrinsic in raw data [10]. Primarily, the approaches are based on vision and Image Analysis to segment the information in primitive data, which allows the study to be done at different levels of detail [8]. This is the case of the approaches for motion detection objects. Several well-known approaches are used for this task, and one of the most used is the background subtraction approach [8]. The background subtraction approach involves creating a model that enables the detection of the foreground and background in a video sequence [11]. Different methodologies establish specific conditions under which the algorithm operates effectively. Generally, this approach assumes that the camera (photo sensors) is fixed while moving objects are displaced within the camera’s field of view [12]. The velocity, complexity of the movement, and the objects to be detected depend on the contextual scenario [13].
One of the most challenging situations to model arises from changes in luminance [14], which can occur due to reflections or varying light levels. These changes can create overexposed or underexposed areas, resulting in a lack of information for analysis [15]. Such conditions are common in outdoor scenarios, where various factors can affect the image [16]. These challenges have led to the development of specialized approaches tailored to specific situations, ensuring satisfactory performance [17].
In most techniques, the camera is fixed long enough (subject to the sampling rate) to accurately compute the foreground and background. Notable approaches include those described in [18,19,20,21], which define a background model based on color intensity. This model is represented as a mixture of Gaussians, where the labeling process involves determining which Gaussian models apply to the computed evidence. Some adaptations have been proposed to continuously update the Gaussian model to better adapt to changing scenario conditions. In [22], a novel method is introduced that analyzes local information and its temporal changes, detecting regions of high-intensity change that are classified as moving.
More recently, Refs. [23,24] introduced a neural classifier for segmenting foreground and background. These approaches are particularly useful when comprehensive historical information is available, and in some instances, a ground truth dataset is utilized to train the neural models [25,26,27]. Recent advancements in neural approaches include the work of [28], which trains a hybrid Spiking Neural Network model by adding layers to ensure stability and precision in object detection.
To improve performance, several criteria have been developed to reduce the computational complexity of background algorithms. One practical approach is outlined in [29], which enhances the Mixture of Gaussians (MoG) method by resizing the imaging source. This adjustment reduces complexity and increases the frame rate. Similarly, the work of [30] aims to reduce complexity by implementing a parallel version of MoG in a digital signal processor (DSP), allowing for the development of embedded systems that utilize this approach.
One of the newest advancements involves the combination and enhancement of models, as proposed by [31]. This algorithm is based on the positioning of local features in the CIE L*a*b* color space and aims to ensure the accurate location of information in low-luminance scenarios, particularly for nighttime applications. Additionally, Ref. [32] presents optimizations for MoG to detect features like borders, while ref. [33] introduces an approach for use in heterogeneous reflective scenarios. Labeling the foreground and background in several applications becomes essential to distinguish moving objects for further analysis. The tasks are typically performed in three main steps [13,34,35]: (1) building or initializing the background model, (2) operating the model, and, optionally, (3) adapting and correcting the model. These steps are detailed as follows:
In the step of building the background model, the common sub-steps involved are as follows: (a) Image stream acquisition involves acquiring and sampling temporal information from photo sensors. This process includes temporal consistency sampling, spatial sampling, data correction, and prefiltering. (b) Data coding and representation involve mapping raw data to a format distinguishing foreground and background. The coding process includes transformations and data reduction while preserving the intrinsic data structures that allow comparison. (c) Establishing a metric space based on data representation is an essential final step. This involves creating a metric from coded data that remains invariant under changes in physical conditions. This is crucial for enabling robots to label foreground and background planes effectively. The metric should be robust enough to handle fixed or dynamic backgrounds and several physical luminance affectations, compression artifacts, and process noise. (d) Develop a model that utilizes an automatic and adaptive learning process by integrating the last two steps. This model aims to determine an algorithmic method for estimating the background and foreground processes. The algorithm employs prior information to make an initial estimation. It is considered automatic and adaptive because it typically relies on outdoor data.
The operating model utilizes an existing model and operates based on information from the photo sensor stream. When matched with the building’s background model, the operational model becomes asymmetrical and exhibits low complexity, enhancing its efficiency for online applications. Adapting and correcting the model involves implementing a method that continuously adjusts the model parameters in response to data variability, ensuring consistency with the background model. Traditionally, these approaches employ numerical methods to address the variability of key parameters while maintaining low computational complexity.
In this context, the literature presents advancements in building background models supported by various theoretical frameworks [14,19,24,25,28]. These studies indicate that robust methods with high accuracy are often associated with high complexity, which renders them unsuitable for online applications. Conversely, low-complexity methods are more prone to errors but benefit from reduced computational demands. Additionally, many techniques rely on numerical methods that, when operated consistently, may degrade over time, leading to operational inconsistencies that are challenging to detect in real-time situations. This presents an opportunity to propose adaptable methods that balance computational complexity with accurate outdoor detection.
This paper introduces a background/foreground model based on mathematical morphology to develop a practical algorithm named MMBS (Mathematical Morphology Background Subtraction). The discrete properties of morphological operators facilitate the development of a method focusing on (a) texture characterization of photo sensors that remains invariant to changes in global luminance, (b) the characterization of texture intensity using a discrete probabilistic probability density function (PDF) for each pixel, and (c) an updating process that allows for motion compensation—transforming moving objects into static ones and vice versa for static zones changing. This approach was tested against other established background methods during the experimental process.

2. Materials and Methods

This section outlines the morphological properties essential for developing the morphological approach.

2.1. Morphological Foundations

Mathematical morphology (MM) is a theoretical framework for a graph structure. One of its most common applications involves analyzing grid surfaces that result from the Euclidean multiplication of discrete sets. MM consists of two fundamental operators: the dilation and erosion operators. These operators define a lattice order for all connected sets within the working space, allowing for the manipulation of structures by stepping forward or backward according to the λ criterion, representing a structural element.
The erosion and dilation operators are mathematically defined as follows: The erosion operator is given by ξ λ I ( x ) = min x D λ f ( x + b ) λ ( b ) . The dilation operator is expressed as δ λ I ( x ) = max x B λ f ( x + b ) λ ( b ) .
Here, λ denotes the structural element, while D λ represents the λ structure displaced to the reference position x for both erosion and dilation operations.
A couple of morphological basic filters must be defined starting from the above operators. These filters are known as opening and closing, denoted by γ λ and φ λ with structural element defined by λ , which denotes an adjacency matrix of the local neighborhood. These operators are defined as follows:
γ λ I = δ λ ξ λ I
φ λ I = ξ λ φ λ I
where I denotes the surface and λ the structural element used. Both operators become denominated as a filter because they obey the following properties: (a) Both transformations increase for all pairs of functions I and J with I J T ( I ) T ( J ) where T transformation becomes an γ λ or φ λ ; i.e., preserve the order relation under transformation. (b) Both transformations are idempotent; it means that for both transformations, T becomes true that T T I = T I .
Finally, two derived operations are introduced: topHat and botHat transformations. The transformation is defined as follows:
T h λ I = I γ λ I
B h λ I = φ λ I I
where I denotes the input surface and λ the structural element used. These transformations return the residual to approach a given surface by applying a filter opening or closing. The residual represents the data detail to approach discrete surfaces λ to the data I.

2.2. Image Representation and Noise Eliminating

In a generalized space of n independent dimensions, an extra dimension n + 1 represents a lattice where the morphological operators work. These morphological operators become the step forward/backward travel in the lattice, and the graph nodes represent a uniform discrete working space.
An image is a graph representing a grid formed through Euclidean multiplication within a discrete set. The dimensions of this image create an enumeration that corresponds to a topological grid of n × m dimensions, defining the height and width of the image. Each pixel can be considered a node, and its neighborhood consists of all connected nodes (traditionally in 8-connected or 4-connected configurations). The lattice associated with these nodes relates to intensity values, with the maximum intensity coded in 8 bits represented by 255. In this context, any image is a workspace where pixel intensities reflect topological structures.
Figure 1 illustrates this image representation: Part (a) shows how a pixel in a 2D plane connects with neighboring pixels, demonstrating adjacency through 4- and 8-connected configurations. The resulting lattice relates to the smallest blob until it covers the entire image surface. In part (b), the image is represented as a 3D surface featuring valleys and peaks, where the orthogonal axis (height) denotes the lattice created by the lowest elevation to the highest peak, corresponding to the maximum pixel intensity.
Texture analysis for characterizing objects is one of the most valuable approaches in image processing. In practice, texture analysis involves examining the contrast in raw data sampled by a photo sensor at a specific timestamp. The texture associated with a material is determined by the spatial contrast captured within the camera’s field of view. Additionally, contrast over time lets us detect when a pixel becomes occluded. To characterize the texture scenario effectively, spatial sampling must be thoroughly analyzed.
This example can be illustrated using a 3D intensity image in Figure 1b. The variations in local intensities determine the roughness or texture of the image. These intensity variations typically represent the different materials present in the scene. The spatial axis (height and width) represents the uniform sampled space, and as natural mapping, intensities represent a lattice over which the morphological operators work.
Noise generated during acquisition can be seen as spurious variations in intensity caused by the spatial sampling process. This noise may stem from the acquisition hardware or uncontrollable local variations, such as small reflections, temperature fluctuations, rain, or interference. These factors can lead to distortions in image intensity.
In this context, texture information is associated with structures related to mid- and low-frequency structures, while noise typically corresponds to high frequencies. Our approach adapts to the size and shape of structural elements: morphological operators that use small structural elements treat high frequencies as noise. In contrast, larger sizes of structural elements represent data structures that reflect shape and local contrast.
To mitigate the effects of noise in each snapshot, several authors applirf a high-pass filter to the sequence of image streams. This filter removes aberrant pixel intensities that distort the surrounding neighborhood. This work takes a similar approach by applying a morphological median operator modified [36]. This operator was initially introduced to model the time dynamics of a decay function using discrete measurements by using a weighted average. For our purposes, the morphological expression is modified as follows:
M e d ˜ λ ( I ) = median γ λ I , φ λ I
In this context, λ represents a structural element that approximates an n-dimensional surface. The operators γ λ and φ λ correspond to morphological filters known as opening and closing filters, respectively. These filters provide superior surface approximations by evolving the surface with the opening and closing filter that computes an upper and lower approximation. This process follows the property γ λ I I φ λ I . The median operator applied to a discrete set finds the middle value among these surfaces, disregarding minor variations. This results in an approximation of the surface concerning the λ structure. The filter acts as a low-to-mid pass filter; it smooths pixel variations more minor than λ , thus eliminating values representing outlier intensities.
Figure 2 illustrates the noise filtering process and its effects in high detail. The size λ defines the threshold for discarding high-frequency variations, meaning elements smaller than λ are removed, resulting in a smoother surface.

2.3. Morphological Texture Characterization

Approximate surfacing, supported by a structural element, can help differentiate the shapes and textures of objects from raw data. The morphological filters, known as opening and closing (as defined in Equations (1) and (2)), can be used to create a discrete surface with a structural element denoted by λ . By increasing the property filter, we can establish that γ λ I I γ λ I . Here, γ λ I and γ λ I represent the surface approximation of I by the lower and upper bounds of the data, respectively. These approximations are limited by the structural element λ . The data information I represents photo sensor intensities.
The morphological global structures are characterized by the data energy captured by the photo sensor. These structures are sensitive to energy changes, making them situations with under/over energy (under/over saturated light conditions) whenever a snapshot is sensed. The texture characterization uses the residual information between the data and the filter approximation. These residuals represent local small changes of data variation (contrast) over information. This information becomes stable whenever λ becomes greater than noise acquisition and less than global data information. This residual becomes interpreted as a mid-pass filter from photo sensor data. Then, residuals of filtering approaching this work represent the texture information. These residuals become negative/positive texture information depending on under/lower morphological approximation (opening or closing). Both residual information are computed by TopHat or BotHat transformation and are expressed as follows:
n , λ I = T h λ I
p , λ I = B h λ I
where I denotes the surface to be analyzed and λ the structural element used. A consequence of closing and opening morphological filters is that n ( x ) = inf or p ( x ) = inf denotes x where the approximation becomes exact. The intersection of both residuals represents the position x without texture information (no contrast residual data) that represents flat zones. When interpreting data, a photo sensor may encounter one of the following situations: (a) insufficient energy is detected to differentiate textures, (b) specific areas of the photo sensor may become saturated, resulting in a loss of data structure, or (c) there may be flat areas with no variations in texture intensity. These issues can complicate reference-based approaches; however, they do not affect the ability to detect zones that are occluded by objects with distinguishable texture information.
The texture information map is built by mixing both information (Equations (6) and (7)) and is expressed as follows:
τ λ I = median ς n , λ , ς + p , λ
Hence, ς represents the central value in the lattice induced by inf I and sup I by morphological operators; I the surface to be proccesed and λ the structural element used. The texture analysis is illustrated in Figure 3, using an image section focusing on a bush. The shape and distribution of the leaves in this section provide valuable texture information. Before the texture analysis, noise reduction was applied to enhance the image quality. After computing Equation (8), two outputs can be observed: (a) the upper image, which represents the morphological approach to the overall structure of the image, including the effects of the environment, and (b) the lower image, which displays the residual from the morphological approximation. This residual captures the texture information, highlighting the finest contrasts in intensity.

2.4. Probabilistic Model Based on Texture

The data from a photo sensor represent the changes in intensity over time. The sequence of images in stream Γ λ = { τ ( I 1 , λ ) , τ ( I 2 , λ ) , } is computed from the image sequences Γ = { I 1 , } sampled by the photo sensor. For notation, Γ t , λ ( x ) represents the snapshot taken at the t time instant, and x represents the position expressed as a two-dimensional tuple. Γ λ sequence express the texture information. The next task involves using this information to identify when an object starts moving within the matrix data. The solution is to implement an efficient classifier that can label the data when it represents the textured foreground.
The proposal involves creating a probabilistic classifier that models texture information over time. This model operates at the pixel level and implicitly incorporates spatial locality when characterizing textures. By considering the distribution over time, the model captures the temporal evolution of texture. The background model integrates temporal and spatial information, providing insights into any pixel’s local dynamics.
Background and foreground layers are classified as fixed and moving zones. Moving zones can include any moving objects or scenario effects classified as dynamic. The background zones are labeled using a probabilistic classifier, which assumes pixel information over time is less likely to overlap and thus represents fixed zones. The instances of overlapping occur only during very brief periods. The texture information from overlapping objects varies considerably. This classifier is designed by treating each pixel position as the center and considering the surrounding information, which includes both spatial and temporal components. The spatial aspect is represented by the intensity values of the pixel’s neighboring matrix, while the temporal component accounts for the number of samples taken over time. Figure 4 illustrates the background model, with the texture distribution in spatial and temporal components used to label foreground/background zones.
Then, a probabilistic classifier for a specific position x should be described as follows:
C η , λ ( x ) = 1 x Γ η , λ ( x ) F ( x ; p ) 0 other   case
In this context, the parameters η and λ represent the time window and the structural element, respectively. The function F ( x ; p ) denotes the probability density function (PDF) of texture distribution. The approximation or calculation and the associated process are defined based on the data or context applied. To simplify the model Γ η , λ ( x ) F ( x ; p ) , rewrite the sentence as M ( x ) = F ( x ; p ) to refer to all indexed models for available position x .
The binary-labeled matrix resulting from motion detection might be pruned with noise, which can be dismissed through the morphological filter: a close morphological filter whenever the moving objects data becomes detected by (9) or open filter, in cases of overdetected motion objects.

2.5. Algorithm Implementation

The algorithm analyzes texture information across temporal and spatial dimensions based on color components. This method utilizes spatial variability to refer to texture information and temporality to describe the occlusion dynamics of a pixel. The algorithm assumes that the texture information for a pixel stabilizes over time, and generally, each pixel does not overlap with others. This characteristic makes the approach suitable for classification when a pixel overlaps or does not. Similarly, the analysis can be generalized to other pixels.
Spatial filtering is subsequently applied to enhance consistency in object detection. The measures for texture information, denoising analysis, and spatial connectivity consistency are developed using morphological operators. This proposal delineates aspects such as noise and texture information as high-frequency components derived from raw data, which correspond to areas of aberration reflecting a global motion trend. Additionally, the existing contrast in the raw data, represented by local minima and maxima, is normalized to depict texture representation.
Characterizing the foreground and background planes employs a probabilistic classifier. This classifier adjusts the probabilistic density function (PDF) under texture values, modeling it as a discrete non-parametric function. This method is intended for use with morphological operators. It allows for complete implementation in integer logic, which facilitates efficient programming for online applications.
The proposal algorithm is based on two steps: (a) background model creation and (b) background operation. The background model creation consists of estimating background model parameters that should be practical to discern from foreground and background. This process involves three basic sub-steps:
(i) Denoising raw data consists of eliminating random high frequencies from raw data, approximating via unitary morphological filtering, and eliminating variations resulting from atypical information. This step is made through a mid-morphological filter over each frame used to make the estimation.
(ii) Estimating texture information, consisting of remarking zones with high contracts information, should correspond to texture information, discarding global behaviors referring to global image affectation and luminance affectation. The texture information becomes invariant to light conditions (whenever they do not saturate the sensor or become lower than sensor sensitivity). This process is performed by approximating morphological transformation, Top and Bot hat, each providing information of maxima/minima zones corresponding to most contrast information zones.
(iii) Updating/adjusting the background model consists of adjusting it using the information from the acquired frame. This process updates the model parameters to make an efficient probabilistic classifier. In practical operation, the complexity of the classifier might affect the algorithm performance and the complexity of updating.
Algorithm 1 summarizes the background model creation.
The background operation analyzes a sequence of frames like the MMBC algorithm. The process involves several steps: first, a current frame is acquired from the video stream. Next, a denoising process is performed using a morphological mid-filter, and posteriorly, a morphological texture analysis is performed.
Algorithm 1 Mathematical Morphological Background Creation (MMBC)
Require: I▹ An stream video source.
M t 1 ▹ Matrix of models for analyzed temporospatial window.
λ n ▹ Structural element for noise eliminating.
λ t ▹ Structural element for texture analysis.
t▹ Training time to estimate initial models.
Ensure:  M t ▹ Matrix of models for analyzed temporospatial window.
M t 1 Models   Matrix ( m × n × 2 η )
f 0
while  f t  do
   I η { I t η T t + η } ▹ Reads temporal window to analyze
   I η , f { M e d ˜ λ n ( I t η ) , , M e d ˜ λ n ( I t + η ) } ▹ De-noise data
   Γ η , λ t { τ λ t ( I t η ) τ λ t ( I t + η ) } ▹ Compute texture information map.
   M t , Γ η , λ t u p d a t e ( M t 1 , Γ η , λ t 1 )
end while
Once both of these steps are completed, the resulting information is matched with the matrix model M to create a motion map I M . Finally, the motion maps are refined by removing additive or subtractive noise related to each pixel’s spatial neighborhood. An enhancement process is applied to address this issue, tailored to specific scenario criteria. This enhancement aims to improve the connection criteria for the detected motion zones.

2.6. Implementation Issues

This section manages the implementation issues of the proposal, making annotations in technical situations to enhance the performance. The implementation of Algorithms 1 and 2 needs the clarification of model M for each pixel. The algorithm requires the pdf characterization and the parameters to model the texture variations and implement a criterion for selecting whenever an intensity pixel sample belongs to observable background texture or foreground information. Commonly, the intensity sensors’ range becomes resolutions of 8 and 10 pixels deep, and the computational resources to fulfill each pixel’s data information become tractable in current hardware. Deepness resolution implies reducing the resolution lost.
An array of counters for each pixel represents the discrete model M . These counters reflect the frequency of the availability of each intensity, reflecting the texture and temporal relation vicinity dependencies. These parameters are defined by λ t and η for spatial and temporal vicinity. The array counters become feasible because they represent a discrete sampling with a fixed resolution. The resolution establishes the buffer length of each model M . The array counters represent the pdf approximation without assuming the parametrized form of the pdf. Consequently, Equation (9) needs a testing function to verify if any intensity sampled at a specific pixel belongs to the pdf distribution. The belongings criterion is defined by the change of concavity of expected value, understanding it as max p ( M ( x ) ) . The change in concavity of the PDF is expressed by a lower limit, Δ l , and an upper limit, Δ r , representing a sparseness criterion used to test whether the data belongs to this distribution. Figure 5 illustrates the process, starting from the empirical probability distribution function (pdf). The global maxima is used as a reference, selecting the positions of the left and right concavity changes, which represent a dispersion criterion.
Figure 6 illustrates the pixel representation and matrix representation of texture information. This cube of counters becomes defined by image dimensions multiplied by 256 in 8 bits case. Once each sampling image is analyzed, the frequencies for each pixel position are updated. This process computationally becomes cheaper against a slight memory consumption to maintain the buffer. Table 1 shows, at different frame sizes, the number of elements expressed in Megas at 8 bits per pixel deep; please note, if you wish to know the true size, we must multiply by the number of bytes represented in each counter.
Algorithm 2 Mathematical Morphological Background Subtraction (MMBS)
Require: I▹ An stream video source.
M t 1 ▹ Matrix of models for analyzed temporo spatial window.
Ψ t 1 ▹ Matrix of parameters of matrix M .
Ensure:  I M ▹ Output Stream of motion objects.
M t ▹ Matrix of models updated.
while h a s   f r a m e ( I ) do▹ Compute while have available frames.
  Create an empty I M
  for  x pixels position do▹ For each pixel position match the individual pixel model.
    I η { I t η T t + η } ▹ Reads temporal window to analyze
    I η , f { M e d ˜ λ n ( I t η ) , , M e d ˜ λ n ( I t + η ) }       ▹ De-noise data
    Γ η , λ t { τ λ t ( I t η ) τ λ t ( I t + η ) } ▹ Computes texture information from current frame.
    I M ( x ) C η , λ ( Γ ( x ) ) ▹ Verify if pixel becomes overlaped.
  end for
   I M ( x ) g e t E n h a n c e d C o n n e c t i o n ( I M ( x ) ) ▹ Improved the motion map by analyzing the surrounding pixels.
   M t , Ψ t u p d a t e ( M t 1 , Ψ t 1 ) ▹ Update the pixels models with current motion information.
end while
The algorithm outlined in Algorithm 3 processes the positions of all pixels. In the current implementation, we set η 0 , which means we are only processing the current frame, while λ t represents the radius of the structural element. These simplifications make it easier to test the proposal. Future work will consider additional structural shapes and different time windows for η .
Algorithm 3 Update Model (Update)
Require:  M t 1 ▹ Matrix array of models for each pixel.
Γ ▹ Texture Information to be learned.
Ensure:  M t ▹ Updated matrix model.
for x pixel position do
   M t ( x ) update probabilities with M t 1 ( x ) and Γ ( x ) .
end for
The algorithm increments by one for each pixel position, with the model M ( x ) functioning as an indexing task. This is expressed as M ( x ) [ Γ ( x ) ] M ( x ) [ Γ ( x ) ] + 1 . It is important to note whether the array indexing starts at position 0 or 1 in the implementation. The continuous updating of the counters ensures a consistent background model, making it robust. This model represents the PDF of local texture information, which remains invariant to global changes. Additionally, the ongoing updates allow for the determination of the dominant probability, reflecting an expected value without relying on a previously parameterized PDF form.
The proposal introduces a criterion to improve the connectivity aspect derived from the motion map. This criterion is based on the computed motion map M : (a) If the motion map shows a majority of fused objects, it indicates a model-over-model motion. In this case, an open filter separates the fused objects. Conversely, (b) a closing filter is implemented if the motion map reveals a majority of separated objects or has gaps. Both scenarios are detailed in the implementation of Algorithm 4, where one of the filters is selected based on the criterion.
Algorithm 4 Enhancement Connectivity (EnhCon)
Require: I▹ Motion Map.
Λ   ▹ Set of parameters and Structural elements parameters for Connectivity process.
Ensure:  I ▹ Enhacement Map.
if Motion Map I is over modeled under Λ ( 1 ) constraint then
   I γ Λ ( 2 ) ( I ) ▹ Over modeled motion map.
else
   I φ Λ ( 2 ) ( I ) ▹ Under modeled motion map.
end if

3. Results

This section introduces an experimental process to test and evaluate the pertinence of the proposal to match with MOG mixture [19] and enhance MOG [37]. The validation becomes focused on MOG family algorithms due to it becoming a generalized family background approach where a probabilistic model becomes approximated through the EM algorithm, even though several modifications, in essence, all methods have the same limitations and advantages: a low-complexity adaptable algorithm to detect motion with a tolerable degree of accuracy.

3.1. Validation Process

The proposal has been tested in two stages: (1) validating the accuracy and precision of the motion detection through previously labeled image sequences; (2) the vehicle analysis from a roundabout to measure the traffic density over time.
(1) Validating the accuracy and precision. The validation process involves using a couple of datasets for surveillance reference videos. The first dataset is PETS [25,38,39], which includes small videos featuring common surveillance scenarios. The second dataset [40] has become a reference point as it contains various situations that are more challenging for background algorithms to characterize. Both datasets include motion ground truth, which is used to evaluate the performance of the algorithms.
The MOG approach [19] and enhanced MOG [37] were used to contrast the results obtained by the proposal. The first one was implemented with the first dataset, and the second implementation was contrasted in the second dataset.
Next, we will conduct a thorough analysis by evaluating the frame-to-frame errors in the motion labeling process. After it is measured, the computed motion maps are compared against the ground truth. This comparison allows us to assess the performance of motion modeling on a frame-by-frame basis and identify which frames did not accurately model motion. The error measures include the confusion matrix measures [40] together with Bias error (Bias), standard deviation of Bias (std Bias), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE) [25].
(2) Vehicle analysis from a roundabout. Finally, an application is introduced to calculate vehicle density throughout the day in roundabout scenarios. The application entails the creation of a probabilistic map indicating denser areas, intended for utilization in urban roadway resource management. This implementation underscores the robustness of the underlying methodology, notwithstanding several variations in luminance and situational factors. The evaluation becomes perceptual to the quality of motion maps computed by the algorithms involved due to the introduction of artifacts or mislabeling of object detection. This test is designed to assess performance in real surveillance outdoors operational scenarios.

3.2. Results and Discussions

(1) Validating the accuracy and precision. The validation process involves the background computation from the previously described pair of datasets.
The first dataset [38] includes three scenarios for testing: Test 1, which focuses on image sequences captured in outdoor conditions, characterized by high compression levels and low quality. Tests 2 and 3 examine indoor conditions, where changes in overall luminance tend to be more stable.
The second dataset [40], consists of ten scenarios divided into two categories: base scenarios (sequences 1 to 4) and dynamic scenarios (sequences 5 to 10). The base scenarios illustrate standard applications for background subtraction techniques, while the dynamic scenarios present more complex situations for detecting moving objects.
The computed results are divided into two stages: confusion matrix measures, which include precision, recall, f1 score, and IoU (intersection over union) [40]. The second error quantifies the model error in computing the motion objects map using Bias error (Bias), standard deviation of Bias (std Bias), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE) [25].
The results for [38] dataset are summarized in Table 2. Upon examining the data, it is evident that the proposed algorithm has greater precision compared to the reference algorithm. Notably, performance is robust in Test 1, representing outdoor scenarios. This is a positive outcome because the proposal leverages texture information rather than relying solely on intensity. In indoor scenarios, the performance becomes closer to the reference algorithm [19].
The results for both methods in terms of recall are quite similar. However, the proposed algorithm generally generates bounding boxes that are slightly larger than the moving objects. This results in a trade-off, slightly penalizing precision and recall, but it has the advantage of fully detecting the moving object. In contrast, the reference algorithm emphasizes object edges and produces moving maps with holes when objects are slightly oversized. In outdoor scenarios, the behavior of MMBS becomes superior to MOG.
Additionally, the proposed method handles shadows and reflections more effectively in indoor environments, with the limitation that structural elements become slightly smaller than shadows and artifacts generated by lights. At the same time, MOG is more sensitive to these factors. The F1 Score and IoU results for both approaches are comparable, with the MMBS approach showing better overall performance in outdoor scenarios.
The model errors are presented graphically in Figure 7 for the three sequences. The results from the outdoor scenarios shown in Figure 7a demonstrate that MMBS becomes more stable, adding less noise while moving. In outdoor scenarios, the MOG approach shows greater accuracy in stable conditions and in the absence of reflections (as indicated by the red peak in Figure 7b). This indicates that while our proposal becomes more stable, the MOG approach performs better in these circumstances, which is reflected in the probability density function (pdf) distribution. The proposal exhibits a slightly higher bias error in the indoor scenario than MOG. This is due to the finer shape of the person walking being smaller than the structural elements used. This limitation becomes one of the practical challenges of MMBS in real situations. However, the pdf distribution for our proposal is more evenly centered around zero, whereas MOG shows a less favorable error distribution centered around a non-zero position. Finally, in the outdoor scenario, RMSE shown in Figure 8a, the motion detection accuracy of MMBS significantly outperforms that of the reference method, demonstrating greater precision and stability. In indoor scenarios, the performance of both methods is similar; however, when shadows and reflections are introduced, the RMSE for MMBS remains nearly zero, while the RMSE for MOG tends to increase.
The second dataset [40] encompasses more extensive scenarios, including dynamic, indoor, and outdoor conditions, providing a more realistic testing environment. In this study, the enhanced MoG approach referenced in [37] was implemented to facilitate comparison with our proposal. The results concerning confusion measures are summarized in Table 3, highlighting the superior precision of our proposal compared to the enhanced MoG.
In baseline scenarios, represented by scenarios 1 to 4, our proposal generally outperformed the baseline, achieving up to a 22 % improvement in the best cases. In the worst-case scenario, our performance was only slightly inferior, approximately 6 % behind the reference. These baseline scenarios predominantly depict ideal conditions in an indoor environment, where our proposal demonstrates better overall efficiency. In scenarios with low resolutions or small moving objects, recall and precision decrease due to the size of the structural elements, which do not align perfectly with the motion zone. This issue is exacerbated in environments without texture, such as indoor settings.
When analyzing dynamic scenarios, particularly the more challenging cases from 5 to 10, while precision does drop significantly for both algorithms, our proposal remains advantageous in every case. The best-performing scenario exhibits a difference of 25 % in favor of our proposal, while the worst case shows a 0.3 % advantage. Overall, the proposal has an increment of 11 % than enhanced MOG. In terms of recall measure, the overall becomes approximately 60 % to MMBS instead of 44 % to enhanced MOG, which indicates better precision to detect motion objects without introducing noise to the motion map.
Regarding error measures, Table 4 shows that the biases in both scenarios are similar, with our proposal demonstrating a slight advantage. This suggests that it consistently adapts well to varying motion conditions. A smaller standard deviation than the reference algorithm further supports this adaptability. The bias distribution for our proposal is centered around 0, indicating general accuracy in modeling motion. In contrast, the reference method exhibits multiple modes in the bias, reflecting its inability to adapt effectively. Furthermore, the MAE and RMSE indicate that our proposal significantly reduces errors in motion detection.
To illustrate, Figure 9 and Figure 10 display the best and worst cases of our proposal regarding RMSE across different tested frames. Figure 9a shows that the Bias error approaches 0 for our proposal in the best scenario, with the Bias distribution centered around 0. The small maxima on the left side of the probability density function represents instances where our proposal struggles to detect motion efficiently. Still, it does so with considerably lower errors than the enhanced MoG. Although the results for the enhanced MoG are similar, the expected value shows less Bias error, while the deviation is more significant. This observation is confirmed by Figure 10a, where the RMSE graph indicates that our proposal generally has lower errors than the enhanced MoG. In the worst-case scenario, as seen in Figure 9a, our proposal categorizes motion objects based on the dynamics of the scenario; however, it still maintains a lower bias compared to the enhanced MoG. The bias distributions for our proposal remain centered at 0, confirming its clear advantage over the reference algorithm. This behavior is validated in Figure 10b, where the proposal consistently exhibits lower RMSE from frame to frame.
(2) Vehicle analysis from a roundabout. To test MMBS approach, we contextualize the scenario: In recent years, Mexico has experienced significant emigration from rural towns to urban areas, rapidly increasing urban population density. This growth necessitates improved resilience planning. One such initiative involves optimizing the use of vital resources. The town has adopted various measures to ensure the quality of these essential resources. One key initiative is replacing traffic signals with roundabouts in suburban areas to facilitate quick vehicle traffic flow. As a result, monitoring roundabouts and intersections where high vehicle density leads to traffic congestion has become imperative.
To conduct our analysis, we focused on the roundabouts in the suburban areas of Querétaro, a city experiencing one of the fastest growth rates in Mexico. Figure 11 illustrates the following: (a) the location of Querétaro, (b) the position of Querétaro City, (c) the structure of the town, highlighting the town center in red and the suburban areas in blue; (d) the suburban areas where roundabouts have replaced traffic lights are shown in purple, while the red circles indicate the locations of the roundabouts examined in this study. As can be seen, the test scenario corresponds to the primary access points to significant avenues, representing some of the most critical traffic resources in the area.
The morphological and MOG approaches have been used for a qualitative comparison. Figure 12 illustrates the morphological approach, showing some frames along time, where the motion map and the texture information are shown, including several samples over time. Similarly, Figure 13 displays various motion maps and background estimations from the contrast algorithm. Both approaches create a probabilistic model based on pixel intensity. However, using direct intensity values has the disadvantage of treating luminance changes as different values or compensating dynamically for the camera’s white balance, which can introduce significant noise into the motion map. In contrast, the morphological approach does not rely on pixel intensity; instead, it focuses on local texture information. By analyzing local information, this method extracts the structures of pixel intensities rather than just the intensity values themselves. This makes it more reliable under varying lighting conditions, effectively reducing motion noise in the motion map. In Figure 12 and Figure 13, in the background layer, the velocity convergence is illustrated. The estimated foreground layer is shown in grayscale. In the case of the proposed method, this reflects the learned information about local texture, while in the case of the reference algorithm, it represents the average intensity. It is evident that the proposed method focuses on the local texture structure, whereas the reference algorithm emphasizes pixel intensity. In Figure 12 and Figure 13, in the motion layer, the morphological approach demonstrates that texture information is effectively learned. This leads to increased robustness against local changes in lighting within the same video frame, resulting in improved classification of foreground and background planes based on texture. In contrast, the MOG approach relies on intensity sensing, which makes it more vulnerable to consistent lighting conditions.
As noted, there are frames where objects appear fused, which is problematic when analyzing vehicles. In this regard, the morphological approach provides a more precise differentiation of vehicles while minimizing motion noise.
In motion detection, a colored map represents motion densities, which helps identify areas with heavy infrastructure use. This application is illustrated in Figure 14, where a superimposed colored map shows vehicle density over time. The data were collected over a half-hour period at 3:00 PM in Mexico City. Vehicle patterns can be observed in this sample, highlighting different density zones. Notably, several areas exhibit varying densities in the lower section of the roundabout. This variation is due to the continuous flow of vehicles in that region.
Finally, the criteria for detecting motion are effective in outdoor environments. The main contribution of this work represents a significant advance in modeling systems because it becomes fully modeled as a discrete task using mathematical morphology, which, in terms of computation, makes the development of discrete algorithms even more efficient than MOG approaches based on continuous spaces. The clear advantages come in the sense that the algorithm is a discrete model that is easy to implement on a computer with relatively low complexity because it has been implemented as an integer algorithm. These criteria facilitate quick convergence and establish a stable background subtraction model. The figure depicts scenarios where a drone is in motion, leading to an inconsistent background. In these cases, the proportion of the foreground suddenly increases, prompting the model to restart. Within a few frames, a consistent model for motion detection is established.

3.3. Further Works

This paper presents a morphological background approach. However, some aspects are reserved for future work. The following points are considered for further publications:
  • An analysis of sensitivity regarding the use of various structural elements to determine which properties are most suitable for different scenarios.
  • An examination of which color component from different color spaces is most effective for detecting the motion map based on specific scenario conditions.
  • An analysis of improved algorithm performance by implementing a parallelized version, aiming to increase the frame-per-second processing rate.
  • Analyze different temporal patterns of vehicle traffic to detect the most common flow patterns, which will help determine better criteria for utilizing avenue resources.

4. Conclusions

In this document, we describe the segmentation of complex scenes and label the foreground (static objects) and background (moving objects) based on the number of external variables involved. We introduce a Morphological Mathematical Background Subtraction (MMBS) method. This approach utilizes various mathematical morphology properties to model texture information from an image sequence as a discrete problem. The morphological operators enable the development of an algorithm well-suited for outdoor environments, making it applicable for smart city initiatives. We compare this approach’s robustness, accuracy, and precision to a well-established method and a ground truth dataset. The results demonstrate the feasibility of operating in dynamic scenarios, which will be crucial for future efforts in segmenting the dynamics of moving objects.

Author Contributions

Conceptualization, E.-J.M.-M. and H.J.-H.; formal analysis, D.C.-E., H.J.-H. and A.-M.H.-N.; investigation, E.-J.M.-M. and H.J.-H.; methodology, D.C.-E., H.J.-H. and A.-M.H.-N.; software, E.-J.M.-M. and H.J.-H.; supervision, H.J.-H.; validation, E.-J.M.-M. and A.-M.H.-N.; writing—original draft, E.-J.M.-M., D.C.-E., H.J.-H. and A.-M.H.-N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

We wish to thank the CIICCTE (Centro de Investigación e Innovación en Ciencias de la Computación y Tecnología Educativa) laboratory belonging to the FIF-UAQ, which provided technical and infrastructure support.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. United Nations Department of Economic and Social Affairs (UN DESA). Urbanization and Development: An Overview; United Nations: New York, NY, USA, 2022. [Google Scholar]
  2. United Nations Department of Economic and Social Affairs (UN DESA). 68% of the World Population Projected to Live in Urban Areas by 2050, Says UN; United Nations: New York, NY, USA, 2018. [Google Scholar]
  3. United Nations Development Programme (UNDP). Sustainable Urbanization Strategy; UNDP: New York, NY, USA, 2021. [Google Scholar]
  4. Pumain, D. Les Villes Sont des Objets Complexes; Le Monde: Paris, France, 2024. [Google Scholar]
  5. United Nations Human Settlements Programme (UN-Habitat). Urbanization and Its Challenges in the 21st Century; PMC: Nairobi, Kenya, 2023. [Google Scholar]
  6. Ritchie, H.; Roser, M. Urbanization. Our World in Data. 2018. Available online: https://ourworldindata.org/urbanization (accessed on 25 January 2025).
  7. Nerella, S.; Guan, Z.; Siegel, S.; Zhang, J.; Khezeli, K.; Bihorac, A.; Rashidi, P. AI-Enhanced Intensive Care Unit: Revolutionizing Patient Care with Pervasive Sensing. arXiv 2023, arXiv:2303.06252. [Google Scholar]
  8. Bahri, F.; Ray, N. Dynamic Background Subtraction by Generative Neural Networks. arXiv 2022, arXiv:2202.05336. [Google Scholar]
  9. Mandal, V.; Mussah, A.R.; Jin, P.; Adu-Gyamfi, Y. Artificial Intelligence Enabled Traffic Monitoring System. arXiv 2020, arXiv:2010.01217. [Google Scholar] [CrossRef]
  10. Giraldo, J.H.; Bouwmans, T. GraphBGS: Background Subtraction via Recovery of Graph Signals. arXiv 2020, arXiv:2001.06404. [Google Scholar]
  11. Piccardi, M. Background subtraction techniques: A review. In Proceedings of the 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583), The Hague, The Netherlands, 10–13 October 2004; Volume 4, pp. 3099–3104. [Google Scholar] [CrossRef]
  12. Sheikh, Y.; Javed, O.; Kanade, T. Background Subtraction for Freely Moving Cameras. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 1219–1225. [Google Scholar] [CrossRef]
  13. Bouwmans, T.; Baf, F.E.; Vachon, B. Background Modeling using Mixture of Gaussians for Foreground Detection: A Survey. Recent Patents Comput. Sci. 2014, 1, 219–237. [Google Scholar] [CrossRef]
  14. Sakkos, D.; Shum, H.P.H.; Ho, E.S.L. Illumination-Based Data Augmentation for Robust Background Subtraction. arXiv 2019, arXiv:1910.08470. [Google Scholar]
  15. Pilet, J.; Lepetit, V.; Fua, P. Making Background Subtraction Robust to Sudden Illumination Changes. In Proceedings of the Computer Vision–ECCV 2008, Marseille, France, 12–18 October 2008; pp. 57–70. [Google Scholar] [CrossRef]
  16. Lin, L.; Xu, Y.; Liang, X.; Lai, J. Complex Background Subtraction by Pursuing Dynamic Spatio-Temporal Models. arXiv 2015, arXiv:1502.00344. [Google Scholar]
  17. Bouwmans, T.; Javed, S.; Sultana, M.; Jung, S.K. Deep Neural Network Concepts for Background Subtraction: A Systematic Review and Comparative Evaluation. arXiv 2018, arXiv:1811.05255. [Google Scholar]
  18. Wren, C.R.; Azarbayejani, A.; Darrell, T.; Pentland, A.P. Pfinder: Real-time tracking of the human body. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19, 780–785. [Google Scholar] [CrossRef]
  19. Stauffer, C.; Grimson, W. Adaptive background mixture models for real-time tracking. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Fort Collins, CO, USA, 23–25 June 1999; Volume 2, pp. 246–252. [Google Scholar]
  20. Zivkovic, Z.; van der Heijden, F. Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognit. Lett. 2006, 27, 773–780. [Google Scholar] [CrossRef]
  21. Tezcan, M.O.; Ishwar, P.; Konrad, J. BSUV-Net 2.0: Spatio-Temporal Data Augmentations for Video-Agnostic Supervised Background Subtraction. arXiv 2021, arXiv:2101.09585. [Google Scholar] [CrossRef]
  22. Barnich, O.; Van Droogenbroeck, M. ViBe: A universal background subtraction algorithm for video sequences. IEEE Trans. Image Process. 2011, 20, 1709–1724. [Google Scholar] [CrossRef]
  23. Bouwmans, T.; Javed, S.; Sultana, M.; Jung, S.K. Deep learning for background subtraction: A comprehensive review and comparative evaluation. Neurocomputing 2020, 388, 77–105. [Google Scholar]
  24. Braham, M.; Van Droogenbroeck, M. Deep semantic background subtraction. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 2335–2339. [Google Scholar]
  25. Jiménez-Hernández, H. Background Subtraction Approach Based on Independent Component Analysis. Sensors 2010, 10, 6092–6114. [Google Scholar] [CrossRef]
  26. Houhou, I.; Zitouni, A.; Ruichek, Y.; Benslimane, D. RGBD deep multi-scale network for background subtraction. Int. J. Multimed. Inf. Retr. 2022, 11, 395–407. [Google Scholar] [CrossRef]
  27. Cioppa, A.; Van Droogenbroeck, M.; Braham, M. Real-Time Semantic Background Subtraction. arXiv 2020, arXiv:2002.04993. [Google Scholar]
  28. Machado, P.; Oikonomou, A.; Ferreira, J.F.; Mcginnity, T.M. HSMD: An Object Motion Detection Algorithm Using a Hybrid Spiking Neural Network Architecture. IEEE Access 2021, 9, 125258–125268. [Google Scholar] [CrossRef]
  29. Song, S.; Kim, J. SFMOG: Super Fast MOG based Background Subtraction Algorithm. J. IKEEE 2019, 23, 1415–1422. [Google Scholar]
  30. Liu, J.; Chen, Y. Efficient parallel implementation of Gaussian Mixture Model background subtraction algorithm on an embedded multi-core Digital Signal Processor. Comput. Electr. Eng. 2023, 105, 108827. [Google Scholar] [CrossRef]
  31. Martins, I.; Carvalho, P.; Corte-Real, L.; Alba-Castro, J. Texture collinearity foreground segmentation for night videos. Comput. Vis. Image Underst. 2020, 23, 1415–1422. [Google Scholar] [CrossRef]
  32. Sun, L. Global vision object detection using an improved Gaussian Mixture model based on contour. PeerJ Comput. Sci. 2024, 10, e1812. [Google Scholar] [CrossRef] [PubMed]
  33. Xu, J.; Li, J.; Liu, X. Background removal using Gaussian mixture model for optical camera communications. Opt. Lasers Eng. 2025, 168, 107211. [Google Scholar] [CrossRef]
  34. Maddalena, L.; Petrosino, A. Background Subtraction for Moving Object Detection in Video. ACM Comput. Surv. 2018, 51, 1–45. [Google Scholar]
  35. An, Y.; Zhao, X.; Yu, T.; Guo, H.; Zhao, C.; Tang, M.; Wang, J. ZBS: Zero-shot Background Subtraction via Instance-level Background Modeling and Foreground Selection. arXiv 2023, arXiv:2303.14679. [Google Scholar]
  36. Herrera-Navarro, A.; Terol-Villalobos, I.; Jiménez-Hernández, H.; Peregrina-Barreto, H.; González Barbosa, J. Intracellular calcium variation analysis of follicular cells. Rev. Mex. Ing. BioméDica 2013, 34, 71–87. [Google Scholar]
  37. Trejo-Morales, A.; Córdova-Esparza, D.; Rosas-Raya, C.; Herrera-Navarro, A.; Jiménez-Hernández, H. Motion detection using MoG: A parametric analysis. Rev. Int. Investig. InnovacióN TecnolóGica (RIIIT) 2022, 56, 9. [Google Scholar]
  38. Ferryman, J.M. Performance Evaluation of Tracking and Surveillance (PETS), Proceedings of the 2nd IEEE International Workshop on Performance Evaluation of Tracking and Surveillance, Kauai, HI, USA, 9 December 2001; IEEE: Piscataway, NJ, USA, 2001. [Google Scholar]
  39. Thirde, D.; Ferryman, J.; Crowley, J. Performance Evaluation of Tracking and Surveillance (PETS), Proceedings of the 2nd IEEE International Workshop on Performance Evaluation of Tracking and Surveillance, New York, NY, USA, 18 June 2006; IEEE: Piscataway, NJ, USA, 2006. [Google Scholar]
  40. Wang, Y.; Jodoin, P.M.; Porikli, F.; Konrad, J.; Benezeth, Y.; Ishwar, P. CDnet 2014: An Expanded Change Detection Benchmark Dataset. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA, 23–28 June 2014; CVPRW ’14. pp. 393–400. [Google Scholar] [CrossRef]
Figure 1. Figure illustrates the image representation as a topological grid.
Figure 1. Figure illustrates the image representation as a topological grid.
Modelling 06 00038 g001
Figure 2. Noise reduction: (a) before applying filter and (b) after morphological median filter.
Figure 2. Noise reduction: (a) before applying filter and (b) after morphological median filter.
Modelling 06 00038 g002
Figure 3. Morphological texture process.
Figure 3. Morphological texture process.
Modelling 06 00038 g003
Figure 4. Schematic diagram illustrating the modeling background approach, where the highlighted pixel box indicates the reference position, while the red line represents a distribution approximation.
Figure 4. Schematic diagram illustrating the modeling background approach, where the highlighted pixel box indicates the reference position, while the red line represents a distribution approximation.
Modelling 06 00038 g004
Figure 5. Expected value and concavity change criterion.
Figure 5. Expected value and concavity change criterion.
Modelling 06 00038 g005
Figure 6. Matrix counter representation of pdf.
Figure 6. Matrix counter representation of pdf.
Modelling 06 00038 g006
Figure 7. Bias error from PETS dataset [38] between MMBS and MoG; the graphs on the right side represent the error bias distribution, which can be observed for both approaches, where a zero-centered symmetrical distribution indicates a better performance for motion detection, where (a) denotes the outdoors scenario; (b,c) represents indoors scenes, with artificial ligth conditions.
Figure 7. Bias error from PETS dataset [38] between MMBS and MoG; the graphs on the right side represent the error bias distribution, which can be observed for both approaches, where a zero-centered symmetrical distribution indicates a better performance for motion detection, where (a) denotes the outdoors scenario; (b,c) represents indoors scenes, with artificial ligth conditions.
Modelling 06 00038 g007
Figure 8. RMSE error from PETS dataset [38] between MMBS and MoG, in this context, smaller values indicate greater accuracy. In (a), the outdoor conditions cause MOG to be less accurate than the proposal, and in (b,c), indoor conditions, both approaches are similar in terms of performance.
Figure 8. RMSE error from PETS dataset [38] between MMBS and MoG, in this context, smaller values indicate greater accuracy. In (a), the outdoor conditions cause MOG to be less accurate than the proposal, and in (b,c), indoor conditions, both approaches are similar in terms of performance.
Modelling 06 00038 g008
Figure 9. Bias error computed from CDnet 2014 dataset [40] between MMBS and enhanced MoG approaches, where (a) represents the scene with better performance for motion detection, instead of (b) representing the worst scene motion detection performance.
Figure 9. Bias error computed from CDnet 2014 dataset [40] between MMBS and enhanced MoG approaches, where (a) represents the scene with better performance for motion detection, instead of (b) representing the worst scene motion detection performance.
Modelling 06 00038 g009
Figure 10. RMSE error computed from CDnet 2014 dataset [40] using MMBS and enhanced MoG approaches, where (a) represents the scene with better performance for motion detection, instead of (b) representing the worst scene motion detection performance.
Figure 10. RMSE error computed from CDnet 2014 dataset [40] using MMBS and enhanced MoG approaches, where (a) represents the scene with better performance for motion detection, instead of (b) representing the worst scene motion detection performance.
Modelling 06 00038 g010
Figure 11. Real testing scenario to analyze traffic flow where (a) the state location, (b) the city location, (c) the delegation location, and (d) the roundabout location.
Figure 11. Real testing scenario to analyze traffic flow where (a) the state location, (b) the city location, (c) the delegation location, and (d) the roundabout location.
Modelling 06 00038 g011
Figure 12. Frames taken from MMBS approach in the test scenario.
Figure 12. Frames taken from MMBS approach in the test scenario.
Modelling 06 00038 g012
Figure 13. Frames computed from MoG approach [19] in the test scenario.
Figure 13. Frames computed from MoG approach [19] in the test scenario.
Modelling 06 00038 g013
Figure 14. Vehicle density on a roundabout.
Figure 14. Vehicle density on a roundabout.
Modelling 06 00038 g014
Table 1. Table of buffer size at different image resolution.
Table 1. Table of buffer size at different image resolution.
ResolutionHeightWidth8 bits 1 10 bits 1 12 bits 1
360 p640360 56.3 225.0 900.0
480 p854480 100.1 400.3 1601.3
720 p1280720 225.0 900.0 3600.0
1080 p19201080 506.3 2025.0 8100.0
1440 p25601440 900.0 3600.0 14,400.0
4 K38402160 2025.0 8100.0 32,400.0
1 Measured in Mega counters.
Table 2. Confusion measures computed with (a) MMBS approach and (b) MOG approach from [38] dataset.
Table 2. Confusion measures computed with (a) MMBS approach and (b) MOG approach from [38] dataset.
(a)
NoScenarioPrecisionRecallF1 ScoreIoU
1Test 1 0.38966   0.75231   0.51341 0.34536
2Test 2 0.51876 0.57041 0.54336   0.37302  
3Test 3 0.53064   0.49542   0.51242   0.34447  
(b)
NoScenarioPrecisionRecallF1 ScoreIoU
1Test 1 0.02790   0.54002 0.05306   0.02725  
2Test 2 0.56424 0.77703   0.65376   0.48562  
3Test 3 0.57203   0.41362   0.48009 0.31587
Note: maximum value; minimum value.
Table 3. Statistical measures computed with (a) MMBS approach and (b) [37] approach from [40] dataset.
Table 3. Statistical measures computed with (a) MMBS approach and (b) [37] approach from [40] dataset.
(a)
NoScenarioPrecisionRecallF1 ScoreIoU
1HighWay 0.46359 0.75820 0.57537   0.40388  
2Office 0.54534   0.44720 0.49142 0.32575
3Pedestrian 0.41343 0.87766   0.56208 0.39090
4Pets2006 0.32404 0.42469   0.36760 0.22519
5Boats 0.10741 0.60489 0.18242 0.10036
6Canoe 0.29241 0.52898 0.37663 0.23200
7Fall 0.10654 0.43008 0.17078 0.09336
8Fountain 1 0.017934 0.65089 0.03491   0.01776  
9Fountain 2 0.17645   0.59594 0.27228 0.15759
10Overpass 0.23281 0.57132 0.33082 0.19819
(b)
NoScenarioPrecisionRecallF1 ScoreIoU
1HighWay 0.26669 0.33107 0.29541 0.17330
2Office 0.31667 0.12260   0.17676 0.09695
3Pedestrian 0.47295   0.88376   0.61616   0.44525  
4Pets2006 0.32034 0.55064 0.40504 0.25395
5Boats 0.00901 0.25911 0.01741 0.00878
6Canoe 0.03778 0.24702 0.06553 0.03388
7Fall 0.07405 0.48349 0.12843 0.06862
8Fountain 1 0.00869   0.66048 0.01715   0.00865  
9Fountain 2 0.02033 0.60599 0.03933 0.02006
10Overpass 0.06016 0.31101 0.10081 0.05308
Note: maximum value; minimum value.
Table 4. Statistical measures computed with (a) MMBS approach and (b) [37] approach from [40] dataset.
Table 4. Statistical measures computed with (a) MMBS approach and (b) [37] approach from [40] dataset.
(a)
NoScenarioBiasstd BiasMAERMSE
1HighWay 0.00735 0.03708 0.02640 0.00142
2Office 0.02496   0.01988 0.02498 0.00101
3Pedestrian 0.00224 0.00352   0.00263   0.000017  
4Pets2006 0.00208 0.01633 0.01222 0.00027
5Boats 0.00988 0.00982 0.01197 0.00019
6Canoe 0.00305 0.01933 0.01460 0.00038
7Fall 0.03854   0.04751   0.04691   0.00374  
8Fountain 1 0.01729 0.00795 0.01729 0.00036
9Fountain 2 0.00174 0.00411 0.00331 0.000019
10Overpass 0.00867 0.02139 0.01578 0.00053
(b)
NoScenarioBiasstd BiasMAERMSE
1HighWay 0.00850 0.05667   0.04043 0.00328
2Office 0.04560   0.05105 0.05356 0.00468
3Pedestrian 0.00313 0.00453   0.00350   0.00003  
4Pets2006 0.00271 0.01756 0.01121 0.00031
5Boats 0.06837 0.01524 0.06837 0.00490
6Canoe 0.05799 0.04239 0.06454 0.00515
7Fall 0.07183   0.04861 0.07880   0.00752  
8Fountain 1 0.03902 0.01455 0.03902 0.00173
9Fountain 2 0.04031 0.01619 0.04031 0.00188
10Overpass 0.03232 0.02950 0.03964 0.00191
Note: maximum value; minimum value.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Moreno-Mejía, E.-J.; Canton-Enriquez, D.; Herrera-Navarro, A.-M.; Jiménez-Hernández, H. Morphological Background-Subtraction Modeling for Analyzing Traffic Flow. Modelling 2025, 6, 38. https://doi.org/10.3390/modelling6020038

AMA Style

Moreno-Mejía E-J, Canton-Enriquez D, Herrera-Navarro A-M, Jiménez-Hernández H. Morphological Background-Subtraction Modeling for Analyzing Traffic Flow. Modelling. 2025; 6(2):38. https://doi.org/10.3390/modelling6020038

Chicago/Turabian Style

Moreno-Mejía, Erik-Josué, Daniel Canton-Enriquez, Ana-Marcela Herrera-Navarro, and Hugo Jiménez-Hernández. 2025. "Morphological Background-Subtraction Modeling for Analyzing Traffic Flow" Modelling 6, no. 2: 38. https://doi.org/10.3390/modelling6020038

APA Style

Moreno-Mejía, E.-J., Canton-Enriquez, D., Herrera-Navarro, A.-M., & Jiménez-Hernández, H. (2025). Morphological Background-Subtraction Modeling for Analyzing Traffic Flow. Modelling, 6(2), 38. https://doi.org/10.3390/modelling6020038

Article Metrics

Back to TopTop