To improve the feature extraction of vehicle contours and reduce the interference of complex backgrounds on vehicle targets, a combination of the FFD-ViBe algorithm and MD-SILBP operator is used to extract vehicle image contours. This approach enables better feature extraction and analysis of the vehicle images.
3.2. Improvement of FFD-ViBe Algorithm
The frame difference method and the ViBe algorithm are commonly used for motion target detection, but they have limitations in extracting complete vehicle outlines and eliminating ghost shadows. To address these issues, the FFD-ViBe algorithm with a hash value is proposed. This algorithm combines frame difference and ViBe, using a hash value to fill the voids in the vehicle foreground and effectively eliminate ghosting. It has shown improved performance in vehicle target detection compared to the individual methods.
The perceptual hash algorithm [
24] is an image search algorithm that generates a hash value for an image and finds similar images based on this value.
First, we select
as the standard frame and calculate its hash value. We set the threshold value T based on the hash value of the image and the Hamming distance between them frame by frame and set the corresponding
frame as the standard frame when the Hamming distance is greater than the threshold value. Then, the operation is similar to finding the
,
, and
frames. Differentiate the five frames two by two:
To address the issue of uneven illumination in images, this paper proposes an image binarization algorithm that utilizes the concept of image chunking. Specifically, the algorithm partitions the image into smaller regions and determines the optimal threshold for each region using the Otsu method. and denote the optimal threshold and resulting binary image, respectively. This approach can effectively mitigate the effects of uneven illumination.
is the operation of the first difference image and the third difference image, and
is the operation of the second difference image and the fourth difference image. Finally, the obtained
and
are combined to obtain
.
The morphology of image
is processed to obtain relatively complete background pixels. Then, the motion region in image
is filled with the corresponding background pixels in the other four frames, and the unfilled motion region is filled using its eight neighboring pixels to obtain the filled background image. Finally, the background sample library can be built using the same modeling method as used in the original algorithm for the background image. The resulting background model is shown in
Figure 8. Data taken from CDnet2014.
3.3. MD-SILBP Shadow Removal Algorithm
Shadows in images are caused by objects blocking direct sunlight, resulting in a slight drop in brightness relative to the object. Since video is usually a sequence of RGB images, the high correlation and redundancy between the R, G, and B components make it challenging to eliminate shadows from the foreground image. To overcome this, the image is converted to the YCbCr color space, which can effectively reduce shadow interference.
where
denotes the pixel luminance,
denotes the blue chromaticity component of the pixel, and
denotes the red chromaticity component of the pixel. In YCbCr color space, the change in luminance does not cause a change in the values of the two chromaticity components. Using the relationship between the three components of
,
, and
, the real foreground can be roughly distinguished from the shadows so that the effect of shadows can be eliminated. If the pixel satisfies the following formula, it will be judged as the shaded part.
In the formula: , , and are the three chromaticities of the current pixel in the YCbCr color space; , and are the three chromaticities of the background corresponding to the pixel in the YCbCr color space. is the luminance threshold, is the blue component chrominance threshold, and is the red component chrominance threshold. When all three component chromaticities satisfy Equation (6), the shadow part can be roughly determined.
The identified shadow regions are subsequently processed using the local binary pattern (LBP) operator, which describes local texture features. However, LBP is susceptible to image noise. To address this issue, the scale-invariant LBP (SILBP) operator is utilized, which is robust against scale changes but still insensitive to noise. Furthermore, the MD-SILBP operator is applied to handle both scale transformation and image noise problems. The operator arranges the central pixel with its neighbors in ascending order, finds the median, and compares it with the remaining pixel values to produce the encoding result.
The three operators, LBP, LTP, and SILTP, are susceptible to random noise and equal-scale variation interference, leading to different coding results. The MD-SILTP coding method [
25] can effectively mitigate these two interferences, indicating that the operator is robust to image noise and scale variation.
is the pixel value of the central pixel in the pixel region and
is the pixel value of the central pixel neighborhood in the pixel region. The following equation calculates the MD-SILBP operator.
where
is the threshold function of the scale factor
,
is the median of all pixels in each pixel block,
is the splicing operator of the binary string, and
is the number of neighborhood pixels in each sub-block.
The threshold function is expressed as:
The texture histogram
is constructed by computing the MD-SILBP operator for each 8-neighborhood process with the probability of the occurrence of the whole image operator. The MD-SILBP operator trains the image background frames, and the texture features are used for background MD-SILBP modeling. The distance between the current frame texture histogram
and the background texture histogram
in the
candidate region is as follows:
In the above equation,
is the number of bins in the candidate region. Then, the relationship between the two is further expressed by calculating the texture correlation between the candidate region in the current frame and the image background frame.
In the above formula,
represents the texture correlation of the candidate region between the current frame and the image background frame;
is the number of pixels in the candidate region;
is the unit step function; and
is the threshold. When the value of
is less than the value of
, the value of
is 1, indicating that the smaller the correlation between the current frame and the background frame, it is more likely to be judged as a shadow. Conversely, when the value of
is 0, it means that the more significant the correlation between the current frame and the background frame, it is more likely to be judged as the background, and the effect of the improved moving vehicle target extraction algorithm is shown in
Figure 9. Ability to remove shadow effects on the foreground of moving vehicles is shown below.