Fractional B-Spline Wavelets and U-Net Architecture for Robust and Reliable Vehicle Detection in Snowy Conditions

This paper addresses the critical need for advanced real-time vehicle detection methodologies in Vehicle Intelligence Systems (VIS), especially in the context of using Unmanned Aerial Vehicles (UAVs) for data acquisition in severe weather conditions, such as heavy snowfall typical of the Nordic region. Traditional vehicle detection techniques, which often rely on custom-engineered features and deterministic algorithms, fall short in adapting to diverse environmental challenges, leading to a demand for more precise and sophisticated methods. The limitations of current architectures, particularly when deployed in real-time on edge devices with restricted computational capabilities, are highlighted as significant hurdles in the development of efficient vehicle detection systems. To bridge this gap, our research focuses on the formulation of an innovative approach that combines the fractional B-spline wavelet transform with a tailored U-Net architecture, operational on a Raspberry Pi 4. This method aims to enhance vehicle detection and localization by leveraging the unique attributes of the NVD dataset, which comprises drone-captured imagery under the harsh winter conditions of northern Sweden. The dataset, featuring 8450 annotated frames with 26,313 vehicles, serves as the foundation for evaluating the proposed technique. The comparative analysis of the proposed method against state-of-the-art detectors, such as YOLO and Faster RCNN, in both accuracy and efficiency on constrained devices, emphasizes the capability of our method to balance the trade-off between speed and accuracy, thereby broadening its utility across various domains.


Introduction
Vehicle detection stands as a crucial element within most Vehicle Intelligence Systems (VIS), which are typically employed to ensure safety, optimize traffic flow, and enable autonomous driving.Traditional approaches to vehicle detection often relied on Customengineered features and rule-based algorithms, resulting in a constrained capacity to adjust to different environmental scenarios.Furthermore, with the escalation in the complexity of real-life situations, there emerged a clear need for detection methodologies that were both more precise and advanced [1].While complicated architectures may enhance detection precision, they introduce additional hurdles, particularly in the context of real-time applications operating on devices with limited capabilities.In many critical transportation solutions where drones are used for data acquisition and processing, a range of difficulties arises.Among these, issues concerning images captured by drones are notable, including oblique angles, non-uniform illumination, degradation, blurring, occlusion, and reduced visibility [2].Concurrently, the necessity for on-the-fly processing as the drone captures data imposes constraints on the available computational resources [3].The lack of sufficient data and research on vehicle detection using drones in snowy conditions is the major drive of this work [4].Many researchers have concentrated on improving the detection of various objects, such as lanes and traffic lights, in different environments.However, it remains uncertain how these advancements will perform in the context of our specific vehicle detection challenges [5,6].The goal is to shed light on vehicle detection challenges in snowy conditions using drones and offer valuable insights to improve the accuracy and reliability of object detection systems in adverse weather scenarios.Furthermore, this investigation seeks to assess the effectiveness of the proposed methodologies on edge computing devices and to conduct comparative analyses of both the performance and accuracy against state-of-the-art detection frameworks such as YOLO and Faster RCNN.At the inception of this study, a thorough review of the recent research was conducted, focusing on research that addresses vehicle detection on edge devices under severe weather conditions.This search aimed to establish a solid foundation for the current work by identifying gaps in the existing body of knowledge and confirming the necessity of further exploration in this area.The investigation revealed a significant lack of studies that specifically tackle the challenges associated with real-time vehicle detection using edge computing in adverse weather scenarios, such as heavy snow.This finding not only underscored the relevance and urgency of the present research but also highlighted the potential for contributing novel insights and methodologies to the field of intelligent transportation systems and computer vision, particularly in enhancing the robustness and efficiency of vehicle detection technologies in less-than-ideal environmental conditions.Table 1 summarizes the findings from our inquiry into the previously implemented models by researchers, the specific customizations applied, the specifications of the edge device employed, and the weather conditions under which the system was tested.These above-mentioned studies collectively push the boundaries of object detection, providing tailored solutions to meet distinct needs within various application domains.However, these advancements are not without their challenges.Common obstacles across these studies include navigating the trade-off between detection speed and accuracy, ensuring consistent performance across different environmental conditions, and managing the computational demands of sophisticated models without sacrificing their effectiveness.Such challenges underscore the inherent complexities in object detection and the continuous need for innovative solutions and optimizations [16].

Proposed Method
The advancement of vehicle detection methods, especially with UAV images, highlights a specific instance where deep learning has played significant improvements.Nevertheless, the complexity and diversity of features in aerial images captured in severe weather conditions demand further enhancements.In response to this requirement, the proposed study proposes a technique that utilizes the fractional B-spline wavelet transform together with a customized U-Net architecture, implemented on a Raspberry Pi 4 Model B. This strategy is specifically designed to enhance vehicle detection and localization, with an emphasis on evaluating its efficacy using the NVD dataset.The NVD dataset [4], accessible at https://nvd.ltu-ai.dev/(accessed on 12 June 2024), and a full explanation of the data extraction process can be found at [4].NVD was compiled amidst the harsh snowy winter conditions of northern Sweden.It encompasses a collection of images taken from heights varying between 120 and 250 m, including 8450 frames that have been annotated to highlight 26,313 vehicles, alongside approximately 10 h of video content that awaits annotation.This dataset is characterized by its variation in video resolutions and frame rates, as well as differences in Ground Sample Distance (GSD) measurements, providing a comprehensive portrayal of vehicles under the demanding winter weather typical of the Nordic region.

Fractional B-Spline Wavelet Transform
A two-dimensional fractional B-spline wavelet transform was applied to extract relevant features from aerial images.The Fractional B-spline wavelet Transform is an advanced mathematical tool that extends the traditional B-spline wavelet transform by incorporating the concept of fractional calculus.Fractional calculus allows for operations that can be applied at any real or complex order, providing more flexible and precise analysis of data, especially when dealing with complex patterns or irregularities.This extension allows for a more flexible manipulation of the wavelet functions, enabling the extraction of features with varying degrees of smoothness and detail from an image.The "fractional" aspect refers to the use of non-integer derivatives, which provide a richer set of parameters to adjust the wavelet functions, thereby offering more control over the feature extraction process.The fractional nature allows for better detection of edges and boundaries in snowy conditions where the edges of vehicles can become blurred or indistinct.This transform also can analyze images at multiple scales, this is particularly useful for drone imagery, where vehicles can appear at various sizes and orientations.The implementation involves applying wavelet filters separately to vertical and horizontal dimensions.Usually, the wavelet transform involves decomposing a signal into shifted and scaled versions of a base wavelet function.In the fractional domain, this is extended by employing fractional B-spline functions as the base wavelets.The transform coefficients at a scale α and shift b for a signal f (t) using a fractional wavelet Ψ α (t) (derived from the fractional B-spline functions) can be defined as in Equation ( 1) where ).The resulting High-Low (HL) and Low-High (LH) channels were found to contain valuable information for car detection as shown in Figure 1.

Integration with U-Net
The fractional B-spline wavelet transform was implemented to utilize the provided two-dimensional fractional spline wavelet transform.The transform was applied to each channel of the input image, producing 4 different channels.Only LH and HL channels were then resized and concatenated with the output of the first convolutional layer before being input into the second convolutional layer of the U-Net architecture.The U-Net architecture, named CarLocalizationCNN, was employed for its effectiveness in semantic segmentation tasks.Notably, we modified the U-Net by incorporating the transformed channels into its second convolutional layer.The first layer output channels and the fractional B-spline transformed channels were concatenated and used as input to the second convolutional layer of the network.This approach aims to enhance the network's ability to discern subtle features related to car presence.Skip connections, present in the original U-Net structure, were intentionally omitted to streamline the architecture for heatmap generation without accurate car boundary delineation.The output of the CarLocalizationCNN model is designed as a heatmap.The final layer of the network produces a heatmap highlighting potential car locations in the input image.This heatmap serves as a valuable tool for visualizing and interpreting the network's car detection predictions.The generation of these heatmaps is done through the application of a Gaussian elliptical function.The Gaussian function has been selected to facilitate a gradual increase in pixel intensity (a proxy for the likelihood of the presence of a vehicle) as one moves toward the central region of the car.This feature ensures smoothness for the gradient descent during the testing process [17].Considering the rectangular shape of cars, we opted for an elliptical function, which entails setting one dimension of the Gaussian function with a higher sigma value compared to the other.This approach allows us to better represent the vehicles' shape in the analysis.Furthermore, we rotated this elliptical Gaussian function using values derived from the original annotations to ensure an optimal fit to the vehicle's orientation as shown in both below figures (Figures 2 and 3).The CarLocalizationCNN model consists of convolutional and up-sampling layers.The convolutional layers capture hierarchical features, while the up-sampling layers restore spatial information as shown in Figure 4.The integration of the fractional B-spline transformed channels into the network is handled seamlessly within the architecture, culminating in a heatmap highlighting potential car locations.

Dataset
Models have been trained and assessed using frames taken from videos that vary in several aspects, including height, snow coverage, cloud coverage, and Ground Sample Distance (GSD) pixel dimensions.The video information and samples of the extracted frame are shown in Tables 2 and 3, and Figure 5.

Testing and Evaluation
The outcomes of the proposed model compared against three detectors commonly utilized in are widely used in both academic research and industrial applications.

Evaluation Metrics and Benchmarking
The main evaluation metrics that the models will be compared with are mean Average Precision (mAP) and Inference Time.Since the output of the proposed model is a heatmap, the heatmap was converted to a reflective bounding box by thresholding and grouping the heatmap pixels to be represented as a bounding box in Figure 6.After we have extracted these bounding boxes representing car predictions, we need to compare them with actual annotated bounding boxes.For this comparison, we employ an adapted version of the Intersection over Union (IoU) metric, which essentially calculates the proportion of the overlapping area between two bounding boxes relative to their combined area, ensuring the overlapped section is accounted for only once (Union).

•
When a pair of bounding boxes (one from the predictions and one from the ground truth) achieves an IoU exceeding a predefined threshold, we classify the prediction as accurate, or a True Positive.• Should a predicted bounding box fail to meet the IoU threshold with any ground-truth bounding boxes, we categorize the prediction as a False Positive.• Conversely, if a ground-truth bounding box does not reach the IoU threshold with any predicted bounding boxes, we label the prediction as a False Negative.

Experimental Results
The training of the mentioned models was operated using a PC specified in Table 4 with a training/validation loss of the proposed model during the training as shown in Figure 7.Meanwhile, the evaluation was conducted on Raspberry Pi 4 Model B.

Part Specification
Processor Intel i9-9900K @ 3.6 GHz RAM 64 GB (3600 MHz) DDR4 CL16 Graphic Card Nvidia Geforce RTX 3800Ti 12 GB (Cuda 11.1)The core aim of the experimental outcomes is to assess the model's performance through two essential metrics: mean average precision (mAP50) for accuracy, and inference time for evaluating efficiency.mAP50 measures how well the model predicts and localizes objects, indicating its accuracy, while inference time assesses the model's speed in processing images, reflecting its practical utility in real-time applications.These metrics collectively provide a concise evaluation of the model's overall effectiveness and applicability as shown in Tables 5 and 6, and Figure 8.The proposed carLocalizationCNN model demonstrates an enhancement in recall, mAP50, and mAP50-90 metrics when contrasted with YOLOv8s, YOLOv5s, YOLOv5s_aug*, YOLOv8s_aug*, and Faster R-CNN (FRCNN).These improvements highlight the model's enhanced ability to correctly identify and localize vehicles across a range of conditions and overlaps, thus offering a more robust solution for vehicle detection tasks.However, it is worth mentioning that YOLOv8s_aug outperforms the proposed model in terms of precision.This indicates that while the carLocalizationCNN model is adept at reducing false negatives and improving detection coverage, YOLOv8s maintains a higher accuracy in predicting true positive detections, minimizing false positives within its identifications.
On the efficiency, The model we've developed demonstrates enhanced efficiency in its inference capabilities when compared with several established models.Specifically, it processes images approximately 1.83 times quicker than YOLOv5s, signifying a notable speed improvement, which can be particularly beneficial in scenarios demanding rapid data processing.Compared to YOLOv5s, the proposed model exhibits a modest speed increment of 1.25%, which, while smaller, still reflects an advancement in processing efficiency.While the model outperforms Faster R-CNN (FRCNN) by a factor of 6.7, indicating a significant leap in inference speed.This good acceleration in processing times could enhance the applicability of the model in real-time applications like the case that we are discussing using drones for decision-making.The improvements underscore the model's potential in balancing the trade-off between speed and accuracy, thereby broadening its utility across various domains.

Conclusions and Future Work
In conclusion, this study represents a step forward in the domain of Vehicle Intelligence Systems (VIS), especially for real-time vehicle detection in adverse weather conditions using Unmanned Aerial Vehicles (UAVs).The proposed method addresses some of these challenges presented by heavy snowfall, common in Nordic regions.The carLocalization-CNN model developed here adds an enhancement to traditional detection methods like YOLO and Faster R-CNN in key metrics such as recall, mAP50, and mAP50-90.Moreover, the model exhibits improvements in inference speed, making it highly suitable for timesensitive applications like UAV-based surveillance in snowy environments.The research also seeks to highlight the existing research gap highlighted by the NVD dataset.There remains a substantial need for further investigation to enhance vehicle detection capabilities in such harsh weather conditions.The extensive evaluation using the NVD dataset lays the groundwork for future research in VIS, particularly in optimizing performance for edge devices operating in challenging environments.
Future work should focus on developing detection algorithms specifically tailored for snowy conditions, accounting for unique visual challenges such as reduced contrast, varying snow textures, and occlusions caused by snow accumulation.Techniques like multi-scale feature extraction and context-aware detection can be explored to enhance robustness.Additionally, the use of more advnaced signal processing techniques can offer improved flexibility in modeling the irregular and complex shapes of snow-covered vehicles, potentially leading to more accurate detections.Research can explore optimizing the parameters of fractional splines to balance computational efficiency and detection accuracy.

Figure 1 .
Figure 1.Sample of implementing the Fractional B-Spline Wavelet Transform over NVD dataset.

Figure 3 .
Figure 3. From left to right: original image, produced heatmaps, overlay of both.

Figure 6 .
Figure 6.Conversion of the heatmap to bounding boxes.

Figure 7 .
Figure 7. Loss of the proposed model.

Table 1 .
Applied search criteria over available vehicle dataset.

Table 4 .
Specification of the PC used for training.

Table 5 .
Accuracy comparison between STOA detectors and the proposed model.

Table 6 .
Efficiency comparison between STOA detectors and the proposed model.
•Accuracy: Emphasis was placed on the high-quality annotations of the dataset, which is specific to snowy conditions in the Nordic region, acknowledging potential limitations in generalizability and the importance of accurate data for training algorithms.•Transparency:The methodology for data collection and model training is thoroughly documented, promoting scrutiny and validation by the scientific community.The use of deep learning for vehicle detection is well-explained, with intentions to share findings and ensure the algorithms perform as expected without unintended behaviors.