Object Detection Method Based on Polarimetric Features and PFOD-Net Under Adverse Weather Conditions

Li, Xingtao; Li, Wenjuan; Yan, Xiaoyao; Wang, Weifeng; Bu, Fan

doi:10.3390/app16041698

Open AccessArticle

Object Detection Method Based on Polarimetric Features and PFOD-Net Under Adverse Weather Conditions

by

Xingtao Li

^1,2,

Wenjuan Li

^1,2,

Xiaoyao Yan

^1,2,

Weifeng Wang

^1,* and

Fan Bu

¹

Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, No. 17 Xinxi Avenue, New Industrial Park, Xi’an High-Tech Zone, Xi’an 710049, China

²

University of Chinese Academy of Sciences, Yanqihu East Road, Huairou District, Beijing 100049, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(4), 1698; https://doi.org/10.3390/app16041698

Submission received: 6 January 2026 / Revised: 2 February 2026 / Accepted: 5 February 2026 / Published: 8 February 2026

Download

Browse Figures

Versions Notes

Abstract

To address insufficient real-time detection accuracy of standard YOLO models under adverse weather, we propose PFOD-Net, a multi-scale detector based on polarimetric features and an improved YOLOv8n. Compared with traditional intensity imaging, polarimetric imaging extracts optical information from target scenes more effectively; when coupled with an appropriate feature selection mechanism, it significantly enhances detection and recognition performance in complex environments. Experimental results on the Polar LITIS dataset demonstrate that PFOD-Net significantly outperforms YOLOv8 in both accuracy and speed: mAP@0.5 (mean Average Precision) increased from 33.5% to 81.7%—an increase of 48.2 percentage points. Notably, the detection performance of PFOD-Net is substantially reinforced in complex conditions like fog and haze. By innovatively integrating polarimetric information, lightweight architectures, and the fusion of max-pooling and average-pooling, this method provides an effective solution for object detection and recognition in adverse weather.

Keywords:

object detection; polarimetric features; multi-scale feature fusion; GhostNet

1. Introduction

Object detection is a pivotal research direction in computer vision, with optical polarimetric imaging detection serving as one of its key branches [1]. In recent years, object detection research has achieved significant progress, benefiting from the rapid development of deep learning methods. As a cutting-edge algorithm that balances high execution speed with low hardware costs, YOLOv8 [2] has opened new possibilities for the application of object detection technologies.

With the increasing demand for deploying object detection systems on resource-constrained edge devices such as autonomous vehicles, unmanned aerial vehicles (UAVs), and mobile robots, the need for computationally efficient models has become paramount. Recent studies have demonstrated the feasibility and necessity of lightweight architectures for hardware implementation. For instance, research on FPGA-accelerated computing methods for YOLO models [3] has shown that optimized lightweight networks can achieve real-time performance while maintaining acceptable accuracy on embedded platforms with limited computational resources and power budgets. These findings highlight the importance of developing streamlined detection architectures that balance performance and efficiency for practical deployment scenarios.

YOLO-based detectors have been successfully applied across diverse real-world scenarios. For example, YOLOv8 has been deployed for real-time detection of traffic signs and roadway objects in autonomous driving systems, demonstrating high accuracy and speed for environment perception and safety-critical tasks [4]. Similarly, in remote sensing and UAV imagery, enhanced YOLO-based detectors have been effectively used for vehicle detection in urban aerial scenes, addressing challenges such as small object sizes and cluttered backgrounds [5].

However, under adverse weather conditions such as fog, haze, rain, and snow, images captured by traditional RGB imaging methods are often degraded by substantial scattered light, leading to reduced image contrast and poor object representation, which significantly compromise detection performance [6]. Polarimetric imaging technology offers a promising solution to this challenge by providing not only intensity information but also additional optical polarimetric data. Processing this information yields unique contrast features that are absent in conventional intensity images [7,8]. By exploiting the polarimetric features of light waves to decouple object reflections from atmospheric scattering, polarimetric imaging effectively suppresses background noise and enhances target details, thereby providing critical support for robust object detection under adverse weather conditions [9,10,11,12,13,14,15].

Building upon these foundations, this study introduces the optical characteristics of polarimetric images into the traditional YOLOv8n and proposes three major improvements:

Polarimetric Transformation and Feature Enhancement (PTFE) module: In response to the variations in light intensity across different polarization angles, an input processing module incorporating prior knowledge from the polarimetric domain was designed. Through multi-scale polarization channel fusion and weight-sharing mechanisms, this module not only accelerates network convergence but also achieves efficient image feature enhancement and extraction with minimal parametric overhead.
Dynamic Coordinate Attention Spatial Pyramid Pooling (DCASPP) module: To address feature redundancy in the original SPPF (Spatial Pyramid Pooling—Fast) during the later stages of the network, we integrated a dual-branch dynamic pooling structure with a Coordinate Attention mechanism, implemented in parallel with the traditional max-pooling layer. This refinement maintains the robustness of the original max-pooling layer while significantly enhancing the spatial pyramid’s capacity for multi-scale information fusion and effective feature extraction through average pooling and Coordinate Attention recalibration.
Furthermore, GhostNet was introduced into the network to achieve further lightweighting. Comparative experiments across various lightweight architectures validated the effectiveness of GhostNet within PFOD-Net (Polarization Feature Object Detection Network). By replacing feature extraction modules in local or global network components with GhostNet, the results demonstrate that utilizing GhostNet in the backbone effectively balances detection accuracy and inference speed.

The remainder of this paper is organized as follows: Section 2 reviews related works on YOLOv8 object detection, polarimetric imaging methods, attention mechanisms, and polarimetric object detection datasets. Section 3 presents the proposed PFOD-Net architecture and details the PTFE and DCASPP modules. Section 4 provides experimental results and comparative analysis to validate the effectiveness of our method. Finally, Section 5 concludes the paper and discusses future research directions.

2. Related Works

2.1. YOLOv8 Object Detection Method

YOLOv8 [2] is an enhanced upgrade of the YOLOv5 [16] series developed by the Ultralytics team, which represents a significant advancement in the YOLO series, featuring an improved architecture that enhances both detection accuracy and computational efficiency. The network architecture of YOLOv8 is illustrated in Figure 1, which shows its backbone, neck, and head components. Compared to YOLOv5, YOLOv8 replaces the C3 module with the C2f module in the feature layers and introduces an anchor-free design in the detection head.

However, despite these improvements, neural network-based object detection methods, exemplified by YOLO, still face several typical challenges in practical applications. First, under adverse weather conditions such as fog, rain, or snow, the performance of these models significantly degrades due to reduced image contrast and obscured target features. Second, the detection accuracy for small objects and distant targets remains insufficient, as shallow network layers often fail to capture fine-grained features effectively. Third, there exists an inherent trade-off between model complexity and real-time performance—deeper networks achieve higher accuracy but require more computational resources, making deployment on edge devices difficult. Additionally, these methods are highly sensitive to illumination variations and complex backgrounds, often resulting in false positives and missed detections.

In this study, we utilize YOLOv8n as the baseline to primarily investigate the performance of object detection algorithms in adverse weather environments. Furthermore, building upon this baseline, we have implemented our object detection method by integrating polarimetric features and dynamic coordinate attention. The effectiveness of these improvements is validated through comprehensive comparative experiments and ablation studies.

2.2. Polarimetric Imaging Methods

Early polarimetric imaging mostly acquired images of different polarization states by using rotating polarizers and multi-frame acquisition; equipment from this period was bulky and lacked real-time performance [8]. After years of development, technologies such as division-of-time, division-of-amplitude, division-of-aperture, and division-of-focal-plane (DoFP) polarimetric imaging have moved from experimental stages to practical applications. Among them, DoFP technology directly integrates micro-polarizers onto the optical sensor, enabling the acquisition of full polarimetric information in a single exposure [17]. This technology is characterized by its compact size and strong real-time performance, making it particularly suitable for dynamic scenes. Currently, polarimetric imaging has been applied in fields such as remote sensing, autonomous driving, medical imaging, target recognition, and military surveillance [2,4,10].

This study focuses on the optical characteristics of polarimetric imaging and introduces polarimetric physical priors into the YOLOv8n object detection network. We designed a feature selection enhancement module based on polarimetric image inputs. Experimental results demonstrate that this method significantly enhances the detection performance of the YOLO model in adverse weather.

2.3. Attention Mechanisms

In recent years, attention mechanisms have been widely applied in computer vision tasks such as object detection and image recognition, becoming an indispensable part of the deep learning field. Attention mechanisms originate from the “focusing” ability of human vision. In deep learning, the mechanism calculates “correlation” on input feature maps to obtain weights, which are then used for weighted summation of features to highlight important information. Representative examples include the following:

SE Module [18]—Hu et al. proposed the SE attention module. The basic idea is to obtain channel-level statistics through global average pooling and use fully connected layers for feature compression and recalibration, thereby achieving adaptive adjustment of weights for each channel.
ECA Module [19]—Wang et al. further reduced parameter redundancy based on SE and proposed the ECA module. ECA uses 1D convolution instead of fully connected layers, effectively avoiding the introduction of excessive parameters and significantly reducing computational complexity while maintaining performance improvements.
CBAM [20]—Sanghyun extended the attention mechanism to the joint modeling of channel and spatial dimensions and proposed CBAM. CBAM first weights the channel features and then generates a spatial attention map, focusing on the importance of both features and spatial positions simultaneously, making the network more precise in feature selection and target localization.
CA (Coordinate Attention) [21]—Hou et al. proposed the CA module. It encodes positional information along the horizontal and vertical directions, respectively, through direction-decomposed global pooling operations and embeds this information into the generation process of channel attention. This captures long-range dependencies while preserving precise positional information. The CA module significantly enhances the network’s representation of target structure and spatial layout while remaining lightweight, making it especially suitable for tasks that require simultaneous attention to global and local features.

2.4. Polarimetric Object Detection Datasets

Deep learning-based object detection methods fundamentally depend on training datasets. Extensive general-purpose datasets already exist in the field of computer vision, such as MS-COCO and PASCAL-VOC. However, image datasets constructed based on polarimetric imaging remain relatively scarce.

Blin et al. constructed the first object detection dataset [7] with a one-to-one correspondence between RGB and polarimetric images by using a combination of GoPro and polarimetric cameras, providing new material for target detection in road scenes. To enrich the diversity of the dataset, images were collected under various weather conditions, including sunny, cloudy, and hazy weather. Wang et al. constructed the PCOD dataset [22] using Lucid Triton polarimetric cameras. The dataset consists of 1200 pairs of RGB intensity images and their corresponding

D o L P

images, sourced from nature, military backgrounds, industrial environments, and daily life, providing effective support for polarimetric imaging-based object detection technology.

3. Methods

3.1. Overall Model Architecture

To address the issue of insufficient accuracy in real-time object detection methods under complex backgrounds, we propose PFOD-Net, a polarimetric input detection network based on improved YOLOv8. By integrating polarimetric features, Coordinate Attention, dynamic multi-scale feature fusion, and GhostNet, the proposed network significantly enhances object detection and recognition performance while maintaining the real-time and high-speed characteristics of the original architecture. The architecture of the PFOD-Net proposed in this paper is illustrated in Figure 2.

Compared with the YOLOv8n architecture, we have introduced several key modifications:

PTFE—We introduced the PTFE module to enhance the model’s feature representation capability in the shallow layers.
DCASPP—We redesigned the receptive field module by developing the DCASPP.
Lightweight Backbone Integration—By replacing specific standard convolutions in the backbone network with Ghost convolutions, which require fewer parameters, we achieved a 22.4% reduction in model size at the cost of only a 0.7% decrease in precision.

In summary, these strategic enhancements in PFOD-Net collectively improve detection accuracy and efficiency in challenging scenarios, ensuring real-time applicability with reduced computational demands.

3.2. PTFE Module

As a transverse wave, light possesses unique polarization properties, fundamentally characterized by the anisotropic distribution of electromagnetic field oscillation directions during propagation. The polarization state, an intrinsic property of vector waves, reflects the vibration pattern of the electric field vector in spacetime. In polarization imaging, the Stokes vector serves as the core mathematical tool for describing light’s polarization characteristics, expressed as Equation (1):

S = {[S_{0}, S_{1}, S_{2}, S_{3}]}^{T}

(1)

S_{0}

represents total light intensity,

S_{1}

reflects the intensity difference between horizontal and vertical linear polarization components,

S_{2}

captures the intensity difference between 45° and 135° linear polarization components, and

S_{3}

characterizes circular polarization differences. Compared to traditional intensity imaging, polarization imaging resolves these parameters to extract deep features such as surface material properties, microstructures, and observation geometries. For example, artificial objects with smooth surfaces generate partially polarized reflections, while natural rough surfaces (e.g., vegetation, soil) typically produce unpolarized reflected light. This physical distinction grants polarization imaging significant advantages in camouflage recognition, material classification, and object detection in complex environments [23,24].

Conventional polarization acquisition requires capturing images at different polarization angles (e.g., 0°, 45°, 90°, 135°), then converting them to Stokes parameters via Equation (2):

S = [\begin{matrix} \begin{matrix} S_{0} \\ S_{1} \end{matrix} \\ \begin{matrix} S_{2} \\ S_{3} \end{matrix} \end{matrix}] = [\begin{matrix} \begin{matrix} I_{0} + I_{90} \\ I_{0} - I_{90} \end{matrix} \\ \begin{matrix} I_{45} - I_{135} \\ I_{L} - I_{R} \end{matrix} \end{matrix}]

(2)

where

I_{0}

,

I_{45}

,

I_{90}

, and

I_{135}

represent the light intensities measured at polarization angles of 0°, 45°, 90°, and 135°, respectively;

I_{L}

and

I_{R}

denote the intensities of left-handed and right-handed circularly polarized light, respectively; and

S_{3}

represents the intensity difference between left-handed and right-handed circularly polarized light, which necessitates additional modulation using a quarter-wave plate. However, since circular polarization components are negligible in natural scenes, conventional systems primarily utilize

S_{0}

,

S_{1}

, and

S_{2}

for feature analysis. Based on total intensity conservation, orthogonal polarization pairs satisfy Equation (3):

I_{0} + I_{45} = I_{90} + I_{135} = S_{0}

(3)

This physical law holds regardless of incident light polarization, enabling derivation of

S_{2}

’s equivalent expression (4):

S_{2} = 2 I_{45} - I_{0} - I_{90}

(4)

This simplification reduces required inputs from four-directional images (

I_{0}

,

I_{45}

,

I_{90}

,

I_{135}

) to three (

I_{0}

,

I_{45}

,

I_{90}

), offering dual advantages:

a.: Hardware simplification—Eliminates the need for additional 135° polarizers, reducing system complexity.
b.: Framework compatibility—Three-channel data structurally aligns with conventional RGB imaging, enabling seamless integration with mainstream vision frameworks.

Stokes parameters further derive the degree of linear polarization (

D o L P

, representing the ratio of fully linearly polarized intensity to total intensity, Equation (5)) and angle of polarization (

A o P

, directly indicating polarization direction, Equation (6)):

D o L P = \frac{\sqrt{{S_{1}}^{2} + {S_{2}}^{2}}}{S_{0}}

(5)

A o P = \frac{1}{2} \arctan (\frac{S_{2}}{S_{1}})

(6)

D o L P

characterizes the proportion of the completely linearly polarized component in light, while

A o P

reflects the polarization direction; both are core parameters in polarimetric analysis. Experimental results indicate that fusing multiple polarimetric images can effectively enhance object detection and recognition performance. Based on this, this study proposes a lightweight PTFE module, the structure of which is shown in Figure 2. The input polarized angle images first pass through a feature generation layer, where the input images

I_{0}

,

I_{45}

,

I_{90}

, and

I_{135}

are converted into Stokes parameters (

S_{0}

,

S_{1}

,

S_{2}

),

D o L P

, and

A o P

. Subsequently, in the feature fusion layer, the original inputs and the generated features are concatenated into an 8-channel tensor. Average pooling is then applied to the scaled feature maps to obtain recalibration weights. The weighted feature maps are then fed into a feature compression layer, where a 3 × 3 convolution is utilized to reduce the tensor channel count to 3, completing feature dimensionality reduction and information refinement. This process can be expressed by Equations (7) and (8):

P T F E (X) = S i L U (B N (W \cdot W e i g h t (P (X)) + b))

(7)

S i L U (x) = x \cdot σ (x) = \frac{x}{1 + e^{- x}}

(8)

where

X

represents the input feature map,

P (\cdot)

denotes the polarimetric transformation operation applied to

X

,

W e i g h t (\cdot)

is a learnable weighting function that adaptively adjusts feature importance,

W

represents the convolutional weight matrix,

b

is the bias term,

B N

denotes batch normalization, and

S i L U

is the activation function defined in Equation (8). Compared to purely data-driven models, this module transforms polarimetric imaging laws into learnable feature representations by explicitly embedding polarimetric physical priors. This prevents the model from implicitly learning complex physical relationships from raw data, thereby accelerating convergence and enhancing generalization capability. Furthermore, the module integrates polarimetric data preprocessing into the network, introducing only 614 learnable parameters, achieving efficient computation while maintaining model compatibility.

3.3. DCASPP Module

In object detection and recognition tasks, target scale diversity and background interference significantly affect model robustness. YOLOv8 inherits the SPPF module from YOLOv5, which achieves rapid multi-scale feature extraction through three concatenated 3 × 3 max-pooling kernels. However, the fixed pooling kernel sizes and static fusion method of SPPF struggle to adapt to scale variations in complex scenes and can easily lead to the loss of small object features in deep networks, thereby limiting detection precision.

To alleviate these issues, the DCASPP module was designed (structure shown in Figure 2). This module introduces a global context-driven dynamic weight allocation mechanism: features first pass in parallel through 3 × 3, 5 × 5, and 7 × 7 average pooling layers to extract multi-scale information, followed by global average pooling to compress spatial dimensions. Subsequently, a two-layer fully connected network generates normalized weights, which are constrained to (0, 1) via a Sigmoid function. These weights are used for the weighted fusion of multi-scale features, achieving adaptive “feature integration on demand.” For instance, in scenes containing numerous small objects, the module automatically increases the weight of the 3 × 3 pooling branch to reduce detail loss caused by larger kernels.

Furthermore, to balance feature diversity and computational efficiency, the 5 × 5 max-pooling path from the original SPPF is retained alongside the dynamic pooling branches, forming a dual-branch structure. The outputs of these two branches are concatenated along the channel dimension and fed into a lightweight CA module (structure shown in Figure 3). CA models direction-sensitive spatial dependencies through decomposed pooling in the horizontal and vertical directions, highlighting key regions such as target edges and centers while suppressing background noise.

Finally, 1 × 1 convolutions are still employed for dimensionality reduction and expansion to avoid additional parameter overhead. Under the synergistic effect of dual-branch concatenation and attention filtering, the subsequent feature processing volume of the module is reduced to approximately 2/3 of the original SPPF. In a nano-scale model with an input of 640 × 640, the 1 × 1 convolutions of the original SPPF account for 164,608 parameters; however, through dynamic weight sharing and attention optimization, the total parameter count of DCASPP is reduced to 109,979, a decrease of approximately 33.1%.

Through the synergistic design of dynamic multi-scale pooling and spatial attention mechanisms, the DCASPP module addresses the rigid feature selection issues inherent in the traditional SPPF module. Its dual-branch structure not only reduces the number of parameters but also enhances the feature retention capability for small objects, providing an efficient and robust feature representation solution for object detection and recognition in complex scenarios.

3.4. Lightweight Convolutions

Neural network models are typically characterized by a high number of parameters and heavy computational loads. To enable the deployment of these models on embedded real-time platforms, numerous studies have explored model lightweighting, such as MobileNet [25], ShuffleNet [26], GhostNet [27], and FasterNet [28]. Among them, GhostNet utilizes depthwise separable convolutions to perform feature extraction on a portion of the input feature maps, generating “ghost” feature maps, which are then concatenated with the original inputs that did not undergo convolution to produce the output feature maps. The network architecture of GhostNet is illustrated in Figure 4.

This study conducted experimental simulations on the aforementioned lightweight structures. Based on the experimental results, GhostNet was selected as the lightweighting method; detailed results are provided in Section 4.4.3.

4. Results

4.1. Implementation Details

To ensure reproducibility, this section details the experimental environment, training configuration, and key implementation strategies employed in this study.

4.1.1. Experimental Setup

All experiments were conducted using PyTorch 2.0.1 with Python 3.9.21 on a workstation equipped with an NVIDIA RTX 4060 GPU (8 GB; NVIDIA Corporation, Santa Clara, CA, USA) and an Intel Core i5-12490F CPU (Intel Corporation, Santa Clara, CA, USA). CUDA version 11.8 was utilized for GPU acceleration. The training employed the SGD optimizer with an initial learning rate of 0.01, momentum of 0.9, and weight decay of 0.0005. The batch size was set to 32, and all input images were resized to 640 × 640 pixels with 3-channel inputs. A cosine annealing scheduler was adopted to gradually reduce the learning rate during training.

4.1.2. Training Strategy

The training procedure consisted of two stages. First, all models were initialized using Kaiming initialization and pre-trained on the MS-COCO dataset for 50 epochs to obtain general object detection capabilities. Subsequently, the model was fine-tuned on polarimetric datasets for an initial training phase of 100 epochs, followed by continued training until performance stabilized (no improvement within 50 consecutive epochs). The pre-training of PFOD-Net required approximately 125 min per epoch, which is slightly longer than YOLOv5n (115 min per epoch) but significantly shorter than the other detection frameworks compared in Section 4.4.2. This demonstrates that PFOD-Net maintains reasonable training efficiency, making it suitable for practical applications where retraining or domain adaptation may be required.

4.1.3. Loss Function and Post-Processing

The loss function combines Binary Cross-Entropy (BCE) loss for classification and prediction, and Complete Intersection over Union (CIoU) loss for bounding box regression. For post-processing, Non-Maximum Suppression (NMS) with an IoU threshold of 0.45 and a confidence threshold of 0.25 was applied to eliminate redundant detections.

4.1.4. Data Augmentation

To enhance model robustness, Mosaic augmentation was employed to combine four training images into a single composite. Additionally, random horizontal flipping (probability 0.5), HSV color space augmentation (hue gain 0.015, saturation gain 0.7, value gain 0.4), and random scaling (0.5–1.5×) with translation (±0.1 image size) were applied during training.

4.2. Experiment Datasets

Regarding the experimental datasets, the publicly available Polar LITIS and PCOD datasets [22] were employed to verify the effectiveness of the proposed detection and recognition algorithm.

Polar LITIS is the first multimodal road scene dataset containing paired RGB and polarimetric images [7]. The sampling scenarios encompass various road types, including highways, urban roads, small village roads, parking lots, and campus areas, as well as three weather conditions: sunny, cloudy, and foggy. The dataset is partitioned into a training set of 1640 images collected on sunny days, a validation set of 420 images collected on cloudy days, and a test set of 509 images collected on foggy days. This setup investigates the generalization performance of the model by training under good visibility conditions and performing final evaluations under low visibility. Furthermore, benefiting from the one-to-one correspondence between RGB and polarimetric images in the dataset, the contribution of polarimetric imaging to object detection and recognition can be easily evaluated. The annotated objects in this dataset consist of four categories: vehicles, pedestrians, bicycles, and motorcycles.

The PCOD dataset was acquired using a Lucid Triton camera and consists of a total of 1200 camouflaged object detection scene instances. The annotated categories include insects, animals, fruits, camouflage nets, and plants. The image data types include polarimetric angle images,

D o L P

images, Stokes parameter images, and RGB images. Due to the lack of categorical annotation files in the original PCOD dataset, we performed a re-annotation process. The comprehensive annotation information is summarized in Table 1.

To verify the effectiveness of the proposed polarimetric image detection and recognition method, we conducted comparative experiments using the two aforementioned datasets to evaluate the performance of various detection and recognition models. Furthermore, ablation studies were performed on the Polar LITIS dataset to validate the individual contributions of PTFE, DCASPP, and GhostConv.

The structure of the polarized angle images in the Polar LITIS dataset is illustrated in Figure 5. Under favorable weather conditions, intensity images can effectively represent targets; however, under adverse weather, visibility is reduced, and color and shape information are confounded by noise or scattering interference. In such cases, while intensity imaging performs poorly, polarimetric images maintain stable feature representation. Additionally, the polarized angle images in the PCOD dataset are stored as independent grayscale images. We performed fundamental processing on these images to ensure their format remains consistent with that of the Polar LITIS dataset.

It is worth noting that to ensure dataset compatibility with various mainstream object detection and recognition algorithms, Polar LITIS employs five different three-channel encoding formats for polarimetric information. Each format maps the light intensities of the four original polarization directions, or their transformations, into three channels, making them compatible with existing deep learning frameworks. However, this approach conversely limits the effective utilization of polarimetric information. The proposed PTFE module fully accounts for the information provided by different polarimetric parameters. It performs parametric transformations on the original polarized images to derive additional polarimetric parameters, which are then integrated and fed into the network simultaneously. This allows the network to automatically learn the relative importance of each polarimetric parameter.

4.3. Evaluation Indicators

The evaluation metrics employed in the experiments of this study include

I o U

(Intersection over Union),

P r e c i s i o n

,

R e c a l l

,

m A P

(mean Average Precision),

F P S

(Frames Per Second), and

P a r a m s

(number of model parameters). The relevant calculation formulas are as follows:

I o U = \frac{A r e a (B_{p r e d} \cap B_{g t})}{A r e a (B_{p r e d} \cup B_{g t})}

(9)

P r e c i s i o n = \frac{T P}{T P + F P}

(10)

R e c a l l = \frac{T P}{T P + F N}

(11)

m A P = \frac{1}{N} \sum_{i = 1}^{N} {A P}_{i}

(12)

where

B_{p r e d}

denotes the predicted bounding box and

B_{g t}

denotes the ground-truth box.

T P

represents the targets correctly detected,

F P

represents the targets incorrectly detected, and

F N

represents the targets that were not detected.

A P

is the average precision for a single category, which corresponds to the area under the Precision–Recall curve.

m A P

represents the mean value of the

A P

across all categories.

F P S

indicates the inference speed of the model, and Params refers to the total number of model parameters.

4.4. Validation of the Proposed Algorithm

4.4.1. Comparison of Different Image Sources

Compared to other polarimetric datasets, the Polar LITIS dataset is the first to provide paired RGB and polarimetric images with a one-to-one correspondence. This enables a performance comparison between intensity imaging and polarimetric imaging for object detection and recognition under adverse weather conditions. Table 2 presents the performance of several general-purpose models using different image sources as inputs.

On the validation set, the detection and recognition results obtained using RGB images as model inputs are generally superior to those using polarimetric images, which is consistent with expectations. Under sunny or cloudy conditions, the imaging environment is ideal; RGB images as an input source provide excellent texture details, color contrast, and edge contour representation. While polarimetric images provide unique information reflecting surface material, smoothness, and reflection characteristics—which are difficult to capture in RGB images—they lack substantial color information and high intensity contrast. Consequently, the overall information content of polarimetric images is lower than that of RGB images, leading to better performance for the latter on the validation set. Nevertheless, polarimetric imaging still maintains respectable detection and recognition accuracy in these scenarios.

However, when focusing on the validation set under adverse weather conditions, the relative performance of RGB and polarimetric imaging is reversed. In this context, the mAP@0.5 of polarimetric imaging is significantly higher than that of RGB imaging. Traditional RGB imaging is highly susceptible to lighting variations; under low-light or high-reflection conditions, the texture details provided by RGB images diminish, color contrast weakens substantially, and edge contours become blurred. In contrast, polarimetric imaging can attenuate background scattered light and exhibits strong robustness to high-reflection surfaces and weak-light conditions, while also enhancing target edges. These advantages enable polarimetric imaging to achieve superior results for object detection and recognition in adverse weather.

In summary, the experimental results in Table 2 demonstrate that polarimetric imaging yields performance far exceeding that of traditional imaging methods for object detection and recognition in adverse weather. Moreover, it maintains robust performance across various weather conditions, significantly enhancing generalization capability compared to traditional imaging methods.

In addition, a small set of polarimetric images was independently collected by us, covering remote-sensing road scenes under sunny, cloudy, rainy/snowy, and foggy/hazy conditions. Eighteen participants evaluated the visual quality of the collected images using a subjective image assessment criterion. The results indicate that images captured under sunny and cloudy conditions received the highest scores, followed by those acquired in foggy and hazy environments, while images obtained under rainy and snowy conditions achieved the lowest scores. This subjective evaluation is consistent with the experimental observations. Polarimetric imaging exhibits relatively stronger performance under foggy and hazy conditions, benefiting from the ability of polarimetric imaging to suppress scattered light and enhance target contours. In contrast, rainy and snowy scenes introduce severe dynamic interference, which limits the performance gains.

According to the detection and recognition performance presented in Table 2, the YOLOv8n and YOLOv11n [29] models achieve the highest inference speeds and detection accuracies on the test set when using polarimetric images as inputs, reaching (222 FPS, 227 FPS) and (71.4%, 72.5% mAP@0.5), respectively. These results already outperform many existing object detection algorithms designed for adverse weather, such as GCAnet (results shown in Table 3). While Faster R-CNN achieves the highest mAP@0.5 of 78.1% on the validation set, its parameter count is more than ten times that of the YOLO series, and its detection speed is less than one-third of the YOLO models. Given that its precision improvement is only between 1.5% and 4.6%, it is difficult to ensure real-time and efficient detection and recognition on most embedded devices. Building upon YOLOv8n, this study further addresses the issues of insufficient feature transformation capability in shallow networks and feature redundancy in deep networks. As a result, the proposed method achieves detection and recognition performance that surpasses current SOTA (State-of-the-art) methods.

4.4.2. Comparison Between the Proposed Algorithm and SOTA Methods

To evaluate the detection and recognition precision of the proposed method, we compared the PFOD-Net with the YOLOv5n, YOLOv8n (baseline), YOLOv11n, and Faster R-CNN models based on the Polar LITIS and PCOD datasets. Furthermore, to demonstrate the superiority of PFOD-Net, it was also compared with GCAnet [12], an algorithm specifically designed for object detection and recognition in adverse weather. The

I o U

threshold for mAP calculation was set to 0.5, following the standard MS-COCO evaluation protocol where detections with

I o U

≥ 0.5 are regarded as true positives.

Table 3 presents the detailed detection and recognition results of different models on the Polar LITIS dataset. The mAP@0.5 of PFOD-Net outperforms mainstream detection algorithms as well as those specialized for adverse weather, reaching a maximum of 81.7%. Compared to the 71.4% mAP@0.5 of the YOLOv8n baseline and the 76.4% maximum mAP@0.5 of GCAnet, our method achieves improvements of 10.3% and 5.3%, respectively.

Regarding real-time performance, except for the two-stage algorithm Faster R-CNN, which exhibits poor real-time capability, all other single-stage algorithms achieve rapid processing speeds. Specifically, PFOD-Net completes a single inference within 3.9 ms, with a total processing time of 5.3 ms per frame, meeting the real-time requirements for various scenarios. Additionally, the results in Table 4 compare the performance of different detection algorithms on the PCOD dataset, where PFOD-Net consistently outperforms other methods. In summary, the proposed PFOD-Net algorithm surpasses current SOTA methods.

4.4.3. Comparison of Different Lightweight Models

For practical engineering applications, the size and detection speed of neural network models are of critical importance. We further replaced the backbone of PFOD-Net with MobileNet, ShuffleNet, GhostNet, and FasterNet to conduct a performance comparison. The results in Table 5 indicate that after model lightweighting, the detection precision of the networks decreased to varying degrees. Among them, GhostNet maintained the best detection precision, while FasterNet achieved the most significant model compression.

Considering the trade-off between model size, detection speed, and precision, the PFOD-Net + GhostNet model reduced the parameter count from 2.96M to 2.25M—a reduction of 22.4%. Simultaneously, the detection speed increased from 192 FPS to 210 FPS, representing an 8.5% improvement. Although there was a marginal decrease in network precision, the substantial reduction in parameters makes it more suitable for deployment on embedded real-time platforms.

4.5. Ablation Experiments

In this study, polarimetric images were used to replace RGB images, and the PTFE and DCASPP modules were designed. To further demonstrate the effectiveness of these two modules in enhancing detection and recognition performance under adverse weather conditions, we conducted ablation studies on the Polar LITIS and PCOD datasets. As shown in Table 6, on the Polar LITIS dataset, the YOLOv8n baseline model achieved an mAP@0.5 of 33.5%. After switching the input to polarimetric images, the value increased to 71.4%. The integration of the PTFE module further raised the mAP@0.5 to 79.3%, and the subsequent addition of the DCASPP module brought it to 81.7%. Compared to the baseline model, the mean average precision improved by 48.2%, the model size decreased by 1.3%, and the detection speed decreased by 13.5%. The results for each modification on the PCOD dataset are similar to those on the Polar LITIS dataset; details are provided in Table 7.

Furthermore, in Experiment 5 on the Polar LITIS dataset, we further incorporated the lightweight GhostNet. This allowed the model to maintain an mAP@0.5 of 81.0%, while the parameter count was further compressed to 2.29M, and the detection speed reached 210 FPS.

To optimize the role of GhostNet within the architecture, we conducted further comparative experiments on the lightweight network while keeping all other configurations constant, as presented in Table 8. Experiment 1 represents the configuration without GhostNet; Experiment 2 introduces GhostNet solely into the neck network; and Experiment 3 incorporates GhostNet into both the backbone and the neck network.

The comparison reveals that the detection accuracy of Experiment 2 outperforms both Experiment 1 and Experiment 3, while maintaining an appropriate level of model compression. In summary, we finalized the strategy of introducing GhostNet into the middle and lower sections of the network architecture.

4.6. Grad-CAM Visualization

To further illustrate the interpretability of the proposed PFOD-Net and its ability to focus on relevant regions compared to baseline models, we apply Grad-CAM [30] to visualize the class activation maps for YOLOv5n, YOLOv8n, and PFOD-Net on representative foggy test samples. Figure 6 shows these visualization results, where warmer colors indicate higher activation and stronger focus on the target regions.

Compared with the ground-truth annotations in Figure 7, it can be observed that while YOLOv8n exhibits a certain degree of response around the target areas, it also shows concentrated irrelevant responses in the background, indicating limited capability in suppressing complex background noise. In contrast, although YOLOv5n effectively weakens the impact of background interference, its response within the actual target regions is considerably weak.

The proposed PFOD-Net demonstrates the most stable and concentrated activation characteristics in the visualization results. Its high-response regions align closely with the target areas while significantly suppressing irrelevant activations in the background. This suggests that PFOD-Net effectively guides the network to focus on key regions, possessing superior discriminative power in feature extraction and attention allocation, which further validates the effectiveness of the proposed method.

4.7. Comparison of Detection Performance

In object detection tasks, the performance of a detection model primarily depends on the network’s feature extraction capability, its ability to distinguish targets from the background, and the efficiency of feature transmission across different hierarchical levels. In complex scenarios, compared to the model proposed in this study, other algorithms rely solely on fixed local feature extraction convolutions. These methods often suffer from insufficient feature extraction of the raw input and an inability to capture spatial positional relationships, making it difficult to accurately identify and localize targets in complex backgrounds. Consequently, this leads to lower confidence scores and higher missed detection rates.

To intuitively demonstrate the detection and recognition efficacy of the proposed algorithm, five frames were randomly selected from the Polar LITIS test set for a performance comparison, as illustrated in Figure 7. Compared to YOLOv8n, the detection results of PFOD-Net are closer to the ground-truth annotations. YOLOv8n failed to detect several pedestrians and vehicles and exhibited a significant number of missed detections for small objects. In contrast, PFOD-Net successfully detected the majority of annotated targets in the scenes, with a substantial improvement in the detection and recognition capability for small targets. The experimental results indicate that the proposed PFOD-Net algorithm possesses superior performance in detecting targets of various scales, particularly excelling in low-visibility complex scenarios such as foggy weather. All metrics, including the missed detection rate, outperform those of the other comparative algorithms.

5. Conclusions

Polarimetric imaging offers significant advantages in object detection and recognition tasks under adverse weather conditions. This study proposes PFOD-Net, a novel object detection and recognition method that incorporates a PTFE network and a DCASPP module. The proposed method first reconstructs polarimetric features and performs enhanced feature extraction through the PTFE network, followed by the subsequent detection and recognition process. Our contributions include the following:

Utilization of polarimetric imaging information for object detection—Under foggy conditions, our method increased mAP@0.5 by 48.2 percentage points (from 33.5% to 81.7%).

Integration of physical priors and multi-scale robustness—In the shallow feature extraction stage, the PTFE module incorporates physical priors into the network with only one additional primary convolutional layer and a marginal increase in parameters. This enhances feature extraction capability and accelerates network convergence. In the feature fusion stage, the DCASPP module—featuring dual-branch pooling and a nearly parameter-free coordinate attention mechanism—further strengthens the model’s robustness to target scale variations while reducing parameter counts. Compared to other networks, our model achieves higher precision with almost no increase in parameters, improving the mAP@0.5 by 10.3% over the YOLOv8n baseline.

Model lightweighting—By replacing the backbone with GhostNet, the model size was successfully compressed by 22.4%, and the detection speed increased from 192 FPS to 210 FPS (an 8.5% improvement), with a negligible precision loss of only 0.7%.

The proposed method was validated on the Polar LITIS and PCOD datasets. The Polar LITIS dataset was used for training on sunny day data and testing on foggy day data, while the PCOD dataset was partitioned into training and validation sets accordingly. Our model achieved detection accuracies of 81.7% and 79.0% on these two datasets, respectively, with a detection speed of 210 FPS. The consistency of the comparative and ablation results across different datasets demonstrates the strong generalization capability of our model. However, existing polarimetric datasets are generally small in scale, and polarimetric images acquired in practical application scenarios may be more complex; thus, the performance of the proposed model requires further validation. Future work will focus on optimizing polarimetric feature extraction and establishing real-world adverse-weather polarimetric datasets to further verify the effectiveness of our method.

Author Contributions

Conceptualization, X.L. and W.W.; methodology, X.L.; software, X.L. and W.L.; validation, X.L. and X.Y.; formal analysis, X.L.; investigation, X.L. and X.Y.; resources, W.W.; data curation, X.L. and W.L.; writing—original draft preparation, X.L.; writing—review and editing, W.W. and X.L.; visualization, X.L.; supervision, W.W.; project administration, X.L. and W.W.; funding acquisition, F.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Science and Technology Major Project (2022ZD0117301).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Source code is available at: https://github.com/Qaz1362820076/PFOD-Net, accessed on 6 January 2025.

Acknowledgments

The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AoP	Angle of Polarization
DCASPP	Dynamic Coordinate Attention Spatial Pyramid Pooling
DoLP	Degree of Linear Polarization
FPS	Frames Per Second
IoU	Intersection over Union
mAP	mean Average Precision
PFOD-Net	Polarization Feature Object Detection Network
PTFE	Polarimetric Transformation and Feature Enhancement
RGB	Red, Green, Blue
SOTA	State-of-the-art
SPPF	Spatial Pyramid Pooling—Fast

References

Yang, K.; Liu, F.; Liang, S.; Xiang, M.; Han, P.; Liu, J.; Dong, X.; Wei, Y.; Wang, B.; Shimizu, K.; et al. Data-Driven Polarimetric Imaging: A Review. Opto-Electron. Sci. 2024, 3, 230042. [Google Scholar] [CrossRef]
Varghese, R.; Sambath, M. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar] [CrossRef]
Bian, Y.F.; Luo, D.X.; Zhang, M.L. A Review of FPGA Accelerated Computing Methods for YOLO Models. In Proceedings of the 2024 4th International Conference on Internet of Things and Machine Learning (IoTML 2024), Nanchang, China, 9–11 August 2024; p. 9. [Google Scholar] [CrossRef]
Nafaa, S.; Ashour, K.; Mohamed, R.; Essam, H.; Emad, D.; Elhenawy, M.; Ashqar, H.I.; Hassan, A.A.; Alhadidi, T.I. Advancing Roadway Sign Detection with YOLO Models and Transfer Learning. In Proceedings of the 2024 IEEE 3rd International Conference on Computing and Machine Intelligence (ICMI), Mt Pleasant, MI, USA, 13–14 April 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–4. [Google Scholar] [CrossRef]
Wang, H.; Shang, J.; Wang, X.; Zhang, Q.; Wang, X.; Li, J.; Wang, Y. RSW-YOLO: A Vehicle Detection Model for Urban UAV Remote Sensing Images. Sensors 2025, 25, 4335. [Google Scholar] [CrossRef] [PubMed]
Blin, R.; Ainouz, S.; Canu, S.; Meriaudeau, F. Road Scenes Analysis in Adverse Weather Conditions by Polarization-Encoded Images and Adapted Deep Learning. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 27–32. [Google Scholar] [CrossRef]
Blin, R.; Ainouz, S.; Canu, S.; Meriaudeau, F. A New Multimodal RGB and Polarimetric Image Dataset for Road Scenes Analysis. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 867–876. [Google Scholar] [CrossRef]
Li, Z.; Zhang, A.; Jiang, Y.; Li, G.; Wang, D.; Wang, W.; Shi, L.; Ji, T.; Liu, F.; Chen, Y. Research, Application, and Progress of Optical Polarization Imaging Technology. Infrared Laser Eng. 2023, 52, 20220808. [Google Scholar] [CrossRef]
Liu, J.Y.; Li, S.T.; Dian, R.; Song, Z. DT-F Transformer: Dual transpose fusion transformer for polarization image fusion. Inf. Fusion 2024, 106, 102274. [Google Scholar] [CrossRef]
Shen, Y.; Liu, X.C.; Wang, S.; Huang, F. Real-Time Detection of Low-Altitude Camouflaged Targets Based on Polarization Encoded Images. Acta Armamentarii 2024, 45, 1374–1383. [Google Scholar] [CrossRef]
Hu, H.F.; Fei, X.T.; Shen, L.H.; Li, X.B. Underwater Image Recovery under Non-Uniform Illumination Based on Polarimetric Imaging. Acta Opt. Sin. 2025, 45, 0629001. [Google Scholar] [CrossRef]
Tan, A.; Guo, T.; Zhao, Y.; Wang, Y.; Li, X. Object Detection Based on Polarization Image Fusion and Grouped Convolutional Attention Network. Vis. Comput. 2024, 40, 3199–3215. [Google Scholar] [CrossRef]
Sun, R.; Sun, X.; Chen, F.; Song, Q.; Pan, H. Polarimetric Imaging Detection Using a Convolutional Neural Network with Three-Dimensional and Two-Dimensional Convolutional Layers. Appl. Opt. 2020, 59, 151. [Google Scholar] [CrossRef] [PubMed]
Huang, F.; Zheng, J.; Liu, X.; Shen, Y.; Chen, J. Polarization of Road Target Detection under Complex Weather Conditions. Sci. Rep. 2024, 14, 30348. [Google Scholar] [CrossRef] [PubMed]
Dey, J.; Anandan, P.; Rajagopal, S.; Mani, M. Improved Target Detection with YOLOv8 for GAN Augmented Polarimetric Images Using MIRNet Denoising Model. IEEE Access 2024, 12, 166885–166910. [Google Scholar] [CrossRef]
Jocher, G.; Stoken, A.; Borovec, J.; NanoCode012; ChristopherSTAN; Liu, C.Y.; Laughing; Hogan, A.; lorenzomammana; tkianai; et al. ultralytics/yolov5, v3.0; Zenodo: Geneva, Switzerland, 2020. [CrossRef]
Zhang, J.C.; Wu, C.Y.; Luo, Y.D.; Li, C.G.; Jiang, N.; Song, Y.C. Research Status and Prospects on Super-Resolution Imaging Technology for Division-of-Focal-Plane Polarimeters (Invited). Infrared Laser Eng. 2025, 54, 20240165. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar] [CrossRef]
Wang, Q.L.; Wu, B.G.; Zhu, P.F.; Li, P.H.; Zuo, W.M.; Hu, Q.H. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 11531–11539. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2018; Volume 11211, pp. 3–19. [Google Scholar] [CrossRef]
Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 13708–13717. [Google Scholar] [CrossRef]
Wang, X.; Ding, J.; Zhang, Z.; Xu, J.; Gao, J. IPNet: Polarization-Based Camouflaged Object Detection via Dual-Flow Network. Eng. Appl. Artif. Intell. 2024, 127, 107303. [Google Scholar] [CrossRef]
Zhao, Y.Q.; Li, N.; Zhang, P.; Yao, J.X.; Pan, Q. Infrared Polarization Perception and Intelligent Processing. Infrared Laser Eng. 2018, 47, 1102001. [Google Scholar] [CrossRef]
Liu, Y.; Shi, H.D.; Jiang, H.L.; Li, Y.C.; Wang, C.; Liu, Z.; Li, G.L. Infrared Polarization Properties of Targets with Rough Surface. Chin. Opt. 2020, 13, 459–471. Available online: https://kns.cnki.net/kcms2/article/abstract?v=dKcr_PZ1zcuZNhwQkbLvOM2LvJh5iFdkDrzcSt5lbNd5FpagAWWyV30apjvhtGd4D_8aYinswsVmrsQrMVzle9VorHlVdlaEXL2Gi8keUSkPLv7oGQecMFGtyekQI5NziVGv82f5pGGKTlr45gEzUte1KfmUCOqC33Zy906dsU2Y3qIxtSI6Cg6ML1lFzPWX&uniplatform=NZKPT&language=CHS (accessed on 6 January 2026).
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 6848–6856. [Google Scholar] [CrossRef]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features From Cheap Operations. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1577–1586. [Google Scholar] [CrossRef]
Chen, J.; Kao, S.; He, H.; Zhuo, W.; Wen, S.; Lee, C.-H.; Chan, S.-H.G. Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 12021–12031. [Google Scholar] [CrossRef]
Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725v1. [Google Scholar] [CrossRef]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. Int. J. Comput. Vis. 2020, 128, 336–359. [Google Scholar] [CrossRef]

Figure 1. Network structure of YOLOv8.

Figure 2. Network structure of PFOD-Net.

Figure 3. Network structure of Coordinate Attention.

Figure 4. Network structure of GhostNet (DSconv consists of depthwise-conv and pointwise-conv).

Figure 5. Composition of polarization images in Polar LITIS.

Figure 6. Grad-CAM Visualization Results of YOLOv5n, YOLOv8n and PFOD-Net on the test set (Warmer colors indicate higher response intensity), corresponding to Figure 7.

Figure 7. Example detection results on the Polar LITIS test set (detections shown as colored boxes; threshold = 0.35; NMS

I o U = 0.5),

corresponding to Figure 6.

Figure 7. Example detection results on the Polar LITIS test set (detections shown as colored boxes; threshold = 0.35; NMS

I o U = 0.5),

corresponding to Figure 6.

Table 1. Instance classification in PCOD dataset.

Class	Train	Val
Insects	453	107
Animals	276	70
Products	230	55
Camouflage Net	39	10

Table 2. Performance comparison of object detection algorithms on RGB and Polarization images.

Detection Method	Imaging Method	mAP@0.5 on the Validation Set (%)	mAP@0.5 on the Test Set (%)	Params (M)	FPS
YOLOv5n	RGB	75.2	41.5	2.51	232
YOLOv5n	Polarization	70.4	66.8	2.51	232
YOLOv8n	RGB	75.7	33.5	3.01	222
YOLOv8n	Polarization	72.4	71.4	3.01	222
YOLOv11n	RGB	76.0	39.6	2.62	227
YOLOv11n	Polarization	70.2	72.5	2.62	227
Faster R-CNN	RGB	77.1	42.1	41	65
Faster R-CNN	Polarization	69.8	70.2	41	65

Table 3. Comparison of different object detection algorithms on the Polar LITIS dataset.

Method	Class	mAP@0.5 (%)	Params (M)	FPS	GFlops
YOLOv5n	Car	73.5	2.51	232	4.2G
	Person	60.1
	All	66.8
YOLOv8n	Car	72.9	3.01	222	8.1G
	Person	70.0
	All	71.4
YOLOv11n	Car	74.6	2.62	227	6.3G
	Person	70.4
	All	72.5
Faster R-CNN	Car	70.5	41	65	208G
	Person	69.9
	All	70.2
GCAnet	Car	90.0	29.8	145	12.8G
	Person	63.0
	All	76.4
PFOD-Net (ours)	Car	77.4	2.96	192	8.0G
	Person	86
	All	81.7

Table 4. Comparison of different object detection algorithms on PCOD dataset.

Method	mAP@0.5 (%)	Params (M)	FPS
YOLOv5n	64.4	2.51	232
YOLOv8n	68.8	3.01	222
YOLOv11n	70.1	2.62	227
GCAnet	73.8	29.8	145
PFOD-Net (ours)	79.0	2.96	192

Table 5. Comparison of detection performance across lightweight network architectures.

Method	mAP@0.5 (%)	Params (M)	FPS
PFOD-Net	81.7	2.96	192
PFOD-Net + MobileNet	71.7	2.11	227
PFOD-Net + ShuffleNet	70.1	2.06	238
PFOD-Net + GhostNet	81.0	2.25	210
PFOD-Net + FasterNet	66.8	1.41	256

Table 6. Results of ablation experiments on the Polar LITIS dataset.

Model Number	Polarized Image	PTFE	DCASPP	GhostNet	mAP@0.5 (%)	Params (M)	FPS
1	×	×	×	×	33.5	3.01	222
2	√	×	×	×	71.4	3.01	222
3	√	√	×	×	79.3	3.01	192
4	√	√	√	×	81.7	2.96	192
5	√	√	√	√	81.0	2.25	210

Table 7. Results of ablation experiments on the PCOD dataset.

Model Number	Polarized Image	PTFE	DCASPP	GhostNet	mAP@0.5 (%)	Params (M)	FPS
1	√	×	×	×	68.8	3.01	222
2	√	√	×	×	72.0	3.01	192
3	√	√	√	×	79.0	2.96	192
4	√	√	√	√	77.6	2.25	210

” √” indicates that the corresponding method is adopted and” ×” indicates not adopted.

Table 8. The result of lightweighting at different levels of the network.

Model Number	mAP@0.5 (%)	Params (M)	FPS	GFlops
1	81.7	3.01	192	8.0G
2	81.0	2.25	210	6.3G
3	65.1	1.66	277	5.6G

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, X.; Li, W.; Yan, X.; Wang, W.; Bu, F. Object Detection Method Based on Polarimetric Features and PFOD-Net Under Adverse Weather Conditions. Appl. Sci. 2026, 16, 1698. https://doi.org/10.3390/app16041698

AMA Style

Li X, Li W, Yan X, Wang W, Bu F. Object Detection Method Based on Polarimetric Features and PFOD-Net Under Adverse Weather Conditions. Applied Sciences. 2026; 16(4):1698. https://doi.org/10.3390/app16041698

Chicago/Turabian Style

Li, Xingtao, Wenjuan Li, Xiaoyao Yan, Weifeng Wang, and Fan Bu. 2026. "Object Detection Method Based on Polarimetric Features and PFOD-Net Under Adverse Weather Conditions" Applied Sciences 16, no. 4: 1698. https://doi.org/10.3390/app16041698

APA Style

Li, X., Li, W., Yan, X., Wang, W., & Bu, F. (2026). Object Detection Method Based on Polarimetric Features and PFOD-Net Under Adverse Weather Conditions. Applied Sciences, 16(4), 1698. https://doi.org/10.3390/app16041698

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Object Detection Method Based on Polarimetric Features and PFOD-Net Under Adverse Weather Conditions

Abstract

1. Introduction

2. Related Works

2.1. YOLOv8 Object Detection Method

2.2. Polarimetric Imaging Methods

2.3. Attention Mechanisms

2.4. Polarimetric Object Detection Datasets

3. Methods

3.1. Overall Model Architecture

3.2. PTFE Module

3.3. DCASPP Module

3.4. Lightweight Convolutions

4. Results

4.1. Implementation Details

4.1.1. Experimental Setup

4.1.2. Training Strategy

4.1.3. Loss Function and Post-Processing

4.1.4. Data Augmentation

4.2. Experiment Datasets

4.3. Evaluation Indicators

4.4. Validation of the Proposed Algorithm

4.4.1. Comparison of Different Image Sources

4.4.2. Comparison Between the Proposed Algorithm and SOTA Methods

4.4.3. Comparison of Different Lightweight Models

4.5. Ablation Experiments

4.6. Grad-CAM Visualization

4.7. Comparison of Detection Performance

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI