5.1. Improvement of Network Model
- (1)
AOD-Net model analysis
Considering the advantages of small size and good real-time performance of the lightweight network AOD-Net model, this method is embedded as a part of image preprocessing to achieve effective image dehazing. The AOD-Net dehazing model can be obtained by fusing the atmospheric scattering rate
t(
a,
b) with the atmospheric light value A into a variable. Divide both sides of Equation (14) by
t(
a,
b) at the same time:
Define
o as a deviation constant with a value of 1 and parameters
K(
a,
b) as:
Equation (15) can be deformed as:
The AOD-Net model integrates the atmospheric light value and transmittance through the above equations, simplifies the transformation relationship between foggy and non-foggy images, and reduces the error. In this case, as long as
K(
a,
b) is accurately estimated, a clear image can be generated. Five groups of convolution kernels of different sizes are used as shown in
Table 8 to extract multi-scale features, and the image dehazing effect is shown in
Figure 15. As can be seen from the figure, after the improvement of the formula of the atmospheric scattering model, the AOD-Net model integrates the atmospheric light value and transmittance into one parameter, which avoids the error caused by the separate estimation of the parameters and improves the accuracy of the calculation results. In addition, the AOD-Net network model is simple and has a low number of parameters, making it easy to embed in object detection tasks as a lightweight network. However, the feature maps obtained by different convolutions are stacked in the channel dimension only through the Concat layer, and the influence of different features on the dehazing performance is not considered. At the same time, due to the oversimplification of the structure of the AOD-Net network model, the color of the photo is distorted after dehazing. In addition, because the AOD-Net network model assigns the same weight to the pixel direction features during image dehazing, this processing method cannot reflect the uneven distribution of the actual haze across the image.
- (2)
Dual attention mechanism
In order to solve the above problems, both pixel and channel attention mechanisms are introduced into the AOD-Net model [
34], as shown in
Figure 16. In the pixel attention branch, the average pooling and max pooling operations shown in
Figure 16 are applied to the same input feature map in parallel along the channel dimension, rather than being arranged in a sequential manner. These two pooling operations capture complementary statistical information, where average pooling reflects global contextual responses and max pooling highlights salient local activations. The resulting feature maps are concatenated and subsequently transformed by a convolutional layer to generate the spatial attention map. This design is consistent with the original CBAM formulation and has been widely validated in previous studies. The attention mechanism works similarly to the human eye’s ability to selectively focus on a certain part of a cluttered diagram when looking at it. This mechanism can give different weights to different pixel and channel features, so that the network can self-regulate and focus on the key information of sensitive features. The network processes each pixel and channel feature in different ways, so that it can focus more on the parts that are beneficial to the pixel information and channel information, so as to enhance the learning ability of the network. At the same time, the mechanism can effectively process pixels in the thin haze area and the thick haze area, and solve the problem of dehazing of uneven haze images while improving the network performance and expression ability.
- (3)
Loss function optimization
After dehazing with the AOD-Net model, a large aperture artifact appears in the center of the image, as shown in
Figure 17. In order to solve this problem, the smoothness loss function is introduced as a penalty term on the basis of the original mean square error loss function. The Sobel operator is used to calculate the image gradient as the smoothness loss function, and the expression is as follows:
where gradient_
x and gradient_
y are the gradients of the image along the horizontal and vertical directions, respectively.
S is the dehazed image.
N is the number of pixels in the image. By fusing the smoothness loss function and the mean square error loss function, the final loss function is as follows:
where
loss_mse and
loss_gra are the mean square error loss and smoothness loss, respectively.
ϕ is the weight of the smoothness loss. The formula for
loss_mse is as follows:
where
m is the number of pixels in the image along the width.
n is the number of pixels in the height direction of the image.
I is a fog-free image.
5.2. Dehazing Effect Analysis
In order to solve the problem that it is difficult to collect clear images and foggy images in the same scene at the same time, three fog-free remote sensing datasets were selected as clear images, and the scattering model was used to construct a remote sensing image with fog, as shown in
Figure 18.
Table 9 lists the experimental environment configuration and network training parameters.
In order to verify the effectiveness of the improved dehazing method, the image dehazing results of four classical methods covering traditional methods and deep learning methods were compared. The traditional methods choose the dark channel prior dehazing method (DCP) based on prior knowledge strategy and the histogram equalization method based on statistical characteristics. The deep learning-based dehazing method selects the AOD-Net method and the Dehaze-Net method based on the atmospheric scattering model [
14]. As shown in
Table 10, the peak signal-to-noise ratio of the improved model can reach 21.095 and the structural similarity can reach 0.8418, and the dehazing effect is much higher than that of other methods. As can be seen from
Figure 19, the use of traditional dark channel prior or histogram equalization dehazing methods can result in color bias in the dehazed image.
Taking the dark channel prior and Dehaze-Net dehazing algorithm as examples, the reasons for the color distortion of the image after dehazing are analyzed. According to the prior theory of dark channel, the process of image dehazing is shown in
Figure 20, and it can be seen that the core of the dehazing algorithm is to obtain the atmospheric light value and transmittance. The atmospheric light value can be solved according to the dark channel theory. Firstly, the dark channel pixels of the foggy image are sorted decreasingly according to the gray value, and the first 0.001 pixels after sorting are selected and their coordinate positions are marked. Subsequently, the recorded coordinate information is used to find the pixels that match them in the initial foggy image. Ultimately, the maximum brightness of these pixels in the RGB channel is used as an estimate of the global atmospheric light value A. By screening and processing the key pixels of the foggy image, the method can accurately and efficiently estimate the atmospheric light in the image, which provides a basis for the subsequent dehazing operation.
Divide both sides of Equation (25) by the atmospheric light value A at the same time:
Doing two minimum filtering operations on both sides of Equation (23) yields:
Since
J in Equation (24) is the fog-free image to be solved, according to the dark channel theory, there is:
then the transmittance can be expressed as:
where
JC is the pixel value of any RGB channel for a fog-free image. The
IC is the pixel value of any RGB channel in a foggy image.
According to the above theory, it can be seen that in the process of obtaining the atmospheric light value of the image through the dark channel theory, the brightest point in the image may be located on the white object. Since the pixel values at these locations are much larger than the atmospheric pixel values in the distant sky region, the estimation error of atmospheric light and transmittance increases, distorting the color of the dehazed image. At the same time, according to the dark channel prior theory, when calculating the transmittance, setting the pixel value of the fog-free image after two minimum filters to 0 will cause the transmittance to be lower than the true value, and will also cause the color deviation of the image after dehazing. The image quality will be further degraded as errors accumulate and may magnify each other in the process of solving for atmospheric light and transmittance. As can be seen from
Figure 21, the image after dehazing by the Dehaze-Net method also has color distortion. The Dehaze-Net dehazing method first uses a convolutional neural network to obtain the transmittance map of the foggy image and then estimates the atmospheric light value according to the transmittance map. Similarly to the dark channel prior dehazing method, the Dehaze-Net method also suffers from the problem of bias in the estimation of atmospheric light due to the fact that bright areas such as the sky and white objects are mistaken for low transmittance.
In order to verify the effect of the improvement measures on the image dehazing performance, ablation experiments were carried out on the attention module and the optimization loss function on the test datasets. According to the results of the ablation experiment shown in
Table 11, the objective evaluation indicators of the original AOD-Net, PSNR and SSIM, were the lowest. When the smoothness loss function is introduced, the objective evaluation index PSNR is increased by nearly 1 dB, and the SSIM is increased from 0.755 to 0.836. When the dual attention mechanism is introduced, the objective evaluation index PSNR is increased by nearly 2 dB, and the SSIM is increased from 0.755 to 0.835. When the two improvements were introduced at the same time, the objective evaluation indicators were greatly improved, with the PSNR increasing by nearly 3 dB and the SSIM increasing from 0.755 to 0.842. Obviously, after improving the loss function and introducing the dual attention mechanism, the image dehazing performance has been significantly improved.
Although the current ablation experiments have validated the improvement in overall performance contributed by the dual-attention module, they have not separately analyzed the independent effects of the pixel-attention module and the channel-attention module. More detailed module-level ablation studies will help further reveal the specific contributions of each component and their potential redundancy. This paper regards this as a limitation of the current work and will treat systematic sub-module ablation analysis as an important direction for future research to support further optimization of the model architecture and enhancement of performance.
To evaluate the generalization performance of the proposed dehazing method under conditions closer to real-world scenarios, this study further conducted experiments on the public remote sensing image dehazing benchmark dataset Haze1k [
36]. This dataset provides real cloud- and fog-degraded images along with their corresponding haze-free references, making it suitable for fair comparisons of restoration methods. A representative qualitative comparison of the improved AOD-Net, the original AOD-Net, and Dehaze-Net on this dataset is shown in
Figure 22.
The experimental results indicate that, for real thin haze and non-uniform fog scenarios, the images restored by the proposed method are visually clearer and exhibit better detail preservation. This verifies the effectiveness of the network improvements described in
Section 5.1. Specifically, the dual-attention mechanism enhances the model’s ability to focus on regions with non-uniform haze distribution, thereby facilitating more accurate image reconstruction. Meanwhile, the introduction of the smoothness loss function effectively suppresses common edge halos and artifacts in restored images, improving the visual naturalness of the results. It is worth noting that the restoration performance of the proposed method is particularly prominent in the central regions of images, while a certain degree of performance degradation is observed near the image boundaries. This phenomenon can be mainly attributed to the combined effects of inherent edge optical distortions and reduced signal-to-noise ratios in remote sensing imaging systems, learning bias caused by the concentration of effective targets near image centers in the training data, and the relatively limited contextual information available to attention mechanisms at image boundaries. However, in practical ship detection scenarios, key targets are typically located in the central or main field-of-view regions of images. Therefore, the performance advantages of the proposed method in these core areas are of greater practical significance.
In addition, for regions in the dataset where signals are almost completely lost due to dense cloud cover, any single-image-based method finds it difficult to recover fully occluded information, and the proposed method also exhibits performance limitations under such conditions. Nevertheless, considering its overall performance on both synthetic data and real-world datasets, the proposed method demonstrates stable and significant enhancement effects for cloud and fog degradations that are physically recoverable. The restored results provide higher-quality inputs for subsequent target detection modules, thereby confirming the practical value of the proposed improvements.
In practical engineering deployments, vessel monitoring systems often need to operate stably over long periods under complex weather conditions; their primary objective is to ensure the reliability and robustness of detection results rather than merely pursuing very high processing frame rates. Unlike conventional single-stage object detection models, the method proposed in this paper adopts a cascaded processing pipeline comprising an initial image dehazing enhancement stage followed by object detection. While this design substantially improves detection accuracy under adverse meteorological conditions, it also inevitably increases the overall inference time.
Although this work introduces a lightweight backbone network (FasterNet) and efficient attention mechanisms to control model size and computational complexity to some extent, the overall FPS on resource-constrained embedded or edge computing platforms still exhibits a certain decline compared with the original YOLOv8s model. Therefore, in the absence of a unified target hardware platform and concrete application scenario constraints, we did not directly report fixed FPS metrics; instead, we focused on reporting model parameter counts and computational complexity (FLOPs) to provide more generalizable references for system design across different hardware environments.
To further narrow the gap between algorithmic research and engineering applications, it is necessary to discuss how the proposed joint dehazing and lightweight ship detection method can be integrated into operational maritime monitoring and information control systems. In real-world application scenarios, such systems typically comprise distributed sensing nodes, edge computing units, and centralized processing centers, and must operate stably under conditions of limited communication bandwidth, stringent real-time requirements, and fault-tolerance.
Benefiting from the lightweight characteristics of the proposed model, the method is well suited for deployment on edge-side devices, such as shipborne platforms, shore-based monitoring stations, or UAV-mounted systems, thereby enabling local real-time target detection and reducing the need to transmit raw image data. Detection results or compressed intermediate features need only be uploaded to upper-level nodes, which helps alleviate communication bandwidth pressure. Meanwhile, a central server can undertake tasks such as multi-source information fusion, long-term data storage, system-level decision making, and periodic model updates, realizing a hierarchical edge-center collaborative processing paradigm.
From a system-design perspective, the algorithm’s integration can follow common design principles used in information and control systems for critical infrastructure, including modular architectural design, hierarchical task allocation, redundancy and robustness provisions for communication outages. Similar methodologies have been validated in safety-critical domains such as power transmission line monitoring (Afanaseva and Tulyakov, 2025) [
37]. Although the concrete implementation of large-scale real-time monitoring systems lies beyond the scope of this study, the above discussion indicates that the proposed method has the potential to be engineered as a core algorithmic module within maritime monitoring systems.