Target Detection in Ship Remote Sensing Images Considering Cloud and Fog Occlusion

Shao, Xiaopeng; Wang, Zirui; Yang, Yang; Zheng, Shaojie; Mu, Jianwu

doi:10.3390/jmse14020124

Open AccessArticle

Target Detection in Ship Remote Sensing Images Considering Cloud and Fog Occlusion

by

Xiaopeng Shao

¹,

Zirui Wang

²,

Yang Yang

^2,*

,

Shaojie Zheng

² and

Jianwu Mu

²

¹

Marine Design and Research Institute of China, Shanghai 200011, China

²

School of Naval Architecture and Ocean Engineering, Dalian University of Technology, Dalian 116024, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2026, 14(2), 124; https://doi.org/10.3390/jmse14020124

Submission received: 29 November 2025 / Revised: 28 December 2025 / Accepted: 1 January 2026 / Published: 7 January 2026

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

The recognition of targets in ship remote sensing images is crucial for ship collision avoidance, military reconnaissance, and emergency rescue. However, climatic factors such as clouds and fog can obscure and blur remote sensing image targets, leading to missed and false detections in target detection. Therefore, it is necessary to study ship remote sensing target detection that considers the impact of cloud and fog occlusion. Due to the large scale and vast amount of information in remote sensing images, in order to achieve high-precision target detection based on limited resource platforms, a comparison of the detection accuracy and parameter quantity of the YOLO series algorithms was first conducted. Based on the analysis results, the YOLOv8s network model with the least number of parameters while ensuring detection accuracy was selected for lightweight network model improvement. The FasterNet was utilized to replace the backbone feature extraction network of YOLOv8s, and the detection accuracy and lightweight level of the resulting FN-YOLOv8s network model were both improved. Furthermore, structural improvements were made to the AOD-Net dehazing network. By introducing a smoothness loss function, the halo artifacts often generated during the image dehazing process were addressed. Meanwhile, by integrating the atmospheric light value and transmittance, the accumulation error was effectively reduced, significantly enhancing the dehazing effect of remote sensing images.

Keywords:

ship target detection; remote sensing images; lightweight; cloud and fog occlusion; smoothness loss function

1. Introduction

Accurate detection and localization of maritime vessels are critically important for applications such as search and rescue operations, maritime and port surveillance, collision avoidance, military reconnaissance, marine resource exploitation, enhancement of shipborne weapon efficiency, and intelligent perception systems for unmanned ships. In real-world scenarios, clouds and fog often obscure object contours and fine details, which negatively impacts the accuracy and robustness of object detection algorithms. In the actual environment, clouds and fog can blur the contours and details of the target object, thus affecting the accuracy and reliability of the object detection algorithm. Removing cloud and fog occlusion from images can improve image quality and improve target detection rate and accuracy, thereby enhancing the application value of remote sensing images in maritime monitoring and emergency rescue.

In practical applications, ship detection algorithms are often deployed on front-end platforms such as satellite onboard processors, UAV onboard computing units, or shipborne embedded systems. These platforms are constrained by hardware computing power, storage capacity, power budgets, and thermal conditions, making it difficult to run large detection models with high parameter counts and computational complexity directly. Furthermore, many real-time monitoring and emergency-response tasks impose strict requirements on algorithm inference speed. Therefore, designing lightweight models for resource-constrained deployment environments is clearly necessary from an engineering perspective. The objective of this study is, while ensuring that the model can operate efficiently on the aforementioned constrained platforms, to improve detection performance under cloud and fog degradation by refining the network architecture and introducing a dehazing module. This research direction is not intended to directly compare absolute performance with high-precision models designed for server-side deployment; rather, it aims to achieve an effective balance between accuracy and efficiency under real-world constraints. Based on this, it is very important to design a high-performance lightweight ship remote sensing image target detection model considering cloud and fog occlusion.

Object detection approaches are generally categorized into two-stage methods and single-stage methods. The two-stage object detection method first generates candidate regions on the image and extracts features through convolutional neural networks, followed by classification and regression. R-CNN [1] represents the earliest two-stage detection framework that incorporated convolutional neural networks into the object detection pipeline. Although this method improves the detection accuracy, it is accompanied by the problems of too many feature files and waste of computing resources. Fast R-CNN [2] optimizes R-CNN to achieve real-time detection, but it is time-consuming. Faster R-CNN [3] introduces RPN on the basis of Fast R-CNN to achieve an end-to-end detection mode, which greatly improves the detection speed. Despite their superior accuracy, two-stage detection methods are generally unsuitable for real-time scenarios because of their multi-stage processing pipeline. Compared with the two-stage object detection method, the single-stage object detection method has a faster detection speed while maintaining higher detection accuracy. Compared to the previous version, YOLOv5 [4] has not improved much in its structure. The feature fusion network adopts FPN-PAN structure, and uses shallow image information and deep semantic information to improve the detection ability of different sizes of objects. According to the design idea of RepVGG, YOLOv6 [5] forms a reparameterizable backbone network. This backbone network improves the detection accuracy of the network through the multi-branch structure in the training structure, and greatly reduces the inference time of the network by integrating the multi-branch structure into a single-branch structure in the inference stage. Building upon ELAN, YOLOv7 [6] introduces an E-ELAN backbone for feature extraction and an auxiliary detection head to further improve detection accuracy. Meanwhile, the detection speed has been improved by introducing structural reparameterization. YOLOv8 [7] takes the C2f module formed by the combination of C3 module and ELAN as the core component of the backbone feature extraction network, and uses the decoupled detection head to predict the category and positioning information, respectively, which ensures the convergence of the acceleration model and improves the detection accuracy.

At present, the commonly used image dehazing methods mainly include physical model-based methods and non-physical model-based image enhancement methods [8]. The method based on the physical model is based on the atmospheric scattering model, and the restoration of fog-free images is realized by solving the atmospheric light value and transmittance. This method mainly relies on manual extraction of foggy features, and has a certain dependence on a large number of prior knowledge, which makes it have great limitations in practical application. The image enhancement method based on non-physical model can improve the contrast of foggy images by extracting the internal information of fogged images for enhancement. Among them, histogram equalization [9] makes the distribution more evenly by adjusting the dynamic range of pixel values. However, detail information may be lost while enhancing contrast. The wavelet and homomorphic filtering algorithms [10] suppress the low-frequency features by decomposing the foggy image signal, so as to achieve the relative enhancement of the high-frequency features. However, dehazing images using this method may have artifacts or image distortion issues. The Retinex [11] algorithm includes a color consistency model based on the human visual system, which can ensure that the color information is not affected by the uneven light and enhance the image clarity. However, dehaze images using this method are prone to color distortion.

Effective image dehazing can also be achieved through convolutional neural networks, which can be divided into single-stage and two-stage algorithms. The single-stage dehazing algorithm takes the foggy image as input and directly outputs the fog-free image. The dehazing process of this method does not depend on the atmospheric scattering model. The Gated Context Aggregation Network (GCANet) proposed by Chen et al. [12] successfully solves the mesh artifact problem in end-to-end dehazing networks by introducing conditional generative adversarial networks. The CycleGAN dehazing network proposed by Engin et al. [13] effectively solves the problems of difficult acquisition and inaccurate parameter estimation by using recurrent neural adversarial networks. The above method has a good dehazing effect, but it has the problems of complex network structure and time-consuming calculation. The two-stage dehazing method based on the atmospheric scattering model calculates the atmospheric light value and transmittance through the convolutional neural network, and obtains clear images according to the parameter calculation results and the atmospheric scattering model. Cai et al. [14] used the proposed Dehaze-Net dehazing network to estimate the atmospheric light value and transmittance, and successfully achieved image dehazing. However, separate estimation of atmospheric light and transmittance can accumulate errors, leading to color distortion in the dehazed results. The AOD-Net dehazing model proposed by Li et al. [15] considers the atmospheric light value and transmittance as one parameter, which effectively reduces the multi-parameter cumulative error. Due to the simple structure of AOD-Net network, it is widely used in the field of lightweight. Embedding a dehazing module into a detector inevitably increases model complexity and computational cost when the dehazing network is parameter-heavy, thereby degrading real-time detection performance.

Both single-stage and two-stage object detection algorithms rely on backbone networks for feature extraction. As backbone networks deepen, architectural complexity and parameter counts increase, resulting in poor real-time performance on mobile and embedded platforms. In this case, the model lightweight problem needs to be considered. The research on network model lightweight mainly focuses on model compression using pruning algorithm and network structure lightweight. Early pruning studies employed Hessian matrix-based methods [16] to evaluate weight importance. And prune according to the weights, remove the smaller weights or set them to 0. Such methods demand substantial computational resources, limiting their widespread applicability. The Targeted Dropout method proposed by Srinivas [17] can reduce the computational effort by deciding whether to prune certain connections or parameters based on the gradient size during the training process. He et al. [18] proposed a pruning strategy based on feature importance evaluation. In this method, the feature importance of each channel is evaluated, and then the channel with the least contribution to network performance is removed to achieve efficient pruning of the model. The iterative pruning method proposed by Frankle et al. [19] requires training a very large neural network and gradually discovering and preserving important connections through iterative pruning and retraining. By removing redundant channels or weights, pruning algorithms significantly reduce computational cost and parameter count with minimal impact on accuracy. The network structure lightweight method can directly design a lightweight network structure to reduce the number of model parameters. Among these approaches, MobileNetv3 [20] employs depthwise separable convolutions to reduce parameters while maintaining effective feature extraction. The ShuffleNetv2 [21] network uses grouped convolution with sparsely connected channels. By dividing the different feature maps of the input layer into multiple groups and performing convolution operations within each group, the convolution operation can be reduced. However, MobileNetv3 and ShuffleNetv2 networks compensate for the loss of precision by increasing the network depth, which undoubtedly leads to high latency due to frequent memory access. The FasterNet [22] network reduces the amount of memory access through the partial convolution method and avoids the latency problem caused by excessive computational redundancy and memory access caused by convolution operations. Compared with model pruning methods, network structure lightweight usually has fewer parameters and a simpler structure, so it is easier to run on resource-constrained devices. In recent years, substantial progress has been made in the detection of rotated targets in remote sensing imagery. Representative works include Xie et al.’s Oriented R-CNN [23], which significantly improved localization accuracy on ship datasets such as HRSC2016 by introducing an oriented proposal generation mechanism and a rotated RoI representation; and Li et al.’s LSKNet [24], which employs a large selective-kernel mechanism to adaptively adjust the receptive field for modeling long-range contextual information, thereby achieving strong performance on remote sensing detection benchmarks. These studies indicate that orientation-aware proposal generation and large-scale context modeling are critical for ship detection in remote sensing images. However, such methods are typically designed to maximize detection accuracy and consequently rely on relatively complex network architectures and ample computational resources, which constrain their direct deployment on resource-limited platforms in terms of implementation cost and efficiency. By contrast, the objective of this work emphasizes model lightweighting under constrained computational conditions and the enhancement of robustness under degraded imaging via a cloud-and-fog denoising module; therefore, there are clear differences in methodological orientation and evaluation metrics between our approach and the aforementioned high-accuracy methods.

In summary, focusing on the deployment of target detection models on resource-limited equipment and the problem of missed detection and misdetection of ship targets in cloudy and foggy weather, a lightweight ship remote sensing image target detection algorithm considering cloud and fog occlusion is proposed. In order to solve the problem of missed detection and misdetection of ship targets in cloudy and foggy weather, the AOD-Net defogging model with simple model structure and small number of parameters was embedded into the target detection model. Considering that the AOD-Net model has a poor dehazing effect, a double attention mechanism and an optimized loss function are introduced to improve the dehazing effect without increasing too many parameters. In order to deploy the object detection model on resource-constrained devices, FasterNet was selected as the backbone feature extraction network. In order to solve the problem of decreasing detection accuracy of lightweight models, a selective attention mechanism and a CARAFE lightweight upsampling operator were introduced to improve the detection accuracy of the model. The research technical roadmap of the paper is shown in Figure 1.

2. Determination of Image Recognition Algorithms

HRSC2016 open-source dataset was used for ship target recognition in remote sensing images. It should be noted that HRSC2016 is a classical benchmark dataset for ship detection with oriented bounding boxes, and several works (including Oriented R-CNN and LSKNet) have reported relatively high mAP on this dataset. The above works primarily target research goals that pursue detection accuracy under ample computational resources, whereas the present study aims at resource-constrained platforms and cloud and fog degraded imaging scenarios, exploring an engineering solution that balances detection accuracy, model complexity and inference efficiency. Therefore, the experimental design of this paper focuses on evaluating lightweight backbones and the contribution of dehazing modules to detection robustness, as well as the trade-off between accuracy and complexity, rather than using absolute mAP ranking on a single benchmark as the sole evaluation criterion. This statement is intended to provide a clear evaluation baseline and research boundaries for subsequent comparisons and discussions.

As illustrated in Figure 2, the image sizes in the dataset range from 300 × 300 pixels to 1500 × 900 pixels, with spatial resolutions varying between 0.4 m and 2 m, and annotations provided in both OBB and HBB formats. Based on the HRSC2016 dataset, model training and testing were conducted on an RTX 3080 GPU workstation, and the corresponding system configuration and experimental hyperparameters are listed in Table 1. The detection performance of satellite remote sensing images of YOLO series target detection algorithms, such as 5 sizes of YOLOv5, 4 sizes of YOLOv6, 2 sizes of YOLOv7 and 5 sizes of YOLOv8, was compared. Figure 3 shows the changes in the average detection accuracy (mAP) and loss (Loss) of each algorithm with the number of trainings, and the detailed detection results of the average detection accuracy (mAP), floating-point arithmetic (GFLOPs) and parameters are shown in Table 2. Considering the detection results of each algorithm, the relatively lightweight YOLOv8s network was selected as the benchmark model for network model improvement.

3. Structural Design of Lightweight Networks

3.1. Lightweight Improvement of the Backbone Feature Extraction Network

In the YOLOv8s architecture, the backbone feature extraction module serves as the primary source of parameter generation, constituting its core computational component. Therefore, it is very important to lighten the backbone feature extraction network before deploying the YOLOv8s model on resource-limited devices. The backbone feature extraction network of YOLOv8s was replaced by three representative lightweight convolutional neural network structures, MobileNetv3, ShuffleNetv2 and FasterNet, respectively, and the average detection accuracy and parameter quantity after lightweight were compared. The MobileNetv3 network with deep separable convolution has good feature extraction performance while reducing the number of parameters. When convolution is performed using ordinary convolution, the number of channels in each convolution kernel is the same as the number of channels in the input feature map. In the case of depth-separable convolution, the number of channels per convolution kernel is 1. Therefore, the depth separable convolution can effectively reduce the number of parameters. The ShuffleNet network uses packet convolution with sparse channel connections. The amount of convolution operations is reduced by dividing the different feature maps of the input layer into multiple groups and performing convolution operations within each group [26]. Suppose the size of the input feature map is Cin × H × W, the size of the convolution kernel is k × k, and the size of the output feature map is Cout × H × W. When using grouped convolution, if the channels of the input feature map are divided into G groups, each group is relatively independent, and the number of output channels will be reduced to 1/G. Then, the output of all groups is spliced in the channel dimension, and the size of the convolution kernel should be (Cin/G) × k × k, and the corresponding parameter quantity of the grouping convolution processing the same dimension data is 1/G of the ordinary convolution parameter quantity. The FasterNet network has the characteristics of low latency and high output, and reduces the amount of memory access through the partial convolution method when reducing FLOPs to reduce the computational complexity, which avoids the latency caused by excessive computational redundancy and memory access caused by convolution operations.

To enhance the efficiency of YOLOv8s’s feature extraction backbone, its original network was supplanted by a lightweight convolutional neural network. The training outcomes of this modified architecture are visualized in Figure 4. Subsequently, the mAP, model parameters and floating-point operations after replacing different lightweight networks are calculated on the test set, and the results are shown in Table 3. As can be seen from the table, the number of parameters and floating-point operations of the YOLOv8s model with ShuffleNet as the backbone feature extraction network is much less than that of the MobileNetv3 network model and is close to that of the FasterNet model. However, the average detection accuracy of the YOLOv8s model with FasterNet as the backbone feature extraction network is higher than that of the other two models. Therefore, FasterNet was selected as the backbone feature extraction network for YOLOv8s. Figure 5 shows the detection effect of FN-YOLOv8s’s improved network model.

3.2. Improvement of FN-YOLOv8s Lightweight Object Detection Algorithm

(1): Selective attention mechanism

Since the surrounding environment can provide valuable clues such as the shape, orientation, or other characteristics of the target, the successful identification of small-scale objects in remote sensing images is inseparable from their environmental background information. At the same time, due to the difference in the range of information required for recognition of different object types, the context information of the object type needs to be considered to accurately detect the target of remote sensing images. However, the existing object detection methods mainly focus on generating bounding boxes that accurately match the detected objects, but often ignore the prior knowledge in the above key remote sensing images. In order to solve the above problems, the LSK [27] attention module was introduced in front of the three detection heads of the FN-YOLOv8s model. The LSK module first expands the receptive field through two concatenated convolutional layers, so that the network can obtain sufficient background information. Then, the weights of different pixels are obtained through the spatial attention mechanism, so that the network can adaptively select the size of the receptive field. Figure 6 shows the schematic diagram of the improved FN-LSK-YOLOv8s network model.

The improved network model is used for dataset training, and the training process is shown in Figure 7. In this case, the average detection accuracy, parameter quantity, and floating-point arithmetic amount are shown in Table 4. As can be seen from the table, the detection accuracy of the model after the introduction of LSK attention module is improved by 0.6 percentage points. In order to more intuitively illustrate the improvement effect of the network model, the features of randomly selected images in the test set were visualized and represented in the form of heat maps, as shown in Figure 8. The darker the red part of the heat map, the more attention the model has paid to this part of the image, and the yellow part indicates that the image has less attention. The blue part indicates that there is less attention to the image, which means that the model considers this part to be redundant information. It can be seen that the introduction of the attention mechanism can make the model pay more attention to the image features of the target area of the ship.

(2): Upsampling operator optimization

YOLOv8 uses the nearest neighbor interpolation upsampling method to up-dimensionally upgrade the deep features to complete the fusion with the shallow features. However, this sampling method only determines the upsampling kernel through the spatial position of pixels, which is difficult to obtain sufficient semantic information and has a small receptive field, which is not conducive to the network accurately distinguishing different types of ship targets. Considering that the CARAFE operator [28] (Content-Aware ReAssembly of Features) can expand the receptive field without increasing too many parameters and computational costs, and effectively integrate the context information in the feature map to prevent the loss of feature information, the CARAFE upsampling operator is introduced to replace the nearest neighbor sampling.

The CARAFE operator consists of an upsampling kernel prediction module and a feature recombination module. The upsampled kernel prediction module predicts the upsampled kernels at different locations corresponding to the feature points through in-depth analysis and coding of the input feature map. The feature recombination module uses the upsampling kernel in the upsampling kernel prediction module to complete the upsampling task. Assuming that the size of the input feature map χ is H × W × C, and the upsampling magnification is set to be σ, then the size of the new feature map χ is σH × σW × C. In order to effectively reduce the number of parameters and the computational cost of subsequent analysis, the upsampling kernel prediction module first uses the 1 × 1 convolution kernel to reduce the number of channels of the input feature map to C_m. A Kup × Kup upsample kernel is then generated using a convolution kernel of size Kencode, with the number of channels input being C_m and the number of channels output being σ × σ × Kup × Kup. Furthermore, the channel dimension is expanded in the spatial dimension to obtain the upsampling kernel of size σH × σW × Kup × Kup. Finally, in order to ensure that the weight sum of the convolution kernels is 1, the convolution kernels with Kup × Kup for each channel are normalized using the softmax function. The feature reassembly module first determines the corresponding position of each pixel in the output feature in the input feature map. And take out the area with the size of Kup × Kup as the center of the corresponding pixel. Finally, the results are obtained by convolution operation between the predicted upsampling kernel and the region. Different channels at the same position in the input feature map all use the same upsampling kernel, which reduces the computational complexity while achieving efficient upsampling, and provides a solution for the effective training and performance improvement of neural networks. The CARAFE operator is replaced with the nearest neighbor upsampling in the improved FN-LSK-YOLOv8s network model, and the network structure is shown in Figure 9.

The size of the encoder convolution kernel Kencode and the upsampling kernel Kup in the CARAFE operator were adjusted to analyze their influence on the network detection performance. As can be seen from Table 5, the detection performance of the network is the best when Kencoder = 3 and Kup = 5. The mAP and loss function curves for the training process are shown in Figure 10. Comparing the average detection accuracy, parameters and floating-point arithmetic of the network model before and after the improvement (Table 6), it can be seen that the mAP of the improved model is increased by 1.5 percentage points, indicating that the above improvement measures are effective. Although the improved YOLOv8s model maintains high detection accuracy while being lightweight, there is a problem of missed detection in the case of cloud and fog occlusion, as shown in Figure 11. Obviously, it is necessary to carry out dehazing before the target image detection.

The above results indicate that the introduction of dehazing processing plays an important role in mitigating missed detections of ships under cloud and fog occlusion. However, even after image enhancement and dehazing pre-processing, some targets remain undetected in certain extreme scenarios. This suggests that missed detections are not solely caused by a decline in overall image clarity but are also closely related to factors such as object scale, preservation of local features, and interference from complex backgrounds.

To further understand the model’s detection behavior under complex meteorological conditions, it is necessary to perform a qualitative analysis of these typical missed-detection samples. Observation of the missed samples in Figure 11 reveals that missed detections are primarily concentrated in the following cases: first, when ships appear as small-scale objects in the image, the available structural and texture information is limited and they are easily submerged by the surrounding sea background; second, under heavy fog or severe cloud occlusion, although overall contrast may be improved, the local contours of the targets remain insufficiently clear; third, under complex sea-state conditions, high similarity in luminance and morphology between ship edges and background textures can also interfere with the discriminative capability of the detection network. The above analysis indicates that dehazing pre-processing provides a necessary prerequisite for ship detection under complex meteorological conditions, but in extreme scenarios it remains necessary to combine more discriminative feature modeling and multi-scale information fusion strategies to further reduce the miss rate.

4. Image Dehazing Theory

4.1. Atmospheric Scattering Model

The scattering effect attenuates the radiant intensity of light in its initial path and increases its intensity in other paths. Although this effect is not very significant, from the remote sensing data, it is clear that the increase in solar radiation introduces a diffuse component that changes the nature of the reflected light. Therefore, the light entering the imaging system includes not only the light reflected by the object, but also the scattered light. This introduces more noise cost into the signal, which in turn reduces the quality of the remote sensing image. The phenomenon of scattering is essentially a diffraction phenomenon produced by the interaction of electromagnetic waves with atmospheric particles during propagation.

The atmospheric scattering model explains the root cause of image quality degradation from a mathematical point of view and provides an important theoretical basis for image dehazing. Therefore, the establishment of a reasonable atmospheric scattering model is a key part of image dehazing [29,30,31]. The atmospheric scattering model mainly includes the incident light attenuation model and the atmospheric light imaging model. The incident light attenuation model can describe the residual intensity of the reflected light from the target after it has weakened during propagation. The Atmospheric Light Imaging model is able to describe the intensity of atmospheric light that arrives at the imaging device.

(1): Incident light attenuation model

As shown in Figure 12, the reflected light of the target collides with small substances such as air particles when propagating, causing a portion of the light to be shifted and reducing the intensity of the light received by the human eye or image collector. Assuming that the atmospheric medium consists of an infinite number of thin sheets of dielectric with a thickness of dx, the amount of intensity change after scattering when an incident ray of intensity E(0, λ) propagates through each thin sheet of dielectric can be expressed as:

d E (x, λ) = - E (x, λ) β (λ) d x

(1)

where x is the distance between the target object and the imaging device, i.e., the depth of field. E(x, λ) is the intensity of the incident beam at x. λ is the wavelength of the incident light. β is the atmospheric scattering coefficient, which describes the scattering ability of atmospheric particles to light different wavelengths.

Integrating Equation (1) in the interval of x = [0, d], we can get the attenuated light intensity as:

\int_{0}^{d} \frac{d E (x, λ)}{E (x, λ)} = \int_{0}^{d} - β (λ) d x

(2)

From Equation (2), it can be obtained:

\ln \frac{E (d, λ)}{E (0, λ)} = - β (λ) d

(3)

E (d, λ) = E (0, λ) e^{- β (λ) d}

(4)

where E(0, λ) is the intensity of the incident light that is not affected by atmospheric particles. E(d, λ) is the intensity of the incident light that reaches the image collector after being scattered by atmospheric particles. According to Equation (4), it can be seen that the intensity of the light after being scattered by atmospheric particles has a negative exponential relationship with the depth of field. In other words, as the depth of field increases, the intensity of the light gradually decreases.

(2): Atmospheric light imaging model

As shown in Figure 13, the scattering effect of suspended particles causes the ambient light propagation path to coincide with the incident light propagation path, which can affect the entire imaging workflow [30]. This process can be clearly illustrated by atmospheric light imaging models. At the position where the depth of field is x, a cone is selected to represent the atmospheric light source, and the cone volume element can be expressed as:

d v = x^{2} d ω d x

(5)

where dv is the volume of the microelement. dx is the length of the microelement in the direction of the height of the cone. dω The angle between the edge of the target object and the acute angle in the horizontal direction. Due to the scattering effect, the light intensity obtained by this volume element is:

d I (x, λ) = ε β (λ) d v = ε β (λ) d ω x^{2} d x

(6)

where ε is the scale factor of the light source, which is used to describe the characteristics of the external light source. dI(x, λ) is the light intensity of the volume microelement dv. According to the dissipation formula of the point light source, assuming that the scattering attenuation of atmospheric suspended particles is not considered, the illumination intensity per unit beam observation angle that can reach the imaging device is calculated as follows:

d G (x, λ) = \frac{d I (x, λ)}{d ω x^{2}} = ε β (λ) d x

(7)

where dG(x, λ) is the illumination intensity of the unit beam observation angle after being dissipated by the point light source. Considering the energy attenuation of atmospheric light during propagation, the intensity of the light that finally reaches the imaging device is obtained by substituting Equation (7) into Equation (4):

d L (x, λ) = d G (x, λ) e^{- β (λ) x} = ε β (λ) e^{- β (λ) x} d x

(8)

Integrating Equation (8) in the interval of x = [0, d] yields:

\int_{0}^{d} d L (x, λ) = \int_{0}^{d} ε β (λ) e^{- β (λ) x} d x

(9)

L (d, λ) = ε (1 - e^{- β (λ) d})

(10)

When x = 0, L(0, λ) = 0. When the depth of the scene is 0, the imaging process is almost unaffected by atmospheric light. When the scene depth is infinite:

ε = E (\infty, λ)

(11)

Therefore, Equation (10) can also be expressed as:

L (d, λ) = E (\infty, λ) (1 - e^{- β (λ) d})

(12)

(3): Atmospheric scattering model

Combined with the incident light attenuation model and the atmospheric light imaging model, the atmospheric scattering model can be obtained by adding Equations (4) and (12):

E_{t} (d, λ) = E (0, λ) e^{- β (λ) d} + E (\infty, λ) (1 - e^{- β (λ) d})

(13)

As shown in Figure 14, the numerical value of each pixel in the image represents the brightness of that pixel. The higher the number, the brighter the corresponding pixels. Therefore, the atmospheric scattering model can be applied to the field of image dehazing, where the value of each pixel represents the intensity of the reflected light at that point. E(0, λ) in Equation (13) can be represented by the fog-free image J(a, b). (a, b) are the pixel coordinates on the image. E_t(d, λ) can be represented by foggy image I(a, b). E(∞, λ) can be expressed by the atmospheric light value A. e^-β^(λ)d can be expressed by transmittance t(a, b). The final atmospheric scattering model can be expressed as:

I (a, b) = J (a, b) t (a, b) + A (1 - t (a, b))

(14)

4.2. Atmospheric Scattering Coefficient

The β of the atmospheric scattering coefficient can be solved based on the type of atmospheric scattering, by using the ratio coefficient of particle scale and incident wavelength ξ = 2πr/λ as the discriminant index (atmospheric particle radius r). The main types of atmospheric scattering are Rayleigh scattering, Mie scattering and non-selective scattering. Rayleigh scattering is mainly caused by molecules in the atmosphere when the radiation wavelength is longer than the diameter of the particles in the atmospheric medium (ξ ≪ 1). The scattering coefficient of Rayleigh scattering is proportional to the fourth power of frequency

β (λ) \propto λ^{- 4}

, where λ is the radiation wavelength. Mie scattering [33] occurs when the diameter of the particles in the atmosphere is comparable to the wavelength of the radiation wave (1 ≤ ξ < 50). Particulate matter in the atmosphere is the main cause of this scattering phenomenon. In this case, the scattering coefficient is positively correlated with the quadratic of the light frequency

β (λ) \propto λ^{- 2}

. Mie scattering also shows obvious directionality, with scattering being more intense in the forward direction of the light than in the backward direction. Non-selective scattering occurs when the radiation wavelength is smaller than the diameter of the particle in the atmospheric medium (50 ≤ ξ). This scattering coefficient is wavelength-independent, and the scattering coefficient is the same for all wavelengths. Obviously, the size of atmospheric particles not only affects the type of weather but also determines the type of scattering. Atmospheric particle sizes for common weather types are shown in Table 7. Compared with the image quality of sunny days, the clarity of cloudy weather images is significantly reduced, and the target information may be obscured, making it difficult to obtain key information. The radiation wavelength range of visible remote sensing images is generally 0.38 < λ < 0.76. Based on the cloud and foggy particle radius shown in Table 7, the scale factor can be calculated in the range of 8.26 < ξ < 82. Obviously, there is both Mie scattering and non-selective scattering in cloudy weather. Since Mie scattering is dominant, it is β to choose the Mie scattering formula to calculate the scattering coefficient.

5. Deep Learning-Based Image Dehazing

5.1. Improvement of Network Model

(1): AOD-Net model analysis

Considering the advantages of small size and good real-time performance of the lightweight network AOD-Net model, this method is embedded as a part of image preprocessing to achieve effective image dehazing. The AOD-Net dehazing model can be obtained by fusing the atmospheric scattering rate t(a, b) with the atmospheric light value A into a variable. Divide both sides of Equation (14) by t(a, b) at the same time:

J (a, b) = \frac{I (a, b) - A}{t (a, b)} + A

(15)

Define o as a deviation constant with a value of 1 and parameters K(a, b) as:

K (a, b) = \frac{I (a, b) - A + t (a, b) (A - ο)}{t (a, b) [I (a, b) - 1]}

(16)

Equation (15) can be deformed as:

J (a, b) = K (a, b) I (a, b) - K (a, b) + ο

(17)

The AOD-Net model integrates the atmospheric light value and transmittance through the above equations, simplifies the transformation relationship between foggy and non-foggy images, and reduces the error. In this case, as long as K(a, b) is accurately estimated, a clear image can be generated. Five groups of convolution kernels of different sizes are used as shown in Table 8 to extract multi-scale features, and the image dehazing effect is shown in Figure 15. As can be seen from the figure, after the improvement of the formula of the atmospheric scattering model, the AOD-Net model integrates the atmospheric light value and transmittance into one parameter, which avoids the error caused by the separate estimation of the parameters and improves the accuracy of the calculation results. In addition, the AOD-Net network model is simple and has a low number of parameters, making it easy to embed in object detection tasks as a lightweight network. However, the feature maps obtained by different convolutions are stacked in the channel dimension only through the Concat layer, and the influence of different features on the dehazing performance is not considered. At the same time, due to the oversimplification of the structure of the AOD-Net network model, the color of the photo is distorted after dehazing. In addition, because the AOD-Net network model assigns the same weight to the pixel direction features during image dehazing, this processing method cannot reflect the uneven distribution of the actual haze across the image.

(2): Dual attention mechanism

In order to solve the above problems, both pixel and channel attention mechanisms are introduced into the AOD-Net model [34], as shown in Figure 16. In the pixel attention branch, the average pooling and max pooling operations shown in Figure 16 are applied to the same input feature map in parallel along the channel dimension, rather than being arranged in a sequential manner. These two pooling operations capture complementary statistical information, where average pooling reflects global contextual responses and max pooling highlights salient local activations. The resulting feature maps are concatenated and subsequently transformed by a convolutional layer to generate the spatial attention map. This design is consistent with the original CBAM formulation and has been widely validated in previous studies. The attention mechanism works similarly to the human eye’s ability to selectively focus on a certain part of a cluttered diagram when looking at it. This mechanism can give different weights to different pixel and channel features, so that the network can self-regulate and focus on the key information of sensitive features. The network processes each pixel and channel feature in different ways, so that it can focus more on the parts that are beneficial to the pixel information and channel information, so as to enhance the learning ability of the network. At the same time, the mechanism can effectively process pixels in the thin haze area and the thick haze area, and solve the problem of dehazing of uneven haze images while improving the network performance and expression ability.

(3): Loss function optimization

After dehazing with the AOD-Net model, a large aperture artifact appears in the center of the image, as shown in Figure 17. In order to solve this problem, the smoothness loss function is introduced as a penalty term on the basis of the original mean square error loss function. The Sobel operator is used to calculate the image gradient as the smoothness loss function, and the expression is as follows:

l o s s_g r a = \frac{1}{N} \sum_{i = 0}^{N - 1} \sqrt{{(g r a d i e n t_x_{i})}^{2} + {(g r a d i e n t_y_{i})}^{2}}

(18)

g r a d i e n t_x = [\begin{matrix} - 1 & 0 & 1 \\ - 2 & 0 & 2 \\ - 1 & 0 & 1 \end{matrix}] \times S

(19)

g r a d i e n t_y = [\begin{matrix} - 1 & - 2 & - 1 \\ 0 & 0 & 0 \\ 1 & 2 & 1 \end{matrix}] \times S

(20)

where gradient_x and gradient_y are the gradients of the image along the horizontal and vertical directions, respectively. S is the dehazed image. N is the number of pixels in the image. By fusing the smoothness loss function and the mean square error loss function, the final loss function is as follows:

l o s s_t o t a l = l o s s_m s e + ϕ l o s s_g r a

(21)

where loss_mse and loss_gra are the mean square error loss and smoothness loss, respectively. ϕ is the weight of the smoothness loss. The formula for loss_mse is as follows:

l o s s_m s e = \frac{1}{N} \sum_{i = 0}^{m - 1} \sum_{j = 0}^{n - 1} \sqrt{{[S (i, j) - I (i, j)]}^{2}}

(22)

where m is the number of pixels in the image along the width. n is the number of pixels in the height direction of the image. I is a fog-free image.

5.2. Dehazing Effect Analysis

In order to solve the problem that it is difficult to collect clear images and foggy images in the same scene at the same time, three fog-free remote sensing datasets were selected as clear images, and the scattering model was used to construct a remote sensing image with fog, as shown in Figure 18. Table 9 lists the experimental environment configuration and network training parameters.

In order to verify the effectiveness of the improved dehazing method, the image dehazing results of four classical methods covering traditional methods and deep learning methods were compared. The traditional methods choose the dark channel prior dehazing method (DCP) based on prior knowledge strategy and the histogram equalization method based on statistical characteristics. The deep learning-based dehazing method selects the AOD-Net method and the Dehaze-Net method based on the atmospheric scattering model [14]. As shown in Table 10, the peak signal-to-noise ratio of the improved model can reach 21.095 and the structural similarity can reach 0.8418, and the dehazing effect is much higher than that of other methods. As can be seen from Figure 19, the use of traditional dark channel prior or histogram equalization dehazing methods can result in color bias in the dehazed image.

Taking the dark channel prior and Dehaze-Net dehazing algorithm as examples, the reasons for the color distortion of the image after dehazing are analyzed. According to the prior theory of dark channel, the process of image dehazing is shown in Figure 20, and it can be seen that the core of the dehazing algorithm is to obtain the atmospheric light value and transmittance. The atmospheric light value can be solved according to the dark channel theory. Firstly, the dark channel pixels of the foggy image are sorted decreasingly according to the gray value, and the first 0.001 pixels after sorting are selected and their coordinate positions are marked. Subsequently, the recorded coordinate information is used to find the pixels that match them in the initial foggy image. Ultimately, the maximum brightness of these pixels in the RGB channel is used as an estimate of the global atmospheric light value A. By screening and processing the key pixels of the foggy image, the method can accurately and efficiently estimate the atmospheric light in the image, which provides a basis for the subsequent dehazing operation.

Divide both sides of Equation (25) by the atmospheric light value A at the same time:

\frac{I (a, b)}{A} = \frac{J (a, b) t (a, b)}{A} + 1 - t (a, b)

(23)

Doing two minimum filtering operations on both sides of Equation (23) yields:

\min_{y \in Ω (e)} (\min_{C \in (R, G, B)} \frac{I^{C} (y)}{A}) = \min_{y \in Ω (e)} (\min_{C \in (R, G, B)} \frac{J^{C} (y)}{A}) t (a, b) + 1 - t (a, b)

(24)

Since J in Equation (24) is the fog-free image to be solved, according to the dark channel theory, there is:

\min_{y \in Ω (e)} (\min_{C \in (R, G, B)} \frac{J^{C} (y)}{A}) \to 0

(25)

then the transmittance can be expressed as:

t (a, b) = 1 - \min_{y \in Ω (e)} (\min_{C \in (R, G, B)} \frac{I^{C} (y)}{A})

(26)

where J^C is the pixel value of any RGB channel for a fog-free image. The I^C is the pixel value of any RGB channel in a foggy image.

According to the above theory, it can be seen that in the process of obtaining the atmospheric light value of the image through the dark channel theory, the brightest point in the image may be located on the white object. Since the pixel values at these locations are much larger than the atmospheric pixel values in the distant sky region, the estimation error of atmospheric light and transmittance increases, distorting the color of the dehazed image. At the same time, according to the dark channel prior theory, when calculating the transmittance, setting the pixel value of the fog-free image after two minimum filters to 0 will cause the transmittance to be lower than the true value, and will also cause the color deviation of the image after dehazing. The image quality will be further degraded as errors accumulate and may magnify each other in the process of solving for atmospheric light and transmittance. As can be seen from Figure 21, the image after dehazing by the Dehaze-Net method also has color distortion. The Dehaze-Net dehazing method first uses a convolutional neural network to obtain the transmittance map of the foggy image and then estimates the atmospheric light value according to the transmittance map. Similarly to the dark channel prior dehazing method, the Dehaze-Net method also suffers from the problem of bias in the estimation of atmospheric light due to the fact that bright areas such as the sky and white objects are mistaken for low transmittance.

In order to verify the effect of the improvement measures on the image dehazing performance, ablation experiments were carried out on the attention module and the optimization loss function on the test datasets. According to the results of the ablation experiment shown in Table 11, the objective evaluation indicators of the original AOD-Net, PSNR and SSIM, were the lowest. When the smoothness loss function is introduced, the objective evaluation index PSNR is increased by nearly 1 dB, and the SSIM is increased from 0.755 to 0.836. When the dual attention mechanism is introduced, the objective evaluation index PSNR is increased by nearly 2 dB, and the SSIM is increased from 0.755 to 0.835. When the two improvements were introduced at the same time, the objective evaluation indicators were greatly improved, with the PSNR increasing by nearly 3 dB and the SSIM increasing from 0.755 to 0.842. Obviously, after improving the loss function and introducing the dual attention mechanism, the image dehazing performance has been significantly improved.

Although the current ablation experiments have validated the improvement in overall performance contributed by the dual-attention module, they have not separately analyzed the independent effects of the pixel-attention module and the channel-attention module. More detailed module-level ablation studies will help further reveal the specific contributions of each component and their potential redundancy. This paper regards this as a limitation of the current work and will treat systematic sub-module ablation analysis as an important direction for future research to support further optimization of the model architecture and enhancement of performance.

To evaluate the generalization performance of the proposed dehazing method under conditions closer to real-world scenarios, this study further conducted experiments on the public remote sensing image dehazing benchmark dataset Haze1k [36]. This dataset provides real cloud- and fog-degraded images along with their corresponding haze-free references, making it suitable for fair comparisons of restoration methods. A representative qualitative comparison of the improved AOD-Net, the original AOD-Net, and Dehaze-Net on this dataset is shown in Figure 22.

The experimental results indicate that, for real thin haze and non-uniform fog scenarios, the images restored by the proposed method are visually clearer and exhibit better detail preservation. This verifies the effectiveness of the network improvements described in Section 5.1. Specifically, the dual-attention mechanism enhances the model’s ability to focus on regions with non-uniform haze distribution, thereby facilitating more accurate image reconstruction. Meanwhile, the introduction of the smoothness loss function effectively suppresses common edge halos and artifacts in restored images, improving the visual naturalness of the results. It is worth noting that the restoration performance of the proposed method is particularly prominent in the central regions of images, while a certain degree of performance degradation is observed near the image boundaries. This phenomenon can be mainly attributed to the combined effects of inherent edge optical distortions and reduced signal-to-noise ratios in remote sensing imaging systems, learning bias caused by the concentration of effective targets near image centers in the training data, and the relatively limited contextual information available to attention mechanisms at image boundaries. However, in practical ship detection scenarios, key targets are typically located in the central or main field-of-view regions of images. Therefore, the performance advantages of the proposed method in these core areas are of greater practical significance.

In addition, for regions in the dataset where signals are almost completely lost due to dense cloud cover, any single-image-based method finds it difficult to recover fully occluded information, and the proposed method also exhibits performance limitations under such conditions. Nevertheless, considering its overall performance on both synthetic data and real-world datasets, the proposed method demonstrates stable and significant enhancement effects for cloud and fog degradations that are physically recoverable. The restored results provide higher-quality inputs for subsequent target detection modules, thereby confirming the practical value of the proposed improvements.

In practical engineering deployments, vessel monitoring systems often need to operate stably over long periods under complex weather conditions; their primary objective is to ensure the reliability and robustness of detection results rather than merely pursuing very high processing frame rates. Unlike conventional single-stage object detection models, the method proposed in this paper adopts a cascaded processing pipeline comprising an initial image dehazing enhancement stage followed by object detection. While this design substantially improves detection accuracy under adverse meteorological conditions, it also inevitably increases the overall inference time.

Although this work introduces a lightweight backbone network (FasterNet) and efficient attention mechanisms to control model size and computational complexity to some extent, the overall FPS on resource-constrained embedded or edge computing platforms still exhibits a certain decline compared with the original YOLOv8s model. Therefore, in the absence of a unified target hardware platform and concrete application scenario constraints, we did not directly report fixed FPS metrics; instead, we focused on reporting model parameter counts and computational complexity (FLOPs) to provide more generalizable references for system design across different hardware environments.

To further narrow the gap between algorithmic research and engineering applications, it is necessary to discuss how the proposed joint dehazing and lightweight ship detection method can be integrated into operational maritime monitoring and information control systems. In real-world application scenarios, such systems typically comprise distributed sensing nodes, edge computing units, and centralized processing centers, and must operate stably under conditions of limited communication bandwidth, stringent real-time requirements, and fault-tolerance.

Benefiting from the lightweight characteristics of the proposed model, the method is well suited for deployment on edge-side devices, such as shipborne platforms, shore-based monitoring stations, or UAV-mounted systems, thereby enabling local real-time target detection and reducing the need to transmit raw image data. Detection results or compressed intermediate features need only be uploaded to upper-level nodes, which helps alleviate communication bandwidth pressure. Meanwhile, a central server can undertake tasks such as multi-source information fusion, long-term data storage, system-level decision making, and periodic model updates, realizing a hierarchical edge-center collaborative processing paradigm.

From a system-design perspective, the algorithm’s integration can follow common design principles used in information and control systems for critical infrastructure, including modular architectural design, hierarchical task allocation, redundancy and robustness provisions for communication outages. Similar methodologies have been validated in safety-critical domains such as power transmission line monitoring (Afanaseva and Tulyakov, 2025) [37]. Although the concrete implementation of large-scale real-time monitoring systems lies beyond the scope of this study, the above discussion indicates that the proposed method has the potential to be engineered as a core algorithmic module within maritime monitoring systems.

6. Conclusions

In order to realize the target detection of remote sensing images on the resource-constrained equipment platform, the object detection network is lightweight improved, in order to effectively reduce the number of parameters of the model while ensuring the detection accuracy. At the same time, in order to solve the problem of cloud and fog occlusion of ship remote sensing images, the structure and loss function of AOD-Net network are improved, which greatly improves the dehazing effect of remote sensing images. It should be further emphasized that the method proposed in this paper is not intended to replace high-precision rotated object detection frameworks represented by Oriented R-CNN and LSKNet; on the contrary, this work focuses on achieving practical detection capability under cloud and fog degradation through lightweight network design and a dehazing–detection collaborative strategy in deployment environments with constrained computation and storage. Based on this research positioning, the experimental evaluation in this paper simultaneously considers detection performance, model parameter count, and inference efficiency, thereby providing actionable design references for engineering deployment scenarios. The main conclusions are as follows:

(1): By replacing the backbone feature extraction network of YOLOv8s with FasterNet, the number of parameters is reduced to 50%. At the same time, the average detection accuracy of the lightweight model is better than that of replacing MobileNetv3 and Shufflev2 with the backbone feature extraction network, respectively. Replacing the nearest neighbor upsampling in the feature fusion network with the CARAFE operator can effectively integrate the context information in the feature map by increasing the receptive field without increasing too many parameters and computational costs and prevent the loss of feature information. Adding a selective attention mechanism to the three object detection layers enables the network to fully obtain background information and adaptively select the size of the receptive field.
(2): The dual attention mechanism of channel attention and pixel attention is introduced into the third Conform layer in the AOD-Net network structure, which can make the network assign different weights to important feature channels and pixels during the training process. This method can reduce the loss of feature information and make the network fully learn the key information, so as to improve the dehazing effect.
(3): The introduction of the smoothness loss function can make the model pay more attention to the overall smoothness of the image and avoid enhancing the local details. The improved network model can reduce abrupt mutations and areas of over-enhancement and effectively solve the aperture artifacts that occur when the image is dehazed. At the same time, the image dehazing effect is difficult to meet the requirements due to the cumulative error caused by the estimation of atmospheric light value and transmittance, respectively. By integrating atmospheric light value and transmittance, error accumulation can be effectively reduced and the dehazing effect can be improved.
(4): The lightweight ship detection and dehazing collaborative framework proposed in this study achieves relatively satisfactory detection performance under complex meteorological conditions; however, several clear limitations still exist. First, the dehazing process is sensitive to the noise level of the input images, and under low signal-to-noise ratio conditions, it may amplify the original imaging noise, thereby causing certain interference to subsequent feature extraction. Second, the model’s performance is closely related to the observation distance. It performs best under moderate optical thickness conditions, whereas the overall performance degrades in long-range scenarios with severe signal attenuation or in very short-range scenarios where the degradation is not pronounced. Third, the proposed method is mainly designed to address atmospheric haze interference. For other common types of degradation in satellite imagery, such as heavy cloud occlusion or strong specular reflection from the sea surface, whose imaging mechanisms differ from those of haze degradation, the current framework has not yet incorporated dedicated modeling and suppression mechanisms. As a result, its adaptability under the aforementioned complex interference conditions still needs further improvement. Finally, in practical applications, due to the difficulty of obtaining true haze-free reference images, the current evaluation of the reliability of dehazing results mainly relies on the indirect improvement of downstream ship detection performance, and a unified and objective no-reference quality assessment standard is still lacking. Future research will focus on the construction of no-reference dehazing quality evaluation metrics, as well as on robustness enhancement methods that integrate observation distance priors and noise modeling, in order to further improve the generalization capability and engineering practicality of the system in complex marine environments.

Author Contributions

Conceptualization, X.S., S.Z. and Y.Y.; methodology, X.S., S.Z., Y.Y. and Z.W.; software, S.Z., Z.W. and J.M.; writing—original draft preparation, X.S., Z.W., S.Z. and Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. [Google Scholar]
Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; Kwon, Y.; Michael, K.; Fang, J.; Wong, C.; Yifu, Z.; Montes, D.; et al. Ultralytics/Yolov5: V6.2—YOLOv5 Classification Models, Apple M1, Reproducibility, ClearML and Deci.ai Integrations. Zenodo, 2022, Record 7002879. Available online: https://zenodo.org/records/7002879 (accessed on 27 December 2025).
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar] [CrossRef]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Xiao, B.; Nguyen, M.; Yan, W.Q. Fruit ripeness identification using YOLOv8 model. Multimedia Tools Appl. 2023, 83, 28039–28056. [Google Scholar] [CrossRef]
Wu, D.; Zhu, Q.S. The latest research progress of image dehazing. Acta Autom. Sin. 2015, 41, 221–239. (In Chinese) [Google Scholar]
Mondal, K.; Rabidas, R.; Dasgupta, R. Single image haze removal using contrast limited adaptive histogram equalization based multiscale fusion technique. Multimed. Tools Appl. 2024, 83, 15413–15438. [Google Scholar] [CrossRef]
Yu, L.; Liu, X.; Liu, G. A new dehazing algorithm based on overlapped sub-block homomorphic filtering. In Proceedings of the Eighth International Conference on Machine Vision, Barcelona, Spain, 19–21 November 2015; p. 987502. [Google Scholar]
Liu, X.; Liu, C.; Lan, H.; Xie, L. Dehaze enhancement algorithm based on Retinex theory for aerial images combined with dark channel. Open Access Libr. J. 2020, 7, 1–12. [Google Scholar] [CrossRef]
Chen, D.; He, M.; Fan, Q.; Liao, J.; Zhang, L.; Hou, D.; Yuan, L.; Hua, G. Gated context aggregation network for image dehazing and deraining. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA, 7–11 January 2019; pp. 1375–1383. [Google Scholar]
Engin, D.; Genç, A.; Kemal Ekenel, H. Cycle-dehaze: Enhanced cyclegan for single image dehazing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 825–833. [Google Scholar]
Cai, B.; Xu, X.; Jia, K.; Qing, C.; Tao, D. Dehazenet: An end-to-end system for single image haze removal. IEEE Trans. Image Process. 2016, 25, 5187–5198. [Google Scholar] [CrossRef] [PubMed]
Li, B.; Peng, X.; Wang, Z.; Xu, J.; Feng, D. AOD-Net: All-in-One Dehazing Network. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 4780–4788. [Google Scholar]
Hassibi, B.; Stork, D. Second order derivatives for network pruning: Optimal brain surgeon. In Advances in Neural Information Processing Systems; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1993; pp. 164–171. [Google Scholar]
Gomez, A.; Zhang, I.; Kamalakara, S.; Madaan, D.; Swersky, K.; Gal, Y.; Hinton, G. Learning sparse networks using targeted dropout. arXiv 2019, arXiv:1905.13678. [Google Scholar] [CrossRef]
He, Y.; Zhang, X.; Sun, J. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1389–1397. [Google Scholar]
Frankle, J.; Carbin, M. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv 2018, arXiv:1803.03635. [Google Scholar]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
Ma, N.; Zhang, X.; Zheng, H.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
Chen, J.; Kao, S.; He, H.; Zhuo, W.; Wen, S.; Lee, C.; Gary Chan, S. Run, Don’t walk: Chasing higher FLOPS for faster neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 12021–12031. [Google Scholar]
Xie, G.; Cheng, J.; Wang, J.; Yao, X.; Han, J. Oriented R-CNN for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 3500–3509. [Google Scholar]
Li, Y.; Li, X.; Dai, Y.; Hou, Q.; Liu, L.; Liu, Y.; Cheng, M.-M.; Yang, J. LSKNet: A foundation lightweight backbone for remote sensing. Int. J. Comput. Vis. 2024, 133, 1410–1431. [Google Scholar] [CrossRef]
Liu, Z.; Yuan, L.; Weng, L.; Yang, Y. A high resolution optical satellite image dataset for ship recognition and some new baselines. In Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods, Porto, Portugal, 24–26 February 2017; pp. 324–331. [Google Scholar]
Du, Y. Research on Lightweight Object Detection Algorithm Based on Multi-Scale Structure; University of Science and Technology of China: Hefei, China, 2021. [Google Scholar]
Li, Y.; Hou, Q.; Zheng, Z.; Cheng, M.; Yang, J.; Li, X. Large selective kernel network for remote sensing object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 4–6 October 2023; pp. 16794–16805. [Google Scholar]
Wang, J.; Chen, K.; Xu, R.; Liu, Z.; Loy, C.C.; Lin, D. Carafe: Content-aware reassembly of features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3007–3016. [Google Scholar]
Narasimhan, S.G.; Nayar, S.K. Removing weather effects from monochrome images. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR 2001, Kauai, Hawaii, 8–14 December 2001; p. II. [Google Scholar]
Narasimhan, S.G.; Nayar, S.K. Contrast restoration of weather degraded images. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 713–724. [Google Scholar] [CrossRef]
Nayar, S.K.; Narasimhan, S.G. Vision in bad weather. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; pp. 820–827. [Google Scholar]
Narasimhan, S.G.; Nayar, S.K. Vision and the atmosphere. Int. J. Comput. Vis. 2002, 48, 233–254. [Google Scholar] [CrossRef]
Mie, G. Contributions to the optics of turbid media, particularly of colloidal metal solutions. R. Aircr. Establ. Libr. Transl. 1976, 25, 377–445. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 2341–2353. [Google Scholar] [CrossRef] [PubMed]
Huang, B.; Li, Z.; Yang, C.; Sun, F.; Song, Y. Single satellite optical imagery dehazing using SAR image prior based on conditional generative adversarial networks. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA, 1–5 March 2020; pp. 1806–1813. [Google Scholar]
Afanaseva, O.V.; Tulyakov, T.F. A methodology to develop an information and control system to monitor the technical state of power transmission lines. Elektrotehniški Vestn. 2025, 92, 221–228. [Google Scholar]

Figure 1. Research technical roadmap of the paper.

Figure 2. HRSC2016 [25].

Figure 3. Iterative map of mAP and Loss at different scales of YOLOv8.

Figure 4. Comparison of different lightweight network experiments.

Figure 5. The detection effect of YOLOv8-FasterNet.

Figure 6. FN-LSK-YOLOv8 network model structure.

Figure 7. Performance Comparison During Training.

Figure 8. Visualization of attention features.

Figure 9. Schematic diagram of the FN-LSCAR-YOLOv8 structure.

Figure 10. Comparison of before and after results.

Figure 11. Miss Detection Illustration.

Figure 12. Scattering and attenuation model [32].

Figure 13. Atmospheric light imaging model [32].

Figure 14. Pixel luminance distribution.

Figure 15. Dehazing effect using AOD-Net model.

Figure 16. Dual Attention Module [34].

Figure 17. Optimized before and after comparison. (a) Foggy imagery. (b) Original AOD-Net dehazing effect. (c) Optimize the loss function.

Figure 18. Schematic diagram of the synthetic dataset section.

Figure 19. Subjective evaluation results.

Figure 20. Dark channel dehazing algorithm flowchart [35].

Figure 21. Dehaze-Net Model [35].

Figure 22. Qualitative comparison of different dehazing methods on a real hazy image from the Haze1k dataset.

Table 1. System configuration and parameter settings.

Operating System	CUDA	CPU	GPU	PyTorch
Ubuntu 20.04 LTS	CUDA 11.6	Intel(R)i7-10700K	NVIDIA GeForce RTX 3080	1.13.1
batch size	learning rate	optimizer	epoch
8	0.01	SGD	300

Table 2. Remote sensing image detection performance of YOLO series algorithms.

Network Model	YOLOv5n	YOLOv5s	YOLOv5m	YOLOv5l	YOLOv5x
mAP	59.7	67.2	73.3	75.4	73.2
Parameters(M)	2.51	9.1	25.1	53.2	97.2
GFLOPs	7.1	23.9	46.3	134.9	246.2
Network model	YOLOv6n	YOLOv6s	YOLOv6m	YOLOv6l
mAP	48.8	60.0	58.4	55.1
Parameters(M)	4.3	16.3	52.0	110.8
GFLOPs	12.2	44.1	161.2	391.3
Network model	YOLOv7	YOLOv7-tiny
map@0.5	75.9	61.7
Parameters(M)	36.4	6.14
GFLOPs	104	13.4
Network model	YOLOv8n	YOLOv8s	YOLOv8m	YOLOv8l	YOLOv8x
map@0.5	66.9	72.4	78.3	77.3	77.8
Parameters(M)	3.1	11.1	25.9	43.6	68.2
GFLOPs	8.1	28.5	78.8	165.0	257.7

Table 3. Comparison table of the performance of different lightweight networks.

Backbone	map@0.5	Parameters	GFLOPs
ShuffleNet	64.0	5.95	16.0
MobileNet	63.4	6.75	16.8
FasterNet	69.0	5.96	16.1

Table 4. Improved YOLOv8-FN performance comparison table.

Improvement	Backbone	map@0.5	Parameters (M)	GFLOPs
FN-YOLOv8s	FasterNet	69.0	5.96	16.1
FN-LSK-YOLOv8s	FasterNet	69.6	6.54	17.3

Table 5. Detection performance of convolutional kernels of different sizes.

Kencoder	1	1	1	3	3	3	5	5	7
Kup	3	5	7	3	5	7	5	7	7
mAP@0.5	67.1	68.2	67.6	66.7	70.5	69.4	68.9	68.2	68.8

Table 6. Improved performance comparison table.

Improvement	Backbone	map@0.5	Parameters (M)	GFLOPs
Baseline	FasterNet	69.0	5.96	16.1
+LSK,CARAFE	FasterNet	70.5	6.69	17.3

Table 7. Details of atmospheric particles during varying weather conditions.

Weather Type	Sunny Day	Haze	Fog	Cloudy	Rainy Weather
Particle type	molecules	Floating particles	Small droplets	Small droplets	drop of water
Particle radius (um)	10⁻⁴	10⁻²~1	1~10	1~10	10²~10⁴

Table 8. AOD-Net convolution parameters.

Convolutional Layer	Conv1	Conv2	Conv3	Conv4	Conv5
Convolution kernel	1 × 1	3 × 3	5 × 5	7 × 7	3 × 3
Receptive Field	1	3	7	13	15
Number of output channels	3	3	3	3	3
The number of parameters	9	81	225	441	81

Table 9. Experimental environment and training parameter settings.

Name	Index	Training Parameters	Valid Values
operating system	Windows 10 Professional version	Batch_size	8
CPU	Intel(R) Core(TM) i7-10700K CPU @ 3.80 GHz 3.79 GHz	Epoch	15
GPU	NVIDIA GeForce RTX 3080	Learning rate	0.001
PyTorch	Version 1.9.1	Optimization algorithms	Adam
CUDA	Version 11.7

Table 10. Objective evaluation results.

The Main Algorithm	DCP	Histogram Equalization	Dehaze-Net	AOD-Net	Ours
PSNR	11.3252	12.5942	18.7951	18.2651	21.0950
SSIM	0.4180	0.5789	0.7681	0.7554	0.8418

Table 11. Ablation test results.

Network Model	Original AOD-Net	Loss Function Optimization	Attention Mechanisms	Loss Function Optimization + Attention Mechanism
PSNR	18.265	19.189	20.041	21.095
SSIM	0.755	0.836	0.835	0.842

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shao, X.; Wang, Z.; Yang, Y.; Zheng, S.; Mu, J. Target Detection in Ship Remote Sensing Images Considering Cloud and Fog Occlusion. J. Mar. Sci. Eng. 2026, 14, 124. https://doi.org/10.3390/jmse14020124

AMA Style

Shao X, Wang Z, Yang Y, Zheng S, Mu J. Target Detection in Ship Remote Sensing Images Considering Cloud and Fog Occlusion. Journal of Marine Science and Engineering. 2026; 14(2):124. https://doi.org/10.3390/jmse14020124

Chicago/Turabian Style

Shao, Xiaopeng, Zirui Wang, Yang Yang, Shaojie Zheng, and Jianwu Mu. 2026. "Target Detection in Ship Remote Sensing Images Considering Cloud and Fog Occlusion" Journal of Marine Science and Engineering 14, no. 2: 124. https://doi.org/10.3390/jmse14020124

APA Style

Shao, X., Wang, Z., Yang, Y., Zheng, S., & Mu, J. (2026). Target Detection in Ship Remote Sensing Images Considering Cloud and Fog Occlusion. Journal of Marine Science and Engineering, 14(2), 124. https://doi.org/10.3390/jmse14020124

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Target Detection in Ship Remote Sensing Images Considering Cloud and Fog Occlusion

Abstract

1. Introduction

2. Determination of Image Recognition Algorithms

3. Structural Design of Lightweight Networks

3.1. Lightweight Improvement of the Backbone Feature Extraction Network

3.2. Improvement of FN-YOLOv8s Lightweight Object Detection Algorithm

4. Image Dehazing Theory

4.1. Atmospheric Scattering Model

4.2. Atmospheric Scattering Coefficient

5. Deep Learning-Based Image Dehazing

5.1. Improvement of Network Model

5.2. Dehazing Effect Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI