A Multi-Feature Fusion Approach for Sea Fog Detection Under Complex Background

Yang, Shuyuan; Tang, Yuzhu; Zhou, Zeming; Zhao, Xiaofeng; Yang, Pinglv; Hu, Yangfan; Bo, Ran

doi:10.3390/rs17142409

Open AccessArticle

A Multi-Feature Fusion Approach for Sea Fog Detection Under Complex Background

by

Shuyuan Yang

¹,

Yuzhu Tang

²,

Zeming Zhou

^1,3,4

,

Xiaofeng Zhao

^1,3

,

Pinglv Yang

^1,*

,

Yangfan Hu

¹

and

Ran Bo

¹

College of Meteorology and Oceanography, National University of Defense Technology, Changsha 410073, China

²

College of Intelligent Science and Control Engineering, Jinling Institute of Technology, Nanjing 211169, China

³

High Impact Weather Key Laboratory of CMA, National University of Defense Technology, Changsha 410073, China

⁴

Center for Applied Mathematics of Jiangsu Province, Nanjing University of Information Science and Technology, Nanjing 210044, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(14), 2409; https://doi.org/10.3390/rs17142409

Submission received: 19 June 2025 / Revised: 9 July 2025 / Accepted: 10 July 2025 / Published: 12 July 2025

(This article belongs to the Special Issue Observations of Atmospheric and Oceanic Processes by Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

Sea fog is a natural phenomenon that significantly reduces visibility, posing navigational hazards for ships and impacting coastal activities. Geostationary meteorological satellite data have proven to be indispensable for sea fog monitoring due to their large spatial coverage and spatiotemporal consistency. However, the spectral similarities between sea fog and low clouds result in omissions and misclassifications. Furthermore, high clouds obscure certain sea fog regions, leading to under-detection and high false alarm rates. In this paper, we present a novel sea fog detection method to alleviate the challenges. Specifically, the approach leverages a fusion of spectral, motion, and spatiotemporal texture consistency features to effectively differentiate sea fog and low clouds. Additionally, a multi-scale self-attention module is incorporated to recover the sea fog region obscured by clouds. Based on the spatial distribution characteristics of sea fog and clouds, we redesigned the loss function to integrate total variation loss, focal loss, and dice loss. Experimental results validate the effectiveness of the proposed method, and the detection accuracy is compared with the vertical feature mask produced by the CALIOP and exhibits a high level of consistency.

Keywords:

sea fog detection; complex backgrounds; multi-feature fusion; multi-scale self-attention

Graphical Abstract

1. Introduction

Sea fog is a meteorological phenomenon that forms in the lower atmosphere [1], significantly reducing visibility and posing increased risks to maritime navigation and fishing operations. Therefore, improving the accuracy and timeliness of sea fog detection holds substantial practical significance. Visibility sensors at meteorological stations and buoys can provide valuable observational data on sea fog events. However, the limited deployment of sensors—primarily located along coastlines with sparse offshore stations [2]—is insufficient to capture the spatial distribution of sea fog across open ocean areas.

Geostationary meteorological satellites provide essential data for real-time observation, early warning, and forecasting of sea fog over extensive regions due to their high temporal resolution. These satellites acquire multiple spectral data, enabling discrimination between the sea surface, sea fog, and clouds based on their unique spectral signatures. Early sea fog detection methods leveraged these signatures by applying thresholding techniques to combinations of reflectance [3,4] or brightness temperature [5,6]. Despite their intuitive and straightforward nature, fixed threshold methods are limited in adapting to variations in time, location, and atmospheric conditions. Thus, Cermak and Bendix [7,8] sequentially developed dynamic thresholds based on emissivity differences at different satellite viewing zenith angles and ground-based measurements of cloud height. Subsequently, additional auxiliary information based on variations in environmental conditions, such as the monthly mean sea surface temperature, has been incorporated to dynamically adjust the thresholds [9]. Despite their improvements over fixed-threshold methods, these dynamic approaches are influenced by the inherent complexity of the histogram (e.g., number of peaks or distribution shape), which can limit their scalability and robustness in complex scenarios.

In recent years, machine learning methods have been widely applied in sea fog detection by constructing classification models trained on labeled data [10]. For instance, Kim et al. [11] developed a model by integrating Geostationary Ocean Color Imager (GOCI) observations with Himawari-8 satellite data through decision tree (DT). Shin and Kim [12] utilize the expectation–maximization (EM) algorithm for nighttime sea conditions to distinguish sea fog from stratus clouds, which share similar characteristics of particle size and altitude. Wang [13] constructed a 13-dimensional feature vector based on FY-3D multispectral data and assessed the performance of classifiers such as DT and support vector machines (SVMs). However, traditional machine learning methods rely on the handcrafted feature, which has limitations when dealing with high-dimensional nonlinear data, making it difficult to fully characterize the difference between sea fog and complex backgrounds.

Currently, deep learning has made significant advancements in environmental remote sensing communities [14,15], gradually becoming a mainstream tool for sea fog detection. Specifically, convolutional neural networks (CNNs) [14], a core deep learning technology, capture multi-scale features through stacked convolution and pooling layers, which are widely applied in this field. For instance, Tang et al. [16] introduced a two-stage deep learning strategy, initially using a multi-layer perceptron (MLP) to exclude the cloud-free sea surfaces from multispectral images, followed by a CNN for spatial feature extraction, which effectively distinguishes clouds from sea fog. Yi et al. [17] proposed a sea fog detection algorithm combining fully convolutional networks (FCNs) and conditional random fields (CRFs) under low-contrast conditions at dawn. Furthermore, U-Net, known for its robust feature fusion and pixel-level segmentation capability, enables end-to-end sea fog detection [18,19]. By leveraging data dimensionality reduction and augmentation [20], U-Net is re-designed to improve detection performance and the ability of generalization. For large-scale sea fog regions, the self-attention mechanism in transformer model [21] captures the intrinsic features and identifies long-range dependencies. Lu et al. [19] integrated transformers and CNNs for daytime sea fog monitoring, which captures global context information by a dual-branch feed-forward network (FFN) along with efficient channel attention (ECA) modules. Yan et al. [22] based the SeaMAE model on a vision transformer encoder and a convolutional hierarchical decoder, achieving better results with very high-resolution satellite imagery. Noting that both sea fog and clouds move concurrently, Yang et al. [23] attempted to exploit the temporal continuity in meteorological satellite image sequences to extract their motion features through a sparse optical flow method. However, their results indicate inconsistent movement trajectories for individual sea fog pixels, which contradicts the actual behavior, as the overall movement path of sea fog should exhibit consistency.

Existing sea fog detection methods primarily rely on spectral and texture features. In the intricate scenarios involving the coexistence of clouds and sea fog, the spectral similarities between sea fog and low clouds can result in a considerable increase in false alarms. Furthermore, sea fog is frequently obscured by high clouds, leading to a high rate of missed detection. Meanwhile, because the dynamic characteristics are difficult to accurately model, effectively representing the motion feature of sea fog remains a challenge. To overcome these limitations, we propose a novel method for sea fog detection under complex backgrounds. The main contributions are as follows:

(1): We construct a sea fog dataset covering the Yellow Sea and Bohai Sea, comprehensively considering the spectral, motion, and spatiotemporal texture features. Sea fog pixels occluded by clouds are labeled as a new category, named “cloud–fog mixed”.
(2): A dual-branch encoder network incorporating a multi-scale self-attention mechanism is designed to effectively extract spatial correlations of sea fog regions in the case of cloud coexistence or occlusion.
(3): The loss function integrates total variation loss, focal loss, and dice loss to address the class imbalance problem and ensures the spatial consistency of sea fog regions.

The rest of this paper is organized as follows: Section 2 introduces the study area and dataset; Section 3 provides the details of the proposed method. Section 4 reports the results, and Section 5 discusses the main factors contributing to the model’s effectiveness. Section 6 concludes the study and outlines future research directions.

2. Study Area and Data

2.1. Study Area

The study area covers parts of the Yellow Sea and Bohai Sea in China (Figure 1), with latitude and longitude ranging from 30°N to 42°N and 117°E to 129°E, respectively. The region adjacent to the Northwestern Pacific Ocean, characterized by cold, dry winters and warm, humid summers, is considered one of the most fog-prone areas worldwide. Influenced by warm moist air flow, advection fog is predominantly formed [24] from March to August each year. Observations from coastal meteorological stations indicate that the average annual number of fog days in this region ranges from 50 to 80, with a significant increase in frequency during spring and autumn [25].

2.2. Himawari-8 Standard Data

The dataset in this paper is derived from the Himawari-8, a geostationary meteorological satellite launched by the Japan Meteorological Agency, which is equipped with the Advanced Himawari Imager (AHI). The Himawari-8 Standard Data (HSD8) dataset [26] comprises 16 observational channels, including three visible light channels, three near-infrared channels, and 10 infrared channels. AHI captures images every 10 min, enabling high-frequency monitoring of sea fog formation and dissipation. Detailed information about each band is provided in Table 1.

2.3. Multiple Feature Representation

2.3.1. Spectral Feature

To analyze the radiative characteristics of clouds, sea fog, and sea surface across different channels of the HSD8 except the water vapor channels, we randomly selected different types of samples and statistically analyzed their reflectivity and brightness temperature distributions. As shown in Figure 2, the reflectivity distributions of the first four channels are similar, while the 0.86 µm channel (Band 04) exhibits high sensitivity to the reflectance of sea surface and clouds, thereby facilitating differentiation between sea surface and clouds. The 1.6 µm channel (Band 05) shows enhanced responsiveness to the absorption characteristics of ice crystal particles in clouds, which proves effective in distinguishing between sea fog and ice clouds. Meanwhile, the 3.9 µm channel (Band 07) responds well to the scattering properties of sea fog during the day. Located within the longwave infrared range, the 10.4 µm channel (Band 13) is highly sensitive to the radiative temperature of the sea fog and clouds. Therefore, the 0.86 µm, 1.6 µm, 3.9 µm, and 10.4 µm channels are chosen to form the spectral feature.

2.3.2. Motion Feature

Sea fog is influenced by low-altitude winds and the pressure field. It typically occurs under conditions of weaker winds and relatively stable surface meteorological conditions, resulting in slow and stable movement. In contrast, cloud formation is primarily associated with the condensation of water vapor in the atmosphere, as well as rising air currents at higher altitudes and atmospheric stability. Thus, we employ the dense optical flow algorithm [27] to estimate motion of sea fog and clouds over consecutive 0.64 µm channel images. The algorithm constructs a multi-scale pyramid between two adjacent frames of images, utilizing the low-resolution image to efficiently identify large-scale motion trends, while the high-resolution image sequence is employed to accurately depict finer motion details. As depicted in Figure 3a, the movements of sea fog and clouds exhibit distinct directional patterns. Sea fog moves predominantly in the south–north direction, while clouds flow from west to east.

To incorporate the motion feature into the self-attention network, the motion vector is transformed to direction and magnitude components, which are then normalized as hue and saturation components in HSI color space, respectively, and the intensity component is fixed to 1. In Figure 3b, the magenta color describes the northward movement of sea fog, and the red color describes the eastward movement of clouds.

2.3.3. Spatiotemporal Texture Consistency Feature

Compared to clouds, the sea fog region is more spectrally homogeneous, and the spectral changes are slower in adjacent time-series images. We propose spatiotemporal texture consistency feature (STCF) to describe the variation in texture pattern in the time domain and the homogeneity of texture in the spatial domain. Texture is commonly represented with the Gibbs random field [28], the Markov model [29], wavelet transform [30], gray level co-occurrence matrix (GLCM) [31], and others. In this paper, the GLCM is employed to calculate the joint probability distribution of pairs of pixel gray levels between two consecutive 0.64 µm channel images within a 15 × 15 window. STCF is then obtained by calculating the angular second moment (ASM) of the GLCM at each pixel as follows:

A S M = \sum_{i, j} P {(i, j)}^{2}

(1)

where

P (i, j)

represents the joint probability at position

(i, j)

in the GLCM.

The STCF characterizes the regularity of the image texture. A higher STCF value indicates that the image texture is more homogeneous, with smoother transitions in grayscale values, while a lower STCF value suggests a more complex texture with abrupt changes in grayscale values. As illustrated in Figure 4c, the black region represents the cloud area, characterized by significant texture variations, and the white region corresponds to the sea surface, exhibiting minimal grayscale fluctuations. Between the frames in Figure 4a,b, the homogeneity of sea fog changes more significantly than sea surface but more gradually than clouds. Thus, in Figure 4c, the grayscale of the sea fog region lies between black and white, and the boundary changes are also depicted.

2.4. Construction of Dataset

To construct a sea fog dataset covering the Bohai Sea and the Yellow Sea, we integrate spectral, motion, and spatiotemporal texture features, which are concatenated as multi-feature inputs to form the dataset. It is noteworthy that the previous sea fog detection studies primarily categorized pixels into three types—sea surface, sea fog, and clouds—and overlook sea fog pixels obscured by clouds [32,33], resulting in misclassification or reduced detection accuracy. In order to better reflect actual meteorological phenomena and enhance the model’s ability to distinguish sea fog from complex scenarios, we label the sea fog pixels covered by cloud as “cloud–fog mixed” type based on the spatial–temporal consistency of consecutive AHI frames.

The constructed dataset consists of 383 sea fog images collected between 2018 and 2020, and each image has a size of 512 × 512 pixels, spanning from March 2018 to December 2020. These cases cover multiple sea fog seasons in the Bohai and Yellow Seas, with sample distribution concentrated in spring and early summer, which aligns with the climatological peak of sea fog occurrence. To ensure temporal independence, we divide the dataset based on individual sea fog events, ensuring that satellite images from the same event appear only in either the training or the testing set. The data are split using a 4:1 ratio between training and testing sets. Detailed temporal coverage is summarized in Table 2.

3. Methodology

The framework of the proposed method is provided in Figure 5. The input

F \in ℝ^{H \times W \times C}

is a concatenation of multiple features, along with a sea–land mask to distinguish the pixels of the oceanic region.

H

and

W

represent height and width of the feature map, and

C

denotes the number of channels. The multiple features are processed through four stages of multi-head self-attention (MSA) blocks, with each stage focusing on different scales of feature. Furthermore, the feature fusion (FF) block integrates the output from each stage through upsampling, normalization, and concatenation. Finally, the segmentation (SGM) block is employed to detect the sea fog and output the result denoted by

R \in ℝ^{H \times W \times N}

, and

N

is the number of categories. Those components are explained in the following sections.

3.1. Encoder Component

We obtain multiple feature representations of sea fog, and how to integrate those features effectively is the key issue in the multi-feature fusion model. Additionally, considering that sea fog typically exhibits a coherent distribution over large areas, handling complex scenes—such as fog areas partially covered by clouds—presents a challenge for convolution operations in capturing long-range dependencies in the global context. We adopt self-attention mechanisms and construct a four-stage MSA with different scales, where each stage focuses on information at different scales and serves as input for the next stage. The multi-scale MSA utilizes small-scale features to extract finer details, while also focusing on the entire region from a large-scale perspective. Even in the presence of sea fog beneath the clouds, the fusion of multi-scale information across stages enables the model to effectively capture the correlation between the occluded and the surrounding sea fog, thereby achieving accurate recognition of the sea fog beneath the clouds. The structure of the MSA block is illustrated in Figure 6. Every stage of the MSA learns multiple global dependencies from different subspaces by computing multiple attention heads in parallel. Unlike ViT that splits an image into multiple linearly embedded patches, we perform 2 × 2 max pooling in each stage to obtain

Q

,

K

, and

V

, which are subsequently mapped into the

i

-th subspace through distinct linear transformations, yielding

Q_{i}

,

K_{i}

, and

V_{i}

. MSA adaptively adjusts the weights to emphasize key features, ensuring the effective extraction of sea fog characteristics from complex backgrounds; MSA and attention formula are as follows:

M S A (Q, K, V) = C o n c a t (A t t e n t i o n_{1}, \dots, A t t e n t i o n_{h})

(2)

A t t e n t i o n_{i} = s o f t m a x (\frac{Q_{i} K_{i}^{T}}{\sqrt{d_{k}}}) V_{i}

(3)

where

C o n c a t (•)

denotes the feature concatenation operation, and

d_{k}

denotes the dimension of

K

. The output is then processed by a feedforward module through linear mapping and activation.

3.2. Decoder Component

Due to the max pooling operation in the four-stage MSA, direct decoding the output of MSA may result in the loss of detailed information. Therefore, the MSA output is upsampled and normalized in each stage, and the original multiple features containing details are downsampled using average pooling at the same scale as the MSA output. Outputs from the two branches, in addition to the output from the FF block of the preceding stage, are integrated through the FF block, as illustrated in Figure 7a.

In the SGM block, as shown in Figure 7b, the fused feature map undergoes processing through four convolutional layers in sequence, progressively compressing the feature dimensions to extract highly discriminative feature representations. The block ultimately yields a sea fog detection result with the same dimensions as the input.

3.3. Loss Function

Sea fog and clouds exhibit similar reflectance and irregular spatial distribution, which undermines the accuracy of traditional image detection methods. In this study, a loss function that combines Total Variation Loss, Focal Loss, and Dice Loss is employed, as follows:

L_{t o t a l} = λ_{1} L_{T V} + λ_{2} L_{F o c a l} + (1 - λ_{1} - λ_{2}) L_{D i c e}

(4)

where the hyperparameters

λ_{1}

and

λ_{2}

represent the weights of total variation loss and focal loss, respectively.

Total variation loss has been designed to suppress noise and prevent overfitting by restricting gradient variations in the predicted result. It is particularly effective in avoiding pixels inside the sea fog area from being misclassified as clouds, thereby maintaining the spatial consistency of sea fog. Total variation loss

L_{T V}

is defined as follows:

L_{T V} = \sum_{i, j} |y_{p r e d (i, j)} - y_{p r e d (i + 1, j)}| + |y_{p r e d (i, j)} - y_{p r e d (i, j + 1)}|

(5)

where

y_{p r e d}

represents the predicted result, and

i, j

denote the pixel location.

As defined in Equation (6), focal loss focuses on the hard-to-detect sea fog areas by reducing the weight of pixels that are easily classified.

L_{F o c a l} = - {(1 - p_{t})}^{γ} • \log (p_{t})

(6)

where

γ

is the focusing parameter;

p_{t}

represents the predicted probability of the correct class, derived from the cross-entropy calculation:

p_{t} = e^{- C E (y_{t r u e}, y_{p r e d})}

(7)

where

C E

denotes the cross-entropy loss function, and

y_{t r u e}

and

y_{p r e d}

represent the true labels and predicted results, respectively.

The dice loss aims to optimize detection accuracy, particularly for small sea fog regions. It measures the overlap between the predicted result and the ground truth as follows:

L_{D i c e} = 1 - \frac{2 • \sum (y_{t r u e} • y_{p r e d}) + ε}{\sum y_{t r u e} + \sum y_{p r e d} + ε}

(8)

where

ε

is a constant to avoid a zero denominator.

4. Results

4.1. Implementation Details and Evaluation Metric

Among the 383 sea fog images, we randomly select 291 samples for model training, which corresponds to 76 million pixels, and the remaining 92 samples (totaling 24 million pixels) are used for testing.

In our experiments, the number of heads in the MSA blocks across the four stages are 2, 2, 4, and 8, respectively. During training stage, the hyperparameters

λ_{1}

and

λ_{2}

in the loss function are set to 0.1 and 0.6 empirically, and

ε

is set to 1. The Adam optimizer [34] is used throughout all the experiments. The training process is conducted over 100 epochs, with a batch size of 32.

All experiments are conducted on an Intel(R) Xeon(R) Gold 6226R CPU @ 2.90 GHz (Intel Corporation, Santa Clara, CA, USA) with an NVIDIA GeForce RTX 4090 GPU (NVIDIA Corporation, Santa Clara, CA, USA).

To evaluate the experimental results quantitatively, eight metrics are employed, including false alarm rate (FAR), recall, Critical Success Index (CSI), error rate (ERR), precision, accuracy, Kuiper skill score (KSS), mean intersection over union (mIoU), and F1 score:

F A R = \frac{F P}{F P + T N}

(9)

R e c a l l = \frac{T P}{T P + F N}

(10)

C S I = \frac{T P}{T P + F P + F N}

(11)

P r e c i s i o n = \frac{T P}{T P + F P}

(12)

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(13)

K S S = R e c a l l - F A R

(14)

m I o U = \frac{1}{N} \sum_{i = 1}^{N} \frac{T P_{i}}{T P_{i} + F P_{i} + F N_{i}}

(15)

F 1 = \frac{2 \cdot P r e c i s i o n \cdot R e c a l l}{P r e c i s i o n + R e c a l l}

(16)

where

T P

indicates the number of pixels that correctly identify sea fog,

F P

indicates the number of pixels that mistakenly identify other categories as sea fog,

T N

indicates the number of pixels that accurately identify other categories as non-sea fog,

F N

indicates the number of pixels that mistakenly identify sea fog as belonging to other categories, and

N

represents the number of categories.

4.2. Ablation Study on Feature Components and Temporal Intervals

To investigate the contribution of each input feature and the impact of different temporal interval settings, we conduct an ablation study in this section. Table 3 presents the results of the ablation experiments. In the table, Spectral refers to the spectral features extracted from satellite imagery, while Motion denotes the motion features derived from optical flow estimation between adjacent images, and STCF represents the proposed spatiotemporal texture consistency features from neighboring frames. T = 1, T = 3, and T = 6 correspond to temporal intervals of 10 min, 30 min, and 60 min, respectively.

When only spectral features are used as input, the model achieves the lowest mIoU. Incorporating motion features improves the average intersection over union (mIoU) to 0.853, and introducing STCF yields an average mIoU of 0.841. The highest mIoU of 0.953 is achieved when the temporal interval is 10 min (T = 1). However, when motion features are computed from images with longer temporal intervals, the mIoU on the test set does not improve but instead decreases. We think the gradual performance degradation is related to the displacement of clouds and sea fog over longer intervals, which leads to pixel-level misalignment between consecutive images, thereby potentially affecting the accuracy of motion estimation and the consistency of feature alignment.

4.3. Comparison with Other Methods

In this section, the proposed method is validated on the sea fog dataset and is compared with five representative methods, including the traditional threshold method [9] and the effective segmentation networks U-Net [35], FCN [36], MLP + CNN [16], and FCN + CRF [17].

As presented in Table 4, it can be concluded that deep-learning-based methods generally yield better results, and the proposed method demonstrates the best performance across all evaluation metrics. Our method achieves the lowest FAR of 0.008, indicating its ability to reduce the number of pixels incorrectly classified as sea fog. Additionally, the precision, accuracy, and other related metrics of the proposed method exceed 0.93, outperforming the other methods on each indicator.

4.4. Cases of Low Cloud Interference and High Cloud Occlusion

Figure 8 presents the detection results of the different methods under the condition of low cloud interference. In Figure 8a, the original image is synthesized from Band 03, Band 04, and Band 05. Sea fog is observed in most areas of the Yellow Sea, and the southeastern sea surface is covered by multi-layer clouds, with cyan pixels for high clouds, while low clouds have similar reflectivity to sea fog and show almost the same color in the image.

As shown in Table 5, the threshold method relies solely on a spectral feature, and the reflectivity of the sea fog boundary regions is relatively blurred, which leads to the misclassification of large cloud areas as sea fog and identifies the boundary area as cloud. Therefore, when low clouds are present in the image, the threshold method achieves the lowest precision value of 0.614 and the highest FAR value of 0.097. Three deep learning-based methods, MLP + CNN, U-Net, and FCN, demonstrate higher results. However, some clouds are still misclassified as sea fog, and pixels near the sea fog boundary are incorrectly identified as clouds. The recall and precision values for these methods are all below 0.94. Although FCN + CRF and our method achieve the same FAR value of 0.007, indicating comparable performance in reducing false alarms, our method outperforms on other metrics such as recall, precision, and F1 score. And in contrast, the detection results of our method closely align with the ground truth, with no instances of misclassifying clouds as sea fog.

Sea fog detection based on passive remote sensing is often plagued by high cloud occlusion. Figure 9 presents a comparison of the detection results between our method and other methods under high cloud occlusion, and the evaluation metrics of different methods are reported in Table 6, with the optimal value of each metric marked in bold. The threshold method and MLP + CNN are ineffective for sea fog beneath clouds, resulting in significantly missed detections, reflected by FAR values of 0.045 and 0.028, respectively. U-Net and FCN, by leveraging deep learning architectures, capture more spatial features but are limited in their ability to detect only portions of the cloud–fog mixed area. The CSI values of these two methods are both less than 0.78, and the recall values are both below 0.88. FCN + CRF enhances the detection of cloud–fog mixed areas through refining spatial coherence and boundary integrity. However, because of the limitations of CRF in recovering fine details and extracting features from small areas, its FAR value remains 0.011, and the mIoU does not exceed 0.94. In contrast, the proposed method reduces the FAR value to 0.004, which is the lowest among all the methods, and the recall value reaches 0.959. Our method outperforms existing methods in terms of both overall detection accuracy and precision in identifying cloud–fog boundaries by combining multi-feature fusion with optimized segmentation strategies.

4.5. Sea Fog Detection in Continuous Observations

Figure 10 illustrates the results of the proposed method for the time-series of sea fog on 8 June 2018, from 03:20 to 04:50 UTC, at 10 min intervals. The first, second, and third rows show the synthesized image, the ground truth, and the sea fog detection result using the proposed algorithm. The performance of various evaluation indicators on continuous time series data is illustrated in Figure 11, the vertical axis for the FAR value is set from 0 to 0.10, and the vertical axis for other indicators is set from 0.85 to 1, which allows for the visualization of the indicator’s position and fluctuations over time.

FAR values remain consistently low, ranging between 0.009 and 0.013, indicating a minimal incidence of false positives. Recall, precision, KSS, mIoU, and F1 scores consistently range between 0.90 and 0.95, demonstrating high confidence and reliability in detection results. Furthermore, the CSI values, ranging from 0.876 to 0.908, along with accuracy values exceeding 0.979, further validate the effectiveness of the proposed method in minimizing both omissions and misclassifications. Over time, every indicator value remains at a consistently high level with minimal fluctuation, which attests to the method’s strong adaptability to varying temporal conditions, sea fog patterns, and environmental dynamics. This stability not only ensures reliable performance in individual detection tasks but also highlights the model’s potential for continuous detection of entire sea fog events, thereby enabling dynamic tracking and precise identification of sea fog.

Although AHI has the capabilities of high spatial resolution and continuous observation capabilities, it lacks key parameters such as cloud base height, making it challenging to distinguish low-level stratus clouds from fog [37]. By contrast, CALIOP operates at two wavelengths, 532 nm and 1064 nm, which are capable of penetrating clouds and aerosols, providing detailed information on the vertical profile of the atmosphere [38]. The level 2 product of CALIOP, known as the CALIPSO VFM, provides detailed information on the location and type of clouds and aerosols. To evaluate the effectiveness of the proposed method, VFM products are utilized to assess the accuracy of fog detection by the algorithm.

The sea fog event depicted in Figure 12a occurs over the Yellow Sea and Bohai Sea at 04:40 UTC on 8 June 2018. The sea fog at this time exhibited a large-scale and continuous distribution. At nearly the same time (04:41:30 UTC on 8 June 2018), the CALIPSO satellite acquired high-resolution vertical sounding data, with its trajectory represented by the colored line. In Figure 12, the sea fog is classified as clouds with altitudes close to the sea surface. Within the latitude and longitude ranges corresponding to the A-B and C-D segments, the sea fog detection results at 04:40 UTC in Figure 10 are highly consistent with the vertical feature mask (VFM) data, further validating the accuracy of the proposed method. These findings demonstrate that the method effectively characterizes the spatial distribution of sea fog, contributing to improved detection accuracy.

5. Discussion

The experimental results demonstrate that the proposed method delivers outstanding and stable performance in sea fog detection across a range of meteorological conditions. This section provides a comprehensive discussion of the experimental results and the main factors contributing to the model’s effectiveness.

The method shows significant improvements across all evaluation metrics, particularly in reducing false alarms and preserving boundary integrity. These enhancements are primarily attributed to the synergistic integration of motion features, spatiotemporal texture consistency features, and a self-attention mechanism. Motion features enable the model to track the dynamic evolution of sea fog, while the spatiotemporal texture consistency features characterize the spatial and temporal variation of clouds and sea fog texture. The attention mechanism enhances contextual awareness, enabling the model to better capture spatial patterns across extensive regions. The low FAR value reflects the model’s ability to suppress misclassification of visually similar phenomena such as low-level clouds and haze. Unlike traditional methods that rely solely on spectral features, the proposed approach incorporates a richer set of representations, thereby improving discriminative power.

A major advantage of the proposed method lies in its spatial stability. Under conditions of low cloud interference, the method demonstrates a superior capacity to distinguish sea fog from low clouds with similar reflectance characteristics. By integrating multiple complementary features, it maintains both high precision and recall. In scenarios involving high cloud occlusion, the method also exhibits strong robustness, whereas conventional algorithms often struggle due to their limited ability to detect sea fog beneath cloud layers. Through hierarchical feature fusion and a well-designed loss function, the model is able to infer sea fog boundaries even when they are partially obscured, thereby outperforming existing methods.

The proposed method also exhibits strong temporal stability. When applied to continuous time-series imagery, performance metrics such as FAR, F1 score, and mIoU remain consistent over time, indicating robust generalization to dynamically changing environments. This temporal consistency highlights the model’s potential for reliable real-time and long-duration monitoring applications.

Nonetheless, the method has certain limitations. It does not explicitly differentiate between sea fog and subsiding stratus clouds, which can appear nearly identical in satellite imagery. While this generalization enhances detection capability, it may obscure physically meaningful distinctions. In addition, while adjacent-frame motion features are utilized, the model still performs detection on individual frames and fails to capture long-term temporal evolution.

6. Conclusions

In this study, a novel multi-feature fused network for sea fog detection under a complex background is proposed and validated on the HSD8 dataset. The proposed approach exploits the inherent characteristics of sea fog, including spatial coherence and continuous temporal variation. To capture these properties, motion features are introduced to represent the spatiotemporal trajectory of sea fog, complemented by a newly designed spatiotemporal texture consistency feature that quantifies temporal variation coupled with spatial stability. Furthermore, a self-attention mechanism is integrated to enhance contextual modeling, enabling the capture of long-range dependencies across the scene. In addition, a comprehensive loss function is devised, effectively addressing fuzzy boundaries, spatial continuity, and class imbalance. Experimental results demonstrate state-of-the-art detection accuracy, surpassing traditional methods. Independent validation using CALIPSO VFM data further corroborates the method’s robustness.

In the current work, low clouds are treated as dynamically coherent systems. Notably, subsiding stratus clouds and sea fog exhibit nearly indistinguishable characteristics in satellite imagery, and our model does not explicitly differentiate between them. This simplification may obscure physically meaningful distinctions between low clouds and sea fog. Future work will rigorously address this limitation by incorporating domain-specific classification criteria to enhance both accuracy and physical interpretability.

Critically, sea fog constitutes a continuously evolving phenomenon. While existing methods extract sea fog from individual frames, they neglect interframe correlations. Although the proposed approach partially mitigates this by incorporating motion features between two consecutive frames, it remains constrained by frame-by-frame detection and fails to model the complete temporal dynamics. This limitation motivates future research aimed at capturing the spatiotemporal evolution of sea fog across an entire event using time-series modeling techniques.

Author Contributions

Conceptualization, P.Y.; methodology, S.Y., Y.T. and P.Y.; software, S.Y. and P.Y.; validation, S.Y. and Y.T.; formal analysis, Z.Z.; investigation, P.Y., X.Z., Y.H. and R.B.; resources, Z.Z.; writing—original draft preparation, S.Y.; writing—review and editing, P.Y., Z.Z. and Y.T.; visualization, S.Y. and P.Y.; supervision, Z.Z.; project administration, Z.Z.; funding acquisition, Z.Z., P.Y. and X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grants 42305159, 61473310, 41174164, and 41775027 and by the Open Project of Center for Applied Mathematics of Jiangsu Province (Nanjing University of Information Science and Technology).

Data Availability Statement

The code and processed data used for this study are publicly available and can be accessed from https://github.com/Yangsy30/MoTexFormer_SFD (accessed on 10 June 2025). The Himawari-8 Standard Data used for model construction in the study are available via https://www.eorc.jaxa.jp/ptree/ (accessed on 10 November 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Koračin, D.; Dorman, C.E.; Lewis, J.M.; Hudson, J.G.; Wilcox, E.M.; Torregrosa, A. Marine Fog: A Review. Atmos. Res. 2014, 143, 142–175. [Google Scholar] [CrossRef]
Xu, M.; Wu, M.; Guo, J.; Zhang, C.; Wang, Y.; Ma, Z. Sea Fog Detection Based on Unsupervised Domain Adaptation. Chin. J. Aeronaut. 2022, 35, 415–425. [Google Scholar] [CrossRef]
Kim, S.-H.; Suh, M.-S.; Han, J.-H. Development of Fog Detection Algorithm during Nighttime Using Himawari-8/AHI Satellite and Ground Observation Data. Asia-Pac. J. Atmos. Sci. 2019, 55, 337–350. [Google Scholar] [CrossRef]
Wu, D.; Lu, B.; Zhang, T.; Yan, F. A Method of Detecting Sea Fogs Using CALIOP Data and Its Application to Improve MODIS-Based Sea Fog Detection. J. Quant. Spectrosc. Radiat. Transf. 2015, 153, 88–94. [Google Scholar] [CrossRef]
Hunt, G.E. Radiative Properties of Terrestrial Clouds at Visible and Infra-red Thermal Window Wavelengths. Q. J. R. Meteorol. Soc. 1973, 99, 346–369. [Google Scholar] [CrossRef]
Zhang, Y.; Pulliainen, J.; Koponen, S.; Hallikainen, M. Detection of Sea Surface Temperature (SST) Using Infrared Band Data of Advanced Very High Resolution Radiometer (AVHRR) in the Gulf of Finland. Int. J. Infrared Millim. Waves 2002, 23, 195–9271. [Google Scholar] [CrossRef]
Cermak, J.; Bendix, J. A Novel Approach to Fog/Low Stratus Detection Using Meteosat 8 Data. Atmos. Res. 2008, 87, 279–292. [Google Scholar] [CrossRef]
Cermak, J.; Bendix, J. Dynamical Nighttime Fog/Low Stratus Detection Based on Meteosat SEVIRI Data: A Feasibility Study. Pure Appl. Geophys. 2007, 164, 1179–1192. [Google Scholar] [CrossRef]
Zhang, S.; Yi, L. A Comprehensive Dynamic Threshold Algorithm for Daytime Sea Fog Retrieval over the Chinese Adjacent Seas. Pure Appl. Geophys. 2013, 170, 1931–1944. [Google Scholar] [CrossRef]
Lee, H.-B.; Heo, J.-H.; Sohn, E.-H. Korean Fog Probability Retrieval Using Remote Sensing Combined with Machine-Learning. Remote Sens. 2021, 58, 1434–1457. [Google Scholar] [CrossRef]
Kim, D.; Park, M.-S.; Park, Y.-J.; Kim, W. Geostationary Ocean Color Imager (GOCI) Marine Fog Detection in Combination with Himawari-8 Based on the Decision Tree. Remote Sens. 2020, 12, 149. [Google Scholar] [CrossRef]
Shin, D.; Kim, J.-H. A New Application of Unsupervised Learning to Nighttime Sea Fog Detection. Asia-Pac. J. Atmos. Sci. 2018, 54, 527–544. [Google Scholar] [CrossRef]
Wang, Y.; Qiu, Z.; Zhao, D.; Ali, M.A.; Hu, C.; Zhang, Y.; Liao, K. Automatic Detection of Daytime Sea Fog Based on Supervised Classification Techniques for FY-3D Satellite. Remote Sens. 2023, 15, 2283. [Google Scholar] [CrossRef]
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Huang, Y.; Wu, M.; Guo, J.; Zhang, C.; Xu, M. A Correlation Context-Driven Method for Sea Fog Detection in Meteorological Satellite Imagery. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1003105. [Google Scholar] [CrossRef]
Tang, Y.; Yang, P.; Zhou, Z.; Zhao, X. Daytime Sea Fog Detection Based on a Two-Stage Neural Network. Remote Sens. 2022, 14, 5570. [Google Scholar] [CrossRef]
Yi, L.; Li, M.; Liu, S.; Shi, X.; Li, K.-F.; Bendix, J. Detection of Dawn Sea Fog/Low Stratus Using Geostationary Satellite Imagery. Remote Sens. Environ. 2023, 294, 113622. [Google Scholar] [CrossRef]
Guo, X.; Wan, J.; Liu, S.; Xu, M.; Sheng, H.; Yasir, M. A scSE-LinkNet Deep Learning Model for Daytime Sea Fog Detection. Remote Sens. 2021, 13, 5163. [Google Scholar] [CrossRef]
Lu, H.; Ma, Y.; Zhang, S.; Yu, X.; Zhang, J. Daytime Sea Fog Identification Based on Multi-Satellite Information and the ECA-TransUnet Model. Remote Sens. 2023, 15, 3949. [Google Scholar] [CrossRef]
Zhu, C.; Wang, J.; Liu, S.; Sheng, S.; Xiao, Y. Sea Fog Detection Using U-Net Deep Learning Model Based on Modis Data. In Proceedings of the 2019 10th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS), Amsterdam, The Netherlands, 24–26 September 2019; IEEE: Amsterdam, The Netherlands, 2019; pp. 1–5. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Advances in Neural Information Processing Systems 30 (NIPS 2017), Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30, pp. 5998–6008. [Google Scholar]
Yan, H.; Su, S.; Wu, M.; Xu, M.; Zuo, Y.; Zhang, C.; Huang, B. SeaMAE: Masked Pre-Training with Meteorological Satellite Imagery for Sea Fog Detection. Remote Sens. 2023, 15, 4102. [Google Scholar] [CrossRef]
Yang, Z.; Wu, M.; Xu, M.; Zhu, X.; Zhang, C.; Zhang, B. MoANet: A Motion Attention Network for Sea Fog Detection in Time Series Meteorological Satellite Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 1976–1987. [Google Scholar] [CrossRef]
Wang, Y.; Gao, S.; Fu, G.; Sun, J.; Zhang, S. Assimilating MTSAT-Derived Humidity in Nowcasting Sea Fog over the Yellow Sea. Weather Forecast. 2014, 29, 205–225. [Google Scholar] [CrossRef]
Fu, G.; Song, Y. Climatic Characteristics of the Occurrence Frequency of Sea Fog in the North Pacific. J. Ocean Univ. China (Nat. Sci. Ed.) 2014, 44, 35–41. (In Chinese) [Google Scholar]
Bessho, K.; Date, K.; Hayashi, M.; Ikeda, A.; Imai, T.; Inoue, H.; Kumagai, Y.; Miyakawa, T.; Murata, H.; Ohno, T.; et al. An Introduction to Himawari-8/9— Japan’s New-Generation Geostationary Meteorological Satellites. J. Meteorol. Soc. Jpn. 2016, 94, 151–183. [Google Scholar] [CrossRef]
Farnebäck, G. Two-Frame Motion Estimation Based on Polynomial Expansion. In Image Analysis; Bigun, J., Gustavsson, T., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2003; Volume 2749, pp. 363–370. ISBN 978-3-540-40601-3. [Google Scholar]
Derin, H.; Elliott, H. Modeling and Segmentation of Noisy and Textured Images Using Gibbs Random Fields. IEEE Trans. Pattern Anal. Mach. Intell. 1987, PAMI-9, 39–55. [Google Scholar] [CrossRef] [PubMed]
Rellier, G.; Descombes, X.; Falzon, F.; Zerubia, J. Texture Feature Analysis Using a Gauss-Markov Model in Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1543–1551. [Google Scholar] [CrossRef]
Qiao, Y.; Zhao, Y.; Song, C.; Zhang, K.; Xiang, X. Graph Wavelet Transform for Image Texture Classification. IET Image Process. 2021, 15, 2372–2383. [Google Scholar] [CrossRef]
Iqbal, N.; Mumtaz, R.; Shafi, U.; Zaidi, S.M.H. Gray Level Co-Occurrence Matrix (GLCM) Texture Based Crop Classification Using Low Altitude Remote Sensing Platforms. PeerJ Comput. Sci. 2021, 7, e536. [Google Scholar] [CrossRef]
Huang, Y.; Wu, M.; Jiang, X.; Li, J.; Xu, M.; Zhang, C.; Guo, J. Weakly Supervised Sea Fog Detection in Remote Sensing Images via Prototype Learning. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4106713. [Google Scholar] [CrossRef]
Zhou, Y.; Chen, K.; Li, X. Dual-Branch Neural Network for Sea Fog Detection in Geostationary Ocean Color Imager. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4208617. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Lecture Notes in Computer Science. Springer: Cham, Switzerland, 2015; Volume 9351, pp. 234–241. [Google Scholar] [CrossRef]
Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar] [CrossRef]
Yi, L.; Zhang, S.P.; Thies, B.; Shi, X.M.; Trachte, K.; Bendix, J. Spatio-Temporal Detection of Fog and Low Stratus Top Heights over the Yellow Sea with Geostationary Satellite Data as a Precondition for Ground Fog Detection—A Feasibility Study. Atmos. Res. 2015, 151, 212–223. [Google Scholar] [CrossRef]
Winker, D.M.; Pelon, J.; Coakley, J.A., Jr.; Ackerman, S.A.; Charlson, R.J.; Colarco, P.R.; Flamant, P.; Fu, Q.; Hoff, R.M.; Kittaka, C.; et al. A Global 3D View of Aerosols and Clouds. Bull. Am. Meteorol. Soc. 2010, 91, 1211–1230. [Google Scholar] [CrossRef]

Figure 1. Overview of the study area.

Figure 2. Reflectance and brightness temperature distributions of different objects in HSD8 (excluding water vapor channels).

Figure 3. Motion diagram from 03:50 UTC on 12 March 2018 to 04:00 UTC on 12 March 2018. (a) Motion vector between the two consecutive frames. The red box shows the movement of sea fog, which generally shows a trend from south to north; the yellow box shows the movement of clouds, which generally shows a trend from west to east. (b) Motion features between the two consecutive frames. The movement of sea fog typically appears with a red hue, while the movement of clouds is characterized by a magenta hue. Different colors describe the specific movement direction and displacement of each pixel.

Figure 4. Spatiotemporal texture consistency feature map. (a) shows the 0.64 µm reflectance for 4 June 2019, at 00:00 (UTC). (b) shows the 0.64 µm reflectance for 4 June 2019, at 01:00 (UTC). (c) displays the STCF derived from (a,b).

Figure 5. Framework of the proposed method. The encoder consists of two branches: one extracts multi-scale contextual information via MSA, while the other emphasizes key local features through average pooling. The outputs from both branches are then fused through the FF block to integrate multi-scale features from each stage. Finally, the SGM block generates the detection result.

Figure 6. The structure of MSA block.

Figure 7. The structure of FF block (a) and SGM block (b).

Figure 8. Detection results of six different algorithms for sea fog events at 05:10 UTC on 13 May 2018. (a) Original image synthesized from Band 03, Band 04, and Band 05; (b) ground truth; (c) results of the threshold method; (d) results of MLP + CNN; (e) results of U-Net; (f) results of FCN; (g) results of FCN + CRF; (h) our result. The sea fog, cloud–fog mixed areas, cloud, sea surface, and land are depicted in yellow, red, blue, pink, and black, respectively.

Figure 9. Detection results of six different algorithms for sea fog events at 03:50 UTC on 29 April 2018. (a) Original image synthesized from Band 03, Band 04, and Band 05; (b) ground truth; (c) results of the threshold method; (d) results of MLP + CNN; (e) results of U-Net; (f) results of FCN; (g) results of FCN + CRF; (h) our result. The sea fog, cloud–fog mixed areas, cloud, sea surface, and land are depicted in yellow, red, blue, pink, and black, respectively.

Figure 10. Time-series evolution of sea fog detection results from 03:20 to 04:50 UTC on 8 June 2018, at 10 min intervals, showing ground truth, sea fog detection results, and a comparison between the ground truth and the detection results.

Figure 11. The performance of various evaluation indicators on continuous time series data.

Figure 12. The fog detection results and VFM validation of the proposed method are shown. (a) The original image, synthesized from Band 03, Band 04, and Band 05, includes a colored line indicating the CALIOP orbit track, where the A–B and C–D segments correspond to sea fog regions; (b) visualization of the VFM derived from the CALIPSO backscatter data along the trajectory on 8 June 2018. Dark blue, yellow, and green indicate regions of “clouds,” “aerosols,” and “surface,” with high confidence, respectively.

Table 1. Introduction to the observation bands of the Himawari-8 Standard Data.

Band	Band Type	Wavelength (μm)	Spatial Resolution (km)	Detection Category
01	VIS	0.47	1	Vegetation, aerosol
02	VIS	0.51	1	Vegetation, aerosol
03	VIS	0.64	0.5	Low cloud (fog)
04	NIR	0.86	1	Vegetation, aerosol
05	NIR	1.6	2	Cloud phase recognition
06	NIR	2.3	2	Cloud droplet effective radius
07	IR	3.9	2	Low cloud (fog), natural disaster
08	IR	6.2	2	Water vapor density from troposphere to mesosphere
09	IR	6.9	2	Water vapor density in the mesosphere
10	IR	7.3	2	Water vapor density in the mesosphere
11	IR	8.6	2	Cloud phase discrimination, sulfur dioxide
12	IR	9.6	2	Ozone content
13	IR	10.4	2	Cloud image, cloud top
14	IR	11.2	2	Cloud image, sea surface temperature
15	IR	12.4	2	Cloud image, sea surface temperature
16	IR	13.3	2	Cloud height

Table 2. Statistical distribution of sea fog samples.

Year	Time Period	Samples	Train/Test Partition
2018	12 March–8 June	64	51/13
2019	22 February–4 June	103	82/21
2020	23 January–2 July, 28 December	216	173/43

Table 3. Comparison of evaluation results under different input features and time interval settings, with the best results highlighted in bold.

Spectral	Motion	STCF	T = 1	T = 3	T = 6	mIoU
√						0.838
√	√		√			0.853
√		√	√			0.841
√	√	√	√			0.953
√	√	√		√		0.94
√	√	√			√	0.938

Table 4. A comparison of the experimental results of different methods on the testing set, with the best results highlighted in bold.

	FAR	Recall	CSI	Precision	Accuracy	KSS	mIoU	F1
Threshold	0.14	0.725	0.427	0.509	0.838	0.585	0.621	0.598
MLP + CNN	0.049	0.822	0.455	0.504	0.943	0.773	0.698	0.625
U-Net	0.011	0.946	0.904	0.947	0.982	0.935	0.941	0.946
FCN	0.012	0.938	0.893	0.939	0.98	0.926	0.934	0.938
FCN + CRF	0.012	0.949	0.909	0.949	0.981	0.938	0.943	0.949
Ours	0.008	0.964	0.934	0.965	0.987	0.957	0.959	0.964

Table 5. A comparison of the experimental evaluation metrics of different methods in cases of low cloud interference, with the best results highlighted in bold.

	FAR	Recall	CSI	Precision	Accuracy	KSS	mIoU	F1
Threshold	0.097	0.571	0.42	0.614	0.832	0.474	0.615	0.592
MLP + CNN	0.020	0.905	0.842	0.923	0.964	0.885	0.899	0.914
U-Net	0.013	0.931	0.885	0.933	0.979	0.918	0.930	0.932
FCN	0.032	0.917	0.82	0.886	0.957	0.885	0.884	0.901
FCN + CRF	0.007	0.969	0.943	0.971	0.990	0.962	0.965	0.969
Ours	0.007	0.973	0.949	0.973	0.991	0.966	0.969	0.973

Table 6. A comparison of the experimental evaluation metrics of different methods in cases of high cloud occlusion, with the best results highlighted in bold.

	FAR	Recall	CSI	Precision	Accuracy	KSS	mIoU	F1
Threshold	0.045	0.891	0.6	0.648	0.949	0.845	0.773	0.75
MLP + CNN	0.028	0.681	0.470	0.603	0.955	0.653	0.712	0.640
U-Net	0.013	0.883	0.776	0.865	0.988	0.870	0.876	0.874
FCN	0.011	0.831	0.742	0.874	0.975	0.820	0.858	0.852
FCN + CRF	0.011	0.940	0.898	0.942	0.981	0.929	0.936	0.941
Ours	0.004	0.959	0.913	0.950	0.992	0.955	0.952	0.954

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, S.; Tang, Y.; Zhou, Z.; Zhao, X.; Yang, P.; Hu, Y.; Bo, R. A Multi-Feature Fusion Approach for Sea Fog Detection Under Complex Background. Remote Sens. 2025, 17, 2409. https://doi.org/10.3390/rs17142409

AMA Style

Yang S, Tang Y, Zhou Z, Zhao X, Yang P, Hu Y, Bo R. A Multi-Feature Fusion Approach for Sea Fog Detection Under Complex Background. Remote Sensing. 2025; 17(14):2409. https://doi.org/10.3390/rs17142409

Chicago/Turabian Style

Yang, Shuyuan, Yuzhu Tang, Zeming Zhou, Xiaofeng Zhao, Pinglv Yang, Yangfan Hu, and Ran Bo. 2025. "A Multi-Feature Fusion Approach for Sea Fog Detection Under Complex Background" Remote Sensing 17, no. 14: 2409. https://doi.org/10.3390/rs17142409

APA Style

Yang, S., Tang, Y., Zhou, Z., Zhao, X., Yang, P., Hu, Y., & Bo, R. (2025). A Multi-Feature Fusion Approach for Sea Fog Detection Under Complex Background. Remote Sensing, 17(14), 2409. https://doi.org/10.3390/rs17142409

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Multi-Feature Fusion Approach for Sea Fog Detection Under Complex Background

Abstract

1. Introduction

2. Study Area and Data

2.1. Study Area

2.2. Himawari-8 Standard Data

2.3. Multiple Feature Representation

2.3.1. Spectral Feature

2.3.2. Motion Feature

2.3.3. Spatiotemporal Texture Consistency Feature

2.4. Construction of Dataset

3. Methodology

3.1. Encoder Component

3.2. Decoder Component

3.3. Loss Function

4. Results

4.1. Implementation Details and Evaluation Metric

4.2. Ablation Study on Feature Components and Temporal Intervals

4.3. Comparison with Other Methods

4.4. Cases of Low Cloud Interference and High Cloud Occlusion

4.5. Sea Fog Detection in Continuous Observations

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI