1. Introduction
Sea fog is a significant cause of maritime and aviation accidents due to reduced visibility, making its accurate detection and prediction essential for operational safety [
1,
2]. Some studies report that the loss of life and property caused by sea fog can be as severe as that resulting from tornadoes or hurricanes [
3]. Recently, the increasing frequency of fog occurrence due to climate change has further exacerbated the risk of safety incidents, as factors such as rising sea surface temperatures, enhanced air–sea temperature contrasts, and increased low-level atmospheric humidity create more favorable conditions for sea fog formation in coastal regions [
3,
4].
Traditionally, sea fog detection technologies have relied primarily on in situ point observations. However, these methods suffer from limitations in extensive monitoring, and detection is particularly challenging in marine environments due to the scarcity of observation stations and the limited nature of measurement data [
5,
6]. Satellites can cover vast areas and operate across multiple spectral channels, enabling precise detection and monitoring of sea fog [
7].
Research on sea fog detection using satellite imagery has employed diverse approaches, including (1) rule-based detection, (2) classical neural networks and machine learning, and (3) deep learning-based image recognition.
Rule-based detection algorithms classify sea fog using pre-defined thresholds and decision rules derived from the physical spectral characteristics of satellite images, such as brightness temperature, reflectance, and RGB composites [
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18]. Operational products from agencies such as EUMETSAT, NOAA, and JMA, as well as GOCI-based algorithms, have demonstrated the feasibility of daytime and nighttime sea fog detection over various regions [
8,
9,
10,
11,
15]. However, these approaches require region- and sensor-specific tuning and often struggle to generalize across different backgrounds, seasons, and cloud conditions because their thresholds are fixed in space and time [
5,
6,
7,
12,
13,
14,
15,
16,
17].
To alleviate the limitations of purely rule-based schemes, machine learning methods (e.g., KNN, RF, SVM, and ERT) have been applied to satellite-derived variables for sea fog classification and dissipation prediction and even combined with domain adaptation strategies to compensate for the lack of marine labels and to support fog and low-stratus nowcasting from geostationary imagery [
19,
20,
21,
22,
23,
24]. Nevertheless, these models still tend to suffer from overfitting and insufficient use of spatial context, and their performance often degrades when applied beyond the training conditions [
21,
22,
23].
More recently, deep learning-based image recognition techniques, particularly CNN and Transformer architectures, have substantially advanced sea fog detection by learning global and local features directly from multispectral satellite imagery [
25,
26,
27,
28,
29,
30,
31,
32]. U-Net variants, CNN transfer learning models, hybrid CNN–Transformer networks, and dual-branch architectures have been successfully applied to sensors such as MODIS, GOCI, AHI, and GOCI-II, demonstrating improved accuracy in distinguishing sea fog, low clouds, and the sea surface at the pixel level [
24,
25,
26,
27,
28,
29,
30]. These studies also explored multi-satellite inputs to mitigate temporal and spatial constraints, showing the potential of deep learning for near-real-time monitoring [
20,
28,
29,
30]. However, most existing approaches still rely on a single satellite or simple combinations of sensors without fully exploiting complementary spectral characteristics, and they are often limited by geometric misalignment between platforms [
19,
20,
29,
30]. As a result, there remains a clear need for a deep learning framework that (i) robustly co-registers multi-satellite imagery at high precision and (ii) systematically evaluates state-of-the-art semantic segmentation models for multi-sensor sea fog detection.
Despite the active use of deep learning in sea fog detection, further improvements in accuracy and reliability require the simultaneous use of multiple satellite datasets rather than a single dataset. Data provided by a single satellite is restricted to specific spectral bands. In addition to mid- and thermal-infrared bands, a combination of diverse spectral bands from multiple satellite images is necessary to capture the complex meteorological characteristics of sea fog in greater detail.
In the Korean geostationary satellite constellation, the GK2A AMI mainly provides thermal infrared–oriented fog products, whereas the GK2B GOCI-II offers visible and near-infrared marine fog products optimized for ocean color and atmospheric optical characteristics. In practice, GK2A-based fog outputs tend to miss optically thin or low-level sea fog under shallow cloud layers, while GOCI-II Marine Fog (MF) products frequently overestimate fog by confusing bright high-level clouds or sea-surface reflection patterns with fog (see also
Section 4.2) [
33,
34,
35].
These complementary strengths and weaknesses indicate that using only GK2A or only GK2B is insufficient for robust sea fog monitoring and that their combined use is expected to provide more balanced detection performance. However, studies combining deep learning models with multi-satellite imagery for sea fog detection, particularly those jointly exploiting GK2A AMI and GK2B GOCI-II, are rarely found. One reason is that, even after geometric correction, spatial misalignment inevitably occurs due to inaccurate co-registration when overlaying images acquired from different satellites.
In this study, we aim to achieve higher accuracy and reliability in sea fog detection by employing a deep learning-based advanced co-registration technique for multi-satellite image combination and the autotuning-based optimization of State-of-the-Art (SOTA) semantic segmentation models. We used the Advanced Meteorological Imager (AMI) sensor of the Geostationary Korea Multi-Purpose Satellite 2A (GK2A) and the GOCI-II sensor of the Geostationary Korea Multi-Purpose Satellite 2B (GK2B). AMI offers diverse spectral bands, including visible, near-infrared, and thermal infrared bands, while GOCI-II provides several detailed visible bands. Combining the complementary band information from these two satellites can yield richer information for sea fog detection than using single-satellite data alone. Hence, we employed Robust Dense Feature Matching (RoMa), a deep learning-based image co-registration model, to prevent inter-image misalignment between the two types of imagery due to different acquisition times, fields of view, or other environmental factors. Also, a more sophisticated sea fog detection algorithm is constructed by the performance evaluations for multiple deep learning segmentation models through the hyperparameter autotuning with the Optuna library. We focus on the Korean Peninsula and surrounding seas observed by GOCI-II and will evaluate sea fog detection performance, accounting for regional characteristics. This approach is expected to solidify and advance the excellent results demonstrated in previous research, further improving the accuracy and real-time monitoring capabilities of sea fog detection, thereby contributing to the safety of maritime and aviation operations.
2. Materials and Methods
We used geostationary satellite imagery from GK2A AMI and GK2B GOCI-II, together with GK2A cloud-top-height products and in situ observations from ASOS and sea fog observation stations, to construct labeled sea fog datasets and train deep learning segmentation models.
Figure 1 illustrates the overall research flow, which is divided into image labeling, image co-registration, and model construction. In the image labeling step, fog annotations were created from AMI and GOCI-II imagery and in situ observation data to construct labeled training data. In the image co-registration part, the RoMa model was used to precisely correct positional discrepancies between the two satellite images, thereby generating 6-channel input data. Finally, during the model construction step, the dataset was split into training, validation, and test sets. Various segmentation models were trained, ultimately producing the final sea fog detection results.
2.1. Overview of GK2A and GK2B Satellites
Our sea fog detection is conducted using both GK2A and GK2B satellite imagery. These two geostationary satellites, developed by the Korea Aerospace Research Institute (KARI), share standard systems and body designs but feature different payloads tailored to their respective missions [
36]. GK2A AMI employs a multispectral scanning mirror system that captures 16 spectral channels. A dual-axis scanning mirror rotates in both azimuth and elevation, traversing the entire Earth disk every 10 min. Each scan line is collected at a nadir-based spatial resolution of approximately 2 km [
37]. In contrast, GK2B GOCI-II employs a push-broom design with 12 independent slot Charge-Coupled Device (CCD) arrays to observe Northeast Asian waters at high resolution repeatedly. By dividing the observation area into 12 segments, the satellite sequentially scans ground pixels in each slot as it orbits the Earth. This enables 10 daily acquisitions at 250 m spatial resolution and one daily acquisition at 1000 m resolution covering the entire hemisphere. A key feature is the application of Time-Delay Integration (TDI) techniques for each slot, which enhance the signal-to-noise ratio (SNR) while enabling precise analysis of complex ocean and atmospheric color characteristics [
38]. GK2A was designed for meteorological disaster monitoring and improved forecasting, while GK2B was designed to support marine environmental change monitoring and coastal management. As such, the two satellites produce specialized information for meteorological and ocean observations through their distinct payloads. Jointly using AMI and GOCI-II imagery effectively combines their complementary spectral information, significantly enhancing the performance of sea fog detection. Our study area is the Korean Peninsula and adjacent Northeast Asian waters within the common observation zone of both satellites (
Figure 2). For spatial colocation, we restricted GK2A AMI local-area scenes to the footprint of the GOCI-II slots covering this region. The spectral band configurations provided by the two satellites are summarized in
Table 1 [
33,
34].
2.2. Labeling Annotation Data for Sea Fog Detection
Using Closed-Circuit Television (CCTV) footage from the Korea Hydrographic and Oceanographic Agency (KHOA) and other sources, we identified the dates and locations where sea fog occurred from January 2023 to July 2024. We collected AMI L1B and GOCI-II L1B satellite imagery captured between 8:00 AM and 12:00 PM on the confirmed sea fog occurrence dates. AMI data is provided at 10 min intervals, while GOCI-II data acquisition completes around 28 min past the hour for the 7th slot. To use both satellite images, we selected image pairs whose acquisition times differed by less than 30 min and whose footprints overlapped the common study area shown in
Figure 2.
Reviewing the literature to select effective bands for sea fog identification revealed that the false-color composite (FCC) of 0.64 μm, 1.6 μm, and 11.2 μm bands was reported to be effective for Himawari 8 AHI. In comparison, the FCC bands at 0.865 μm, 0.443 μm, and 0.412 μm proved effective for GOCI images [
26,
30]. These FCCs can also be valid for GK2A AMI and GK2B GOCI-II because AMI has the same band configuration as AHI, and GOCI-II carries all three GOCI bands. Sea fog, composed of fine water droplets, exhibits strong reflectance in the 0.64 μm (visible) band due to scattering by these droplets in sunlight. Specifically, within the 0.4–0.7 μm range, sea fog generates a clear reflectance signal due to uniform scattering. However, in the 1.6 μm (shortwave infrared) channel, absorption by water becomes prominent. Crucially, clouds tend to exhibit more absorption and less reflectance than sea fog in this shortwave infrared channel due to their larger droplet size and longer absorption path, following the principles of micro-scattering theory. This contrast in spectral properties allows for the effective differentiation of sea fog from clouds through spectral analysis of satellite imagery [
38]. Also, 11.2 μm (thermal infrared) brightness temperature provides information that sea fog is colder than the surrounding sea surface. Reflecting these spectral characteristics, we selected the AMI 0.64 μm, 1.6 μm, and 11.2 μm bands, along with the GOCI-II 0.412 μm, 0.443 μm, and 0.865 μm bands as the input channel for the sea fog detection model. GOCI-II FCC images provide detailed spectral and texture information. Sea fog typically displays a smooth pink surface texture, while clouds exhibit a relatively coarse white surface texture in GOCI-II FCC images [
26].
We performed annotation work for sea fog detection using FCC images from AMI and GOCI-II, AMI cloud top height (CTH) data, and in situ data from ASOS and sea fog observation stations (
Figure 3). To create the FCC images for effective identification of sea fog, we used the above-selected three AMI bands as the RGB channels for the AMI FCC imagery and the three selected GOCI-II bands as the RGB channels for the GOCI-II FCC imagery. Cross-analysis of the FCC images, CTH data, and field measurements for fog-occurrence days was conducted to derive common fog characteristics. The AMI FCC images showed fog areas in light green, while the GOCI-II FCC images displayed them in a color close to pink. Drawing on the concept of surface homogeneity for low clouds and fog presented in [
18], we identified and extracted fog areas using color and texture in the two FCC images. Furthermore, AMI CTH data was also utilized to reflect the characteristics of sea fog, which is close to the ground and thus at low altitude.
Based on these data, the observation-based labeling followed a three-step procedure.
First, for coastal regions near the ASOS and sea fog observation stations, we delineated initial fog seed areas around stations reporting sea fog by selecting pixels whose FCC color/texture and CTH values were consistent with the typical fog signatures described above.
Second, these seed areas were extended offshore by region-growing along contiguous pixels that preserved homogeneous fog-like FCC textures and low CTH values, allowing fog labels to be assigned over maritime areas without direct in situ coverage.
Third, in offshore regions far from any ground-based station, candidate fog patches were identified solely from FCC and CTH patterns; only pixels forming clearly homogeneous and persistent fog-like structures were labeled as sea fog, whereas ambiguous areas were conservatively assigned as non-sea-fog.
The operational GK2A AMI Fog and GK2B GOCI-II Marine Fog (MF) products were intentionally not used during the labeling stage so that our reference labels would be independent of any existing algorithm and could later serve as an unbiased benchmark for comparing the proposed deep learning model with the operational products.
Given reduced visibility in sea fog conditions, a comparative analysis was conducted using in situ visibility measurement data from ASOS and sea fog observation stations. Specifically, we used visibility observations from 95 land-based ASOS stations with an hourly reporting interval and from 11 coastal sea fog observation stations with a 1-min resolution. Both ASOS and sea fog stations provide visibility measurements, which were jointly used as reference information for sea fog labeling. Since such features of sea fog were difficult to extract automatically [
18], annotation tools were used to perform precise manual annotation (
Table 2).
Overall, 86 annotated scenes were selected from confirmed sea-fog days, each corresponding to a daytime case in which sea fog was clearly present in the FCC imagery. Across all annotated pixels in these 86 fog-containing scenes, sea fog and non-sea-fog pixels account for 4% and 96%, respectively; this class imbalance may cause minority class bias during training, necessitating imbalance correction.
Figure 4 summarizes the monthly sampling distribution of the labeled dataset for transparency. Long-term observational studies report that sea fog around the Korean Peninsula exhibits strong seasonality, with higher occurrence typically during the warm season from late spring to summer and higher frequencies over the Yellow/West Sea than over the East Sea [
39,
40]. Accordingly, our labeled dataset was constructed by selecting confirmed fog days rather than by uniform temporal sampling. Almost 80% of our data is from May and June, while early spring (March and April) has almost 20%, and late winter (February) has less than 5%.
2.3. Co-Registration for AMI and GOCI-II Satellite Images
The GK2A and GK2B satellites acquire imagery using different sensor characteristics and observation methods. Therefore, ensuring spatial alignment is essential for simultaneously utilizing both images. We performed a co-registration procedure to achieve precise fusion of AMI and GOCI-II imagery. Image co-registration is the process of overlaying two or more images of the same scene, captured at different times, from different perspectives, or by other sensors. First, unique control points, such as closed boundaries, edges, contours, and line intersections, are automatically or manually detected in both images. Then, corresponding control point pairs are identified using various feature descriptors, similarity measures, and spatial relationships. Subsequently, based on these correspondences, a mapping function such as a homography is estimated. Finally, the estimated mapping function is applied to align the coordinate systems of the detected and reference images. Non-integer pixel coordinates are interpolated to obtain the final matched image [
41].
We employed RoMa, a deep learning-based feature detection and matching model (
Figure 5). RoMa provides powerful capabilities for estimating pixel-level dense warps and their uncertainties, demonstrating high robustness by leveraging the pre-trained Distillation of Knowledge with No Labels Version 2 (DINOv2). DINOv2 is a Vision Transformer based on self-supervised learning that utilizes a match decoder to predict anchor probabilities and processes multimodal representations, enabling more sophisticated matching than traditional local features. Furthermore, it introduces a regression-based classification loss function to enhance matching performance, achieving results that are much better than those of existing methods [
42]. Traditional Scale-Invariant Feature Transform (SIFT) and Oriented FAST and Rotated BRIEF (ORB) algorithms are manually designed feature descriptors. While SIFT offers scale and rotation invariance and ORB provides fast computation and efficiency, their performance is limited under conditions of lighting changes, distortions, or a lack of texture. RoMa overcomes these limitations by combining deep features from DINOv2 with details extracted from a ConvNet to achieve more reliable matching.
The RoMa model can automatically detect and match features in AMI and GOCI-II images. Based on the matched features, we computed the homography matrix between the GOCI-II and AMI images. The computed homography matrix was applied to the GOCI-II image to register the two images. A comparison before and after registration revealed a significant spatial mismatch between the AMI and GOCI-II images, attributed to coordinate differences before co-registration. However, after co-registration, the two images were found to be noticeably aligned. We used the original RoMa implementation with publicly released pre-trained weights and did not perform additional fine-tuning on our dataset, because RoMa is designed as a general-purpose dense matcher and our primary objective was to generate geometrically consistent fusion inputs. The registration performance was evaluated qualitatively by visually inspecting the alignment of coastlines and major geographical features with the KHOA shoreline data, and we confirmed that the residual misalignment was negligible at the native resolution for all selected scenes.
Figure 6 shows the comparisons of the reference AMI FCC image and the GOCI-II FCC image before and after co-registration, providing a visually consistent basis for multi-sensor comparison. The AMI FCC image represents land surfaces in dark green and sea areas in dark blue, while the GOCI-II FCC image depicts land surfaces in dark red and sea areas in dark cyan. Additionally, the white solid line represents the coastline data provided by KHOA. Before the registration, the shoreline in the GOCI-II FCC image was slightly misaligned with the KHOA shoreline. After co-registration, both images achieved precise spatial alignment, accurately matching the shorelines.
2.4. Training Deep Learning Segmentation Models
We adopted relatively SOTA semantic segmentation models for CNN (OCRNet, ConvNeXt-L, and SegNeXt) and Transformer (SegFormer, Swin Transformer, and Mask2Former) [
43,
44,
45,
46,
47,
48], which demonstrate strong performance in satellite remote sensing and can effectively process the complex image features required for sea fog monitoring. While CNN-based models leverage the characteristics of convolutional neural networks to integrate local features and multi-resolution information effectively, Transformer-based models excel at capturing global as well as local contextual information, enabling a more precise understanding of complex image patterns. By training both types of models and comparing their results, we aim to identify the most suitable modeling approach for sea fog detection.
Spatially aligned AMI and GOCI-II images were cropped into identical regions and resampled with dimensions of 1024 × 1024 pixels, resulting in the six-channel fusion input composed of the three AMI and three GOCI-II bands described in
Section 2.2. The entire training dataset consists of 1002 patches for input and label data, obtained from 86 scenes captured on 24 foggy days during seven months in 2023 and 2024. To facilitate evaluation, 10 scenes (120 patches) from two fog days were assigned to the validation set, and another 10 scenes (120 patches) from two different fog days were allocated to the test set. The remaining 66 scenes (762 patches) from 20 foggy days were used as the training set. These splits were performed on a scene-by-scene basis, meaning that all patches from a given scene were assigned to the same subset rather than being randomly mixed at the patch level. This ensures separation between the train, validation, and test sets, which is necessary to avoid overfitting of the deep learning model. In addition, the fog/non-fog pixel ratios in the training, validation, and test sets were kept close to the overall 4% vs. 96% distribution shown in
Table 2. Although the number of training scenes is limited, they cover multiple sea fog events over different days and regions around the Korean Peninsula, providing a diverse set of fog patterns for model learning. Transfer learning was conducted for efficient learning and rapid optimization. The weights from the first input layer of a 3-channel model pre-trained on the existing ADE20K dataset were imported and scaled to fit the 6-channel input. Input images underwent normalization using channel-wise mean and standard deviation before being fed to the model during training. Subsequently, data augmentation was applied using geometric transformations such as random resizing, cropping, and flipping. This procedure exposed the model to diverse data variations, enhancing its generalization performance. In combination with transfer learning, these augmentation strategies were adopted to effectively expand the variability of the training data and compensate for the limited number of original scenes.
The hyperparameters used during model training were autotuned by Optuna to optimize training efficiency and performance. Optuna is a Bayesian optimization framework that samples candidate hyperparameter combinations within the search space per trial, repeatedly training and evaluating the model for each combination to derive optimal hyperparameter values. Unlike grid search or random search, it offers the advantage of probabilistic exploration of optimal combinations in the N-dimensional hyperparameter space, even when new datasets are added or experimental conditions change. We set the ranges of learning rate, batch size, dropout ratio, weight decay, and background class weight for the hyperparameter space (
Table 3). Among the five trials, the combination that ultimately achieved the highest performance was adopted as the optimal hyperparameters. Moreover, background class weighting was applied to suppress background overconfidence and mitigate undetected fog.
All models were trained on an NVIDIA GeForce RTX 3090 Ti (24 GB) with an Intel(R) Core(TM) i9-12900K and 64 GB of RAM. The average training time was approximately 15 h per model until convergence, including five Optuna trials, using the configuration described above. For inference, we report the runtime using a representative Transformer-based segmentation model, which required approximately 0.2 s per 1024 × 1024 patch and approximately 2.4 s per full scene when processing the scene as multiple 1024 × 1024 patches (on average ~12 patches per scene in our dataset). These runtimes indicate that near-real-time daytime sea fog monitoring is feasible when considering the operational update cycles of GK2A AMI (10 min) and GK2B GOCI-II.
To quantitatively evaluate the fog detection model, several commonly used pixel-level classification metrics were employed. By comparing the model’s inference results with the labeled image pixel-by-pixel, true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN) were calculated for the segmented images. Based on this, Intersection over Union (IoU), Accuracy, Precision, Recall, and F1-score were calculated to evaluate the model’s segmentation performance.
Intersection over Union (IoU) is defined as the ratio of the intersected area between the predicted and actual fog area to the union area of both. A high IoU value indicates that the model closely matches the actual sea fog area.
Accuracy is the ratio of correctly classified pixels to the total number of pixels. While Accuracy provides an intuitive overview of overall performance, it is used as a supplementary metric in scenarios with class imbalance, such as sea fog.
Precision is the ratio of actual fog pixels to the pixels predicted as fog by the model. High precision indicates that the model’s predictions do not tend to overestimate.
False Alarm Ratio (FAR) is defined as the proportion of false positives among all predicted fog pixels.
Recall is the ratio of correctly detected fog pixels to the actual fog pixels. This value indicates the model’s sensitivity; a higher value indicates a lower tendency to underestimate.
The F1-score is the harmonic mean of precision and recall, balancing the two metrics. It reflects overall performance degradation when one metric is extremely low.
Thus, each metric contributes to evaluating the model’s fog detection performance and prediction accuracy from multiple criteria and was used to analyze the overall segmentation performance quantitatively.
3. Results
The optimized hyperparameters by Optuna in
Table 4 were selected as the final configuration for each model. Test datasets were evaluated by the optimized models, and the evaluation metrics for the sea fog detection were summarized in
Table 5. While all models achieved over 98% accuracy, it is because background pixels overwhelmingly dominate the images. Rather, metrics such as IoU, Precision, and Recall are better suited for comparing model performance. Swin Transformer achieved the highest performance with an IoU of 77.242, followed closely by Mask2Former (76.118) and SegNeXt (76.110). In contrast, SegFormer (72.049) and OCRNet (71.194) had relatively low IoU. In terms of Precision, the Swin Transformer (82.796) achieved the highest score, indicating high confidence in the predicted fog areas. For Recall, SegNeXt (93.616) was the highest, demonstrating strength in not missing actual fog pixels. The F1-score, which balances the two metrics, was highest for Swin Transformer (87.160), with Mask2Former (86.440) and SegNeXt (86.435) showing very close performance, confirming them as particularly effective models for sea fog detection.
Figure 7 provides an overview of the FCC and labeled images for the test dataset, along with the inference results from each detection model, thereby complementing the quantitative comparison in
Table 5 with qualitative FP/FN patterns across architectures. The segmentation maps for the model results are black for TN, white for TP, orange for FP, and red for FN. Overall, all models showed high rates of TP and TN. Notably, the Transformer-based Swin Transformer and Mask2Former, along with the CNN-based SegNeXt, recorded the highest TP and TN rates and the lowest FP and FN rates, suggesting excellent fog detection performance. In contrast, OCRNet exhibited lower FP but somewhat higher FN, while SegFormer showed lower FN but a slightly higher FP. Overall, Transformer-based models, particularly Swin Transformer, demonstrated superior performance compared to CNN-based models in precisely detecting complex sea fog boundaries, achieving high IoU and F1-scores.
Table 6 shows the evaluation metrics for the Swin Transformer model across the test dataset slots. Slots containing no fog pixels or extremely rare fog pixels were excluded from evaluation. S007 (Korean Peninsula) and S010 (Southern Bohai Sea and Eastern China) showed the highest performance with an IoU of 92.114 and 89.658 and an F1-score of 95.895 and 94.547, respectively. Precision of 94.612 and 92.970, and the recall of 97.214 and 96.179, also mean balanced performance. S004 (Central-Southern East Sea and East Coast) also showed excellent results, with an IoU of 81.899 and an F1 of 90.049. At the same time, S005 (Northeast East Sea and Northwest Japan) demonstrated strength in minimizing fog omission, with a recall of 95.681. Conversely, S001 (Southern Japanese Waters) and S002 (Eastern Japanese Waters) had low precision of 49.439 and 58.331, resulting in IoU levels of 49.050 and 51.746. Although Recall was high at 98.423 and 82.090 for these two regions, false detections increased due to the combination of complex backgrounds and low-contrast features such as clouds, sea surface reflections, and haze. In addition, these slots contained relatively fewer fog cases in the training data, which may have limited the model’s ability to learn region-specific fog–background characteristics and contributed to the reduced precision. S008 (Northern East Sea and Southern Primorsky Krai) showed high precision of 94.658 but low recall of 68.812, indicating a tendency for increased omissions. In summary, the model operated stably in areas with frequent and distinct fog patterns, such as S007, S010, and S004. Conversely, in areas with a complex background and relatively fewer fog cases, such as S001 (Southern Japanese Waters) and S002 (Eastern Japanese Waters), an increase in false positives or false negatives was observed. These waters are strongly influenced by warm currents, such as the Kuroshio Current, resulting in relatively high sea surface temperatures. Consequently, the fog here may consist of a mixture of radiation fog or steam fog—which arises from the condensation of warm, moist air itself—rather than the typical advection fog (formed by warm air cooling over a cold sea surface). Radiation and steam fogs often have a thin optical thickness, making them difficult for satellite sensors to capture. Furthermore, the high frequency of mid- and high-level clouds in these areas, due to the passage of low-pressure systems or the influence of seasonal rain fronts, makes it challenging to distinguish fog from clouds.
4. Discussion
4.1. Comparisons with Currently Operational Products
Figure 8 compares the officially operational AMI Fog and GOCI-II Marine Fog (MF) products, respectively, along with the results predicted by the Swin Transformer model. Overall, the AMI Fog product showed a tendency toward underestimation, particularly a significant tendency to exclude sea fog in areas associated with shallow or optically thin cloud. In contrast, our deep learning model reduced omission errors by detecting sea fog signals in such conditions when fog features remained distinguishable in the FCC imagery and low-level CTH, leveraging the spatial–spectral context of surrounding pixels. It should be noted that fog fully obscured by optically thick cloud layers remains undetectable using optical imagery alone. Meanwhile, the GOCI-II MF product exhibited a strong tendency toward overestimation, with instances observed in which it misidentified areas of high-level clouds (shown in pink on the AMI FCC) as sea fog. In contrast, our model demonstrated relatively stable and balanced detection performance compared to existing operational outputs. This suggests the model effectively mitigated both under- and overestimation issues by comprehensively learning diverse spectral and spatial patterns.
4.2. Advantages from Multi-Sensor Image Fusion
Table 7 and
Figure 9 show the results comparing the single-satellite and multi-satellite models. The multi-satellite model using AMI and GOCI-II showed slightly improved performance compared to the single-satellite model, with an IoU of 77.242 and an F1-score of 87.160. However, while the multi-satellite model had a somewhat lower Precision of 82.796 compared to the standalone AMI model’s 84.130, Recall significantly improved from 88.782 to 92.009. This indicates that the multi-satellite model experienced an increase in false positives but a reduction in missed fog pixels. In the fog detection problem, fog pixels are not only a minority class but also areas that pose potential safety risks across various fields, such as transportation and navigation. Therefore, the importance of Recall, which ensures the detection of all actual fog areas without omission, is particularly significant. Securing high Recall over Accuracy or Precision is crucial for reliably detecting sea fog from the perspective of preventing risks before they occur. These results demonstrate that complementary information from different sensors, provided by AMI and GOCI-II, effectively captures the overall pattern, improving not only IoU and F1-score but also Recall, a core metric for fog detection. Future research should refine the model by enhancing multi-sensor fusion strategies and expanding the dataset to minimize false detections while reducing fog detection omissions.
4.3. An In-Depth Case Study
To validate the actual detection performance, two distinct fog-inflow cases were analyzed for an in-depth analysis of the Swin Transformer model’s inference results (
Figure 10). The first was a widespread advection fog case observed on 5 July 2024, along the West Coast, Jeju Island, and Busan. The second was a localized fog case declared near Daesan Port on 9 August.
In the 5 July case, the Swin Transformer model successfully detected widespread fog over the West Coast and waters off Jeju Island. Marine weather observation buoy data from multiple points showed air temperatures higher than sea surface temperatures and humidity exceeding 90%, precisely matching the typical formation conditions for advection fog: warm, moist air cooling as it passes over the cold sea surface. This pattern of dense, uniformly developed advection fog across such a broad area was also clearly visible in AMI and GOCI-II FCC imagery, as indicated by the location labels in
Figure 10. This uniformity was a key factor enabling the model to extract and detect its features reliably. However, sea fog near Busan on the same day was not detected. Analysis of satellite imagery revealed that the sea fog around Busan was very sparse and exhibited a complex cloud structure, with low-level and high-level clouds mixed. This low contrast and complex background are analyzed as having made it difficult for the model to clearly distinguish the unique signal of sea fog. This case demonstrates that the distributions of sea fog and the interactions with surrounding clouds can be critical variables that affect the detection performance of deep learning models.
For the 9 August Daesan Port case, our Swin Transformer model showed highly impressive performance. Although the sea fog in that area was too faint to identify in a true-color image, the model successfully detected it. This demonstrates the model’s high sensitivity, capable of detecting subtle spectral changes beyond human visual perception because we used six channels, including visible, shortwave, and thermal infrared bands from AMI and GOCI-II. Buoy data at the time suggested potential mixing of air masses with locally distinct characteristics, which could be associated with conditions for fog or light vapor fog, which can occur in bay terrain. The model is judged to have effectively detected this sea fog with weak features by utilizing specific spectral band information.
Overall, the Swin Transformer model demonstrated strong reliability in detecting widespread, dense drift fog patterns. Conversely, it was confirmed that performance could be limited when thin drift fog is mixed with complex cloud structures, as seen in the Busan case. Also, the Daesan Port case demonstrated high sensitivity, exceeding human observational capabilities, for faint, localized sea fog that is difficult to identify visually. These results suggest that deep learning-based sea fog detection models may exhibit varying performance depending on the physical causes of fog formation and its visual characteristics (concentration, morphology, and surrounding environment). Specifically, it was confirmed that the key factor determining performance is the model’s ability to learn the unique spectral and morphological features of each fog type. Therefore, future research should focus on introducing data augmentation strategies tailored to the characteristics of each fog type and on constructing detailed datasets that cover diverse cases. This will be crucial for improving generalization, enabling the model to ensure robust performance even in light fog or complex meteorological conditions.
4.4. Limitations and Future Directions
The results according to slots showed that not only model architecture but also regional and meteorological factors significantly influence detection performance. Slots S007 (Korean Peninsula) and S010 (Southern Bohai Sea/Eastern China) exhibited frequent fog occurrences with distinct patterns, resulting in a high IoU and F1-score exceeding 90. Conversely, slots S001, S002, and S008 showed significantly degraded detection performance due to complex background factors like clouds, low-level clouds, and sea surface reflections. These results suggest that environmental conditions in the observation areas directly influence detection performance differences, extending beyond mere limitations of the model architecture. Particularly in some slots, the insufficient number of sea fog cases may have prevented regional characteristics from being adequately reflected during training and validation. Therefore, future research should construct datasets with sufficient sea fog cases per slot and establish environments that enable balanced learning and evaluation of regional meteorological and oceanic characteristics. In particular, sampling strategies informed by climatology or fog distribution should be explored to reduce regional sampling bias, including stratified sampling by slot and targeted data collection for underrepresented regions such as S001 and S002. In addition, an independent labeling review by human experts, such as operational forecasters or ocean and meteorology specialists, and a quantitative inter-annotator agreement analysis should be conducted to further validate the reliability of the reference labels.
Several directions can be proposed to address these limitations in future research. First, dataset augmentation and expansion are needed to mitigate the spatiotemporal bias and imbalance in fog data. Acquiring additional fog cases across various seasons and regions, combined with the application of image augmentation techniques, could enhance the model’s generalization performance. Additionally, using generative models like Generative Adversarial Network (GAN) or diffusion models to artificially generate fog patterns that are difficult to obtain from actual observations could be considered to ensure diversity in the training data. Furthermore, designing a hybrid network architecture that combines CNNs’ local feature extraction with Transformers’ global representation learning is expected to enable more sophisticated detection of complex, ambiguously bounded fog areas. Finally, adding a post-processing module that uses auxiliary meteorological variables, such as Cloud Top Height (CTH) and relative humidity, could account for physical constraints, reduce false detections, and enhance detection reliability.
Figure 11 shows the input image, label image, model prediction result, and AMI CTH image for the case where the Swin Transformer recorded the lowest IoU across the entire test set. The AMI CTH image visualizes cloud top height, where pixels at or above 3 km altitude or without clouds are rendered completely black, while values between 0 and 3 are gradient-shaded from white to black. In this case, FPs were the predominant type. Areas predicted by the model as sea fog were identified as non-sea fog in the label image. Examining the AMI CTH for this region reveals a cloud top height below 3 km, with a mixture of dark and light gray tones. Furthermore, the AMI FCC imagery also lacks a homogeneous texture, making it difficult to attribute this to annotation errors and suggesting a genuine mixture of low-level clouds and sea fog. Therefore, in such complex areas, the model is interpreted as making a slight misclassification. Furthermore, the GOCI-II FCC imagery appears to have produced numerous no-data regions during registration, resulting in data discontinuities. This data incompleteness is also considered to have contributed to the model’s misclassification. To address these issues, appropriate preprocessing and correction techniques for no-data regions must be introduced. Furthermore, model improvements that leverage additional feature information, such as cloud height and texture, are required to better distinguish subtle differences between low-level clouds and sea fog. Future efforts should include re-examining annotations and conducting additional case studies to develop strategies for more precisely enhancing model performance in complex areas.
5. Conclusions
This study combined complementary spectral information from the geostationary satellite sensors GK2A AMI and GK2B GOCI-II for daytime sea fog detection. It precisely corrected the spatial misalignment of the imagery through deep learning-based co-registration (RoMa), then trained and evaluated the latest semantic segmentation model using a 6-channel fusion input. Swin Transformer, Mask2Former, and SegNeXt demonstrated balanced, excellent performance across overall metrics such as IoU and F1-score. Notably, multi-satellite fusion significantly improved Recall compared to single AMI to mitigate missing disaster information. These results indicate that combining sensors with differing spectral and spatiotemporal characteristics contributes to enhanced sensitivity in sea fog segmentation.
Additional analysis revealed performance variations across regions and time slots. Areas adjacent to the Korean Peninsula and the East Sea supported the model’s effectiveness, with high IoU and F1-scores, while performance in some regions was limited due to insufficient data and complex backgrounds. Furthermore, factors such as low-level cloud overlap, low contrast, and no-data during the registration process could cause false positives and false negatives. At the same time, class imbalance also limited the interpretability of the accuracy. This indicates that environmental context and data distribution directly impact model performance, and that the quality of integration and preprocessing determines actual operational capability.
Future work should focus on hybrid designs combining the Transformer’s global context learning with the CNN’s local detail extraction and on constructing datasets with expanded sea fog cases, correcting integration errors and no-data, integrating auxiliary features like CTH and texture, and incorporating GAN-based augmentation for fog types (e.g., advection and steam). The methodology presented in this study, encompassing the entire process of multi-satellite fusion, alignment, and segmentation, is expected to substantially enhance fog detection systems and improve maritime and aviation operational safety.