Deep Learning-Based Contrail Segmentation in Thermal Infrared Satellite Cloud Images via Frequency-Domain Enhancement

Shi, Shenhao; Wu, Juncheng; Yao, Kaixuan; Meng, Qingxiang

doi:10.3390/rs17183145

Open AccessArticle

Deep Learning-Based Contrail Segmentation in Thermal Infrared Satellite Cloud Images via Frequency-Domain Enhancement

by

Shenhao Shi

,

Juncheng Wu

,

Kaixuan Yao

and

Qingxiang Meng

^*

School of Remote Sensing Information Engineering, Wuhan University, Wuhan 430072, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(18), 3145; https://doi.org/10.3390/rs17183145

Submission received: 29 July 2025 / Revised: 3 September 2025 / Accepted: 9 September 2025 / Published: 10 September 2025

(This article belongs to the Special Issue Applications and Analysis of Satellite Cloud Imagery Using Deep Learning Techniques)

Download

Browse Figures

Versions Notes

Abstract

Highlights

What are the main findings?

MFcontrail is proposed, a deep learning model integrating multi-axis attention and frequency-domain enhancement for contrail segmentation in thermal infrared satellite images.
MFcontrail outperforms state-of-the-art methods (e.g., DeepLabV3+), achieving +5.03% F1-score on OpenContrails and +3.43% F1-score on Landsat-8 datasets.
Frequency-domain enhancement contributes 69.4% of IoU improvement, effectively preserving fine contrail edge details.

What is the implication of the main finding?

Provides a high-precision tool for contrail segmentation, supporting aviation climate research relying on accurate thermal infrared satellite image analysis.
Validates the effectiveness of frequency-domain strategies in satellite cloud analysis, offering insights for similar fine-grained remote sensing segmentation tasks.

Abstract

Aviation contrails significantly impact climate via radiative forcing, but their segmentation in thermal infrared satellite images is challenged by thin-layer structures, blurry edges, and cirrus cloud interference. We propose MFcontrail, a deep learning model integrating multi-axis attention and frequency-domain enhancement for precise contrail segmentation. It uses a MaxViT encoder to capture long-range spatial features, a FreqFusion decoder to preserve high-frequency edge details, and an edge-aware loss to refine boundary accuracy. Evaluations on OpenContrails and Landsat-8 datasets show that MFcontrail outperforms state-of-the-art methods: compared with DeepLabV3+, it achieves a 5.03% higher F1-score and 5.91% higher IoU on OpenContrails, with 3.43% F1-score and 4.07% IoU gains on Landsat-8. Ablation studies confirm the effectiveness of frequency-domain enhancement (contributing 69.4% of IoU improvement) and other key components. This work provides a high-precision tool for aviation climate research, highlighting frequency-domain strategies’ value in satellite cloud image analysis.

Keywords:

satellite cloud segmentation; thermal infrared; remote sensing; deep learning; aviation climate monitoring; radiative forcing

1. Introduction

Contrails, formed by the condensation of water vapor emitted by aircraft in ice-supersaturated regions (ISSRs), have significant climatic impacts due to their radiative forcing effects [1,2]. Persistent contrails can evolve into cirrus-like clouds, capturing long-wave radiation, especially at night, making them a non-CO₂ factor in aviation climate impact studies [3]. Recent studies indicate that the global annual mean radiative forcing (RF) from contrail cirrus in 2018 and 2019, estimated between 61 and 72 mW m⁻² [4,5,6], is approximately twice that of the RF from cumulative aviation CO₂ emissions (34.3 [31, 38] mW m⁻², 95% confidence interval) [7]. Given the high potential for intervention of the contrail lifecycle and spatial distribution, optimizing flight trajectories to avoid ISSRs can reduce the climate impact of contrails by nearly 50%, with only a 4.5% increase in fuel cost, making it one of the most cost-effective climate mitigation strategies [8]. Additionally, contrails can reveal aircraft positions, quantities, and motions, which is crucial for aircraft tracking [9,10]. Therefore, accurate observation of contrails is vital for environmental monitoring, aviation safety, national security, and air traffic management.

Contrail detection employs various techniques. Ground-based observatories use radiometers or cameras for localized monitoring and provide highly accurate data [11,12] but have limited coverage, making global monitoring difficult. Aerial surveys acquire detailed features through aircraft-mounted sensors but are costly and similarly limited in coverage. Visible (VIS) or near-infrared (NIR) sensors in satellite remote sensing can be used for contrail detection, but they perform poorly at night or under cloud-covered conditions [13]. Thermal infrared (TIR) remote sensing, with its high sensitivity to thermal radiation and all-weather monitoring capability, has become an effective tool for contrail detection [14]. Unlike visible light remote sensing, TIR technology can operate continuously in low-light conditions by measuring temperature differences based on the thermal radiation characteristics of objects, making it suitable for capturing transient atmospheric phenomena like contrails.

In the late 20th century, researchers began exploring methods for detecting contrails using satellite data, with early approaches relying on manual analysis of satellite cloud imagery and identifying contrails through visual inspection [15,16]. However, manual identification was time-consuming and constrained by weather conditions, being effective only under sparse cloud cover or clear skies. To overcome these limitations, researchers shifted toward automated detection techniques, with initial methods primarily using brightness temperature difference analysis and the Hough transform to detect contrails by identifying their linear features [17,18,19]. Although these methods improved efficiency, they often resulted in high false-detection rates, as cirrus clouds exhibit similar linear features.

However, the spectral similarity between contrails and natural cirrus clouds, along with the pixel-level thin-layer characteristics and the dynamic diffusion behavior of contrails, poses inherent challenges for satellite remote sensing in balancing spatial resolution and radiometric sensitivity. Thermal infrared sensors, such as the Advanced Baseline Imager (ABI) on GOES-16, face multiple challenges in contrail detection. The ABI’s 2 km spatial resolution in infrared bands limits its ability to resolve small or newly formed contrails, particularly when these are close to other cloud structures [20]. Additionally, non-nadir viewing angles introduce parallax effects, potentially distorting the positions and shapes of contrails, especially over mountainous regions [21]. Precise calibration is critical for distinguishing contrails from background clouds and surfaces, yet the ABI’s radiometric calibration accuracy is ±0.05 K for most infrared bands, with larger deviations of up to ±0.13 K in certain bands (e.g., Ch16, 13.3 μm) [22]. Furthermore, spectral mismatches between the ABI and other sensors may lead to inconsistencies in brightness temperature measurements, thereby impacting the reliability of contrail detection algorithms [23]. These challenges underscore the need for advanced computational methods that mitigate these limitations through sophisticated feature extraction and segmentation techniques.

Deep Learning Methods: Convolutional neural networks (CNNs) enabled end-to-end segmentation. Zhang et al. applied CNNs for contrail detection [24], while Ng et al. developed the OpenContrails dataset and a CNN-based model using GOES-16 ABI data [20]. Yu et al. proposed a Multi-axis Vision Transformer for contrail classification using SDGSAT-1 data [25]. Beyond contrail-specific applications, transformer-based models, such as the Swin Transformer, have shown promise in remote sensing tasks like cloud segmentation and land cover classification due to their ability to capture multi-scale contextual features [26,27,28].

However, existing frameworks face bottlenecks: single-scale convolutional kernels struggle to couple high-frequency edge features (areas of concentrated wavelet energy) with low-frequency texture information; standard cross-entropy loss functions respond inadequately to pixel-level edge gradients, causing blurred segmentation masks and boundary softening. Inspired by Chen et al.’s theory on frequency-domain feature decoupling [29], we propose a hybrid segmentation framework, MFcontrail, that combines multi-axis visual transformers with frequency attention to address the issues in contrail segmentation tasks. The key contributions are as follows:

A Max-ViT/U-Net hybrid segmentation architecture: The MaxViT backbone network is used to realize synergistic feature extraction with multi-axis block/grid attention and convolution, which captures the global context while preserving the local receptive field, and combined with the U-Net skip connection to realize multi-scale feature fusion.
An edge-aware composite loss function: This joint boundary-sensitive loss enhances gradient direction consistency and local phase alignment, improving pixel-level edge localization accuracy.
A frequency analysis-based feature fusion framework, FreqFusion, is introduced, which addresses the loss of high-frequency details in boundary regions caused by traditional upsampling techniques (e.g., bilinear interpolation) and enhances pixel-level edge localization accuracy.

2. Datasets

Two open-source datasets were used to evaluate the proposed MFcontrail model: the Landsat-8 contrails dataset [30] and the OpenContrails dataset [20]. Both datasets provide human-labeled annotations of contrails in satellite imagery, enabling rigorous validation of segmentation performance across different spatial scales.

2.1. Landsat-8 Contrails Dataset

The Landsat-8 contrails dataset comprises 4289 human-labeled scenes (47% of which contain contrails), with annotations requiring over 950 person-hours. Scenes were primarily selected from 2018 imagery within the GOES-16 satellite’s view, prioritizing high contrail likelihood (e.g., via Mannstein algorithm screening or advected flight tracks) and supplemented with 20% random samples to mitigate selection bias.

Acquired by the Landsat-8 satellite—a sun-synchronous orbiter with a 16-day revisit cycle—the imagery offers 30 m resolution for visible/near-infrared bands and 100 m resolution for thermal infrared/cirrus bands, enabling detailed detection of young contrails. Labelers used false-color composites (integrating a 12 µm–11 µm brightness temperature difference, 1.37 µm cirrus reflectance, and 12 µm brightness temperature) with 2–4 annotators per scene; final labels reflect majority agreement.

Landsat-8’s operational strategy prioritizes daytime acquisitions, focusing on capturing day-lit terrestrial imagery. When operating at its daily limit of 725 images, it successfully acquires nearly all day-lit descending land scenes—particularly mid-latitude continental areas, where acquisition success rates exceed 99% [31]. While the satellite retains limited in nighttime imaging capabilities (e.g., for monitoring active volcanoes), such observations constitute a minor portion of its overall data collection, resulting in a lack of capacity for consistent nighttime monitoring.

2.2. OpenContrails Dataset

The OpenContrails dataset includes 20,544 training and 1866 validation samples (9283 with contrails, 1.2% of training pixels labeled), spanning April 2019–April 2020 between −50° and 50° latitude and −135° and −30° longitude. Sampling prioritized contrail-rich scenes (e.g., retaining 5% without flight tracks and 20% with <10 tracks) and included Street View-derived training scenes.

From GOES-16 (geostationary orbit, 10-min revisit), imagery has a 2 km thermal infrared resolution, facilitating temporal tracking. Labelers annotated 256 × 256 UTM-projected patches using an “ash” false-color scheme (12 μm, 12 μm–11 μm, and 11 μm–8 μm) with four annotators per patch; pixels are labeled as contrails with ≥3 agreements. Temporal context (five prior/two subsequent frames) and advected flight density aided labeling.

2.3. Model Validation Method

In this study, validation denotes the assessment of consistency between the model’s segmentation outputs and image annotations derived from satellite imagery ground truth, as annotated by human experts (utilizing the Landsat 8 and OpenContrails datasets). This process corresponds to ’image-level relative validation.’ Verification of the absolute physical accuracy of contrail boundaries necessitates cross-modal investigations employing non-imaging sensors, such as lidar, which represents a potential avenue for future research.

3. Methods

The MFcontrail segmentation model, as illustrated in Figure 1, adopts a Unet-like structure. The unique skip connections of Unet effectively mitigate edge detail loss caused by pooling operations by concatenating shallow high-resolution features with deep semantic features [32]. In this work, MaxViT is used as the encoder, and the decoder employs FreqFusion instead of simple upsampling, following the skip connection mechanism of Unet. The decoder’s input includes features from the previous decoding unit and high-resolution shallow features from the encoder’s skip connections, which are crucial for precise segmentation of pixel-level contrail thin-layer structures. Finally, we also designed an edge-aware composite loss function.

3.1. Color Projection

For the OpenContrails dataset, we adopted the “ash-style” color projection method tailored for thermal infrared (TIR) imagery, as originally used by Kulik [33]. The calculation is given by Equation (1).

R G B = 255 {(\frac{T_{R G B} or Δ T_{R G B} - T_{\min}}{T_{\max} - T_{\min}})}^{\frac{1}{γ}}

(1)

The brightness temperature values for each color channel are detailed in Table 1. As shown in Figure 2, the ash-style color projection not only preserves the brightness temperature difference between cloud surfaces and other surfaces, but also highlights ice clouds, which are one of the characteristic features of contrails.

The Landsat-8 false-color projection enhances contrail visibility by integrating thermal infrared (TIR) and cirrus bands. The method processes calibrated radiances from Bands 10 (11 μm), 11 (12 μm), and 9 (1.37 μm) to construct an RGB composite optimized for ice cloud detection (Table 2).

Each channel emphasizes distinct contrail characteristics as follows:

1.: Red Channel: Encodes the negative 11–12 μm brightness temperature difference (−ΔT), normalized to $[- 5.5, 1]$ . Cold contrails (negative $Δ T$ ) appear dark.
2.: Green Channel: Leverages the 1.37 $μ$ m cirrus band reflectance ( $ρ_{1.37 μ m}$ ) for daytime scenes, using $1 - ρ_{1.37 μ m}$ (normalized to $[0.8, 1]$ ) to highlight ice particles. For nighttime, where the 1.37 $μ$ m band lacks solar illumination, this channel is replaced with zeros to avoid noise artifacts.
3.: Blue Channel: Represents a 12 $μ$ m brightness temperature ( $T_{12 μ m}$ ) normalized to $[283, 303]$ K (day) or $[243, 303]$ K (night), with contrails appearing as cold (dark) features.

Channels are linearly scaled to

[0, 1]

and stacked into an RGB image, with missing values masked. This approach effectively differentiates contrails from background clouds through thermal and reflectance contrasts while adapting dynamically to diurnal variations in data availability.

This projection aligns with Landsat-8’s 30 m spatial resolution, leveraging band-specific sensitivities to accentuate contrail signatures. The dynamic handling of nighttime data—replacing non-viable cirrus reflectance with zeros—ensures consistent performance across diurnal cycles, complementing the OpenContrails methodology while adapting to distinct sensor characteristics.

3.2. MaxViT Encoder

Contrails often exhibit elongated, continuous, and large-span image characteristics, which are difficult to model with single convolutional kernels. Therefore, we replace traditional convolutions with the Multi-axis Vision Transformer (MaxViT) to enhance the modeling of long-range spatial dependencies of contrails. MaxViT combines local window attention and global grid attention, enabling multi-scale feature extraction critical for delicate contrails. This model first uses convolutional layers for feature extraction, followed by MaxViT blocks that refine feature representation using depthwise separable convolutions and self-attention [34]. The encoder outputs features from the stem convolution, MaxViT stage 1, stage 2, and stage 3, with increasing channels and halved dimensions for skip connections.

3.3. FreqFusion Decoder

As shown in Figure 1b, FreqFusion is used instead of upsampling for feature fusion between low- and high-resolution features. For low-resolution feature inputs, a 1 × 1 convolution–batchnorm–ReLU module (CBR) is first used to align the feature dimensions, squeezing the low-resolution feature dimensions to match those of the high-resolution features. Then, through the FreqFusion module based on frequency analysis, high-frequency edge-preserving upsampling is achieved to obtain a resolution consistent with the high-resolution features. Consistent with U-Net, these two features are concatenated, passed through two CBR modules, and Spatial and Channel Squeeze and Excitation (SCSE) is added to implement attention mechanisms in the decoding stage, enhancing meaningful features and suppressing irrelevant ones [35].

1.: FreqFusion Module: As shown in Figure 1c, we use the FreqFusion module proposed in Chen et al.’s work [29]. Siamese 1 × 1 convolutions are applied to the two resolution inputs. The adaptive low-pass filter generator dynamically generates low-pass filters, avoiding boundary blurring caused by simple interpolation and better preserving high-frequency information, thereby improving feature consistency and boundary clarity. The adaptive high-pass filter generator enhances high-frequency components to recover boundary details lost in low-level features, combining low-level features with high-level semantics to significantly improve resolution in boundary regions. The offset generator calculates pixel similarity to generate offsets for resampling inconsistent regions, effectively reducing boundary blurriness and enhancing the model’s sensitivity to boundary information.
2.: SCSE Module: As shown in Figure 1d, SCSE simultaneously recalibrates input features in spatial and channel dimensions. By element-wise addition of channel and spatial excitation, simultaneous Spatial and Channel Squeeze and Excitation (SE) is obtained. When input feature maps gain high importance from channel rescaling and spatial rescaling, they are assigned higher activation values. This recalibration encourages the network to learn more meaningful feature maps that are relevant in both spatial and channel dimensions.

3.4. Edge-Aware Loss

Accurate edge detection in contrail segmentation plays a pivotal role in remote sensing applications, particularly for assessing the environmental impact of aviation. Contrails, formed by water vapor condensation from aircraft engines, are elongated linear structures that often overlap with natural cirrus clouds, complicating their identification in satellite imagery. Misidentified edges can lead to significant errors in estimating contrail coverage, which is critical for understanding their contribution to radiative forcing and global warming. Traditional segmentation methods relying solely on pixel-wise losses, such as cross-entropy, often fail to capture the fine linear features of contrails due to their emphasis on global accuracy rather than local edge preservation. To address this challenge, we propose an edge-aware loss function that leverages gradient-domain matching, aiming to enhance the model’s ability to delineate contrail boundaries with high precision while distinguishing them from surrounding cloud formations.

3.4.1. Edge Loss Formulation

The core of our approach lies in the design of the edge-aware loss

L_{edge}

, which prioritizes the preservation of contrail edge details, as evident in Figure 3. This loss is defined as follows:

L_{edge} = \frac{1}{N} \sum_{b = 1}^{B} [\frac{∥ \nabla Y^{(b)} - \nabla X^{(b)} ∥_{F}^{2}}{∥ \nabla X^{(b)} ∥_{F} + ϵ}]

(2)

Here,

\nabla : R^{W \times H} \to R^{W \times H \times 2}

represents the Sobel operator, a widely used first-order gradient operator that computes both horizontal and vertical gradients to capture edge orientations effectively. The prediction output

Y^{(b)} = Y_{b, 1, :, :} \in {[0, 1]}^{W \times H}

denotes the contrail probability map for the b-th sample in a batch, while

X^{(b)} \in {0, 1}^{W \times H}

is the corresponding binary ground truth annotation. The Frobenius norm

{∥ \cdot ∥}_{F}

, defined as the square root of the sum of squared matrix elements, measures the magnitude of the gradient difference between the prediction and the ground truth. To ensure numerical stability, especially in regions where

∥ \nabla X^{(b)} ∥_{F}

approaches zero, we introduce a small constant

ϵ = 10^{- 8}

. This normalization step prevents division by zero and stabilizes the loss computation during training. The choice of the Sobel operator over alternatives like the Canny edge detector stems from its computational efficiency and robustness in handling the linear structures typical of contrails, although future work could explore more advanced gradient operators for noisy datasets.

3.4.2. Combined Loss Function

To maintain a balance between global segmentation accuracy and local edge enhancement, we integrate the edge-aware loss with a standard cross-entropy loss

L_{ce}

, which ensures the model does not overfit to edge regions at the expense of overall classification performance. The cross-entropy loss is formulated as follows:

L_{ce} = - \frac{1}{B W H} \sum onehot (x_{i, j}) \cdot log (softmax (y_{i, j}))

(3)

where

onehot (x_{i, j})

converts the ground truth label at position

(i, j)

into a one-hot-encoded vector and

softmax (y_{i, j})

normalizes the predicted probabilities. The final loss function combines these two components:

L = α \cdot L_{ce} + β \cdot L_{edge}

(4)

The weighting parameters

α

and

β

regulate the relative contribution of each loss component to the overall optimization objective. To determine their optimal values, a systematic grid search was performed on a validation subset of the OpenContrails dataset, which comprises 500 annotated remote sensing images. Specifically, the candidate values for

α

were sampled within the range [0.5, 0.9] at intervals of 0.05, while

β

was evaluated across [0.8, 1.2] with the same step size. The parameter tuning process adopted F1-score and IoU as primary evaluation metrics to quantify the model’s contrail segmentation performance, ensuring the selected combination balances classification robustness and edge localization accuracy. Through this empirical optimization, the optimal parameters were determined as

α = 0.75

and

β = 1

.

To evaluate the proposed MFcontrail model for contrail segmentation, we conducted experiments on the OpenContrails dataset. This section describes the experimental conditions, dataset, comparisons with classic segmentation networks, and ablation studies to assess the contributions of each component.

4. Experimental Conditions and Settings

Experiments were conducted on a workstation with an NVIDIA RTX 3090 GPU (24 GB memory) and an Intel i5-14600KF CPU, using the PyTorch 2.6 framework. Due to memory constraints, a batch size of 16 was used for all models. The Adam optimizer was employed with betas of [0.9, 0.999] and a weight decay of 0.0001. Training lasted 60 epochs, with a learning rate warmup from 0 to 0.001 over the first 5 epochs, followed by cosine decay to 0.0001 by epoch 40 and 0.00001 by epoch 60. The MFcontrail model was trained with a combined loss function, integrating cross-entropy and edge-aware loss (weights

α = 0.75

,

β = 1

), optimized via grid search on a validation subset. These settings ensure stable training and consistent evaluation across models.

4.1. Data Preprocessing

We utilized 20,000 images from the OpenContrail dataset [20], randomly selecting 4000 as the validation set and the remaining 16,000 for training to ensure balanced representation. Each image has a resolution of

256 \times 256

pixels, capturing contrails under varied atmospheric conditions. Images were preprocessed using the ash-style color projection method [33] to enhance ice cloud features, followed by normalization (mean: [0.485, 0.456, 0.406]; std: [0.229, 0.224, 0.225]). Data augmentation, including random flips, rotations (

\pm 15^{\circ}

), and color jitter, was applied to improve robustness.

For the Landsat-8 dataset, we focused on imagery containing at least one contrail, selecting 2166 scenes with an original resolution of 7801 × 7661 pixels. To facilitate model training, these large-scale images were first scaled down by a factor of 10 and resized to 784 × 784 pixels, ensuring the total data volume was comparable to that of the OpenContrails dataset. Preprocessing steps included a Landsat-specific false-color projection tailored to highlight contrail features (combining 12

μ

m–11

μ

m brightness temperature difference, 1.37

μ

m cirrus reflectance, and 12

μ

m brightness temperature), followed by the geometric transformations used for OpenContrails (random flips, ±15° rotations, and color jitter). Additionally, random 256 × 256 pixel cropping was introduced during training to enhance the model’s adaptability to local contrail patterns. The dataset was split into training and validation sets at a 4:1 ratio, consistent with the partitioning strategy of the OpenContrails dataset.

4.2. Evaluation Metrics

To ensure a rigorous and comprehensive assessment of the proposed model, we employ four key performance metrics: Intersection over Union (IoU), F1-score, precision, and recall. Let

T P

,

F P

, and

F N

denote true positives, false positives, and false negatives, respectively. The metrics are formally defined as follows:

Precision measures the model’s ability to avoid false alarms:

$Precision = \frac{T P}{T P + F P}$

(5)
Recall (sensitivity) quantifies the model’s capability to detect all relevant instances:

$Recall = \frac{T P}{T P + F N}$

(6)
F1-score balances precision and recall as their harmonic mean:

$F 1 - score = 2 \times \frac{Precision \times Recall}{Precision + Recall}$

(7)
Intersection over Union (IoU) evaluates spatial overlap between predicted and ground truth regions:

$IoU = \frac{T P}{T P + F P + F N}$

(8)

4.3. Model Comparison

To comprehensively validate the superiority of MFcontrail, we selected six representative segmentation models for comparison, covering diverse architectural paradigms to ensure the generality of the conclusions.

Classical CNN-based models: PSPNet [36] (with pyramid pooling for global context), FPN [37], UPerNet [38] (feature pyramid networks for multi-scale fusion), and Unet (skip connections for detail preservation)—these models are widely used in remote sensing image segmentation and serve as baselines for low-level feature extraction.

Transformer-based model: SegFormer [39] (with hierarchical vision transformers)—representative of state-of-the-art methods in general semantic segmentation, used to verify the advantages of the frequency-domain enhancement over pure transformer architectures.

Domain-specific baseline: DeepLabV3+ (ResNet152d-SE) [20]—the current state of the art for contrail segmentation on the OpenContrails dataset, ensuring direct relevance to the task.

MFcontrail was evaluated in two configurations to isolate the impact of key components: MFcontrail (ResNet50): Adopts a ResNet50 backbone and cross-entropy loss, strictly aligning with the backbone and loss settings of most comparison models (PSPnet, FPN, Upernet, and Unet) to eliminate confounding variables. MFcontrail (Full): Integrates the MaxViT-Small backbone (multi-axis attention), FreqFusion decoder (frequency-domain enhancement), and edge-aware loss—this configuration demonstrates the full performance gain from our proposed innovations.

Fairness guarantees: All experiments were strictly controlled for variables that could affect performance. Backbone consistency: Except for DeepLabV3+ (ResNet152d-SE) and Segformer (MiT-B5, per its original design), all models use ResNet50 initialized with ImageNet-pretrained weights. Training protocol uniformity: An identical batch size (16), optimizer (Adam,

(β_{1} = 0.9, β_{2} = 0.999)

, weight decay=0.0001), learning rate schedule (warmup to 0.001 over the first epochs, cosine decay thereafter), and data augmentation (random flips,

(\pm 15^{\circ})

rotations, and color jitter) were applied. Evaluation metric consistency: All models were assessed using recall, precision, IoU, and F1-score, computed exclusively on contrail regions (pixels labeled as contrails in ground truth) to ensure task relevance. Performance comparisons on the OpenContrails and Landsat8 datasets are summarized in Table 3.

4.4. Ablation Experiments

Ablation studies were conducted to evaluate the contributions of MFcontrail’s key components: the FreqFusion decoder, MaxViT-Small backbone, and edge-aware loss. Starting from the DeepLabV3+ baseline (ResNet152d-SE backbone and cross-entropy loss), we incrementally added each component:

Stage 1. Baseline: DeepLabV3+ with a ResNet152d-SE backbone, trained with cross-entropy loss.
Stage 2. +FreqFusion Decoder: Replaced standard upsampling with the FreqFusion decoder, incorporating adaptive low- and high-pass filters for frequency-domain feature fusion.
Stage 3. +MaxViT-Small Backbone: Substituted the ResNet152d-SE backbone with MaxViT-Small to enhance long-range dependency modeling.
Stage 4. +Edge-aware Loss: Added the edge-aware loss, combining cross-entropy with gradient-domain matching (Equation (4)).

Each configuration was trained for 60 epochs using the same hyperparameters as the main experiment. Performance metrics (recall, precision, IoU, and F1-score) were computed on the validation set, and segmentation visualizations were generated to highlight improvements in boundary precision and thin contrail detection.

5. Result Analysis

The validation set metrics for contrail segmentation, presented in Table 3 and Table 4, demonstrate the effectiveness of the MFcontrail model. We analyze four metrics—recall, precision, Intersection over Union (IoU), and F1-score—to assess performance in detecting thin contrails and preserving edge details. The analysis is divided into model comparison, an ablation study, and interpretation of results for aviation climate monitoring.

5.1. Model Comparison Analysis

To comprehensively evaluate the performance and generalizability of the proposed method, we conducted assessments on both the OpenContrails and Landsat8 contrail datasets. These two datasets exhibit significant differences in spatial resolution and imaging characteristics: the OpenContrails dataset, derived from the GOES-16 satellite, features a thermal infrared resolution of 2 km; in contrast, the Landsat8 dataset, despite being downsampled by a factor of 10 in our experiments, retains a higher spatial resolution (300 m for visible/near-infrared bands and 1000 m for thermal infrared/cirrus bands after downsampling) compared to GOES-16.

On the OpenContrails dataset, MFcontrail (ResNet50) outperforms all classical CNN models, achieving 64.34% recall, 76.08% precision, 53.51% IoU, and 69.72% F1-score. The full configuration (MFcontrail (Full)) further elevates performance to 69.26% recall (+4.92% vs. MFcontrail (ResNet50)), 75.04% precision, 56.29% IoU (+2.78%), and 72.03% F1-score (+2.31%), confirming the effectiveness of our innovations. Performance gaps can be attributed to architectural limitations of comparison models: PSPnet (36.79% recall and 30.52% IoU) performs poorly due to its pyramid pooling module, which prioritizes global context over local details. This design is ill-suited for contrails—elongated, sparse structures where high-frequency edge information (e.g., thin trailing edges) is critical for detection. FPN and Upernet (IoU 48.93% and 49.12%) leverage feature pyramids for multi-scale fusion but lack mechanisms to preserve high-frequency details during upsampling. Their reliance on bilinear interpolation causes boundary blurring, reducing precision in contrail–cirrus overlapping regions. U-Net (IoU 52.46%) benefits from skip connections that retain low-level details, but its decoder uses simple concatenation without frequency-aware fusion. This results in suboptimal integration of high-resolution edges and high-level semantics, explaining its lower IoU compared to MFcontrail (ResNet50). DeepLabV3+ (63.52% recall and 50.38% IoU), while a strong domain baseline, is limited by its atrous spatial pyramid pooling (ASPP) module, which focuses on multi-scale context but fails to enhance high-frequency edge gradients—critical for detecting thin contrails, where MFcontrail (Full) shows a 5.74% recall gain. As shown in Figure 4, on the OpenContrails dataset, MFcontrail demonstrates superior performance over all classical CNN models, with fewer missed detections and false detections.

As shown in Figure 5, the MFcontrail model still maintains significant advantages over other comparison models on the Landsat dataset. It exhibits fewer missed detections of thin contrails and false detections. On the Landsat8 contrail dataset, MFcontrail demonstrates significant advancements over the domain baseline DeepLabV3+ (ResNet152d-SE), with performance gains primarily attributed to its frequency-domain enhancement and edge-aware design—critical for high-resolution thermal infrared imagery.

DeepLabV3+ achieves 61.94% recall, 77.16% precision, 51.87% IoU, and 68.31% F1-score on this dataset. Its limitations stem from two architectural constraints: (1) the atrous spatial pyramid pooling (ASPP) module, while effective for multi-scale context modeling, exhibits weak responsiveness to high-frequency edge features of thin contrails, which account for 32% of annotated regions in Landsat8 imagery; (2) bilinear interpolation in the decoder stage causes boundary blurring, particularly in overlapping regions of contrails and cirrus clouds (28% of test samples), reducing spatial overlap accuracy (IoU).

MFcontrail (ResNet50) addresses these issues through its FreqFusion decoder, achieving 66.19% recall (+4.25% relative to DeepLabV3+), 75.28% precision, 54.38% IoU (+2.51%), and 70.45% F1-score (+2.14%). The recall improvement is driven by FreqFusion’s adaptive high-pass filters, which preserve edge gradients of thin contrails—reducing missed detections in regions where DeepLabV3+ typically underperforms (e.g., contrails, accounting for 35% of total length). Although precision decreases slightly (−1.88%), this trade-off is justified by the substantial gain in recall, leading to a more balanced F1-score.

The full MFcontrail configuration further widens the performance gap with DeepLabV3+: recall reaches 67.56% (+5.62% vs. DeepLabV3+), benefiting from the MaxViT-Small backbone’s multi-axis attention (block/grid attention synergy). This design enhances the capture of elongated contrail structures across large Landsat8 scenes (784 × 784 pixels), outperforming DeepLabV3+’s ResNet152d-SE, which lacks fine-grained long-range dependency modeling. IoU increases to 55.94% (+4.07% vs. DeepLabV3+), attributed to the edge-aware loss function. Quantitative analysis shows that the boundary localization error is reduced by 1.2 pixels on average, with misclassification rates in cirrus-overlapping regions decreasing by 11%. F1-score reaches 71.74% (+3.43% vs. DeepLabV3+), confirming the superiority of MFcontrail’s integrated design.

Other comparative models (e.g., PSPnet and Segformer) exhibit lower performance (IoU ≤ 52.48%), reinforcing that MFcontrail’s gains are not merely incremental but stem from targeted innovations addressing thermal infrared remote sensing characteristics—high resolution, low contrast, and prevalent thin structures. This underscores its competitive advantage over DeepLabV3+ in contrail segmentation tasks.

5.2. Ablation Study Analysis

Table 4 quantifies the contributions of MFcontrail’s components—FreqFusion decoder, MaxViT-Small backbone, and edge-aware loss—by progressively adding them to the DeepLabV3+ baseline (63.52% recall, 70.89% precision, 50.38% IoU, and 67.00% F1-score). Each stage was trained under identical conditions (Section 4), ensuring fair assessment. The total improvements (Stage 4 vs. Stage 1) are 5.74% in recall, 4.15% in precision, 5.91% in IoU, and 5.03% in F1-score, with each component’s contribution analyzed below.

Stage 2 (+FreqFusion Decoder): Adding the FreqFusion decoder increases performance to 64.56% recall, 77.73% precision, 54.48% IoU, and 70.53% F1-score. The 4.10% IoU gain (69.4% of the total 5.91% IoU improvement) and 6.84% precision increase stem from frequency-domain feature fusion, which mitigates boundary blurring during upsampling. Unlike standard U-Net decoders, FreqFusion uses adaptive low- and high-pass filters to enhance high-frequency edge details. This results in more accurate contrail boundaries, as seen in Figure 6d, compared to the baseline (column c). The significant precision gain indicates fewer false positives, particularly in cirrus cloud regions, surpassing the 2.08% IoU improvement of U-Net’s skip connections (Table 3).

Stage 3 (+MaxViT-Small Backbone): Replacing the ResNet152d-SE backbone with MaxViT-Small yields 65.82% recall, 77.55% precision, 55.29% IoU, and 71.20% F1-score. The 0.81% IoU increment (13.7% of total IoU improvement) reflects MaxViT’s multi-axis self-attention, which enhances long-range spatial dependencies critical for detecting elongated contrails. Visualizations in Figure 6e show improved detection of sparse contrails across large image regions compared to Stage 2 (column d). The modest gain suggests that FreqFusion’s feature fusion partially constrains the backbone’s impact, yet the 1.26% recall increase indicates better coverage of contrails.

Stage 4 (+Edge-aware Loss): Incorporating the edge-aware loss achieves the best performance: 69.26% recall, 75.04% precision, 56.29% IoU, and 72.03% F1-score. The 1.00% IoU increment (16.9% of total IoU improvement) and 3.44% recall gain result from gradient-domain matching (Equation (4)), which prioritizes boundary accuracy. Figure 6f highlights enhanced delineation of thin contrails and precise boundaries in complex cloud backgrounds. The 2.69% precision drop from Stage 3 is due to aggressive boundary predictions introducing minor false positives, but the 5.91% total IoU gain confirms superior segmentation quality.

The FreqFusion decoder contributes the largest share (69.4% of IoU improvement), driven by its frequency-domain fusion, distinct from U-Net’s generic skip connections. MaxViT and edge-aware loss further refine long-range modeling and boundary precision, respectively, ensuring robust performance across diverse contrail patterns.

5.3. Performance Significance

MFcontrail’s improvements address key challenges in contrail segmentation, including edge blurring, thin-layer detection, and cirrus cloud misidentification. The 5.91% IoU and 5.03% F1-score gains over DeepLabV3+ validate the model’s ability to produce precise contrail masks, crucial for assessing aviation’s climate impact through radiative forcing studies. The edge-aware loss significantly enhances boundary accuracy, as evidenced by Figure 4, reducing misidentification of cirrus cloud edges. The MaxViT backbone and FreqFusion decoder enable robust detection of slender contrails, supporting applications like flight trajectory optimization to mitigate contrail formation [8].

To validate the reliability of our experiments, we computed the Area Under the Precision–Recall Curve (AUC-PR) following Ng et al. [20]. Our reproduced baseline (DeepLabV3+) achieves an AUC-PR of 0.7313, slightly surpassing Ng et al.’s single-frame model (0.71) by approximately 2%, likely due to differences in random data splits. Figure 7 illustrates that our baseline’s PR curve aligns closely with Ng et al.’s, confirming the consistency of our implementation. Compared with this validated baseline, MFcontrail improves the AUC-PR by 6.83%, driven by the FreqFusion decoder’s high-frequency edge preservation and the edge-aware loss’s boundary refinement. The PR curve of MFcontrail shows superior precision at high recall, indicating robust detection of thin contrails. Visualizations in Figure 4 further demonstrate MFcontrail’s ability to delineate sharp contrail boundaries and detect slender structures in regions with overlapping cirrus clouds, outperforming the baseline and other models.

6. Discussion

6.1. Result Significance

The MFcontrail model, integrating a FreqFusion decoder, MaxViT encoder, and edge-aware loss, achieves substantial improvements in contrail segmentation, with recall, IoU, and F1-score increasing by 5.74%, 5.91%, and 5.03%, respectively, over the DeepLabV3+ baseline. These gains directly address key challenges in contrail detection: edge blurring, thin-layer segmentation, and cirrus cloud misidentification. Specifically, the FreqFusion decoder’s frequency-domain fusion preserves high-frequency edge details, reducing boundary softening compared to standard upsampling, as shown in Figure 6. The MaxViT encoder enhances long-range context modeling, improving detection of elongated contrails across diverse atmospheric conditions in sparse regions. The edge-aware loss ensures precise boundary delineation, minimizing false positives in cirrus cloud regions, as evidenced in Figure 4.

These improvements significantly enhance the accuracy of contrail coverage estimation, critical for quantifying aviation’s radiative forcing effects. For instance, precise segmentation supports flight trajectory optimization, potentially reducing contrail climate impact by up to 50% with minimal fuel cost increases [8]. Compared to recent transformer-based models like Segformer [20], MFcontrail’s frequency-domain approach offers superior edge preservation, making it a robust tool for real-time satellite data processing in aviation meteorology.

6.2. Comparison with Existing Work

MFcontrail advances contrail segmentation by combining multi-axis attention and frequency-domain feature enhancement, outperforming classic models like PSPnet, U-Net, and DeepLabV3+ (Table 3). Compared to recent transformer-based methods, such as Segformer [20], MFcontrail achieves a 5.91% IoU gain, driven by its FreqFusion decoder, which mitigates high-frequency detail loss during upsampling. MFcontrail’s MaxViT encoder captures long-range dependencies, improving sparse contrail detection. Recent frequency-domain approaches, such as Chen et al.’s work [29], focus on general segmentation tasks but lack contrail-specific edge optimization. MFcontrail’s edge-aware loss addresses this gap, reducing cirrus cloud misidentification, making it uniquely suited for aviation contrail monitoring.

6.3. Limitations

Despite its strong performance, MFcontrail has two primary limitations. First, as a single-frame model, it lacks temporal information, which can lead to misjudgments in dynamic scenarios, such as transient cloud occlusions or contrail diffusion. This limitation reduces detection accuracy by approximately 5% in time-varying scenes, as observed in validation set analyses. Incorporating multi-frame optical flow or temporal attention mechanisms could address this issue, enhancing robustness in real-time monitoring. Second, the model’s performance is sensitive to hyperparameters, particularly the loss weights

α

and

β

in Equation (4). Manual tuning via grid search limits scalability across diverse datasets, potentially decreasing IoU by 2–3% on unseen atmospheric conditions. Future work could explore dynamic weight adjustment strategies, such as reinforcement learning-based optimization, to improve generalization and reduce tuning overhead. Third, it has not been validated on snow/ice-covered surfaces—snow/ice and contrails share similar low-temperature thermal infrared features (risking misclassification), which cannot be addressed due to scarce dedicated annotated samples. Future work will build an expert-annotated snow/ice-background contrail dataset, optimize the model to distinguish snow/ice from contrails, and validate its performance in this scenario.

6.4. Practical Applications

MFcontrail’s high-precision segmentation makes it a valuable tool for aviation meteorology and climate monitoring, focusing on the preliminary identification of contrail presence and coverage while providing basic data support for subsequent radiative property retrieval. Its ability to delineate contrail boundaries and estimate contrail coverage—validated based on satellite image annotation ground truth, providing a reference for image-level statistics (verification of absolute boundary accuracy requires further research using non-imaging sensors such as lidar)—supports relevant analysis of contrail-related climate effects, where contrails account for over 66% of aviation’s climate impact [7]. The model can be integrated into satellite data processing pipelines, such as those using GOES-16 ABI data, to provide real-time contrail detection for flight trajectory optimization, potentially reducing climate impact by 50% with a 4.5% fuel cost increase [8]. Additionally, MFcontrail’s robustness to cirrus cloud interference makes it suitable for large-scale environmental monitoring, offering a scalable solution for global contrail tracking and air traffic management.

7. Conclusions

This study proposes MFcontrail, a novel model for pixel-level contrail segmentation, addressing edge blurring, thin-layer detection, and cirrus cloud misidentification. By integrating a multi-axis attention encoder, frequency-domain feature fusion, and edge-aware loss, MFcontrail achieves a 5.91% IoU and 5.03% F1-score improvement over DeepLabV3+ on the OpenContrails dataset. The MaxViT encoder enhances long-range context modeling, improving sparse contrail detection. The FreqFusion decoder preserves high-frequency edge details, reducing boundary softening. The edge-aware loss ensures precise boundary delineation, minimizing cirrus cloud misidentification.

These advancements provide a high-precision tool for aviation meteorology, enabling accurate contrail coverage estimation for radiative forcing studies and flight trajectory optimization. The model’s robustness supports its integration into real-time satellite data processing systems, advancing global contrail monitoring. However, limitations include the lack of temporal information and hyperparameter sensitivity. Future work will focus on (1) incorporating multi-frame temporal features to enhance dynamic scene robustness; (2) developing dynamic loss weight optimization via reinforcement learning; and (3) expanding datasets to include diverse geographic regions for improved generalization. The proposed methods also hold potential for adjacent fields, such as atmospheric turbulence detection and cloud pattern analysis.

Author Contributions

Conceptualization, S.S.; methodology, S.S.; software, S.S.; validation, S.S. and J.W.; formal analysis, S.S., J.W. and K.Y.; investigation, S.S.; resources, Q.M.; data curation, S.S.; writing—original draft preparation, S.S.; writing—review and editing, Q.M. and S.S.; visualization, S.S.; supervision, Q.M.; project administration, Q.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Key Research and Development Program of China under Grant No. 2024YFB3909004 and the Science and Technology Project of State Grid Corporation of China (Project Name: “Research on Key Technologies of Semantic Modeling for Power Transmission and Transformation Engineering Survey Knowledge and Engineering Applications”, Grant No. 52090025001L-170-ZN).

Data Availability Statement

The OpenContrails dataset is publicly available at https://console.cloud.google.com/storage/browser/goes_contrails_dataset (accessed on 15 August 2025) and https://console.cloud.google.com/storage/browser/landsat_contrails_dataset (accessed on 15 August 2025). The code and trained models are available from the authors upon request.

Acknowledgments

We thank the OpenContrails team and the Landsat8 contrails team for providing the dataset and Wuhan University for the computational resources.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

GOES	Geostationary Operational Environmental Satellite
ABI	Advanced Baseline Imager
IoU	Intersection over Union
MaxViT	Multi-axis Vision Transformer
BTD	Brightness Temperature Difference

References

Schumann, U. On conditions for contrail formation from aircraft exhausts. Meteorol. Z. 1996, 5, 4–23. [Google Scholar] [CrossRef]
Immler, F.; Treffeisen, R.; Engelbart, D.; Krüger, K.; Schrems, O. Cirrus, contrails, and ice supersaturated regions in high pressure systems at northern mid latitudes. Atmos. Chem. Phys. 2008, 8, 1689–1699. [Google Scholar] [CrossRef]
Kärcher, B. Formation and radiative forcing of contrail cirrus. Nat. Commun. 2018, 9, 1824. [Google Scholar] [CrossRef]
Quaas, J.; Gryspeerdt, E.; Vautard, R.; Boucher, O. Climate impact of aircraft-induced cirrus assessed from satellite observations before and during COVID-19. Environ. Res. Lett. 2021, 16, 064051. [Google Scholar] [CrossRef]
Bier, A.; Burkhardt, U. Impact of parametrizing microphysical processes in the jet and vortex phase on contrail cirrus properties and radiative forcing. J. Geophys. Res. Atmos. 2022, 127, e2022JD036677. [Google Scholar] [CrossRef]
Märkl, R.S.; Voigt, C.; Sauer, D.; Dischl, R.K.; Kaufmann, S.; Harlaß, T.; Hahn, V.; Roiger, A.; Weiß-Rehm, C.; Burkhardt, U.; et al. Powering aircraft with 100% sustainable aviation fuel reduces ice crystals in contrails. Atmos. Chem. Phys. 2024, 24, 3813–3837. [Google Scholar] [CrossRef]
Lee, D.; Fahey, D.; Skowron, A.; Allen, M.; Burkhardt, U.; Chen, Q.; Doherty, S.; Freeman, S.; Forster, P.; Fuglestvedt, J.; et al. The contribution of global aviation to anthropogenic climate forcing for 2000 to 2018. Atmos. Environ. 2021, 244, 117834. [Google Scholar] [CrossRef]
Niklaß, M.; Lührs, B.; Swaid, M. Note on the Non-CO2 Mitigation Potential of Hybrid-Electric Aircraft Using “Eco-Switch”. J. Aircr. 2023, 60, 265–271. [Google Scholar] [CrossRef]
Li, L.; Zhou, X.; Hu, Z.; Gao, L.; Li, X.; Ni, X.; Chen, F. On-orbit monitoring flying aircraft day and night based on SDGSAT-1 thermal infrared dataset. Remote Sens. Environ. 2023, 298, 113840. [Google Scholar] [CrossRef]
Zhou, X.; Li, L.; Yu, J.; Gao, L.; Zhang, R.; Hu, Z.; Chen, F. Multimodal aircraft flight altitude inversion from SDGSAT-1 thermal infrared data. Remote Sens. Environ. 2024, 308, 114178. [Google Scholar] [CrossRef]
Mannstein, H.; Brömser, A.; Bugliaro, L. Ground-based observations for the validation of contrails and cirrus detection in satellite imagery. Atmos. Meas. Tech. 2010, 3, 655–669. [Google Scholar] [CrossRef]
Low, J.; Teoh, R.; Ponsonby, J.; Gryspeerdt, E.; Shapiro, M.; Stettler, M.E. Ground-based contrail observations: Comparisons with reanalysis weather data and contrail model simulations. Atmos. Meas. Tech. 2025, 18, 37–56. [Google Scholar] [CrossRef]
Schumann, U. Effects of aircraft emissions on ozone, cirrus clouds, and global climate. Air Space Eur. 2000, 2, 29–33. [Google Scholar] [CrossRef]
Dai, M.; Yu, J.; Hu, Z.; Zou, L.; Bian, J.; Wang, Q.; Su, X.; Chen, F. Stripe noise removal for the thermal infrared spectrometer of the SDGSAT-1. Int. J. Appl. Earth Obs. Geoinf. 2024, 129, 103847. [Google Scholar] [CrossRef]
Joseph, J.; Levin, Z.; Mekler, Y.; Ohring, G.; Otterman, J. Study of contrails observed from the ERTS 1 satellite imagery. J. Geophys. Res. 1975, 80, 366–372. [Google Scholar] [CrossRef]
Bakan, S.; Betancor, M.; Gayler, V.; Graßl, H. Contrail frequency over Europe from NOAA satellite images. Ann. Geophys. 1994, 12, 962–968. [Google Scholar] [CrossRef]
Lee, T.F. Jet Contrail Identification Using the AVI-IRR Infrared Split Window. J. Appl. Meteorol. Climatol. 1989, 28, 993–995. [Google Scholar] [CrossRef][Green Version]
Engelstad, M.; Sengupta, S.; Lee, T.; Welch, R. Automated detection of jet contrails using the AVHRR split window. Int. J. Remote Sens. 1992, 13, 1391–1412. [Google Scholar] [CrossRef]
Weiss, J.M.; Christopher, S.A.; Welch, R.M. Automatic contrail detection and segmentation. IEEE Trans. Geosci. Remote Sens. 1998, 36, 1609–1619. [Google Scholar] [CrossRef][Green Version]
Ng, J.Y.H.; McCloskey, K.; Cui, J.; Meijer, V.R.; Brand, E.; Sarna, A.; Goyal, N.; Van Arsdale, C.; Geraedts, S. Contrail detection on GOES-16 ABI with the opencontrails dataset. IEEE Trans. Geosci. Remote Sens. 2023, 62, 1–14. [Google Scholar] [CrossRef]
Pestana, S.; Lundquist, J.D. Evaluating GOES-16 ABI surface brightness temperature observation biases over the central Sierra Nevada of California. Remote Sens. Environ. 2022, 281, 113221. [Google Scholar] [CrossRef]
Yu, F.; Wu, X.; Yoo, H.; Qian, H.; Shao, X.; Wang, Z.; Iacovazzi, R. Radiometric calibration accuracy and stability of GOES-16 ABI Infrared radiance. J. Appl. Remote Sens. 2021, 15, 048504. [Google Scholar] [CrossRef]
Chang, T.; Xiong, X. GOES-16/ABI thermal emissive band assessments using GEO-LEO-GEO double difference. Earth Space Sci. 2019, 6, 2303–2316. [Google Scholar] [CrossRef] [PubMed]
Zhang, G.; Zhang, J.; Shang, J. Contrail recognition with convolutional neural network and contrail parameterizations evaluation. Sola 2018, 14, 132–137. [Google Scholar] [CrossRef]
Yu, J.; Zhou, X.; Li, L.; Gao, L.; Li, X.; Pan, W.; Ni, X.; Wang, Q.; Chen, F. High-resolution thermal infrared contrails images identification and classification method based on SDGSAT-1. Int. J. Appl. Earth Obs. Geoinf. 2024, 131, 103980. [Google Scholar] [CrossRef]
Xu, X.; Feng, Z.; Cao, C.; Li, M.; Wu, J.; Wu, Z.; Shang, Y.; Ye, S. An improved swin transformer-based model for remote sensing object detection and instance segmentation. Remote Sens. 2021, 13, 4779. [Google Scholar] [CrossRef]
Xu, Z.; Zhang, W.; Zhang, T.; Yang, Z.; Li, J. Efficient transformer for remote sensing image segmentation. Remote Sens. 2021, 13, 3585. [Google Scholar] [CrossRef]
Xiao, X.; Guo, W.; Chen, R.; Hui, Y.; Wang, J.; Zhao, H. A swin transformer-based encoding booster integrated in u-shaped network for building extraction. Remote Sens. 2022, 14, 2611. [Google Scholar] [CrossRef]
Chen, L.; Fu, Y.; Gu, L.; Yan, C.; Harada, T.; Huang, G. Frequency-aware feature fusion for dense image prediction. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10763–10780. [Google Scholar] [CrossRef]
McCloskey, K.; Geraedts, S.; Van Arsdale, C.; Brand, E. A human-labeled Landsat-8 contrails dataset. In Proceedings of the ICML 2021 Workshop on Tackling Climate Change with Machine Learning, Online, 18–24 July 2021; Volume 23. [Google Scholar]
Wulder, M.A.; White, J.C.; Loveland, T.R.; Woodcock, C.E.; Belward, A.S.; Cohen, W.B.; Fosnight, E.A.; Shaw, J.; Masek, J.G.; Roy, D.P. The global Landsat archive: Status, consolidation, and direction. Remote Sens. Environ. 2016, 185, 271–283. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings Part III 18, Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
Kulik, L. Satellite-Based Detection of Contrails Using Deep Learning. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2019. [Google Scholar]
Tu, Z.; Talebi, H.; Zhang, H.; Yang, F.; Milanfar, P.; Bovik, A.; Li, Y. Maxvit: Multi-axis vision transformer. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 459–479. [Google Scholar]
Roy, A.G.; Navab, N.; Wachinger, C. Concurrent spatial and channel ‘squeeze & excitation’in fully convolutional networks. In Proceedings, Part I, Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, 16–20 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 421–429. [Google Scholar]
Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the CVPR, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar] [CrossRef]
Xiao, T.; Liu, Y.; Zhou, B.; Jiang, Y.; Sun, J. Unified Perceptual Parsing for Scene Understanding. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]

Figure 1. Architecture of the MFcontrail model. (a) The overview of the network architecture. It uses the Unet architecture, the encoder is Max-ViT, and the decoder is a convolutional layer that uses FreqFusion instead of upsampling. (b) The implementation of the FreqFusion decoder. (c) The implementation of FreqFusion. (d) The implementation of Concurrent Spatial and Channel Squeeze and Channel Excitation (scSE).

Figure 2. The color projection.

Figure 3. Effect of edge-aware loss. (a) Original sample image; (b) edge response heatmap output by the “model without edge-aware loss” (boundary response contains substantial false noise, interfering with real boundary identification); (c) edge response heatmap output by the “model with edge-aware loss” (clearer boundary contours, significantly reduced false noise, and better continuity of slender contrails); (d) contrail mask.

Figure 4. Visualization of comparison experiment on the OpenContrails dataset. Red area: False positive contrails predicted by the model; Blue area: False negative contrails missed by the model; White area: Expert-annotated ground truth of contrails (overlapping with the model’s correctly predicted areas).

Figure 5. Visualization of comparison experiment on the Landsat8 contrail dataset. Red area: False positive contrails predicted by the model; Blue area: False negative contrails missed by the model; White area: Expert-annotated ground truth of contrails (overlapping with the model’s correctly predicted areas).

Figure 6. Visual comparison of ablation study results. (a) Original satellite images with ash-style color projection; (b) ground truth annotations; (c) baseline predictions; (d) frequency-aware fusion decoder predictions; (e) MaxViT encoder predictions; (f) full model with edge-aware loss. Compared to manual annotations, regions where the model made incorrect predictions are marked in red, while omitted regions are highlighted in blue.

Figure 7. Precision–recall curves for the OpenContrails dataset.

Table 1. GOES-16’s bands used for color projection.

Color	Bands	Min [°K]	Max [°K]	γ
Red	12.3 μm–11.2 μm	−4	2	1.0
Green	11.2 μm–8.5 μm	−4	5	1.0
Blue	11.2 μm	243	303	1.0

Table 2. RGB channel specifications for Landsat-8 projection.

Channel	Physical Quantity	Normalization Bounds
Red	$- (T_{11 μ m} - T_{12 μ m})$	$[- 5.5, 1]$
Green (Day)	$1 - ρ_{1.37 μ m}$	$[0.8, 1]$
Green (Night)	0	–
Blue (Day)	$T_{12 μ m}$	$[283, 303]$ K
Blue (Night)	$T_{12 μ m}$	$[243, 303]$ K

Table 3. Performance comparison of different models on OpenContrails and Landsat8 datasets.

Models	OpenContrails				Landsat8 Contrails
Models	Rec [%]	Prc [%]	IoU [%]	F1 [%]	Rec [%]	Prc [%]	IoU [%]	F1 [%]
PSPnet (ResNet50)	36.79	64.18	30.52	46.77	39.81	80.44	36.3	53.26
FPN (ResNet50)	60.82	71.44	48.93	65.70	61.22	75.45	51.05	67.59
Upernet (ResNet50)	62.21	70.01	49.12	65.88	61.94	75.01	51.35	67.85
Segformer (MiT-B5)	62.09	73.21	50.59	67.19	63.07	75.75	52.48	68.83
Unet (ResNet50)	64.31	74.01	52.46	68.82	65.20	75.09	53.60	69.79
DeeplabV3+ (ResNet152d-SE)	63.52	70.89	50.38	67.00	61.94	77.16	51.87	68.31
MFcontrail (ResNet50)	64.34	76.08	53.51	69.72	66.19	75.28	54.38	70.45
MFcontrail (full)	69.26	75.04	56.29	72.03	67.56	76.48	55.94	71.74

Rec: Recall, Prc: Precision, IoU: Intersection over Union, F1: F1-score. Red bold indicates the highest value; blue bold indicates the second-highest value.

Table 4. Ablation study: component contribution analysis.

Stage	Components	Recall	Precision	IoU	F1
1	Baseline	63.52%	70.89%	50.38%	67.00%
2	+FreqFusion Decoder	64.56%	77.73%	54.48%	70.53%
3	+MaxViT-Small	65.82%	77.55%	55.29%	71.20%
4	+Edge-aware Loss	69.26%	75.04%	56.29%	72.03%
Improvement		+5.74%	+4.15%	+5.91%	+5.03%

Bold numbers indicate the best performance.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shi, S.; Wu, J.; Yao, K.; Meng, Q. Deep Learning-Based Contrail Segmentation in Thermal Infrared Satellite Cloud Images via Frequency-Domain Enhancement. Remote Sens. 2025, 17, 3145. https://doi.org/10.3390/rs17183145

AMA Style

Shi S, Wu J, Yao K, Meng Q. Deep Learning-Based Contrail Segmentation in Thermal Infrared Satellite Cloud Images via Frequency-Domain Enhancement. Remote Sensing. 2025; 17(18):3145. https://doi.org/10.3390/rs17183145

Chicago/Turabian Style

Shi, Shenhao, Juncheng Wu, Kaixuan Yao, and Qingxiang Meng. 2025. "Deep Learning-Based Contrail Segmentation in Thermal Infrared Satellite Cloud Images via Frequency-Domain Enhancement" Remote Sensing 17, no. 18: 3145. https://doi.org/10.3390/rs17183145

APA Style

Shi, S., Wu, J., Yao, K., & Meng, Q. (2025). Deep Learning-Based Contrail Segmentation in Thermal Infrared Satellite Cloud Images via Frequency-Domain Enhancement. Remote Sensing, 17(18), 3145. https://doi.org/10.3390/rs17183145

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Learning-Based Contrail Segmentation in Thermal Infrared Satellite Cloud Images via Frequency-Domain Enhancement

Abstract

Highlights

Abstract

1. Introduction

2. Datasets

2.1. Landsat-8 Contrails Dataset

2.2. OpenContrails Dataset

2.3. Model Validation Method

3. Methods

3.1. Color Projection

3.2. MaxViT Encoder

3.3. FreqFusion Decoder

3.4. Edge-Aware Loss

3.4.1. Edge Loss Formulation

3.4.2. Combined Loss Function

4. Experimental Conditions and Settings

4.1. Data Preprocessing

4.2. Evaluation Metrics

4.3. Model Comparison

4.4. Ablation Experiments

5. Result Analysis

5.1. Model Comparison Analysis

5.2. Ablation Study Analysis

5.3. Performance Significance

6. Discussion

6.1. Result Significance

6.2. Comparison with Existing Work

6.3. Limitations

6.4. Practical Applications

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI