Research on Forest Fire Detection and Segmentation Based on MST++ Hyperspectral Reconstruction Technology

Tang, Shuai; Xu, Jie; Zhang, Li

doi:10.3390/fire9040139

Open AccessArticle

Research on Forest Fire Detection and Segmentation Based on MST++ Hyperspectral Reconstruction Technology

by

Shuai Tang

¹,

Jie Xu

² and

Li Zhang

^1,*

¹

College of Science, Beijing Forestry University, Beijing 100083, China

²

College of Electronics and Information Engineering, Beijing University of Aeronautics and Astronautics, Beijing 100191, China

^*

Author to whom correspondence should be addressed.

Fire 2026, 9(4), 139; https://doi.org/10.3390/fire9040139

Submission received: 26 December 2025 / Revised: 21 March 2026 / Accepted: 23 March 2026 / Published: 25 March 2026

(This article belongs to the Special Issue Machine Learning (ML) and Deep Learning (DL) Applications in Wildfire Science: Principles, Progress and Prospects (2nd Edition))

Download

Browse Figures

Versions Notes

Abstract

The increasing frequency of global forest fires necessitates rapid and accurate detection methods. This study proposes a forest fire detection and segmentation framework based on the MST++ hyperspectral reconstruction model to improve the accuracy and robustness of wildfire monitoring under complex environmental conditions. The proposed method first reconstructs hyperspectral images from RGB inputs using an MST++ model trained on the NTIRE 2022 RGB-to-hyperspectral dataset (950 paired samples), followed by fire and smoke segmentation based on spectrally sensitive bands. For segmentation experiments, 118 flame images from the BoWFire dataset and 100 manually annotated smoke images from public datasets (D-Fire and DFS) were used. Quantitative results demonstrate that the proposed MST++-based method significantly outperforms the conventional U-Net baseline. In flame segmentation, MST++ achieved an IoU of 76.90%, an F1 score of 86.81%, and a Kappa coefficient of 0.8603, compared to 44.42%, 58.15%, and 0.5625 for U-Net, respectively. For smoke segmentation, MST++ achieved an IoU of 91.76% and an F1 score of 95.66%, surpassing U-Net by 17.08% and 10.32%, respectively. In fire–smoke overlapping scenarios, MST++ maintained strong robustness, achieving an IoU of 89.64% for smoke detection. These results indicate that hyperspectral reconstruction enhances discrimination capability among flame, smoke, and complex backgrounds, particularly under low-light and overlapping conditions. The proposed framework provides a reliable and efficient solution for early forest fire detection and demonstrates the potential of hyperspectral reconstruction approaches in disaster monitoring applications.

Keywords:

forest fire and smoke detection; hyperspectral image reconstruction; MST++ algorithm; instance segmentation; machine learning

1. Introduction

1.1. Research Background and Significance

Forests play an essential role in maintaining ecological balance and regulating the global climate system. In recent years, however, the frequency and intensity of forest fires have increased significantly, posing serious threats to ecosystems, human safety, and environmental sustainability [1]. Forest fires release large amounts of greenhouse gases, accelerate climate change, and cause long-term ecological degradation, thereby making timely detection and monitoring a critical global challenge.

Climate change and human activities further exacerbate wildfire risks by increasing vegetation flammability and extreme weather events [2]. Consequently, rapid and accurate detection of fire-affected areas has become essential for effective disaster response and early warning. Traditional monitoring approaches, including manual inspection and conventional remote sensing techniques, often suffer from low efficiency and insufficient detection accuracy, highlighting the need for more advanced monitoring methods.

Recent advances in hyperspectral imaging provide new opportunities for wildfire monitoring [3]. Unlike conventional imaging systems, hyperspectral sensors capture continuous spectral data at high resolution, enabling detailed characterization of fire-related properties such as combustion status, emission characteristics, and vegetation damage [4]. These capabilities allow more accurate discrimination between fire-induced changes and other environmental variations.

Meanwhile, deep learning techniques have demonstrated strong performance in feature extraction and pattern recognition. Convolutional neural networks (CNNs) [5] and Transformer-based architectures [6] can automatically learn complex representations from large datasets, significantly improving fire detection accuracy and efficiency [7].

Therefore, integrating hyperspectral imaging with deep learning offers a promising framework for accurate and efficient forest fire monitoring. Such integration enables earlier fire detection and improved decision support while promoting broader applications of hyperspectral imaging approaches in environmental monitoring and disaster management.

1.2. Research Status of Hyperspectral Reconstruction Technology

Hyperspectral imaging captures continuous spectral information across a wide range of wavelengths, providing a rich data foundation for material identification, classification, and quantitative analysis. It has been extensively applied in remote sensing [8], medical imaging [9], and industrial inspection [10]. However, due to hardware limitations and high acquisition costs, capturing high-resolution hyperspectral images remains challenging.

Therefore, hyperspectral reconstruction methods have emerged to recover high-dimensional hyperspectral information from low-dimensional observations. Existing approaches are mainly categorized into interpolation-based, filtering-based, and deep learning-based methods.

Early research focused on interpolation and filtering-based methods [11]. Interpolation-based methods, such as bilinear interpolation and nearest-neighbor interpolation, estimate missing pixel values using spatial or spectral correlations. Although computationally simple, these methods often result in low reconstruction accuracy and loss of fine details [12]. Filtering-based methods, such as Wiener filtering and Kalman filtering, reconstruct hyperspectral images by constructing observation models and incorporating prior knowledge. These methods can improve reconstruction quality to a certain extent but remain sensitive to noise and require accurate system parameters [13].

In recent years, with the rapid development of deep learning, learning-based hyperspectral reconstruction methods have achieved significant progress [14]. By constructing deep neural networks, these approaches learn mappings between low-dimensional observations and high-dimensional hyperspectral data from large-scale training datasets, effectively overcoming limitations of traditional techniques [15]. Representative methods include CNN-based methods [16], generative adversarial network (GAN)-based methods [17], and Transformer-based methods [18]. In 2022, networks such as SpectralFormer and MST employ self-attention mechanisms to model long-range dependencies in hyperspectral images, achieving more accurate spectral reconstruction [19].

Although hyperspectral reconstruction methods have made significant progress in recent years, challenges remain, including limited training data, high model complexity, and insufficient physical interpretability.

As an important image processing approach, hyperspectral reconstruction holds promising applications in remote sensing and computer vision fields [20]. Therefore, developing more efficient network architectures to improve reconstruction speed and performance remains a critical research direction.

1.3. Main Works of This Paper

This study proposes a reconstruction-driven spectral enhancement framework for wildfire detection, shifting the focus from conventional deep learning model comparison to feature-space transformation and intrinsic separability enhancement.

Instead of directly applying complex segmentation networks to RGB images, the proposed approach first reconstructs hyperspectral representations from RGB inputs using the MST++ network [21], thereby expanding the feature space into a spectral domain. This spectral expansion provides richer wavelength-dependent features that are not available in standard three-channel RGB imagery.

Through empirical spectral sensitivity analysis, discriminative wavelength bands are identified for flame and smoke under varying environmental conditions. Experimental observations indicate that specific spectral regions exhibit stronger contrast between fire-related targets and background elements. By exploiting this enhanced separability, target regions are directly extracted using threshold-based segmentation. These results demonstrate that spectral-domain expansion significantly improves intrinsic target–background discrimination, enabling effective segmentation under the adopted experimental setting through enhanced target–background separability.

To validate the effectiveness of this spectral-driven framework, a conventional RGB-based U-Net model is used as a baseline. The results reveal that hyperspectral reconstruction combined with simple thresholding achieves superior robustness and accuracy, particularly in challenging scenarios such as low illumination, smoke–fire overlap, and visually ambiguous backgrounds. These findings indicate that improving feature separability plays an important role in wildfire segmentation performance within the adopted experimental framework.

From a remote sensing perspective, this work highlights the feasibility of RGB-to-hyperspectral reconstruction as a cost-effective alternative to physical hyperspectral sensing systems for wildfire monitoring. Furthermore, the study illustrates how feature-space expansion can improve target–background separability and facilitate downstream segmentation under adverse imaging conditions within the experimental framework considered in this work.

The main contributions of this study can be summarized as follows:

A reconstruction-driven wildfire segmentation framework is proposed, where hyperspectral reconstruction serves as a feature-space expansion module prior to segmentation, shifting the emphasis from model complexity to intrinsic separability enhancement.
A spectral sensitivity–guided band selection strategy is developed to identify discriminative wavelength regions for flame and smoke under diverse environmental conditions, enabling effective threshold-based segmentation.
Empirical evidence shows that spectral-domain enhancement allows simple thresholding strategies to outperform RGB-based deep segmentation models in complex wildfire scenarios.
The study provides methodological insight into how spectral reconstruction–based representation enhancement can improve wildfire segmentation performance under the adopted experimental setting, offering a practical perspective for remote sensing-based disaster monitoring.

Importantly, the proposed framework emphasizes representation enhancement rather than segmentation architecture design. The ablation results demonstrate that substantial performance improvements can be achieved using simple threshold-based segmentation under the current experimental setting. This suggests that the effectiveness of hyperspectral reconstruction in this study is closely related to improved feature separability prior to downstream segmentation. However, the broader applicability of this effect across different segmentation backbones is not established as a general conclusion in the present work.

Overall, this study contributes a spectral-driven wildfire segmentation framework that bridges hyperspectral reconstruction and practical disaster monitoring, providing a new technical pathway for environmental surveillance applications.

2. Methods

2.1. Overview of the MST++ Framework

Existing fire segmentation datasets are scarce, and the annotation process for smoke and flame is labor-intensive and prone to human error. Traditional methods heavily rely on manual labeling, which is both time-consuming and difficult to maintain at high levels of accuracy. To address this problem, a hyperspectral reconstruction–based fire segmentation method is proposed in this study using the MST++ (Multi-Scale Transformer++) algorithm. By converting original RGB images into hyperspectral representations, MST++ enhances spectral separability between fire-related targets and background regions, thereby providing a stronger representation basis for subsequent segmentation.

In this study, the MST++ algorithm is applied to the field of fire detection. As shown in Figure 1, MST++ is an advanced image processing model that combines multi-scale convolution operations with self-attention mechanisms to enhance feature representation through multi-scale modeling and attention-based learning, thereby supporting accurate downstream segmentation. Its processing pipeline can be divided into three main stages: encoder, bottleneck, and decoder.

Before introducing the detailed mathematical formulation, a conceptual overview of the processing pipeline is provided to clarify the information flow of MST++. The encoder extracts multi-scale spatial features and learns latent spectral representations from the input RGB image. The bottleneck stage enhances global contextual relationships through transformer-based attention mechanisms, while the decoder progressively reconstructs high-resolution hyperspectral representations that serve as the basis for subsequent spectral-domain segmentation. This hierarchical design enables simultaneous modeling of local structures and long-range dependencies, forming the foundation of hyperspectral reconstruction–driven fire and smoke analysis.

In the encoder stage, the image is subjected to multi-layer convolution operations and spatial self-attention modules (SST) to extract spatial and spectral features. The goal of this stage is to extract rich local feature information through multi-scale convolutions to facilitate the subsequent self-attention mechanism to model global information. The encoding operation can be expressed as:

Z_{1} = C o n v (X) and S_{1} = S S T (Z_{1})

where

X

is the input image,

Z_{1}

is the feature map after convolution, and

S_{1}

is the feature after spatial self-attention processing.

At the bottleneck stage, the high-order features of the image are further enhanced through multiple self-attention blocks (SABs) and feed-forward networks (FFNs). The self-attention block (SAB) is mainly used to model long-distance dependencies and enhance contextual information between different regions, while the FFN is used for further nonlinear transformation to improve feature expressiveness. The formula for this process can be expressed as:

Z_{2} = S A B (S_{1}) and Z_{3} = F F N (Z_{2})

In the decoding stage, upsampling and image reconstruction were performed to output high-resolution hyperspectral reconstruction results. The purpose of the decoder was to progressively restore spatial resolution in order to obtain fine-grained segmentation outputs. The decoding process can be expressed as:

Y = U p s a m p l e (Z_{3}) followed by Y^{'} = R e c o n s t r u c t i o n (Y)

where

Y

is the up sampled feature map and

Y^{'}

is the final segmentation output image.

2.2. Overview of the U-Net Framework

In addition to MST++, the U-Net model was also used in the experimental section as a comparative baseline. Through multidimensional comparisons between the proposed method and U-Net, the effectiveness of the proposed approach was validated. The overall architecture of U-Net is illustrated in Figure 2. The U-Net network structure can be divided into two main parts: an encoder and a decoder. In the encoder, a series of convolutional and pooling operations are used to extract image features while progressively reducing spatial resolution. The decoder restores the spatial resolution of the image through upsampling and convolution operations and generates the final segmentation result.

In the encoder, hierarchical features were gradually extracted through successive convolutional layers followed by max-pooling operations, and compresses the spatial resolution of the image. The encoder formula can be expressed as:

Z_{1} = C o n v (X) a n d Z_{2} = M a x P o o l (Z_{1})

where

X

is the input image,

Z_{1}

is the feature map after convolution, and

Z_{2}

is the low-resolution feature map after pooling.

During decoding, spatial resolution was progressively restored through upsampling, and feature maps were concatenated with corresponding encoder outputs via skip connections. The formula of the decoder is expressed as:

Z_{3} = U p s a m p l e (Z_{2}) a n d Z_{4} = C o n c a t e n a t e (Z_{3}, Z_{1})

where

Z_{3}

is the upsampled feature map,

Z_{4}

is the feature map concatenated with the encoder feature through skip connection.

2.3. Theoretical Motivation for Hyperspectral Reconstruction

The incorporation of hyperspectral reconstruction prior to segmentation is motivated by the intrinsic limitations of RGB imagery in wildfire detection tasks. RGB images compress scene radiance information into three broad and overlapping spectral channels, often resulting in spectral mixing and reduced discriminability between fire-related targets and visually similar background elements, particularly under complex illumination conditions.

In contrast, hyperspectral representations approximate a continuous spectral reflectance distribution, enabling finer-grained characterization of wavelength-dependent responses. Although the reconstructed hyperspectral data are not physically measured spectra, they provide an expanded feature space that captures spectral variations beyond the original three-channel RGB representation. From a feature-space perspective, this dimensional expansion enhances class separability among flame, smoke, and background regions.

Rather than assuming fixed physically dominant wavelengths, this study adopts empirical spectral sensitivity analysis to identify discriminative bands under different imaging conditions. The selected reconstructed bands demonstrate improved target–background contrast compared with individual RGB channels, thereby enabling effective threshold-based segmentation.

Although physically acquired hyperspectral imaging systems provide high spectral fidelity, such systems are typically expensive, data-intensive, and impractical for large-scale or real-time wildfire monitoring. The reconstruction-based approach adopted in this study combines the accessibility of standard RGB imaging with the enhanced spectral richness of hyperspectral representations, offering a practical solution for spectral-enhanced wildfire segmentation under the adopted experimental framework.

2.4. Dataset

The experimental section of this study includes three primary datasets.

First, the MST++ RGB-to-hyperspectral reconstruction dataset from the NTIRE 2022 Challenge was used, consisting of 950 RGB images paired with corresponding hyperspectral images. Among these, 900 images were used for training, while the remaining 50 were reserved for validation.

In addition, to construct the flame segmentation model, 118 flame images with different resolutions were selected from the BoWFire public dataset. These images were randomly divided into training and validation sets at a ratio of 9:1. To further achieve precise smoke segmentation, 100 smoke images with clearly defined contours were collected from public datasets, including the D-Fire dataset, the DFS dataset, and publicly available wildfire imagery with manual annotation performed by the authors. These images were manually annotated using Labelme and subsequently partitioned into training and validation sets using the same 9:1 ratio.

For the datasets used in this study, data augmentation techniques were applied during U-Net training to improve model generalization, including random translation, random horizontal flipping, and random scale cropping. As a result, the training dataset was effectively expanded to tens of thousands of samples. All input images were resized to 256 × 256 pixels and normalized prior to training. These preprocessing steps ensured the robustness and reliability of the experimental results.

2.5. Training Configuration and Hyperparameters

In the experimental section of this study, the hyperparameters of the MST++ model were optimized and adjusted to ensure optimal model performance. For the MST++ model, the learning rate was set to 0.001, and the Adam optimizer was employed for gradient updates. The cross-entropy loss function was adopted for training. With the selected hyperparameter configuration, the training performance of the model was effectively improved, resulting in enhanced segmentation performance on the test set.

The hardware configuration of the workstation is summarized in Table 1. The system is equipped with an Intel E5-2640 central processing unit (Intel Corporation, Santa Clara, CA, USA), an NVIDIA RTX 6000 graphics processing unit (NVIDIA Corporation, Santa Clara, CA, USA), and a 1 TB solid-state drive (Samsung Electronics, Suwon-si, Gyeonggi-do, Republic of Korea).

2.6. Data Annotation and Augmentation

In this study, data processing involved careful annotation of fire-related imagery. The raw fire dataset was preprocessed and transformed into hyperspectral representations to facilitate more precise segmentation tasks. To enhance model robustness, manual annotations of fire regions were performed using the Labelme tool. Data augmentation operations, including rotation, flipping, and noise injection, were applied to increase the diversity of the training data.

The dataset was systematically divided into training, validation, and testing subsets to ensure comprehensive model evaluation across different data distributions. After preprocessing, all images were normalized to ensure consistent pixel value ranges across samples, thereby improving the stability of model training.

2.7. Performance Metrics

To comprehensively evaluate both hyperspectral reconstruction fidelity and wildfire segmentation performance, two categories of quantitative metrics were adopted. Reconstruction quality was assessed using spectral similarity and error-based indicators, while segmentation accuracy was evaluated using widely accepted classification metrics.

2.7.1. Reconstruction Metrics

To quantitatively evaluate the fidelity of RGB-to-hyperspectral reconstruction, four standard metrics were employed: Mean Relative Absolute Error (MRAE), Root Mean Square Error (RMSE), Peak Signal-to-Noise Ratio (PSNR), and Spectral Angle Mapper (SAM).

(1): Mean Relative Absolute Error (MRAE)

MRAE measures the relative absolute difference between reconstructed and ground-truth spectral intensities:

MRAE = \frac{1}{N} \sum_{i = 1}^{N} \frac{|\hat{S_{i}} - S_{i}|}{S_{i}}

where

\hat{S_{i}}

and

S_{i}

denote the reconstructed and ground-truth spectral values at pixel i, and N represents the total number of spectral samples. Lower MRAE values indicate better reconstruction accuracy.

(2): Root Mean Square Error (RMSE)

RMSE evaluates the overall reconstruction error magnitude:

RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(\hat{S_{i}} - S_{i})}^{2}}

A lower RMSE reflects higher reconstruction fidelity.

(3): Peak Signal-to-Noise Ratio (PSNR)

PSNR measures the reconstruction quality relative to the maximum possible signal intensity:

PSNR = 10 \log_{10} (\frac{M A X^{2}}{MSE})

MSE = \frac{1}{N} \sum_{i = 1}^{N} {(\hat{S_{i}} - S_{i})}^{2}

where MAX represents the maximum possible pixel value and MSE denotes the mean squared error between reconstructed and ground-truth spectra. Higher PSNR values indicate better reconstruction performance.

(4): Spectral Angle Mapper (SAM)

SAM evaluates spectral similarity by computing the angular difference between reconstructed and reference spectral vectors:

SAM = \arccos (\frac{\hat{S} \cdot S}{| \hat{S} | | S |})

where

\hat{S}

and S represent reconstructed and ground-truth spectral vectors, respectively. Smaller SAM values indicate higher spectral consistency.

2.7.2. Segmentation Metrics

To quantitatively assess segmentation performance for smoke and flame detection in forest fire scenarios, five widely accepted evaluation metrics were adopted.

(1): Intersection over Union (IoU)

The IoU measures the average overlap between predicted segmentation masks and ground truth annotations across all classes (smoke, flame, and background). For a single class, the Intersection over Union (IoU) is defined as:

I o U = \frac{T P}{T P + F P + F N}

where TP (True Positives) denotes correctly predicted pixels of the target class, FP (False Positives) represents pixels incorrectly classified as the target class, and FN (False Negatives) indicates target-class pixels misclassified as other classes.

(2): Precision

Precision quantifies the proportion of correctly identified positive pixels relative to all predicted positives, emphasizing the avoidance of false alarm:

P r e c i s i o n = \frac{T P}{T P + F P} \times 100 %

High precision is vital for minimizing over-detection in dynamic fire scenes, where misclassifying background elements (e.g., clouds or dust) as smoke/flame could degrade system reliability.

(3): Recall

Recall evaluates the model’s ability to capture all relevant positive pixels in the ground truth, defined as:

R e c a l l = \frac{T P}{T P + F N} \times 100 %

For forest fire applications, high recall ensures minimal missed detections of critical targets, such as early-stage smoke plumes or small flames.

(4): F1 Score

The F1 Score harmonizes precision and recall into a single metric via their harmonic mean, addressing potential trade-offs between them:

F 1 S c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l} \times 100 %

This metric is particularly relevant for imbalanced datasets where smoke/flame pixels occupy a small fraction of the total image area.

(5): Cohen’s Kappa Coefficient

Kappa evaluates the agreement between predictions and ground truth while accounting for random chance, offering a robust measure for segmentation consistency:

κ = \frac{p_{o} - p_{e}}{1 - p_{e}}

where

p_{o}

is the observed agreement (equivalent to overall accuracy), and

p_{e}

is the hypothetical probability of random agreement, calculated from the confusion matrix. A Kappa value close to 1 indicates strong model reliability, especially critical for distinguishing smoke from similarly textured environmental artifacts (e.g., fog or shadows).

3. Research Results

This section presents the experimental results of the proposed spectral-driven wildfire segmentation framework. The evaluation is conducted in two stages.

First, the performance of the RGB-to-hyperspectral reconstruction model (MST++) is quantitatively assessed using spectral reconstruction metrics, including MRAE, RMSE, PSNR, and SAM.

Second, the effectiveness of the reconstructed spectral representations for downstream segmentation tasks is evaluated across flame segmentation, smoke segmentation, and fire–smoke overlapping scenarios. Quantitative comparisons using IoU, F1 score, Precision, Recall, and the Kappa coefficient are provided to demonstrate the robustness and discriminative capability of the proposed approach.

3.1. Performance Evaluation of MST++ Hyperspectral Image Reconstruction

To quantitatively evaluate RGB-to-hyperspectral reconstruction performance, the MST++ model was assessed on the validation subset of the NTIRE 2022 dataset using the metrics defined in Section 2.7. The quantitative results are summarized in Table 2.

As shown in Table 2, the reconstruction achieved low relative and absolute spectral errors, a high peak signal-to-noise ratio, and minimal spectral angular deviation with respect to the ground-truth hyperspectral data. These results indicate that the reconstructed spectral representations preserve essential wavelength-dependent characteristics and maintain strong spectral fidelity within the visible range.

Representative reconstructed spectral bands spanning 400–700 nm at 10 nm intervals are illustrated in Figure 3. For visualization clarity, selected bands at wider intervals are displayed. The reconstructed images retain spatial structures and exhibit consistent spectral transitions across wavelengths, further confirming their reliability for subsequent spectral-domain analysis.

To investigate the practical utility of reconstructed hyperspectral representations in wildfire scenarios, additional images containing both flames and smoke were analyzed. Rather than relying on predefined physically dominant wavelengths, an empirical spectral separability analysis was conducted to evaluate the discriminative capability of individual reconstructed bands. As illustrated in Figure 4, certain wavelength regions demonstrated enhanced target–background contrast under specific imaging conditions. These bands were subsequently selected for threshold-based segmentation to validate their effectiveness in isolating fire and smoke pixels.

Importantly, band selection is driven by empirical separability performance rather than fixed emission assumptions, ensuring adaptability across diverse environmental and illumination conditions. The above results provide quantitative and qualitative evidence that the reconstructed hyperspectral data form a reliable foundation for subsequent spectral-driven segmentation experiments.

3.2. Contribution of Hyperspectral Reconstruction (Ablation Study)

To clarify the necessity of hyperspectral reconstruction within the proposed framework, an ablation study was conducted to compare segmentation performance obtained from reconstructed spectral bands and raw RGB inputs under identical threshold-based segmentation settings. All evaluation metrics and parameters were kept unchanged to ensure a fair comparison.

Specifically, flame segmentation using RGB inputs was performed on the red channel, which provides the strongest response to fire emissions in standard RGB imagery, while grayscale intensity was adopted for smoke segmentation due to its reliance on luminance variations. For visualization consistency, original RGB images are presented as scene references, while channel-specific representations were internally used for RGB-based segmentation.

Quantitative results are summarized in Table 3, which evaluates the influence of input representation on segmentation performance. Overall, reconstructed hyperspectral bands consistently outperform RGB-based representations across all evaluation metrics, indicating improved intrinsic separability between fire-related targets and background regions.

For flame segmentation, the RGB baseline achieved an IoU of 9.28% and an F1 score of 16.99%, whereas the reconstructed spectral band improved performance to an IoU of 69.58% and an F1 score of 82.06%. Similarly, for smoke segmentation, hyperspectral reconstruction increased the IoU from 56.11% to 86.45% and improved the F1 score from 71.89% to 92.73%. The Kappa coefficient also showed substantial improvement in both tasks, demonstrating stronger agreement beyond chance with ground-truth annotations and confirming the statistical reliability of the reconstructed spectral representations.

These results indicate that performance improvements arise primarily from enhanced spectral separability rather than increased algorithmic complexity. By expanding the feature space from RGB intensity information to wavelength-dependent representations, hyperspectral reconstruction enables accurate segmentation using simple threshold-based operations.

Qualitative comparisons illustrating these improvements are presented in Figure 5, where reconstructed spectral bands exhibit clearer target–background contrast and more accurate boundary delineation than RGB-based results. Together, these findings provide quantitative and qualitative evidence that reconstructed hyperspectral data form a reliable foundation for subsequent spectral-driven segmentation experiments.

These observations suggest that the performance improvement observed in this study is strongly associated with enhanced input representation and improved target–background separability. Notably, substantial gains are achieved even when segmentation is performed using simple threshold-based operations without introducing more complex deep architectures. Therefore, within the current experimental framework, hyperspectral reconstruction can be understood as a representation-enhancement step prior to downstream decision making. However, the extent to which this benefit transfers consistently across different segmentation backbones has not been established as a general conclusion in the present work and requires more systematic investigation in future studies.

3.3. Evaluation of Hyperspectral Segmentation Performance for Fire Targets

This section further evaluates the segmentation performance of the MST++ model in typical flame scenes. Due to the limited size of the test set, three representative scenarios were deliberately selected for performance analysis. Figure 6a–c present the raw images, where Figure 6a represents a nighttime scene, Figure 6b depicts a large open flame, and Figure 6c shows a small open fire.

As shown in Figure 6d–f, the MST++ model was applied to flame detection and achieved promising results. It effectively captured flame-sensitive spectral bands and their distinctive features, demonstrating accurate recognition and strong edge delineation capability.

Figure 6g–i show that although the U-Net model correctly detected the general flame regions, partial merging of targets occurred under low-light conditions.

These observations indicate that the MST++ model exhibited superior segmentation performance in typical fire scenes.

As shown in Table 4, quantitative comparisons show that the U-Net model achieved an IoU of 44.42%, an F1 score of 58.15%, a Precision of 74.67%, a Recall of 68.72%, and a Kappa coefficient of 0.5625, whereas MST++ achieved an IoU of 76.90%, an F1 score of 86.81%, a Precision of 93.35%, a Recall of 82.04%, and a Kappa coefficient of 0.8603.

The experimental results demonstrate that the MST++ model provides improved performance in hyperspectral segmentation of fire targets and enhances accurate detection of fire regions in practical scenarios.

3.4. Evaluation of Smoke Detection Capability

This section evaluates the segmentation performance of the MST++ model for smoke targets. Considering the need for manual annotation of smoke datasets and the associated risk of human error, only images with clear and well-defined smoke contours were selected for testing. Moreover, to enhance the diversity of testing scenarios, images from different viewpoints were deliberately chosen: Figure 7a represents a ground-level perspective, Figure 7b an aerial view captured by a drone, and Figure 7c a satellite view, thereby covering various real-world monitoring conditions.

As shown in Figure 7d–f, the MST++ model was applied to smoke detection. Compared with the manually annotated smoke regions shown in Figure 7j–l, the results demonstrate accurate detection and clear edge delineation. The MST++ model effectively captured smoke-sensitive spectral bands and their distinctive features.

Conversely, Figure 7g–i show that the performance of the U-Net model deteriorated as the detection area increased.

These observations indicate that the MST++ model provided more reliable segmentation performance across typical smoke scenarios.

As shown in Table 5, quantitative analysis shows that the U-Net model achieved an IoU of 74.68%, an F1 score of 85.34%, a Precision of 99.81%, a Recall of 74.77%, and a Kappa coefficient of 0.7507, whereas the MST++ model achieved an IoU of 91.76%, an F1 score of 95.66%, a Precision of 97.51%, a Recall of 93.90%, and a Kappa coefficient of 0.9192.

These results indicate that the MST++ model improves hyperspectral segmentation performance for smoke targets, enabling more accurate and stable smoke detection while reducing the impact of annotation-related uncertainties and demonstrating strong generalization capability.

3.5. Evaluation of Detection Ability in Fire-Smoke Overlapping Scenarios

In practical forest fire monitoring tasks, fire and smoke frequently appear simultaneously, posing a significant challenge for accurate detection. To further evaluate the recognition capability of the MST++ model under mixed fire–smoke conditions, various representative scenes were selected for testing. Fire and smoke segmentation performances were analyzed separately under these mixed scenarios.

Figure 8a shows a small fire scene where smoke and fire are spatially separated, while Figure 8b presents a large fire scene with clear separation. Figure 8c illustrates a large fire scene with overlapping smoke and fire, and Figure 8d depicts a small fire scene with significant fire–smoke overlap. These scenarios collectively provide a comprehensive evaluation of the model’s generalization capability.

As shown in Figure 8e–h, the MST++ model was applied to flame detection in mixed fire–smoke environments. Compared with the manually annotated flame regions in Figure 8m–p, the results demonstrate strong consistency. MST++ effectively captures flame-sensitive spectral bands and distinctive flame features, achieving accurate detection along with clear and precise boundary delineation.

In contrast, Figure 8i–l show the results produced by U-Net. Although U-Net performs reasonably well in separated scenarios, its performance degrades significantly when smoke overlaps with flames. When flames are heavily obscured by smoke, U-Net frequently fails to correctly detect fire targets, resulting in false detections. In small-fire scenarios with dense smoke, U-Net even completely misses flame regions.

These observations indicate that MST++ achieves improved segmentation performance for fire targets in mixed fire–smoke scenarios compared with U-Net.

As shown in Table 6, according to the averaged quantitative results for flame detection, the U-Net model achieves an IoU of 48.25%, an F1 Score of 58.46%, a Precision of 71.44%, a Recall of 50.27%, and a Kappa coefficient of 0.5677. In comparison, the MST++ model shows significant improvement, achieving an IoU of 55.55%, an F1 Score of 70.42%, a Precision of 99.13%, a Recall of 55.91%, and a Kappa coefficient of 0.6846.

For smoke region recognition, Figure 9a–d present the original images corresponding to the same scenes used in the fire detection experiments. Specifically, Figure 9a represents a small fire scene with separation, Figure 9b a large fire scene with separation, Figure 9c a large fire scene with overlapping smoke and fire, and Figure 9d a small fire scene with overlapping conditions.

As illustrated in Figure 9e–h, the MST++ model was applied to smoke detection under mixed scenarios. Compared with the manually annotated smoke regions in Figure 9m–p, MST++ again demonstrates accurate detection and precise edge delineation.

By comparison, Figure 9i–l reveal that although U-Net performs satisfactorily in separated conditions, it suffers from substantial misclassification and omission errors under overlapping scenarios. Consequently, MST++ consistently outperforms U-Net in smoke segmentation tasks involving mixed fire–smoke scenes.

As shown in Table 7, quantitative evaluation further supports these findings. The U-Net model achieves an IoU of 60.76%, an F1 Score of 72.91%, a Precision of 99.35%, a Recall of 60.94%, and a Kappa coefficient of 0.5584. In contrast, the MST++ model achieves superior performance, with an IoU of 89.64%, an F1 Score of 94.52%, a Precision of 97.58%, a Recall of 91.67%, and a Kappa coefficient of 0.8658.

Overall, the results demonstrate that the MST++ model maintains robust segmentation and detection performance even under complex fire–smoke overlapping conditions, highlighting its strong generalization capability and practical applicability.

4. Discussion

4.1. Performance in Complex Scenarios

The experimental results demonstrate that the MST++-based framework maintains stable segmentation performance across diverse wildfire scenarios, including nighttime environments, small-scale fires, fire–smoke overlapping conditions, UAV imagery, and satellite observations. This robustness can be largely attributed to the hyperspectral reconstruction process, which enhances wavelength-dependent feature representations and improves discrimination between fire-related targets and complex backgrounds. By leveraging spectral information beyond conventional RGB intensity variations, the model becomes less sensitive to illumination changes and visual interference, thereby improving cross-scene generalization.

Although the training datasets used in this study are relatively limited in size, the evaluation protocol intentionally incorporates heterogeneous testing scenarios with substantial variations in illumination, spatial resolution, and observation perspectives. Such cross-platform validation partially alleviates dataset scale limitations and provides preliminary evidence of generalization capability. Nevertheless, larger-scale wildfire datasets would further improve statistical reliability. Future work will therefore include validation on more extensive publicly available wildfire benchmarks to further assess large-scale generalization performance.

In addition to dataset scale, potential sources of dataset bias should also be considered. Wildfire imagery often exhibits class imbalance, where fire and smoke regions occupy relatively small spatial proportions compared with background areas, potentially influencing model optimization and evaluation metrics. Moreover, annotation uncertainty can arise near smoke boundaries due to gradual intensity transitions and ambiguous visual definitions. These factors may introduce variability into quantitative results and partially affect generalization assessment. Future studies will therefore consider larger datasets with improved class balance and more standardized annotation protocols to further enhance statistical reliability.

4.2. Limitations of MST++

Despite the encouraging performance achieved by the proposed framework, several limitations remain. First, misclassification and omission errors still occur under low-contrast conditions, where fire or smoke regions exhibit spectral characteristics similar to those of the surrounding environment. Since MST++ primarily relies on global feature modeling, hyperspectral information alone may not always fully separate targets from spectrally similar backgrounds, particularly in highly dynamic scenes.

Second, the current framework employs threshold-based segmentation following hyperspectral reconstruction. In this study, thresholding serves as an interpretable mechanism for validating spectral separability among fire, smoke, and background regions. However, manually or empirically selected thresholds may reduce automation and reproducibility in large-scale operational scenarios. Future work will therefore focus on integrating adaptive threshold estimation or learning-based decision mechanisms to establish a fully end-to-end automated segmentation pipeline.

Third, U-Net was selected as the primary baseline due to its widespread adoption and architectural interpretability in wildfire segmentation research. It should be noted that the objective of this study is not to develop a new segmentation architecture but to investigate how hyperspectral reconstruction influences feature representation prior to segmentation. Since the reconstruction module operates before semantic feature extraction, the proposed framework is positioned as a representation-enhancement step prior to downstream segmentation. However, the extent to which this benefit transfers consistently across different segmentation backbones has not been established as a general conclusion in the present work. Therefore, U-Net is used here as an interpretable and controlled baseline for isolating the effect of hyperspectral reconstruction, while more systematic cross-backbone evaluation is left as future work.

It should be noted that the present study primarily focuses on validating the feasibility of hyperspectral reconstruction–enhanced representation for wildfire segmentation within the adopted experimental framework. Although the proposed framework is conceptually positioned before downstream segmentation, its effectiveness across different segmentation backbones has not been established as a general conclusion in this work. The observed performance may vary with the downstream backbone and training configuration. Therefore, a more systematic cross-backbone and cross-configuration evaluation is considered an important direction for future research rather than a core objective of the present study.

4.3. Computational Performance and Operational Feasibility

For wildfire monitoring applications, computational efficiency and operational feasibility are important considerations. The proposed MST++-based framework was implemented on a workstation equipped with an NVIDIA RTX 6000 graphics processing unit (NVIDIA Corporation, Santa Clara, CA, USA) under the experimental configuration described in Section 3.1, providing a practical reference for evaluating deployment feasibility. Although runtime optimization was not the primary focus of this study, the framework processed 256 × 256 images efficiently within the experimental environment, enabling stable and consistent experimental evaluation.

Compared with conventional convolutional architectures such as U-Net, transformer-based models generally introduce higher computational complexity due to attention mechanisms. However, the enhanced robustness and spectral discrimination capability provided by hyperspectral reconstruction offer practical advantages in wildfire monitoring scenarios, where detection reliability and resistance to environmental interference are often prioritized over high frame-rate processing.

In this context, real-time monitoring typically refers to continuous video-stream processing requiring frame-level inference at high temporal frequencies (e.g., 15–30 frames per second), whereas periodic monitoring involves discrete image acquisition at longer temporal intervals, such as UAV patrol imaging or satellite revisits occurring every several seconds, minutes, or longer. Since many wildfire monitoring systems rely on intermittent observations rather than continuous video streams, reliable detection performance becomes more critical than ultra-low inference latency.

Therefore, the proposed framework is more suitable for periodic monitoring systems, including UAV-based inspection missions and satellite observation platforms, where images are analyzed sequentially rather than continuously streamed. Future work will include systematic runtime benchmarking and deployment-oriented optimization to further evaluate real-time applicability and operational efficiency.

4.4. Outlook

Building upon the above findings and identified limitations, future research will focus on improving both model efficiency and detection capability. Introducing more efficient adaptive attention mechanisms into the MST++ architecture may reduce computational complexity while maintaining spectral reconstruction accuracy. In addition, integrating advanced attention modules or multi-scale feature enhancement strategies may further improve sensitivity to subtle or sparse smoke regions, thereby enhancing detection performance in challenging wildfire scenarios.

5. Conclusions

This study proposes a hyperspectral reconstruction–driven framework for fire and smoke segmentation based on the MST++ model. By transforming RGB images into reconstructed hyperspectral representations, the proposed approach enhances spectral separability between fire-related targets and background regions, thereby improving segmentation performance under complex environmental conditions.

Quantitative experiments demonstrate the effectiveness of the proposed method. The MST++ model achieved reliable hyperspectral reconstruction performance, with an MRAE of 0.1645, RMSE of 0.0248, PSNR of 34.32 dB, and SAM of 0.0852, indicating strong spectral consistency between reconstructed and ground-truth hyperspectral data. In segmentation tasks, the proposed framework significantly outperformed the RGB-based U-Net baseline, achieving an IoU of 76.90% and an F1 score of 86.81% for flame segmentation, and an IoU of 91.76% with an F1 score of 95.66% for smoke segmentation.

Furthermore, ablation experiments confirmed the necessity of hyperspectral reconstruction. When segmentation was performed directly on RGB inputs, IoU values were limited to 9.28% for flame and 56.11% for smoke, whereas reconstructed spectral bands improved IoU to 69.58% and 86.45%, respectively. These results indicate that, within the adopted experimental framework, performance gains are strongly associated with enhanced spectral separability prior to downstream segmentation.

The main contributions of this study can therefore be summarized as follows. First, a reconstruction-driven spectral enhancement strategy is introduced to improve wildfire segmentation under the adopted experimental setting. Second, the applicability of the MST++ hyperspectral reconstruction model is extended to wildfire monitoring scenarios, demonstrating its capability to capture discriminative spectral characteristics of flame and smoke. Third, the proposed framework enables robust joint segmentation of fire and smoke across challenging scenarios, including low-illumination and fire–smoke overlap conditions.

Overall, this study highlights the potential of RGB-to-hyperspectral reconstruction as a practical and cost-effective alternative to physical hyperspectral sensing for wildfire monitoring. By enhancing intrinsic feature separability at the representation level, the proposed framework enables accurate wildfire segmentation using simple decision mechanisms under the studied experimental conditions. These findings support the value of representation enhancement for wildfire image analysis in the current task setting, while broader conclusions across different segmentation architectures remain to be investigated in future work.

Author Contributions

Conceptualization, S.T., J.X. and L.Z.; methodology, S.T. and J.X.; software, S.T.; validation, S.T.; formal analysis, S.T. and L.Z.; investigation, S.T.; resources, L.Z.; data curation, S.T.; writing—original draft preparation, J.X.; writing—review and editing, S.T. and L.Z.; visualization, S.T. and J.X.; supervision, L.Z.; project administration, L.Z.; funding acquisition, L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China under Grant No. 62505025.

Data Availability Statement

Data will be made available on request.

Acknowledgments

We are very grateful to all the students who assisted with data collection and the experiments. We also thank the anonymous reviewers for their helpful comments and suggestions on this paper. The authors also acknowledge the use of ChatGPT (GPT-5), Kimi (K2.5), and DeepSeek (DeepSeek-V3.2) for language editing and text polishing. All content was carefully reviewed by the authors.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Parks, S.A.; Guiterman, C.H.; Margolis, E.Q.; Lonergan, M.; Whitman, E.; Abatzoglou, J.T.; Falk, D.A.; Johnston, J.D.; Daniels, L.D.; Lafon, C.W.; et al. A fire deficit persists across diverse North American forests despite recent increases in area burned. Nat. Commun. 2025, 16, 1493. [Google Scholar] [CrossRef] [PubMed]
Samborska, V.; Ritchie, H. Wildfires. Our World in Data. 2024. Available online: https://ourworldindata.org/wildfires (accessed on 3 May 2025).
Chen, Y.; Morton, D.C.; Randerson, J.T. Remote sensing for wildfire monitoring: Insights into burned area, emissions, and fire dynamics. One Earth 2024, 7, 1022–1028. [Google Scholar] [CrossRef]
Allison, R.S.; Johnston, J.M.; Craig, G.; Jennings, S. Airborne optical and thermal remote sensing for wildfire detection and monitoring. Sensors 2016, 16, 1310. [Google Scholar] [CrossRef] [PubMed]
Gargiulo, M.; Dell’AGlio, D.A.G.; Iodice, A.; Riccio, D.; Ruello, G. A CNN-based super-resolution technique for active fire detection on Sentinel-2 data. In 2019 PhotonIcs & Electromagnetics Research Symposium-Spring (PIERS-Spring); IEEE: New York, NY, USA, 2019. [Google Scholar] [CrossRef]
Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.-S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep learning in remote sensing: A comprehensive review and list of resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [Google Scholar] [CrossRef]
Zhou, B.; Gao, S.; Yin, Y.; Zhong, Y. Enhancing active fire detection in Sentinel 2 imagery using GLCM texture features in random forest models. Sci. Rep. 2024, 14, 31076. [Google Scholar] [CrossRef] [PubMed]
Pandey, P.C.; Balzter, H.; Srivastava, P.K.; Petropoulos, G.P.; Bhattacharya, B. Future perspectives and challenges in hyperspectral remote sensing. In Hyperspectral Remote Sensing; Elsevier: Amsterdam, The Netherlands, 2020; pp. 429–439. [Google Scholar] [CrossRef]
Liu, L.X.; Li, M.Z.; Zhao, Z.G.; Qu, J.L. Recent advances of hyperspectral imaging application in biomedicine. Chin. J. Lasers 2018, 45, 0207017. [Google Scholar] [CrossRef]
Sun, D.-W.; Pu, H.; Yu, J. Applications of hyperspectral imaging technology in the food industry. Nat. Rev. Electr. Eng. 2024, 1, 251–263. [Google Scholar] [CrossRef]
Loncan, L.; de Almeida, L.B.; Bioucas-Dias, J.M.; Briottet, X.; Chanussot, J.; Dobigeon, N.; Fabre, S.; Liao, W.; Licciardi, G.A.; Simoes, M.; et al. Hyperspectral pansharpening: A review. IEEE Geosci. Remote Sens. Mag. 2015, 3, 27–46. [Google Scholar] [CrossRef]
Yang, J.; Fu, X.; Hu, Y.; Huang, Y.; Ding, X.; Paisley, J. PanNet: A deep network architecture for pan-sharpening. In Proceedings of the IEEE International Conference on Computer Vision; IEEE: New York, NY, USA, 2017. [Google Scholar] [CrossRef]
Dian, R.; Li, S.; Guo, A.; Fang, L. Deep hyperspectral image sharpening. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 5345–5355. [Google Scholar] [CrossRef] [PubMed]
Jia, Y.; Feng, Y. Hyperspectral swing-scanning compressed imaging and data reconstruction. Infrared Technol. 2017, 39, 722–727. Available online: https://wenku.baidu.com/view/33758ab1a1116c175f0e7cd184254b35eefd1aba (accessed on 3 May 2025).
Hu, J.; Zhao, M.; Li, Y. Hyperspectral image super-resolution by deep spatial-spectral exploitation. Remote Sens. 2019, 11, 1229. [Google Scholar] [CrossRef]
Li, J.; Wu, C.; Song, R.; Li, Y.; Xie, W.; He, L.; Gao, X. Deep hybrid 2-D–3-D CNN based on dual second-order attention with camera spectral sensitivity prior for spectral super-resolution. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 623–634. [Google Scholar] [CrossRef] [PubMed]
Shi, Y.; Han, L.; Han, L.; Chang, S.; Hu, T.; Dancey, D. A latent encoder coupled generative adversarial network (LE-GAN) for efficient hyperspectral image super-resolution. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5534819. [Google Scholar] [CrossRef]
Hu, J.-F.; Huang, T.-Z.; Deng, L.-J.; Dou, H.-X.; Hong, D.; Vivone, G. Fusformer: A transformer-based fusion network for hyperspectral image super-resolution. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6012305. [Google Scholar] [CrossRef]
Zhang, M.; Zhang, C.; Zhang, Q.; Guo, J.; Gao, X.; Zhang, J. Essaformer: Efficient transformer for hyperspectral image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: New York, NY, USA, 2023. [Google Scholar] [CrossRef]
Bioucas-Dias, J.M.; Plaza, A.; Camps-Valls, G.; Scheunders, P.; Nasrabadi, N.; Chanussot, J. Hyperspectral remote sensing data analysis and future challenges. IEEE Geosci. Remote Sens. Mag. 2013, 1, 6–36. [Google Scholar] [CrossRef]
Cai, Y.; Lin, J.; Lin, Z.; Wang, H.; Zhang, Y.; Pfister, H.; Timofte, R.; Van Gool, L. Mst++: Multi-stage spectral-wise transformer for efficient spectral reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2022. [Google Scholar] [CrossRef]

Figure 1. MST++ flowchart.

Figure 2. U-Net flowchart.

Figure 3. Representative visualization of hyperspectral images reconstructed by MST++ from an RGB input. (a) Original RGB image; (b–h) reconstructed grayscale spectral bands sampled from 400 nm to 700 nm at 50 nm intervals, demonstrating spatial consistency and smooth spectral transitions across reconstructed wavelengths.

Figure 4. Selection of reconstructed spectral bands for fire and smoke segmentation based on empirical separability analysis.

Figure 5. Qualitative ablation comparison demonstrating the effect of hyperspectral reconstruction on threshold-based wildfire segmentation. (a–e) Flame example and (f–j) smoke example. Columns from left to right present the original RGB image, ground-truth mask, RGB-based segmentation (red channel for flame and grayscale intensity for smoke), selected reconstructed hyperspectral band, and segmentation result obtained from the reconstructed band, showing improved target–background separability and boundary accuracy.

Figure 6. Comparison of segmentation performance between MST++ and U-Net under different flame conditions (nighttime, large flame, small flame). (a–c) The raw images. (d–f) The flame detected by MST++. (g–i) The flame detected by U-Net. (j–l) Artificial and accurately calibrated flame images.

Figure 7. Performance comparison between MST++ and U-Net for smoke detection from ground, UAV, and satellite perspectives. (a–c) The raw images. (d–f) The smoke detected by MST++. (g–i) The smoke detected by U-net. (j–l) Artificial and accurately calibrated smoke images.

Figure 8. Comparison of flame detection results between MST++ and U-Net in scenarios with varying degrees of smoke occlusion and fire size. (a–d) The raw images. (e–h) The flame detected by MST++. (i–l) The flame detected by U-net. (m–p) Artificial and accurately calibrated flame images.

Figure 9. Comparison of smoke detection results between MST++ and U-Net in the same mixed fire–smoke scenarios as Figure 8. (a–d) The raw images. (e–h) The smoke detected by MST++. (i–l) The smoke detected by U-net. (m–p) Artificial and accurately calibrated smoke images.

Table 1. Workstation hardware attributes.

Hardware	Attribute
CPU	E5-2640
GPU	RTX 6000 24 GB
SSD	1T SSD
Memory	32 GB

Table 2. Quantitative reconstruction performance of MST++ on the NTIRE 2022 validation dataset.

	MRAE	RMSE	PSNR (dB)	SAM (Rad)
MST++	0.1645	0.0248	34.32	0.0852

Table 3. Quantitative comparison between RGB-based and reconstructed hyperspectral representations for threshold-based flame and smoke segmentation.

Task	Input Representation	IoU (%)	F1 Score (%)	Kappa
Flame	RGB (R channel)	9.28	16.99	0.1122
Flame	Reconstructed band	69.58	82.06	0.8149
Smoke	RGB (Grayscale)	56.11	71.89	0.0629
Smoke	Reconstructed band	86.45	92.73	0.8461

Table 4. Comparison of the accuracy of the MST++ model and U-Net in the flame segmentation task.

	IoU (%)	F1 Score (%)	Precision (%)	Recall (%)	Kappa
U-Net	44.42	58.15	74.67	68.72	0.5625
MST++	76.90	86.81	93.35	82.04	0.8603

Table 5. Comparison of the accuracy of MST++ model and U-net in smoke segmentation task.

	IoU (%)	F1 Score (%)	Precision (%)	Recall (%)	Kappa
U-Net	74.68	85.34	99.81	74.77	0.7507
MST++	91.76	95.66	97.51	93.90	0.9192

Table 6. Comparison of the accuracy of flame detection between the MST++ model and U-Net under fire-smoke overlapping scenarios.

	IoU (%)	F1 Score (%)	Precision (%)	Recall (%)	Kappa
U-Net	48.25	58.46	71.44	50.27	0.5677
MST++	55.55	70.42	99.13	55.91	0.6846

Table 7. Comparison of the smoke detection accuracy between the MST++ model and U-Net under fire-smoke overlapping scenarios.

	IoU (%)	F1 Score (%)	Precision (%)	Recall (%)	Kappa
U-Net	60.76	72.91	99.35	60.94	0.5584
MST++	89.64	94.52	97.58	91.67	0.8658

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tang, S.; Xu, J.; Zhang, L. Research on Forest Fire Detection and Segmentation Based on MST++ Hyperspectral Reconstruction Technology. Fire 2026, 9, 139. https://doi.org/10.3390/fire9040139

AMA Style

Tang S, Xu J, Zhang L. Research on Forest Fire Detection and Segmentation Based on MST++ Hyperspectral Reconstruction Technology. Fire. 2026; 9(4):139. https://doi.org/10.3390/fire9040139

Chicago/Turabian Style

Tang, Shuai, Jie Xu, and Li Zhang. 2026. "Research on Forest Fire Detection and Segmentation Based on MST++ Hyperspectral Reconstruction Technology" Fire 9, no. 4: 139. https://doi.org/10.3390/fire9040139

APA Style

Tang, S., Xu, J., & Zhang, L. (2026). Research on Forest Fire Detection and Segmentation Based on MST++ Hyperspectral Reconstruction Technology. Fire, 9(4), 139. https://doi.org/10.3390/fire9040139

Article Menu

Research on Forest Fire Detection and Segmentation Based on MST++ Hyperspectral Reconstruction Technology

Abstract

1. Introduction

1.1. Research Background and Significance

1.2. Research Status of Hyperspectral Reconstruction Technology

1.3. Main Works of This Paper

2. Methods

2.1. Overview of the MST++ Framework

2.2. Overview of the U-Net Framework

2.3. Theoretical Motivation for Hyperspectral Reconstruction

2.4. Dataset

2.5. Training Configuration and Hyperparameters

2.6. Data Annotation and Augmentation

2.7. Performance Metrics

2.7.1. Reconstruction Metrics

2.7.2. Segmentation Metrics

3. Research Results

3.1. Performance Evaluation of MST++ Hyperspectral Image Reconstruction

3.2. Contribution of Hyperspectral Reconstruction (Ablation Study)

3.3. Evaluation of Hyperspectral Segmentation Performance for Fire Targets

3.4. Evaluation of Smoke Detection Capability

3.5. Evaluation of Detection Ability in Fire-Smoke Overlapping Scenarios

4. Discussion

4.1. Performance in Complex Scenarios

4.2. Limitations of MST++

4.3. Computational Performance and Operational Feasibility

4.4. Outlook

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI