This section presents the experimental results of the proposed spectral-driven wildfire segmentation framework. The evaluation is conducted in two stages.
First, the performance of the RGB-to-hyperspectral reconstruction model (MST++) is quantitatively assessed using spectral reconstruction metrics, including MRAE, RMSE, PSNR, and SAM.
Second, the effectiveness of the reconstructed spectral representations for downstream segmentation tasks is evaluated across flame segmentation, smoke segmentation, and fire–smoke overlapping scenarios. Quantitative comparisons using IoU, F1 score, Precision, Recall, and the Kappa coefficient are provided to demonstrate the robustness and discriminative capability of the proposed approach.
3.1. Performance Evaluation of MST++ Hyperspectral Image Reconstruction
To quantitatively evaluate RGB-to-hyperspectral reconstruction performance, the MST++ model was assessed on the validation subset of the NTIRE 2022 dataset using the metrics defined in
Section 2.7. The quantitative results are summarized in
Table 2.
As shown in
Table 2, the reconstruction achieved low relative and absolute spectral errors, a high peak signal-to-noise ratio, and minimal spectral angular deviation with respect to the ground-truth hyperspectral data. These results indicate that the reconstructed spectral representations preserve essential wavelength-dependent characteristics and maintain strong spectral fidelity within the visible range.
Representative reconstructed spectral bands spanning 400–700 nm at 10 nm intervals are illustrated in
Figure 3. For visualization clarity, selected bands at wider intervals are displayed. The reconstructed images retain spatial structures and exhibit consistent spectral transitions across wavelengths, further confirming their reliability for subsequent spectral-domain analysis.
To investigate the practical utility of reconstructed hyperspectral representations in wildfire scenarios, additional images containing both flames and smoke were analyzed. Rather than relying on predefined physically dominant wavelengths, an empirical spectral separability analysis was conducted to evaluate the discriminative capability of individual reconstructed bands. As illustrated in
Figure 4, certain wavelength regions demonstrated enhanced target–background contrast under specific imaging conditions. These bands were subsequently selected for threshold-based segmentation to validate their effectiveness in isolating fire and smoke pixels.
Importantly, band selection is driven by empirical separability performance rather than fixed emission assumptions, ensuring adaptability across diverse environmental and illumination conditions. The above results provide quantitative and qualitative evidence that the reconstructed hyperspectral data form a reliable foundation for subsequent spectral-driven segmentation experiments.
3.2. Contribution of Hyperspectral Reconstruction (Ablation Study)
To clarify the necessity of hyperspectral reconstruction within the proposed framework, an ablation study was conducted to compare segmentation performance obtained from reconstructed spectral bands and raw RGB inputs under identical threshold-based segmentation settings. All evaluation metrics and parameters were kept unchanged to ensure a fair comparison.
Specifically, flame segmentation using RGB inputs was performed on the red channel, which provides the strongest response to fire emissions in standard RGB imagery, while grayscale intensity was adopted for smoke segmentation due to its reliance on luminance variations. For visualization consistency, original RGB images are presented as scene references, while channel-specific representations were internally used for RGB-based segmentation.
Quantitative results are summarized in
Table 3, which evaluates the influence of input representation on segmentation performance. Overall, reconstructed hyperspectral bands consistently outperform RGB-based representations across all evaluation metrics, indicating improved intrinsic separability between fire-related targets and background regions.
For flame segmentation, the RGB baseline achieved an IoU of 9.28% and an F1 score of 16.99%, whereas the reconstructed spectral band improved performance to an IoU of 69.58% and an F1 score of 82.06%. Similarly, for smoke segmentation, hyperspectral reconstruction increased the IoU from 56.11% to 86.45% and improved the F1 score from 71.89% to 92.73%. The Kappa coefficient also showed substantial improvement in both tasks, demonstrating stronger agreement beyond chance with ground-truth annotations and confirming the statistical reliability of the reconstructed spectral representations.
These results indicate that performance improvements arise primarily from enhanced spectral separability rather than increased algorithmic complexity. By expanding the feature space from RGB intensity information to wavelength-dependent representations, hyperspectral reconstruction enables accurate segmentation using simple threshold-based operations.
Qualitative comparisons illustrating these improvements are presented in
Figure 5, where reconstructed spectral bands exhibit clearer target–background contrast and more accurate boundary delineation than RGB-based results. Together, these findings provide quantitative and qualitative evidence that reconstructed hyperspectral data form a reliable foundation for subsequent spectral-driven segmentation experiments.
These observations suggest that the performance improvement observed in this study is strongly associated with enhanced input representation and improved target–background separability. Notably, substantial gains are achieved even when segmentation is performed using simple threshold-based operations without introducing more complex deep architectures. Therefore, within the current experimental framework, hyperspectral reconstruction can be understood as a representation-enhancement step prior to downstream decision making. However, the extent to which this benefit transfers consistently across different segmentation backbones has not been established as a general conclusion in the present work and requires more systematic investigation in future studies.
3.3. Evaluation of Hyperspectral Segmentation Performance for Fire Targets
This section further evaluates the segmentation performance of the MST++ model in typical flame scenes. Due to the limited size of the test set, three representative scenarios were deliberately selected for performance analysis.
Figure 6a–c present the raw images, where
Figure 6a represents a nighttime scene,
Figure 6b depicts a large open flame, and
Figure 6c shows a small open fire.
As shown in
Figure 6d–f, the MST++ model was applied to flame detection and achieved promising results. It effectively captured flame-sensitive spectral bands and their distinctive features, demonstrating accurate recognition and strong edge delineation capability.
Figure 6g–i show that although the U-Net model correctly detected the general flame regions, partial merging of targets occurred under low-light conditions.
These observations indicate that the MST++ model exhibited superior segmentation performance in typical fire scenes.
As shown in
Table 4, quantitative comparisons show that the U-Net model achieved an IoU of 44.42%, an F1 score of 58.15%, a Precision of 74.67%, a Recall of 68.72%, and a Kappa coefficient of 0.5625, whereas MST++ achieved an IoU of 76.90%, an F1 score of 86.81%, a Precision of 93.35%, a Recall of 82.04%, and a Kappa coefficient of 0.8603.
The experimental results demonstrate that the MST++ model provides improved performance in hyperspectral segmentation of fire targets and enhances accurate detection of fire regions in practical scenarios.
3.4. Evaluation of Smoke Detection Capability
This section evaluates the segmentation performance of the MST++ model for smoke targets. Considering the need for manual annotation of smoke datasets and the associated risk of human error, only images with clear and well-defined smoke contours were selected for testing. Moreover, to enhance the diversity of testing scenarios, images from different viewpoints were deliberately chosen:
Figure 7a represents a ground-level perspective,
Figure 7b an aerial view captured by a drone, and
Figure 7c a satellite view, thereby covering various real-world monitoring conditions.
As shown in
Figure 7d–f, the MST++ model was applied to smoke detection. Compared with the manually annotated smoke regions shown in
Figure 7j–l, the results demonstrate accurate detection and clear edge delineation. The MST++ model effectively captured smoke-sensitive spectral bands and their distinctive features.
Conversely,
Figure 7g–i show that the performance of the U-Net model deteriorated as the detection area increased.
These observations indicate that the MST++ model provided more reliable segmentation performance across typical smoke scenarios.
As shown in
Table 5, quantitative analysis shows that the U-Net model achieved an IoU of 74.68%, an F1 score of 85.34%, a Precision of 99.81%, a Recall of 74.77%, and a Kappa coefficient of 0.7507, whereas the MST++ model achieved an IoU of 91.76%, an F1 score of 95.66%, a Precision of 97.51%, a Recall of 93.90%, and a Kappa coefficient of 0.9192.
These results indicate that the MST++ model improves hyperspectral segmentation performance for smoke targets, enabling more accurate and stable smoke detection while reducing the impact of annotation-related uncertainties and demonstrating strong generalization capability.
3.5. Evaluation of Detection Ability in Fire-Smoke Overlapping Scenarios
In practical forest fire monitoring tasks, fire and smoke frequently appear simultaneously, posing a significant challenge for accurate detection. To further evaluate the recognition capability of the MST++ model under mixed fire–smoke conditions, various representative scenes were selected for testing. Fire and smoke segmentation performances were analyzed separately under these mixed scenarios.
Figure 8a shows a small fire scene where smoke and fire are spatially separated, while
Figure 8b presents a large fire scene with clear separation.
Figure 8c illustrates a large fire scene with overlapping smoke and fire, and
Figure 8d depicts a small fire scene with significant fire–smoke overlap. These scenarios collectively provide a comprehensive evaluation of the model’s generalization capability.
As shown in
Figure 8e–h, the MST++ model was applied to flame detection in mixed fire–smoke environments. Compared with the manually annotated flame regions in
Figure 8m–p, the results demonstrate strong consistency. MST++ effectively captures flame-sensitive spectral bands and distinctive flame features, achieving accurate detection along with clear and precise boundary delineation.
In contrast,
Figure 8i–l show the results produced by U-Net. Although U-Net performs reasonably well in separated scenarios, its performance degrades significantly when smoke overlaps with flames. When flames are heavily obscured by smoke, U-Net frequently fails to correctly detect fire targets, resulting in false detections. In small-fire scenarios with dense smoke, U-Net even completely misses flame regions.
These observations indicate that MST++ achieves improved segmentation performance for fire targets in mixed fire–smoke scenarios compared with U-Net.
As shown in
Table 6, according to the averaged quantitative results for flame detection, the U-Net model achieves an IoU of 48.25%, an F1 Score of 58.46%, a Precision of 71.44%, a Recall of 50.27%, and a Kappa coefficient of 0.5677. In comparison, the MST++ model shows significant improvement, achieving an IoU of 55.55%, an F1 Score of 70.42%, a Precision of 99.13%, a Recall of 55.91%, and a Kappa coefficient of 0.6846.
For smoke region recognition,
Figure 9a–d present the original images corresponding to the same scenes used in the fire detection experiments. Specifically,
Figure 9a represents a small fire scene with separation,
Figure 9b a large fire scene with separation,
Figure 9c a large fire scene with overlapping smoke and fire, and
Figure 9d a small fire scene with overlapping conditions.
As illustrated in
Figure 9e–h, the MST++ model was applied to smoke detection under mixed scenarios. Compared with the manually annotated smoke regions in
Figure 9m–p, MST++ again demonstrates accurate detection and precise edge delineation.
By comparison,
Figure 9i–l reveal that although U-Net performs satisfactorily in separated conditions, it suffers from substantial misclassification and omission errors under overlapping scenarios. Consequently, MST++ consistently outperforms U-Net in smoke segmentation tasks involving mixed fire–smoke scenes.
As shown in
Table 7, quantitative evaluation further supports these findings. The U-Net model achieves an IoU of 60.76%, an F1 Score of 72.91%, a Precision of 99.35%, a Recall of 60.94%, and a Kappa coefficient of 0.5584. In contrast, the MST++ model achieves superior performance, with an IoU of 89.64%, an F1 Score of 94.52%, a Precision of 97.58%, a Recall of 91.67%, and a Kappa coefficient of 0.8658.
Overall, the results demonstrate that the MST++ model maintains robust segmentation and detection performance even under complex fire–smoke overlapping conditions, highlighting its strong generalization capability and practical applicability.