1. Introduction
Remote sensing technology utilizes platforms such as satellites, aircraft, or drones equipped with sensors to receive electromagnetic wave reflections or radiation signals from the Earth’s surface, enabling monitoring and analysis of the Earth and its atmosphere. With the rapid advancement of remote sensing, hyperspectral imaging (HSI) technology has emerged prominently. Hyperspectral images capture continuous spectral information, allowing for precise identification of subtle features of objects, and are widely applied in agricultural monitoring [
1], environmental monitoring [
2], and resource exploration [
3]. Image classification is a critical component of remote sensing data processing, as it assigns unique labels to different land cover types. However, hyperspectral image classification faces challenges, including “same spectrum, different objects” [
4], “same object, different spectra” [
5], spectral feature complexity, and noise interference. Fortunately, radar data can provide the three-dimensional spatial information lacking in hyperspectral images, enhancing object recognition. Additionally, it complements spectral information, effectively addressing the issues of “same spectrum, different objects” and “same object, different spectra”. Therefore, combining HSI and LiDAR data leverages the spectral advantages of hyperspectral data and the spatial advantages of radar data, achieving complementary benefits and improving the accuracy of land cover classification [
6,
7]. However, LiDAR data may have limitations in capturing spectral information for different land covers.
In recent years, deep neural networks (DNNs) have demonstrated significant achievements in various domains, including image recognition, speech recognition, and natural language processing, and have been extensively employed in the joint classification of hyperspectral and LiDAR data. Prominent deep learning frameworks include convolutional neural networks (CNNs), graph convolutional networks (GCNs), and transformer networks. For instance, Hang et al. [
8] proposed a framework utilizing two coupled CNNs for the fusion of hyperspectral and LiDAR data, which notably enhanced classification accuracy by leveraging the complementary information from both data sources through feature-level and decision-level fusion methods. Chen et al. [
9] introduced a feature fusion framework based on deep CNNs, extracting spectral–spatial and spatial–elevation features and integrating them using a fully connected DNN, leading to substantial improvements in classification accuracy. Cai et al. [
10] developed a graph attention-based multimodal fusion network (GAMF) that employed parameter sharing and Gaussian tokenization for feature extraction, utilizing a graph attention mechanism to fuse the semantic relationships of multi-source data, resulting in significant advancements in classification performance. Zhang et al. [
11] proposed a transformer-based LIIT model that dynamically integrated HSI and LiDAR features through multi-branch feature embedding, a local multi-source feature interactor (L-MSFI), and a multi-source feature selection module (MSFSM), effectively addressing the challenges associated with collaborative feature extraction and fusion in multi-source data. Xue et al. [
12] presented a novel architecture known as the deep hierarchical vision transformer (DHViT), which extracted features using spectral sequence transformer and spatial hierarchical transformer and employed a cross-attention mechanism to fuse heterogeneous features, thereby enhancing classification performance effectively. Furthermore, to capitalize on the combined strengths of CNNs and GCNs, Wang et al. [
13] introduced an innovative deep learning model termed S3F2Net, which effectively extracted multimodal data features from multiple angles by integrating the properties of both CNNs and GCNs. In an effort to unify the advantages of CNNs and transformers, Zhao et al. [
14] proposed a dual-branch approach that integrated a hierarchical CNN and a transformer, extracting spectral–spatial and elevation features through the CNN and subsequently employing the self-attention mechanism of the transformer for feature fusion, significantly improving classification accuracy. Additionally, the recently proposed Mamba network has also been utilized for the joint classification of hyperspectral and LiDAR data. For example, Li et al. [
15] proposed the AFA-Mamba model, which addresses the challenges of complex information capture and effective fusion of multi-source data in the joint classification of hyperspectral and LiDAR data through adaptive feature alignment and a global–local Mamba design. He et al. [
16] developed a multi-source remote sensing data classification method grounded in the mamba architecture, utilizing the LatSS and LonSS mechanisms to extract spatial–spectral features from hyperspectral and LiDAR data, followed by the CIF module for heterogeneous feature fusion and classification. In essence, these studies [
17,
18,
19,
20,
21] predominantly focused on feature extraction from hyperspectral and LiDAR data using deep neural networks (DNNs), followed by feature fusion to achieve improved feature representation, thereby enhancing classification accuracy. However, due to the significant scale differences of various objects in remote sensing images, single-scale feature extraction often proves inadequate in comprehensively capturing the spatial and spectral features of different target types [
22,
23].
To overcome this limitation, researchers have explored multi-scale feature extraction, aiming to effectively address the issue of significant scale differences among object types in remote sensing images by extracting features at multiple scales [
24]. Specifically, Liu et al. [
25] proposed a multi-scale and multi-directional feature extraction network (MSMD-Net) that integrated multi-scale spatial features, multi-directional spatial features, and spectral feature modules to address the challenges of insufficient utilization of multi-source information. Ni et al. [
26] introduced a multi-scale head selection transformer (MHST) network that extracted spectral–spatial features from HSI and elevation features from LiDAR using multi-scale convolutional layers, reducing redundant information with a head selection pooling transformer, thus significantly enhancing classification performance. Ge et al. [
27] proposed a cross-attention-based multi-scale convolution fusion network (CMCN), which extracted spatial–spectral-elevation features and integrated semantic information from multi-source data to achieve high-accuracy land cover classification. Feng et al. [
28] proposed a dynamic scale hierarchical fusion network (DSHFNet) that used a dynamic scale feature extraction module (DSFE) to select appropriate scale features and reduce dimensionality, employing a multi-attention mechanism for hierarchical fusion to significantly enhance classification performance. Similar works [
29,
30,
31,
32,
33] share the core idea of extracting local spectral information and global spatial context information from HSI and LiDAR at multiple scales and then fusing them to effectively address the issue of significant scale differences in feature types, thereby improving classification accuracy.
In addition to multi-scale feature extraction, data fusion methods can also be fully utilized, which play a crucial role in the accurate classification of multi-source remote sensing images [
34]. Traditional fusion methodologies primarily concentrate on strategies at the feature, decision, or pixel levels. For instance, Zhu et al. [
35] introduced the hierarchical multi-attribute transformer (HMAT), which facilitated feature-level fusion via the hierarchical multi-feature aggregation (HMFA) module, thereby significantly enhancing the joint classification performance of hyperspectral and LiDAR data. Similarly, Li et al. [
36] developed a fusion classification approach for hyperspectral and LiDAR data utilizing a local-pixel-neighborhood-preserving embedding (SSLPNPE) method based on pixel segmentation. That approach effectively improved classification performance by extracting spatial and spectral features while optimizing spatial neighborhoods through pixel segmentation. Additionally, Jia et al. [
37] proposed a cooperative comparative learning (CCL) method that enhanced classification performance in scenarios with limited sample sizes through multi-level fusion of collaborative feature extraction during both the pre-training and fine-tuning stages. Despite the advancements offered by these methods in classification accuracy, their effectiveness remains constrained due to an inadequate exploration of the spatial–spectral relationships and global contextual information inherent in the data. Furthermore, traditional methods exhibit sensitivity to noise and uncertainty within the data, which can lead to inaccurate classification outcomes. Consequently, deep neural networks have been employed for the fusion classification of hyperspectral and LiDAR data. Specifically, Sun et al. [
38] introduced a spectral–spatial feature transformer (SSFTT) that integrated convolutional neural networks (CNNs) with transformer architectures, effectively extracting shallow spectral–spatial features and modeling high-level semantic features, thus significantly enhancing classification accuracy. Li et al. [
39] proposed a depth feature fusion technique for hyperspectral image classification utilizing a double-stream CNN, which simultaneously extracted spectral, local, and global spatial features while incorporating channel correlation to identify the most informative features, resulting in a marked improvement in classification accuracy. Wang et al. [
40] developed a multi-scale spatial–spectral multimodal attention network (MS2CANet), which achieved significant enhancements in classification accuracy through the implementation of multi-scale pyramid convolution and an effective feature recalibration module. Despite the significant successes achieved by the aforementioned methods in the joint classification of hyperspectral and LiDAR data, several limitations and challenges remain: (1) Existing joint classification methods for hyperspectral and LiDAR data inadequately address the modeling of data uncertainty, particularly in handling the phenomena of “same spectrum, different objects” and “same object, different spectra,” which adversely affects classification accuracy. (2) Existing multimodal fusion methods often fail to effectively integrate the complementary information of spatial and spectral features across modalities. This results in insufficient enhancement of classification models’ recognition capabilities and accuracy in complex scenes.
To address these challenges, this paper proposes a Fuzzy-Enhanced Multi-scale Cross-modal Fusion Network (FE-MCFN) for the joint classification of hyperspectral image (HSI) and LiDAR data. We innovatively incorporate fuzzy logic to enhance the feature representation capabilities of multimodal data, enabling the model to robustly handle uncertainties and redundancies within the data. Specifically, an FLM is constructed, which utilizes fuzzy membership functions to weight the input features. This approach captures subtle differences and uncertainty information in the data, enhancing the model’s ability to manage uncertainty effectively. Subsequently, an FFM is developed, which employs fuzzy rules to eliminate redundancy and interference, thereby optimizing feature representation and ensuring that the fused features are more focused on regions relevant to the classification task. The proposed method effectively addresses the limitations of existing multimodal fusion techniques in handling fuzzy modality boundaries and feature uncertainty, demonstrating greater robustness and accuracy in complex scenarios.
The primary contributions of this study are outlined as follows:
- (1)
We propose a fuzzy-enhanced multi-scale cross-modal fusion network that integrates global contextual information through fuzzy logic and CNN. This approach effectively addresses the inherent uncertainties in hyperspectral and LiDAR data while leveraging their complementary nature, thereby significantly enhancing the efficacy of feature extraction and data fusion processes.
- (2)
To address the uncertainty between classes in HSI, we propose an FLM. This module employs Gaussian fuzzy membership functions to weight the features, effectively addressing issues of spectral mixing and noise interference in hyperspectral data.
- (3)
To address the limitations of existing networks in feature fusion strategies, we propose a fuzzy fusion module (FFM). This module applies fuzzy rules to compute the membership degrees of features, enabling more effective weighted fusion and focusing on regions critical for classification.
The structure of the subsequent sections of this paper is organized as follows:
Section 2 provides a comprehensive overview of the proposed model’s framework and explains the operational principles of each component module in detail.
Section 3 systematically presents the experimental results and offers a comprehensive analysis, including ablation studies, quantitative metric comparisons, and visual effect evaluations. Finally,
Section 4 summarizes the paper.
3. Experiments
3.1. Dataset Description
In our experiments, we evaluated the performance of the proposed method on three widely used hyperspectral datasets: Houston2013 dataset, Trento dataset, and MUUFL dataset:
- (1)
Houston2013 Dataset: The Houston2013 dataset was acquired in 2013 in the agricultural and urban areas in the northwest of Houston, USA. The dataset contains images with pixels and 144 bands, covering a spectral range from 0.4 to 2.5 m, with a spatial resolution of 2.5 m. The ground truth contains 15,029 labeled samples from 15 land-cover classes, mainly representing various crops, grasslands, forests, residential areas, commercial areas, roads, and water bodies. In the experiment, 20 samples from each class were selected for training, and the rest were used as test samples.
- (2)
Trento Dataset: The Trento dataset was acquired in the rural area south of Trento, Italy, in 2007. The dataset contains images with pixels and 63 bands, covering a spectral range from 0.42 to 0.99 m, with a spatial resolution of 1 m. The ground truth contains 30,214 labeled samples from six land-cover classes, mainly representing apple trees, buildings, ground, wood, vineyards, and roads. In the experiment, five samples from each class were selected for training, and the rest were used as test samples.
- (3)
MUUFL Dataset: The MUUFL dataset was acquired in November 2010 at the Gulf Park campus of the University of Southern Mississippi in Long Beach, Mississippi. The dataset contains images with pixels and 64 bands, covering a spectral range from 0.38 to 1.05 m, with a spatial resolution of 0.54 × 1.0 m. The ground truth includes 53,687 labeled samples from 11 land-cover classes, mainly representing trees, grasslands, mixed ground, and sand. In the experiment, 20 samples from each class were selected for training, and the rest were used as test samples.
The datasets are available at:
https://github.com/AnkurDeria/MFT ( accessed on 9 January 2025). The names of land categories, along with the numbers of training and testing samples used in the experiments for the three datasets mentioned above, are presented in
Table 1.
Figure 4,
Figure 5 and
Figure 6 illustrate the hyperspectral image in false color, the LiDAR intensity image in grayscale, and the corresponding ground-truth classification map.
3.2. Experimental Setup
(1) Evaluation Metrics: To evaluate the performance of the proposed method and the comparative algorithms, three widely utilized metrics in the hyperspectral image classification domain were employed: overall accuracy (OA), average accuracy (AA), and Kappa coefficient. By comprehensively analyzing OA, AA, and the Kappa coefficient, this study offers a holistic and balanced assessment of classification performance. This approach not only emphasizes the overall classification effectiveness but also considers the performance across various classes and the statistical significance of the results.
(2) Environment Configuration: The computational framework employed in this investigation was the PyTorch v2.3.0 deep learning library. The experimental hardware comprised an Intel Xeon Gold 5320 (2.20 GHz)-Intel Corporation, Santa Clara, CA, USA and an NVIDIA A40 (48 GB VRAM)-NVIDIA Corporation, Santa Clara, CA, USA, with all processing accelerated via the CUDA 11.8 parallel computing architecture. Model training was conducted using the Adam optimization algorithm, with a learning rate of and a maximum of 500 epochs. Multi-scale feature extraction was achieved through convolutional kernels of sizes , , and . Batch size was maintained at 64, and the fuzzy set counts were configured at 30, 50, and 50, respectively. To ensure statistical robustness and result stability, each experimental procedure was repeated ten times, and the mean of the outcomes was reported as the definitive result.
Additionally, to verify the effectiveness of the proposed method, we compared it against several state-of-the-art hyperspectral image classification algorithms, including MS2CANet [
40], CoupledCNNs [
41], MSA-GCN [
42], ExViT [
43], S3F2Net [
13], GLTNet [
44], CALC [
45], and DSHFNet [
28]. It is important to note that for all methods, the same training and test splits described in the data section were utilized to ensure a fair comparison. All experiments were conducted within an identical computing environment, and the hyperparameters for each method were fine-tuned following the recommendations provided in their respective publications.
3.3. Quantitative Results and Analysis
(1) Houston2013 Dataset: For the Houston2013 dataset, The classification results of each model on the Houston2013 dataset are presented in
Table 2, accompanied by the corresponding visual representations in
Figure 7. As indicated in
Table 2, our model outperformed all comparative methods across the three key metrics, OA, AA, and Kappa coefficient, for the Houston2013 dataset. Notably, in terms of OA, our model achieved improvements of 1.52%, 1.76%, 1.10%, 3.7%, 2.63%, 0.7%, 3.07%, and 0.90% over CALC, Coupled-CNNs, ExViT, MSA-GCN, S3F2Net, GLT, DSHFNet, and MS2CANet, respectively. This substantial difference illustrates that our method effectively captured both the spectral and spatial characteristics of various categories.
Examining
Figure 7 reveals that alternative models experienced significant challenges with edge classification and noise within the classification diagrams. For instance, while CALC performed adequately in the C5 category, it presented notable noise and erroneous boundary categorizations in the C6 category. ExViT demonstrated considerable misclassification when addressing the C6 and C9 categories, highlighting its limitations in processing local geographical data. MSA-GCN and DSHFNet excelled in classifying C14 and C15 categories but struggled with accurate boundary classification in C9. Although Coupled-CNNs and S3F2Net were effective in most categories, they failed to optimally capture the local structure of the C8 category, resulting in decreased classification accuracy. GLTNet and MS2CANet displayed stability across several categories, yet issues of noise and blurred boundaries persisted in the challenging C6 category. In contrast, our model consistently outperformed the comparative models across the majority of categories, effectively reducing noise and accurately delineating the boundaries of complex categories, such as C5.
This series of results clearly indicates that our model can effectively capture the spectral and spatial features of various categories, achieving accurate classification of different types of ground objects when processing the Houston2013 dataset.
(2) Trento Dataset: On the Trento dataset, the classification results of each model are shown in
Table 3, and the corresponding visual effects are shown in
Figure 8.
Figure 8a is a standard label diagram, and
Figure 8b–j show the classification results of different algorithms.
Table 3 shows that our model outperformed other comparison models in three important indexes, OA, AA, and Kappa, with respective values of 98.28%, 96.73%, and 98.28%. This outcome demonstrates how our approach may more precisely capture the spatial and spectral information in the overall classification problem. Our model attained notable advantages in numerous categories when viewed from different angles. Our model’s accuracy of 91.77% in the C6 category, for instance, was much higher than MS2CANet’s accuracy of 85.83%. This suggests that the model can more effectively balance the fitting degree and generalization ability, fully utilize spectral and spatial characteristics, and accurately distinguish objects that are very similar.
Other models clearly fell short when it came to handling boundary classification and noise issues, as seen by the classification results in
Figure 8b–j. For instance, ExViT and CALC performed well in C4 and C5 categories, but while working with C3 and C6 categories, there were more sounds, which caused the borders to become hazy. S3F2Net and MS2CANet performed well in the majority of categories; however, they still misclassified samples from the C6 category, suggesting that they lacked in their capacity to capture global context. The accuracy of the classification decreased as a result of Coupled-CNNs and DSHFNet’s inability to manage the C2 class border. In contrast, the classification result of our model was better than that of the contrast model in most categories, which fully verifies the effectiveness of our method in local feature enhancement and context space processing and explains its excellent performance and robustness in complex scenes and boundary classification tasks.
(3) MUUFL Dataset: On the MUUFL dataset, the classification results of each model are shown in
Table 4, and the corresponding visual effects are shown in
Figure 9.
Figure 9a is a standard label diagram, and
Figure 9b–j show the classification results of different algorithms. As illustrated in
Table 4, the coefficients of overall accuracy (OA), average accuracy (AA), and Kappa for our model were 85.32%, 84.81%, and 85.32%, respectively. These results underscore the model’s significant advantages in addressing complex land object classification tasks. Furthermore, our model demonstrated superior performance compared to the benchmark model across the majority of categories. For instance, in the C1 category, our model effectively captured the characteristics of trees, achieving a classification accuracy of 90.25%, which markedly exceeded ExViT’s accuracy of 64.42%. This finding further corroborates the model’s exceptional capability in managing challenges associated with overlapping categories.
The classification results presented in
Figure 9b–j indicate that several models exhibited significant limitations in addressing the boundary regions of complex categories. Specifically, the IF_CALC and GLTNet models demonstrated increased noise levels when classifying the C9 category, resulting in indistinct boundaries. Although Coupled-CNNs and MS2CANet showed commendable performance across most categories, their handling of the C3 category’s boundaries was suboptimal, which adversely affected classification accuracy. In contrast, the FE-MCFN model demonstrated superior accuracy in classifying boundary regions of complex categories, with a notable reduction in noise and achieving a classification accuracy of 100% for the C6 category. The incorporation of fuzzy learning and fuzzy fusion within this model effectively captures the spectral–spatial complexities and mitigates boundary ambiguities between categories, thereby enhancing performance in the boundary regions of complex categories and improving overall classification consistency.
(4) Feature Distribution Analysis: To illustrate the visual contrast effect of each model classification more effectively,
Figure 10. presents the T-SNE feature distribution for S3F2Net, CALC, MS2CANet, and FE-MCFN across three datasets. The experimental findings indicate that the boundaries between different categories in our proposed method are distinctly defined, with minimal overlap, thereby demonstrating a strong feature discrimination capability. In contrast, the other methods exhibit considerable variability in the distribution of within-class features, highlighting their limitations in addressing complex spectral mixed scenarios. Our approach dynamically adjusts feature weights through the membership function inherent in fuzzy logic, which significantly mitigates the variability in features within each class and results in the formation of more compact clusters. Additionally, the implementation of a fuzzy fusion strategy facilitates the establishment of clearer inter-class boundaries within the feature space. This innovative methodology, which integrates fuzzy theory with deep learning, offers a novel perspective for addressing issues of class ambiguity and mixed pixels in hyperspectral data, thereby enhancing the stability and generalization capacity of the classification system.
3.4. Ablation Analysis
To validate the effectiveness of the various modules in the proposed FE-MCFN model, we conducted ablation experiments on the Houston2013, Trento, and MUUFL datasets. The FE-MCFN model comprises two key modules: the fuzzy learning module (FLM) and the fuzzy fusion module (FFM). By integrating individual modules and then combining both modules, we analyzed the specific contributions of each module to overall classification performance.
Table 5 presents the performance under different module combinations.
The experimental results indicate that integrating each module significantly enhanced the classification performance of the model. For instance, on the Houston2013 dataset, the overall accuracy (OA) of the baseline model was , demonstrating the CNN’s capability to capture local spectral and spatial relationships. Following the integration of the FLM, the OA improved to , indicating that the fuzzy learning module effectively enhances the model’s ability to process uncertainty information. With the addition of the FFM, the OA reached , highlighting the crucial role of the fuzzy fusion module in integrating multi-scale features and optimizing feature representation. Similar performance improvements were observed in the Trento and MUUFL datasets. On the Trento dataset, the OA of the baseline model was , while the OA of the complete model with all modules integrated increased to , demonstrating the collaborative effect of each module in enhancing classification accuracy. For the MUUFL dataset, the model’s performance improved from to , further validating the model’s robustness in handling complex data.
These experimental results indicate that the fuzzy learning module effectively addresses issues of spectral mixing and noise interference, thereby enhancing the model’s feature extraction capability. Meanwhile, the fuzzy fusion module significantly improves classification accuracy and robustness through dynamically weighted integration of multi-source features. The synergistic interaction of these modules greatly enhances the classification performance of the model.
3.5. Parameter Sensitivity Analysis
In this study, we analyzed the influence of learning rate and batch size on model performance, aiming to find the best parameter configuration to optimize model training. Different learning rates
and batch sizes
were selected to determine the best learning rate and batch size. The result is shown in
Figure 11. For the Houston2013 dataset, a higher learning rate improved the model performance. For MUUFL dataset, when the learning rate was
or
, better accuracy was achieved with a large batch. On the Trento dataset, a higher learning rate combined with a larger batch size led to better classification performance and helped the model to be more stable. Therefore, the final determination of a learning rate of
and batch size of 64 was the best choice for that model.
3.6. Fuzzy Set Quantity Impact Analysis
In this study, we discussed the influence of fuzzy membership set number on the performance of hyperspectral image classification model. By choosing different numbers of fuzzy sets
, we determined the best setting on the three datasets. The experimental results are shown in
Figure 12. In the Houston2013 dataset, when the number of fuzzy clusters was set to 30, the model reached its best performance, and OA was increased to
, AA to
, and Kappa to
, which was significantly higher than that in the 10 sets. Compared with the suboptimal value, the OA of the MUUFL dataset increased by
in 50 episodes. In 50 episodes of the Trento dataset, OA increased by
compared with the suboptimal value. The results show that when the number of fuzzy clusters was set to 30, 50, and 50, respectively, the three datasets all reached an optimal balance among OA, AA and Kappa coefficients, indicating that the number of fuzzy sets is very important to the model performance.
3.7. Performance Analysis of Different Training Samples
To assess the robustness of the proposed method across varying training sample ratios, we conducted experiments utilizing three distinct datasets. The results of these experiments are illustrated in
Figure 13. On the Houston2013 and MUUFL datasets, the training samples for each category were set at
, while on the Trento dataset, the training samples were
. The findings indicate that the FE-MCFN method exhibits significant robustness and adaptability, particularly in scenarios characterized by limited sample sizes. Notably, as the number of training samples increases, the model’s classification performance consistently improves, reflecting its strong learning capabilities. This trend suggests that FE-MCFN effectively balances the extraction of both local and global features. Importantly, even with a reduced number of samples, the model maintains a high level of accuracy, demonstrating its adaptability in addressing data scarcity challenges. Across all three datasets, despite the limited training samples, FE-MCFN successfully captured essential feature information through its fuzzy learning and fusion mechanisms. This capability enhances the model’s comprehension of complex spectral and spatial relationships, thereby significantly improving classification accuracy and further substantiating its robust generalization ability.
3.8. Model Stability Analysis
To assess the classification robustness of the model across diverse scenarios, a stability analysis was performed on three datasets.
Figure 14 illustrates the OA, AA, and Kappa coefficients for each class. On the Houston2013 dataset, the OA, AA, and Kappa coefficients for FE-MCFN were
,
, and
, respectively, demonstrating high stability and consistency. Notably, the C14 and C15 classes achieved 100% classification accuracy, with standard deviations across 11 classes being ≤3%. On the MUUFL dataset, the model maintained a stable OA of
under complex ground object conditions. On the Trento dataset, OA reached
, with all class standard deviations being ≤0.21%, indicating excellent robustness. The experimental findings confirm that FE-MCFN attains synergistic improvements in accuracy and stability for cross-scene and multi-class classification tasks by integrating multi-scale feature extraction and adaptive optimization mechanisms, thereby demonstrating strong robustness and generalization capabilities.
3.9. Comparison of Computational Efficiency
To comprehensively evaluate the computational efficiency of the FE-MCFN model, we conducted a systematic comparison of various classification models across three datasets, focusing on metrics such as training time, testing time, number of parameters, and computational complexity (FLOPs). As shown in
Table 6, although FE-MCFN did not achieve the shortest training and testing times, it demonstrated excellent performance in terms of computational complexity, ranking third among nine comparative models with a complexity of
G. Furthermore, FE-MCFN maintained a balance between training and testing times across different datasets. For instance, on the Houston2013 dataset, while the training time was slightly longer, the testing time was only
s, showcasing its efficient inference capability. The experimental results indicate that FE-MCFN optimizes feature extraction and fusion strategies, ensuring efficient inference speed while reducing computational complexity. This adaptability meets the demands of diverse datasets and achieves a favorable balance between accuracy and resource consumption.