BayesNet: Enhancing UAV-Based Remote Sensing Scene Understanding with Quantifiable Uncertainties

: Remote sensing stands as a fundamental technique in contemporary environmental monitoring, facilitating extensive data collection and offering invaluable insights into the dynamic nature of the Earth’s surface. The advent of deep learning, particularly convolutional neural networks (CNNs), has further revolutionized this domain by enhancing scene understanding. However, despite the advancements, traditional CNN methodologies face challenges such as overfitting in imbalanced datasets and a lack of precise uncertainty quantification, crucial for extracting meaningful insights and enhancing the precision of remote sensing techniques. Addressing these critical issues, this study introduces BayesNet, a Bayesian neural network (BNN)-driven CNN model designed to normalize and estimate uncertainties, particularly aleatoric and epistemic, in remote sensing datasets. BayesNet integrates a novel channel–spatial attention module to refine feature extraction processes in remote sensing imagery, thereby ensuring a robust analysis of complex scenes. BayesNet was trained on four widely recognized unmanned aerial vehicle (UAV)-based remote sensing datasets, UCM21, RSSCN7, AID, and NWPU, and demonstrated good performance, achieving accuracies of 99.99%, 97.30%, 97.57%, and 95.44%, respectively. Notably, it has showcased superior performance over existing models in the AID, NWPU, and UCM21 datasets, with enhancements of 0.03%, 0.54%, and 0.23%, respectively. This improvement is significant in the context of complex scene classification of remote sensing images, where even slight improvements mark substantial progress against complex and highly optimized benchmarks. Moreover, a self-prepared remote sensing testing dataset is also introduced to test BayesNet against unseen data, and it achieved an accuracy of 96.39%, which showcases the effectiveness of the BayesNet in scene classification tasks.


Introduction
Remote sensing imaging technology has advanced significantly over the years, enabling satellites and unmanned aerial vehicles (UAVs) equipped with sophisticated sensors to capture high-resolution images.These images are comprehensive, of high quality, and provide an excellent platform for object recognition and scene categorization.Remote sensing methods offer a range of techniques for presenting information about the Earth's surface, including classification, detection, and scene understanding, without the need for physical interaction.
However, remote sensing objects can be challenging to discern due to their varying sizes and locations [1][2][3].To address this challenge, annotated aerial imagery is utilized to construct modern machine learning and deep learning models.Moreover, the conventional classification of remote sensing objects is often a time-consuming process, requiring experts to examine and interpret each image individually [4,5].
To expedite this process, researchers are working on developing deep learning models to understand remote sensing scenes using classification models such as the convolutional neural network (CNN).Automatic feature extraction plays a crucial role in rapidly extracting essential information from the images, which saves processing time [6,7].
CNNs provide reliable and efficient tools for automatically predicting higher-level characteristics from input data.Hu et al. demonstrated the superiority of CNN-based strategies over classic machine learning algorithms in classification tasks using hyperspectral imagery [8].This finding is crucial as it underlines the evolving trend toward more sophisticated, data-driven approaches in remote sensing, a trend that our study builds upon by exploring Bayesian methods.Similarly, Grana et al. compared deep learning algorithms and Monte Carlo approaches for classifying facies from seismic data [9].This comparison is pivotal in highlighting the strengths and weaknesses of various computational approaches, informing our choice of methodology.However, they only consider machine learning methods which do not provide the measurement of the uncertainty.
Zhang et al. [10] employed deep learning to classify seismic facies within stratigraphic sequences.Their work contributes to the broader understanding of stratigraphic sequence classification but also highlights a limitation in addressing the complex patterns of seismic data, an aspect our study aims to tackle through improved feature extraction techniques.Li et al. introduced a unique pixel-pair technique for image classification [11], offering a novel perspective on spatial relationships in imagery.Their approach, though innovative, encounters limitations in processing efficiency, a gap that our study addresses by implementing a deep learning method.Zhao et al. proposed a spectral-spatial feature-based CNN classification framework [12].Their work is significant for incorporating both spectral and spatial features, yet it underscores the challenge of integrating these features without significant computational overhead, an issue our research seeks to ameliorate.
Neeta and Saroj presented a semi-supervised classification model using neural networks [13].However, their technique's reliance on semi-supervised learning highlights a need for more robust fully supervised methods, especially in scenarios with limited labeled data, a challenge that our study addresses.Saroj proposed a deep auto-encoder neural network architecture [14] with a focus on automating feature extraction and enhancing generalization capability; while this method is innovative in leveraging neighborhood rough sets, it do not fully capture the complex, multi-dimensional nature of remote sensing data, an area where our research contributes by implementing a more comprehensive feature analysis framework.Wu and Guo's introduction of a robust interval type-2 fuzzy clustering method [15] represents a significant stride in handling uncertainties in remote sensing image classification.Their approach to address category density and object spectra uncertainties is enlightening.Nevertheless, the method's complexity in handling overlapping categories exposes the need for a more robust yet equally effective approach, which our study aims to provide.
While traditional CNNs outperform conventional machine learning approaches regarding classification, their deterministic parameters do not allow for uncertainty calculation.Additionally, deterministic CNN-based predictions may provide inaccurate classification labeling, leading to unintended consequences if not accompanied by some measure of confidence.To address these issues, various methods have been developed to examine uncertainty in classification models.One of the most effective uncertainty estimation approaches is the Bayesian CNN model.
The Bayesian CNN is a probabilistic deep learning approach that quantifies uncertainty by utilizing stochastic weights and biases, as opposed to the deterministic counterparts in traditional CNNs.This stochastic parameterization allows the Bayesian CNN to cap-ture data variability and calculate uncertainty in its predictions, resembling standard backpropagation.Furthermore, regularization of weights through variational free energy minimization is possible, similar to dropout regularization.Shridhar et al. introduced this technique known as "Bayes via backprop", which has demonstrated superior performance, including improved uncertainty measures and normalization, compared to traditional CNNs [16,17].Kendall et al. proposed a Bayesian deep learning paradigm that addresses aleatoric and epistemic uncertainty [18], leading to a 1 to 3% performance boost over deterministic models, with the Monte Carlo dropout method by Gal et al. [19].The growing popularity of Bayesian CNNs is attributed to their ability to incorporate uncertainty into predictions, which is crucial for applications like remote sensing data analysis [20,21].This uncertainty assessment contributes to enhancing data quality, preventing overfitting, and ensuring precise models, though their accuracy is not yet on par with state-of-the-art models, especially in remote sensing scene classification.
This study introduces BayesNet as a means to enhance classification performance across four remote sensing datasets.The modified RegNet deep learning model to further improve performance and propose a novel Channel-Spatial Attention module (CSAM) to enhance feature extraction.The performance of the proposed model is evaluated using standard multiclass performance evaluation metrics, and aleatoric and epistemic uncertainty measures are calculated to infer the uncertainty caused by the model and dataset.A test dataset from a drone and publicly available images are utilized to test our proposed BayesNet.The main contributions of this paper can be summarized as follows: 1.
A CNN-based remote sensing scene understanding method called BayesNet is proposed to improve the performance of the classification of four state-of-the-art datasets.
To this end, this is the first time Bayesian CNN, specifically Bayes by backpropagation with variational inference, has been used in remote sensing applications.The previous Bayesian methods were only implemented to process hyperspectral remote sensing images.

2.
The standard convolution layers are then replaced with the Bayes by backpropagation layers to bayesify the neural network.

3.
A novel Channel-Spatial Attention module is proposed to improve the feature extraction of the proposed model.

4.
BayesNet shows very good performance compared to other conventional CNN models.Moreover, the epistemic and aleatoric uncertainty is calculated using the proposed model, which can further improve the robustness of BayesNet in complex scene classification tasks.

Preliminaries
This section presented the background of the core Bayesian CNN models: the Bayesian neural network and its uncertainty quantification method.

Bayesian Neural Network
Conventional neural networks treat weights as fixed, rather than random, variables, operating under the assumption that the precise values of these weights are indeterminable and that data should be approached as a probabilistic process.Bayesian neural networks, on the other hand, infer the unknown weights of the model using data that are already known or observed [22].Utilizing Bayes' Theorem, it is possible to assign a probability distribution to these weights based on the likelihood of observed data, leading to the determination of the posterior distribution of parameters.This approach allows for the specification of a joint probability distribution that reflects the prior knowledge incorporated into the neural networks, which can be defined as follows, where P(w|d) is the posterior probability of the weights (w) given the data (d), P(d|w) is the likelihood of the data given the weights, and P(w) is the prior distribution of the weights.The likelihood of a given dataset in a Bayesian neural network (BNN) can be defined as follows.
where N represents the individual term of a dataset, y n represents the training data label, and x n represents the training data.We can define a function y = f (x) to model the relationship between inputs {x 1 , . . ., x N } and their respective outputs {y 1 , . . ., y N }.By applying Bayesian inference, we can establish a prior probability distribution P( f ) over potential functions, reflecting our initial understanding of which functions might explain our data.To update our beliefs based on observed data, we compute the posterior distribution P( f |X, Y) using Bayes' theorem, enabling the prediction of outcomes for new input x * by considering all possible functions f as follows: where P(y * |x * , X, Y) is the predictive distribution.The predictions about any given set of values can be estimated using this posterior distribution.Predictions are expressed as a probability distribution with regard to the likelihood function, E q (Pd(y * |x * )) = q θ (w|d)P w (y|X)dw, where q θ (w|d) is the variational posterior distribution of the weights, and P w (y|X) is the posterior of the weights for a given dataset.

Bayes by Backpropagation
Bayes by backprop utilizes variational inference to learn about the posterior distribution of weights in a neural network, represented as w ∼ q θ (w|d), where q θ (w|d) is a variational posterior distribution of the weights given the data d, and θ is the parameter defining this distribution.The goal is to find the optimal parameters, θ opt , that minimize the KL divergence between the variational posterior distribution q θ (w|d) and the true posterior distribution P(w|d).This divergence measures the difference between the two distributions, guiding us towards a more accurate approximation of the true posterior.The optimization problem can be expressed as follows: where KL[q θ (w|d)||P(w)] = q θ (w|d) log q θ (w|d) P(w) dw.
The KL divergence, KL[q θ (w|d)||P(w)], is an integral that compares the variational posterior distribution q θ (w|d) to the prior P(w), acting as a complexity cost.The expectation E q(w|d) [log P(d|w)] is the likelihood cost, indicating how well the weights explain the observed data d, without needing to consider log P(d) in optimization as it is constant.
Given the intractability of computing KL divergence directly, we can use a stochastic approach [16] by sampling weights from the variational posterior distribution q θ (w|d) where w (i) is the weights sampled from q θ (w|d), and n is the number of samples drawn.It balances the model's fit to the data (through the likelihood term) against the complexity of the model (through the prior term), optimizing the parameters θ to achieve a model that not only fits the data well but also incorporates uncertainty effectively.

Uncertainty Estimation
Uncertainty estimations are becoming very important for life-critical applications such as autonomous driving, remote sensing imagery, medical images, etc. Bayesian deep learning provides methods to estimate a model's and dataset's uncertainty.Uncertainty estimations can be divided into epistemic uncertainty and aleatoric uncertainty.Both uncertainties account for the variance of the probability distribution over the weights.Aleatoric uncertainty quantifies the noise that accompanies the data.This sort of uncertainty is introduced by the data collection technique, such as measurement noise or mobility noise that is consistent throughout the dataset.This cannot be minimized by collecting more data.On the other side, epistemic uncertainty is a measure of model-induced uncertainty.Epistemic uncertainty can also be reduced by providing more good data to the model.
The main object of using BayesNet for remote sensing scene classification is to estimate predictive distribution P d (y * |x * ) which represents the probability of class, y * , given a new input, x * .The relevant equation can be expressed as follows: It sums over all possible weights w, weighted by their probability given the dataset d.In our scenario, we approximate the posterior distribution of weights using Gaussian variational posterior distribution q θ (w|d) ∼ N w|µ, σ 2 .The parameter θ = {µ, σ} is learned from the dataset d.The predictive distribution is then reformulated as below; Due to the integral complexity, we can estimate it by sampling from variational posterior q θ (w|d) as follows: E q [P d (y * |x * )] = q θ (w|d)P w (y|x)dw where p w t (y * |x * ) is the prediction of the model for the t-th weight sample.We then estimate the predictive variance of these predictions across different weight samples which can be represented as Var q [P(y Var q [P(y * |x * )] estimates the spread of the predictions and thus the uncertainty of the model's output.Therefore, a high variance indicates a wide spread of predictions which lead to higher uncertainty.These variances can be further decomposed into two uncertainties such as aleatoric and epistemic.The aleatoric uncertainty represents the uncertainty in the data such as noise, which can be represented as On the other hand, epistemic uncertainty reflects the uncertainty in the model's parameters which can be from limited dataset or model complexity.It can be represented as follows.

BayesNet for Remote Sensing
Figure 1 presents an overview of our proposed approach for classification and uncertainty estimation using BayesNet.The remote sensing data is divided into training, validation, and testing sets.To improve the quality of the data, a data augmentation method was applied.The augmented data is then used to train and validate the model, resulting in a classified output sample.The aleatoric and epistemic uncertainty using softplus normalization was used to estimate uncertainty.This method provides a measure of the model's reliability by estimating the uncertainty caused by the data and the model itself.Our approach enables the development of more accurate and reliable models for remote sensing data analysis, with the ability to quantify uncertainty to support decision-making in critical applications

Dataset Acquisition
Four remote sensing datasets are used to train our proposed model.These datasets include UCM21 [23], RSSCN7 [24], AID [25], and NWPU45 [26].A detailed description of the implemented dataset can be seen in Table 1.

Data Augmentation
The datasets used in this study are divided into training and testing sets based on the author's instructions.Specifically, the UCM21 [23], RSSCN7 [24], AID [25], and NWPU45 [26] datasets are split into training-testing ratios of 80:20, 50:50, 50:50, and 50:50, respectively.Data augmentation methods such as flip, clip, and perspective are employed to enhance the quantity and variety of samples within a dataset.These techniques not only broaden the scope of the dataset but also act as a form of regularization, aiding in the reduction of overfitting when training models are based on deep learning for classification purposes.The structure of each block includes Bayes by backprop convolution layers (BBConv) and a Channel-Spatial Attention module (CSAM) for efficient feature extraction.The original block is modified by introducing BBConv and a CSAM module.An additional BBConv is added prior to implementing the Softplus activation function.The input is first passed through a 1 × 1 BBConv, followed by group 3 × 3 BBConv.The output is then passed through another two 1 × 1 BBConv.The residual layer is then implemented to add all relevant features before the Softplus activation function is applied for further processing.

BayesNet Model
Overall, the BayeNet model is designed to extract features from remote sensing images effectively by incorporating Bayes by backprop convolution layers and a CSAM attention module into the structure.This modification enhances the model's ability to recognize small images with intricate backgrounds, making it a promising solution for effective feature extraction in remote sensing applications.
Figure 3 introduces the Channel-Spatial Attention Module (CSAM), a novel addition to each block of our model, comprising the Channel Attention Block and the Spatial Attention Block, while foundational models like the Squeeze-and-Excitation (SE) block [27], Frequency Channel Attention Networks (FcaNet) [28], and the Convolution Block Attention Module (CBAM) [29] have significantly influenced CNN architectures by focusing on critical input data areas.However, CSAM is designed to solve the intricate issues of remote sensing image scene understanding, addressing limitations of previous models by simultaneously extracting attention features from both channel and spatial dimensions.This concurrent extraction process ensures a balanced feature representation, avoiding the potential bias of one dimension overshadowing the other.As illustrated in Figure 4, the architectural distinction of CSAM from SE and CBAM is evident, providing an unbiased, comprehensive feature representation that significantly surpasses the traditional, dimensionally constrained approaches.Unlike previous methods, CSAM uniquely incorporates the Discrete Fourier Transform (DFT) within its attention mechanisms, allowing for a transition of input feature maps from the spatial to the frequency domain.This integration not only preserves crucial positional data but also enhances the model's ability to interpret complex remote sensing imagery by considering both low and high-frequency components of feature maps.The inclusion of high-frequency details, often overlooked in conventional methods, is particularly pivotal in discerning subtle, yet critical aspects of remote sensing scenes.As depicted in Figure 4, CSAM's innovative approach to feature extraction and its comprehensive frequency component analysis underscore its superiority over existing models like SE, CBAM, and FcaNet, especially in the field of remote sensing image scene understanding.This highlights the novelty and potential of CSAM in the application of attention mechanisms for remote sensing image processing.
The CSAM module consists of two parts including the Channel and Spatial Attention block.The Channel Attention block focuses on the channel-wise properties of remote sensing images.The input of the Channel Attention block first undergoes a DFT layer before passing through a fully connected (FC) layer, which comprises a 1 × 1 BBConv, batch normalization, and Softplus activation function.The output then passes through another FC layer followed by a 1 × 1 Bayes by backprop Group Convolution layer (BBGConv) with a group size of 4. The sigmoid function is used to send the feature information to the output of the channel attention block. where The Spatial Attention block is responsible for acquiring feature information from a spatial perspective in remote sensing images.Initially, the input undergoes processing via a 3 × 3 BBConv, which is subsequently accompanied by batch normalization and implementation of the Softplus function.The output proceeds through a pair of distinct 3 × 3 BBConv, accompanied by batch normalization and the implementation of a Softplus activation function.The final output passes through another 3 × 3 BBConv, followed by a sigmoid function to aggregate the output.Overall, the Spatial Attention block enhances the BayeNet model's ability to extract relevant features from remote sensing images by selectively highlighting spatially important information.By combining both the Channel Attention block and the Spatial Attention block, the proposed CSAM effectively enhances the feature extraction ability of the BayeNet model for classification in remote sensing images.
The relevant aggregation equation can be seen as follows: where FM output represents the output feature map from CSAM.s 1 is the out from the channel attention block, and s 2 is the output from the spatial attention block.
The Softplus activation function is used instead of the ReLU function because it does not set the model's variance to zero or negative, even if the variance is very close to zero.The relevant equation of the Softplus activation function is defined as follows: where the default value of β is set to 1, but it should be tuned according to the structure of the model to improve its performance of the model.Figure 5 presents a visual comparison of heatmap visualizations, clearly illustrating the superior performance of our proposed CSAM in remote sensing scene classification.In contrast to conventional models, our heatmaps exhibit a pronounced concentration of deep, intense colors precisely in the target feature areas, underscoring the method's precise focus and accuracy in feature recognition.This enhanced concentration is particularly noticeable against the complex backgrounds typically encountered in remote sensing imagery, highlighting the model's capability to filter out irrelevant information and zoom in on the most salient features.Furthermore, the depth and intensity of the colors in our heatmaps are indicative of a robust and discerning attention mechanism, one that confidently pinpoints the defining attributes of each class with remarkable precision.Unlike the broader, more diffuse patterns observed in the heatmaps of other methods, our CSAM effectively defined areas of attention, demonstrating not just a refined feature extraction but also an inherent ability to reduce ambiguity and potential false positives.

Experimental Results
Experiments were conducted on the remote sensing datasets to assess the BayesNet model performance.Several available deep learning models were trained on the same dataset, and their performance was compared based on accuracy, precision, F-1 score, and recall.In addition, we also estimate uncertainty using BayesNet to provide epistemic and aleatoric uncertainty data, which are crucial in remote sensing applications.

Classification Evaluation Metrics Experiments
In order to evaluate the impact of data augmentation on BayesNet's performance, different augmentation strategies were applied and assessed across four datasets, UCM-21, RSSCN7, AID, and NWPU, as detailed in Table 2.The baseline scenario, without augmentation, established initial accuracies for each dataset.The introduction of the Auto-Augment method yielded marginal accuracy improvements across all datasets, with the most notable increase observed in UCM-21.Further experimentation with BayesNet's custom augmentation (Flip, Clip, Perspective) either matched or slightly altered the performance compared to Auto-Augment, demonstrating a consistent accuracy of 99.99% for UCM-21, a slight decrease for RSSCN7, and marginal improvements for AID and NWPU.These results indicate that while both Auto-Augment and BayesNet's tailored augmentation methods offer some benefits over the non-augmented baseline, the overall impact on model performance across the datasets was relatively subtle, suggesting a robust baseline performance and a limited, though positive, influence of the augmentation techniques on the model's accuracy.Table 3 presents the training and testing accuracy of all the deep learning models evaluated in the experiment.The results show that our proposed BayesNet model outperforms commonly used deep learning models in terms of accuracy.Specifically, BayesNet achieved 99.99% accuracy for the UCM21 dataset, 97.30% for the RSSCN7 dataset, 97.57% accuracy for the AID dataset, and 95.44% accuracy for the NWPU dataset.These findings underscore the exceptional performance exhibited by BayesNet for image classification within remote sensing imagery.The proposed model's performance is evaluated against various deep learning methods on the AID dataset from 2019 to 2022 to demonstrate its effectiveness.The comparison is conducted with a 50% training dataset, and overall accuracy is used as the evaluation metric.The proposed BayeNet model achieves an overall accuracy of 97.57%, which is 1.59% higher than VGG-VD16 with SAFF Method and 0.3% higher than MGSNet.A detailed comparison of the overall accuracy of various CNN-based methods is presented in Table 4.The comparison reveals the effectiveness of our proposed model in remote sensing scene understanding tasks.These results demonstrate the superiority of the BayesNet in processing remote sensing scene images.The confusion matrix demonstrates that BayesNet accurately classified most of the images, but there were a few instances where images were incorrectly classified as a different class, which can be seen in Figure 6.This was due to the similar feature characteristics of those classes.For example, two images of churches were misclassified as city centers, likely because both classes contain buildings with similar features.We also observed that two images from the resort class were misclassified as residential due to the similar characteristics found in those images.Despite these misclassifications, the proposed model outperformed other available deep learning models by achieving higher accuracy.Figure 7 shows BayesNet's predictive visualization on the AID dataset, depicting its proficiency across ten distinct classes of images spanning various landforms and urban structures.BayesNet can accurately identify intricate details of an 'Airport,' the sparse features of 'Bareland,' and the organized patterns of 'Farmland.'It further demonstrates high efficiency in recognizing the congested layout of 'Dense Residential' areas, the uniformity of 'Desert' landscapes, and the intricate complexity of urban 'Center' regions.Additionally, BayesNet's capability extends to accurately classifying 'Commercial' areas with their distinct structural elements, precisely pinpointing 'Bridge' structures over water bodies, recognizing the unique architectural features of 'Churches,' and discerning the natural interface in 'Beach' images.These results collectively underline BayesNet's remarkable versatility and accuracy in interpreting a diverse spectrum of natural and urban imagery within the AID dataset, affirming its robust applicability in complex scene classification tasks.

Performance Evaluation on the RSSCN7 Dataset
The performance of the BayesNet model was compared to other CNN-based methods using the RSSCN7 dataset from 2019 to 2022.The evaluation of the models was carried out based on overall accuracy, and the results showed that our proposed model achieved an overall accuracy of 97.30%, which was comparable to other state-of-the-art models.As detailed in Table 5, while BayesNet did not surpass the Channel Multi-Group Fusion method, it notably outperformed others like Branch Feature Fusion.This competitive edge highlights the efficacy of BayesNet's distinct features, including the Channel-Spatial Attention Module and Bayes by backprop convolution layers, in enhancing CNN-based models.The results not only attest to the model's substantial potential in remote sensing image classification but also highlight its stature among state-of-the-art counterparts, confirming the significant impact of the proposed architectural enhancements.Figure 8 presents the confusion matrix for BayesNet applied to the RSSCN7 dataset, effectively capturing the model's classification performance across diverse scene classes.The confusion matrix reveals a high accuracy rate, with notable exceptions in the grass and river-lake categories, where minor misclassifications occur.Specifically, similarities in visual characteristics led to the mislabeling of two grass images as fields and two river-lake images as industry.Despite these isolated instances of confusion, the performance of BayesNet remains commendable, showcasing a marked improvement over preceding deep learning models and reinforcing its proficiency and reliability in remote sensing scene classification tasks.Figure 9 illustrates BayesNet's robust predictive performance on the RSSCN7 dataset, highlighting its robustness across a spectrum of seven distinct image classes: Grass, Field, River Lake, Forest, Parking, Industry, and Resident.The model's precision is evident in its accurate identification of Grass and Field images, demonstrating its ability to differentiate between natural landscapes.Similarly, its accurate categorization of River Lake and Forest images showcases an effective understanding of aquatic features and complex vegetative patterns.In urban and industrial contexts, BayesNet's proficiency is equally evident, correctly recognizing Parking and Industry images by determining structured urban designs and complex industrial settings, and accurately predicting Resident areas, highlighting its capability to interpret urban residential layouts.

Performance Evalaution on the NWPU45 Dataset
Bayesnet efficacy in remote sensing scene understanding is evaluated through a comparative analysis on the NWPU dataset, covering developments from 2019 to 2022.Demonstrating a high accuracy of 95.44%, BayesNet not only showcases superiority over established CNN-based methods like Channel Multi-Group Fusion and Multi-Level Fusion Network but also quantifiably exceeds their performance by 1.26% and 0.54%, respectively.These comparative results, detailed in Table 6, highlight BayesNet's notable ability in classifying complex remote sensing scenes, marking a significant advancement in the field and underscoring its potential as a leading solution for intricate image analysis tasks.[63] 2020 93.55 Coutourlet CNN [58] 2021 89.57Channel Multi-Group Fusion [49] 2022 94.18 Multi-Level Fusion Network [53] 2022 94.90 MGSNet [54] 2023 94.57BayesNet -95.44 Figure 10 presents the confusion matrix of BayesNet when applied to the NWPU dataset, illustrating its performance in accurately classifying the majority of remote sensing scenes into their respective categories.Despite its generally strong performance, BayesNet indicates certain limitations, particularly in distinguishing between classes with shared attributes.A notable instance of this is the misclassification of church images as palaces, a mix-up likely stemming from their similar architectural styles and visual semblances.This specific confusion underscores a potential area for enhancement, signaling an avenue for future investigations to refine the model's discriminative capabilities, especially for classes with closely resembling features.Addressing these nuances could further elevate the precision of BayesNet, fortifying its application in complex remote sensing scene classification tasks.
Figure 11 presents the application of the BayesNet method to the NWPU dataset, focusing on ten distinct image classes: Thermal Power Station, Basketball Court, Chaparral, Airplane, Basketball Court, Airport, Bridge, Baseball Diamond, Beach, and Church.BayesNet distinctly identifies the complex industrial layout of the Thermal Power Station and the unique court markings of the Basketball Court, reflecting its acute sensitivity to specific features.Similarly, the method adeptly distinguishes the dense vegetation of the Chaparral and the distinct aerodynamic structure of the Airplane, showcasing its balanced proficiency in recognizing both natural terrains and engineered artifacts.The precise categorization extends to the systematic expanse of the Airport, the architectural marvel of the Bridge, the simplicity of the Baseball Diamond, and the natural confluence depicted in the Beach image, emphasizing BayesNet's comprehensive understanding of diverse scenes.Additionally, BayesNet's ability to accurately identify the distinctive architectural style of the Church further attests to its robust recognition capabilities.

Performance Evalaution on the UCM21 Dataset
The proposed model's performance is thoroughly evaluated by comparing it with a range of deep learning techniques on the UCM21 dataset, covering the period from 2019 to 2022.This comprehensive comparison detailed in Table 7 showcases the model's exceptional accuracy of 99.99%, which indicates its superior performance over other CNN methods.This result not only highlights the model's precision in classifying remote sensing scenes but also cements its status as a benchmark in the field.The in-depth comparison articulated in the table provides crucial insights into the relative performance of competing methods, further accentuating the BayesNet model's dominance.Its unparalleled accuracy on the UCM21 dataset validates the model's robustness and underscores its substantial potential for practical deployment in diverse remote sensing scenarios.

Method Year Overall Accuracy
Skip connected covariance network [39] 2019 97.98 Feature aggregation CNN [37] 2019 98.81 Aggregated Deep Fisher Feature [38] 2019 98.81 Scale-Free Network [64] 2019 99.05 SE-MDPMNet [55] 2019 99.09Multiple resolution BlockFeature method [65] 2019 94.19 Branch Feature Fusion [42] 2020 99.29 Gated Bidirectional Network with global feature [43] 2020 98.57Positional Context Aggregation [56] 2020 99.21 Feature Variable Significance Learning [57] 2020 98.56 Deep Discriminative Representation Learning [44] 2020 99.05 ResNet50 with transfer learning [41] 2020 98.76 VGG-VD16 with SAFF [46] 2021 97.02 Coutourlet CNN [58] 2021 99.25 EfficientNetB3 [47] 2021 99.21 Channel Multi-Group Fusion [49] 2022 99.52 Inception-ResNet-v2 [66] 2023 99.05 MGSNet [54] 2023 99.76 BayesNet -99.99 Figure 12 shows the confusion matrix for the UCM21 dataset when utilizing the proposed model.The matrix clearly illustrates the model's exceptional performance in classifying all classes accurately.This outcome emphasizes the effectiveness of the proposed model when applied to remote sensing images, highlighting its ability to correctly identify various scene categories within the dataset.By successfully classifying all classes within the UCM21 dataset, the model establishes itself as a powerful tool for remote sensing scene understanding tasks, showcasing its reliability and precision when handling complex image data.Figure 13 showcases the BayesNet method's application to the UCM21 dataset, effectively distinguishing between ten varied image classes: Beach, Intersection, Forest, Buildings, Golf Course, Agriculture, Freeway, Baseball Diamond, Dense Residential, and Airplane.BayesNet effectively identifies the Beach, capturing the distinctive features of coastal landscapes, and accurately recognizes the complex urban layout in the Intersection image.Its ability to differentiate diverse natural landscapes is evident in its precise classification of both the densely vegetated Forest and the structured farmlands in the Agriculture image.BayesNet's robustness in urban scene interpretation is further highlighted by its successful identification of Buildings, Freeway, and Dense Residential areas.The method's precision extends to specialized human-made structures, accurately recognizing the Golf Course and Baseball Diamond.Additionally, its capability to discern individual objects within their environments is underscored by the accurate prediction of the Airplane image.

Model Evaluation for Constraint Environment
The performance of the Bayesnet was assessed under various real-world noise conditions, such as rotation and cropping.The prediction probabilities generated by BayesNet for images subjected to these constraints are illustrated in Figure 14 (cropped images) and Figure 15 (rotated images).The model's robustness and adaptability under these challenging conditions are demonstrated by its ability to maintain high prediction accuracy.
For cropped images, as depicted in Figure 14, the predicted accuracy of BayesNet exceeds 99%, showcasing its resilience against this particular constraint.On the other hand, the prediction accuracy for rotated images is even higher, as evidenced in Figure 15.Despite these alterations, our proposed model effectively classifies constrained remote sensing imagery, highlighting its applicability across various constraint environments.This adaptability confirms the model's potential for detecting and classifying remote scenes under a wide range of conditions, making it a valuable tool for real-world remote sensing applications.Figure 16 illustrates the results of the proposed model's predictions for partially overlapping images.The model accurately predicted all instances of randomly overlapped images, with the lowest accuracy acquired in Figure 16c at a score of 0.999, showcasing its robustness and adaptability in dealing with partially overlapping scenes.This capability highlights the model's potential for practical applications in remote sensing and surveillance, where such scenarios are common.Its consistent high prediction accuracy in the presence of partial overlaps underscores its suitability for a wide range of remote sensing tasks.The study evaluated the proposed model's prediction performance on noisy images, crucial for its real-world applicability.In Figure 17, the model's ability to maintain high accuracy in challenging conditions is demonstrated through predictions on various noisy images with different types of introduced noise.The model accurately classified all beachclass images, with the lowest accuracy score of 0.999 observed for Figure 17c.These results highlight the model's robustness in handling diverse noise types and constrained real-world environments, highlighting its suitability for practical applications.

Uncertainty Estimation Using BayesNet
Uncertainty estimation plays a vital role in evaluating a network's confidence in its predictions, particularly in the context of remote sensing image classification.Assessing uncertainty is an essential parameter that can be inherently measured by Bayesian methodologies alongside prediction performance, offering valuable insights into the reliability of the classification process.
Figure 18 presents a comparative analysis of the uncertainty estimation capabilities of BayesNet on the NWPU and MNIST [67] datasets, employing epistemic and aleatoric uncertainty measures through softmax and normalization approaches.Notably, the same BayesNet model was applied to the MNIST dataset without prior training, emphasizing its adaptability.The analysis revealed that the normalization method outperformed softmax in uncertainty quantification.Specifically, BayesNet exhibited minimal epistemic uncertainty for the NWPU dataset, approaching zero, while showing a significantly higher uncertainty for the untrained MNIST dataset, reflecting its sensitivity to unfamiliar data.Furthermore, the study introduced noise into the test images to mimic real-world data conditions, leading to a significant increase in aleatoric uncertainty for both datasets.This heightened uncertainty under noisy scenarios underscores the model's ability to recognize and quantify the impact of data corruption, showcasing the practical applicability of BayesNet in handling real-world, noisy remote sensing data.Figure 19 provides an in-depth analysis of the uncertainty estimation of BayesNet applied to two distinct datasets, namely, AID and MNIST datasets.Utilizing these datasets, both the epistemic and aleatoric uncertainties were computed via two distinct methodologies: the softmax approach and normalization.BayesNet model was applied to the MNIST dataset without prior training.The analysis underscored the normalization method's superior efficacy over softmax in estimating uncertainties.Specifically, BayesNet exhibited significantly low epistemic uncertainty for the AID dataset, nearly zero, indicating high model confidence, whereas it registered a notably higher uncertainty for the untrained MNIST dataset, reflecting the model's sensitivity to unfamiliar data structures.Additionally, the assessment of aleatoric uncertainty across both datasets revealed that the normalization approach comprehensively quantified uncertainty, highlighting its robustness in diverse classification scenarios.This thorough analysis demonstrates BayesNet's capability in effectively discerning and quantifying uncertainties, positioning it as a reliable tool for complex dataset analyses.Figure 20 presents a comprehensive analysis of uncertainty estimation conducted using BayesNet on the UCM21 dataset, in addition to the MNIST dataset.Both epistemic and aleatoric uncertainties were calculated via two methods: normalization and softmax.The same BayesNet model, with no prior training on MNIST data, was employed to perform this analysis.As depicted in Figure 20, the normalization method demonstrated a superior performance in estimating uncertainties compared to its softmax counterpart.BayesNet demonstrated remarkably low epistemic uncertainty for the UCM21 dataset, nearing zero, indicating high confidence, while the MNIST dataset presented a markedly higher level of epistemic uncertainty.Furthermore, the assessment of aleatoric uncertainty, conducted for both datasets using normalization and softmax approaches, revealed that normalization offered a more comprehensive uncertainty measure across different classes.This analysis not only affirms BayesNet's broad applicability across varied datasets but also emphasizes the normalization method's efficacy over softmax in delivering accurate uncertainty estimations, thereby enhancing the robustness and reliability of BayesNet in complex, real-world dataset applications like UCM21.
Figure 21 offers a detailed analysis of uncertainty estimation, applying BayesNet on two distinct datasets-the RSSCN7 dataset and the MNIST dataset.The analysis focuses on the computation of both epistemic and aleatoric uncertainties, utilizing two different techniques: normalization and softmax.The same BayesNet model, untrained on MNIST, was utilized, revealing the normalization method's superior capability in estimating uncertainties over softmax, as depicted in Figure 21.Notably, the model registered minimal epistemic uncertainty for the RSSCN7 dataset, almost zero, indicating robust model confidence, whereas the MNIST dataset exhibited considerably higher epistemic uncertainty.Additionally, the analysis of aleatoric uncertainty for both datasets, through normalization and softmax, demonstrated that the normalization method offers a more comprehensive and nuanced measure of aleatoric uncertainty across different classes, underscoring the effectiveness and adaptability of BayesNet in uncertainty estimation for diverse datasets.

Ablation Study
The primary objective of this section is to demonstrate the effectiveness of the proposed method on various datasets.Table 8 presents an ablation study of the proposed model, which examines the impact of removing different components of BayesNet.The study methodically evaluates the influence of distinct components: the original RegNet architecture, the Bayesian approach via Bayes by backpropagation, the enhanced Bayes Block, and the Channel-Spatial Attention Module (CSAM).The results reveal that BayesNet, with its integrated modifications, registers substantial performance enhancements, with accuracy gains of 2.07%, 2.97%, 2.96%, and 0.41% over the baseline RegNet model.Additionally, the integration of CSAM into RegNet also demonstrates significant improvements, outperforming both the original and Bayesian-enhanced models.These findings highlight BayesNet's superior capability in feature extraction and its robustness in remote sensing image classification, showcasing its improvement beyond the original model's performance and affirming its potential in complex image analysis tasks.

Discussion
BayesNet was further evaluated on a newly proposed remote sensing testing dataset, which included 400 high-resolution images across four distinct classes: Airport, Baseball Field, Port, and Railway Station.The dataset was prepared by combining drone images and publicly available images.The flight duration for the drone reached approximately 15 min, and it was outfitted with onboard cameras featuring CMOS size, lenses, and focal lengths.Moreover, it was equipped with a standard RGB CMOS sensor, capable of capturing images with an effective resolution of 12.4 million pixels.A few samples from the proposed test dataset can be found in Figure 22.This dataset was specifically designed to test the model with a diverse range of land cover and usage scenarios, thereby providing a robust platform for testing.Initially, BayesNet, along with several existing models, was trained on the well-established AID dataset, ensuring a comprehensive and rigorous learning phase.The quantitative results of BayesNet on the proposed test dataset are summarized in Table 9.The comparative analysis highlights the superior performance of our BayesNet model, which achieved a testing accuracy of 96.39%.This not only highlights the robustness and generalizability of BayesNet but also emphasizes the potential and efficacy of probabilistic deep learning frameworks within the domain of remote sensing image analysis.Furthermore, the visualization of predictions on the testing dataset, as depicted in Figure 23, provides convincing evidence of BayesNet's effectiveness on unseen data.These visual results and quantitative data demonstrate BayesNet's ability to handle diverse and complex imagery, reinforcing the model's suitability for complex remote sensing applications.9.The comparison of testing accuracy on our proposed dataset with AID dataset trained model.

Method Accuracy
AlexNet [30] 89.32 VGG16 [31] 92.79 GoogLeNet [32] 91.46 ResNet50 [33] 94.57 ViT [34] 95.35 RegNet [35] 94.88 BayesNet 96.39 Table 10 offers a comprehensive comparative analysis of seven deep learning models, including AlexNet, VGG16, GoogleNet, ResNet50, VIT, Bayesian RegNet, and BayesNet, focusing on their number of parameters (M) and computational complexity (G).The analysis reveals a range of computational demands and scalability, with AlexNet being the least complex at 0.715 G and 61.1 M parameters, and GoogleNet following with a moderate 1.51 G complexity and 13.0 M parameters.ResNet50 and VGG-VD-16 show a marked increase in complexity and parameter count, with ResNet50 at 4.12 G and 25.56 M parameters, and VGG-VD-16 at a substantial 15.5 G and 138.36 M parameters.The VIT model, while having a high parameter count of 306.54 M, maintains a complexity level similar to VGG-VD-16 at 15.39 G.The Bayesian RegNet and BayesNet models, however, stand out with the highest parameters and complexity among the models analyzed, with Bayesian RegNet at 92.3 G and 900.3 M parameters, and BayesNet peaking with 949.85 M parameters and 93.1 G complexity.Despite the noticeable computational intensity, BayesNet's performance, characterized by its superior accuracy and efficiency, validates its computational demands, positioning it as a high-performance model suitable for advanced applications where the trade-off for computational resources is justified by significant performance With ongoing in computational hardware, the initially daunting complexity of BayesNet becomes increasingly manageable, highlighting its potential as a formidable model in cutting-edge machine learning applications.

Conclusions
This study proposed an uncertainty-aware remote sensing image understanding method using the Bayesian CNN model.The background of the Bayesian CNN model using the Bayes by backpropagation method and uncertainty estimation are discussed to show the potential of Bayesian CNN in classification.We propose BayesNet by introducing a Bayes by backprop convolution block and CSAM to improve BayesNet performance.BayesNet is implemented on the UAV-based remote sensing datasets such as UCM21, NWPU, AID, and RSSCN7 to train the proposed model.BayesNet provides impressive performance with other deep learning models on all performance evaluation metrics.The uncertainty estimations using the normalized and SoftMax methods are also calculated to estimate the aleatoric and epistemic uncertainty.The results show that the normalized uncertainty estimation can project uncertainty better than the SoftMax method.
This study enhances our understanding of uncertainties associated with Bayesian-CNN-based classification, facilitating the utilization of these uncertainties to improve model performance.Although more research on different datasets is necessary, these results suggest Bayesian CNN can enhance the classifier for various remote sensing data.Moreover, the Bayesian CNN model has higher computing costs than the conventional CNN model.More research is needed on reducing computing costs and increasing the model performance before it can be used in real-world scenarios.

Figure 1 .
Figure 1.The overall architecture of the proposed remote sensing image scene understanding classification system based on BayesNet.

Figure 2
Figure2shows the overall architecture of the BayeNet model, which comprises three parts: stem, body, and block.The body consists of four stages, and each stage consists of i blocks, with each block containing several convolution layers and an attention network.

Figure 2 .
Figure 2. The overall structure of the BayesNet model, which consists of stem, body, and head parts.

Figure 3 .Figure 4 .
Figure 3.The architecture of the CSAM block from the individual block, showcasing its detailed structure and components.

Figure 5 .
Figure 5.The visual comparison of different methods using GradCam.

Figure 7 .
Figure 7. Visualization of the efficient classification results of the BayesNet on the AID dataset.

Figure 9 .
Figure 9. Illustration of the robust classification performance of the BayesNet using the RSSCN7 dataset.

Figure 11 .
Figure 11.Visualization of the efficient classification results of the BayesNet on the NWPU dataset.

Figure 13 .
Figure 13.Demonstration of the distinctive classification capabilities of the proposed BayesNet on UCM21 dataset.

Figure 14 .
Figure 14.The predictions of BayesNet on the cropped images.Cropped images were used to evaluate the performance of the BayesNet.(a) original image, (b) prediction on cropped top left of the original image, (c) prediction on cropped top right of the original image, (d) prediction on cropped bottom left of the image, (e) prediction on cropped bottom right of the original image.

Figure 15 .
Figure 15.The predictions of BayesNet on the rotated images.Different rotated angle images were used to evaluate the performance of the BayesNet.(a) original image, (b) prediction on 90 degrees rotated image, (c) prediction on 180 degrees rotated image, (d) prediction on 270 degrees rotated image, (e) prediction on 360 degrees rotated images.

Figure 16 .
Figure 16.The predictions of BayesNet on the partially overlapping images to evaluate the performance of the BayesNet in constrained environments.(a,b) prediction of beach class partially overlapped on park image; (c,d) prediction of beach class partially overlapped on sea image.

Figure 17 .
Figure 17.The predictions of BayesNet on the noisy images, where black screen and random line noise were utilized to evaluate the performance of the BayesNet.(a) prediction of beach class with black line noise; (b) prediction of beach class with left aligned black screen noise; (c) prediction of beach class with right aligned black screen noise; (d) prediction of beach class with blue line noise.

Figure 18 .
Figure 18.The normalized and SoftMax aleatoric and epistemic uncertainty estimation for NWPU dataset using BayesNet.

Figure 19 .
Figure 19.Normalized and SoftMax-processed estimation of aleatoric and epistemic uncertainty for the AID dataset, utilizing BayesNet.

Figure 22 .
Figure 22.Sample images from the testing dataset to evaluate the performance of BayesNet on unseen data.

Figure 23 .
Figure 23.Prediction visualization using BayesNet on our introduced remote sensing testing dataset which comprises complex backgrounds.

Table 1 .
The detailed description of four remote sensing scene image datasets used in this study.

Table 3 .
The training and testing accuracy of the BayesNet and the different deep learning models.

Table 4 .
Comparative analysis of proposed models' overall accuracy on AID dataset (50:50) over the last few years.

Table 5 .
Comparative analysis of proposed models' overall accuracy on RSSCN7 dataset (50:50) over the last few years.

Table 6 .
Comparative analysis of proposed models' overall accuracy on NWPU45 (20:80) dataset over the last few years.

Table 7 .
Comparative analysis of proposed models' overall accuracy on UCM21 (80:20) dataset over the last few years.

Table 8 .
The ablation study of the proposed model on four remote sensing datasets.

Table 10 .
Comparison of parameters and complexity of different deep learning models implemented in this study.