Land-Cover Classification Using Deep Learning with High-Resolution Remote-Sensing Imagery

: Land-area classification (LAC) research offers a promising avenue to address the intricacies of urban planning, agricultural zoning, and environmental monitoring, with a specific focus on urban areas and their complex land usage patterns. The potential of LAC research is significantly propelled by advancements in high-resolution satellite imagery and machine learning strategies, particularly the use of convolutional neural networks (CNNs). Accurate LAC is paramount for informed urban development and effective land management. Traditional remote-sensing methods encounter limitations in precisely classifying dynamic and complex urban land areas. Therefore, in this study, we investigated the application of transfer learning with Inception-v3 and DenseNet121 architectures to establish a reliable LAC system for identifying urban land use classes. Leveraging transfer learning with these models provided distinct advantages, as it allows the LAC system to benefit from pre-trained features on large datasets, enhancing model generalization and performance compared to starting from scratch. Transfer learning also facilitates the effective utilization of limited labeled data for fine-tuning, making it a valuable strategy for optimizing model accuracy in complex urban land classification tasks. Moreover, we strategically employ fine-tuned versions of Inception-v3 and DenseNet121 networks, emphasizing the transformative impact of these architectures. The fine-tuning process enables the model to leverage pre-existing knowledge from extensive datasets, enhancing its adaptability to the intricacies of LC classification. By aligning with these advanced techniques, our research not only contributes to the evolution of remote-sensing methodologies but also underscores the paramount importance of incorporating cutting-edge methodologies, such as fine-tuning and the use of specific network architectures, in the continual enhancement of LC classification systems. Through experiments conducted on the UC-Merced_LandUse dataset, we demonstrate the effectiveness of our approach, achieving remarkable results, including 92% accuracy , 93% recall , 92% precision , and a 92% F 1-score . Moreover, employing heatmap analysis further elucidates the decision-making process of the models, providing insights into the classification mechanism. The successful application of CNNs in LAC, coupled with heatmap analysis, opens promising avenues for enhanced urban planning, agricultural zoning, and environmental monitoring through more accurate and automated land-area classification.


Introduction
The utilization of remote-sensing (RS) imagery for land-cover (LC) classification is of paramount importance across various domains, encompassing environmental protection, agriculture and urban planning, and land resource management [1].Recent accessibility to high-resolution remote-sensing (HRRS) images and the ability to gather multi-temporal and multi-source RS images from diverse geographic regions [2] present new opportunities for multiple-time-scale LC classification.Nevertheless, the complex features visible in HRRS images, such as geometrical and object structures, introduce new challenges for classification [3].Variability in photographic distortions, scale variations, and illumination changes in RS images make it difficult to apply existing models to diverse HRRS images effectively [4].Furthermore, spectral and spectral-spatial (SS) features have traditionally been employed in the literature to interpret RS images for land-cover classification [5].However, these features struggle to capture the contextual information in HRRS images due to increased spatial resolution [6].To address these challenges, deep CNNs have gained attention for their ability to comprehend HRRS images [7] and represent semantic and high-level image properties [8,9], such as scene classification [10], object detection [11], image retrieval [12], as well as LC classification [13].However, two significant issues arise when applying deep models to LC classification with multi-source HRRS data: insufficient transferability and the lack of an extensive, well-annotated land-cover dataset.The challenges in applying land-cover classification with multi-source images include issues of insufficient transferability, where models struggle to adapt effectively to diverse dataset images, and the absence of extensive well-annotated land-cover dataset images.
To address these issues, therefore, this study introduces an efficient and robust LAC system that harnesses fine-tuned versions of Inception-v3 and DenseNet networks, leveraging transfer learning.The primary objectives include enhancing LC classification performance through the contextual understanding of deep CNNs and conducting a comparative analysis to assess the system's advancements and competitive edge over existing models in the field of satellite sensing imagery interpretation.This approach addresses the limitations posed by the diversity and complexity of HRRS images, ultimately contributing to improved land-cover classification in remote-sensing applications.Various CNN models, including Inception-v [14], DenseNet201 [8], and ResNet-50 [15], have been trained on a labeled dataset.
Recent research highlights innovative applications of deep learning, especially (CNNs) in the unmanned aerial vehicle (UAV) domain.For instance, Chen, Wang, and Zhang's work [16] combines CNNs and cooperative spectrum detection for unauthorized drone identification.Haq et al. [17] employ a stacked auto-encoder deep learning approach to achieve accurate forest area assessment using UAV-captured images applicable to forest management.Kawaguchi, Nakamura, and Hadama et al.'s research [18] leverages CNNs to identify diverse drone types, achieving over 90% accuracy in recognition and showcasing the prowess of CNNs in drone identification across various models and shapes, including radio-controlled flying objects [19].
Recent studies have showcased the diverse applications of deep learning in the fields of image classification and object detection.Chehreh, Moutinho, and Viegas [20] introduced a classification method for rotary-wing unmanned aerial vehicles and birds, enhancing image classification accuracy notably with the CDNTS layer.Youme and colleagues [21] presented an automated approach for detecting hidden waste dumps in Senegal, employing single-shot detector techniques for feature extraction, though facing challenges in regions with imprecise ground truths.Genze and team [22] explored deep learning's generalization capabilities for early weed detection in sorghum fields, creating expert-curated datasets and achieving strong model generalization with an F1-score exceeding 89% on testing data, even in challenging conditions, including degraded captures with motion blur and occluded plants, surpassing existing research.
These research studies highlight the application of deep learning and machine learning techniques for various tasks in the context of unmanned aerial vehicles (UAVs) and remote sensing.Shahi and colleagues [23] focus on the identification of crop diseases using UAV-based remote sensing, emphasizing the role of image processing methods, assessing the effectiveness of ML and DL techniques, and exploring future research directions in UAV-based crop disease detection and classification [24,25].Behera, Bakshi, and Sa [26,27] present lightweight CNN architectures for real-time segmentation and object extraction on IoT edge devices, achieving high performance on datasets suitable for urban and agricultural mapping, as well as road damage detection using YOLO algorithms.Shanthi and team [28] discuss the use of face identification algorithms in drones for security, identity verification, disaster relief, and more, aiming to aid technologists in developing hybrid algorithms for real-time face recognition across diverse scenarios.These studies demonstrate the versatility of deep learning and UAV-based applications in diverse domains.
Aydin and Singha [29] presented YOLOv5, a one-shot detector trained with augmented data and pre-trained weights, achieving a 90.40% mean average precision.This represents a significant 21.57% improvement over the previous YOLOv4 model when tested on the same dataset.Yao et al. [30] investigated split learning in IoD networks via simulations.Findings reveal that separation levels have minimal impact on accuracy, increasing clientside layers extends training time, communication overhead is a major bottleneck, client numbers insignificantly affect accuracy, and training time slightly rises with more clients.
The methods described in the literature offer advancements in deep learning for drone image classification, but they are subject to various limitations.These include challenges related to data diversity, labeling accuracy, real-time processing, adverse conditions, privacy, benchmarking against existing methods, real-world deployment, and model interpretability.
To mitigate these limitations, we reduced bias and domain gap through fine-tuning and domain adaptation techniques, considering a more diverse and representative target dataset, and developing model interpretability tools specific to the transfer learning context.The main findings of this study are as follows: • The primary objective of this study is the development of a highly efficient and reli- able land-cover (LC) classification system.To achieve this, we strategically employ fine-tuned versions of Inception-v3 and DenseNet networks, emphasizing the transformative impact of transfer learning.The fine-tuning process enables the model to leverage pre-existing knowledge from extensive datasets, enhancing its adaptability to the intricacies of LC classification.This approach increases both the efficiency and reliability of the model, underscoring the vital role of transfer learning methodologies in optimizing the performance of our LC classification system.By aligning with these advanced techniques, our research not only contributes to the evolution of remote-sensing methodologies but also underscores the paramount importance of incorporating cutting-edge methodologies, such as fine-tuning, in the continual enhancement of LC classification systems.

•
By enhancing classification accuracy through the precise characterization of contextual information in high-resolution remote-sensing (HRRS) images, we demonstrate the significance of leveraging the powerful capabilities inherent in deep convolutional neural networks (CNNs).The fine-tuned networks play a crucial role in capturing intricate features, contributing to the system's heightened accuracy and reliability in LC classification.• A detailed comparative analysis is conducted to evaluate our land-cover classification system against state-of-the-art models within the field of satellite sensing imagery interpretation.This comprehensive assessment scrutinizes the system's performance, accuracy, and efficiency in comparison to the most advanced solutions, thereby highlighting its significant advancements and competitive edge within the domain of land-cover classification.Notably, the fine-tuned models achieved baseline accuracy comparatively.
The structure of the remaining sections of this work is as follows.The land-area classification system's comprehensive methodology is explained in Section 3. Section 4 reports experimental results and comparative analysis with baselines, and finally, Section 5 concludes the article.

Proposed Methodology
The proposed methodology for land-cover classification is discussed in this section, which mainly includes data preprocessing and model training and evaluations, as shown in Figure 1.In this study, we use different deep learning models for fine-tuning and transfer learning, optimizing critical hyper-parameters like batch size, activation function, and learning rate.The utilization of optimizers assists in adjusting the learning rate within the neural models to enhance performance.All these steps are briefly explained in the subsequent sections.

Preprocessing
In the initial phase of the research study, data is collected, typically captured with mobile device cameras or other vision sensors that record information crucial to influencing the output of subsequent deep learning (DL) and machine learning (ML) models.Following this data collection, the preprocessing stage becomes paramount, involving various techniques such as resizing, noise reduction, and data augmentation to enhance the quality of images or videos.As a foundational step in our investigation, we employ data normalization, a process illustrated mathematically in Equation ( 1), where each pixel value is standardized to the range of [−1, 1].
In this context, the representation of the initial data is denoted as I, where I min and I max represent the minimum and maximum values within the input data, respectively.To standardize the data input, "R i " is introduced.Before the training, all images underwent resizing to dimensions of 224 × 224 × 3. Additionally, various data transformations were applied to the input images, encompassing operations such as flipping, 15 • rotation, and 0.2 zooming.Introducing data augmentation, as depicted in Figure 2, played a pivotal role in enhancing dataset variability, effectively mitigating overfitting and bolstering the model's accuracy and generalization capabilities.

Convolutional Neural Networks
The convolutional neural network (CNN) was introduced by [31] in the late 1980s and contains three main key layers: convolution, pooling, and fully connected.Convolutional layers are used to extract features from input data by placing the kernel and the input image through a convolutional process to extract features.The feature map is created by sliding a small 5 × 5 × 3 rectangular matrices over the image, which is known as the kernel.The pooling layer is used to condense the feature size, which is divided into average, maximum, and minimum pooling.In the field of computer vision (CV), CNN has recently achieved higher performance in comparison to conventional machine learning methods.Several CNN-based architectures are developed for image classification, and each has its own types of pros and cons.For instance, some of these models are effective but computationally expensive, while some are lightweight and resource friendly.After a thorough analysis of recent models, we choose Inception-v3 and DenseNet121 for LAC.These models have a balance between effectiveness and computational efficiency, which makes them suitable for accurate and concise land-cover classification.The proposed diagram of fully connected layers is given in Figure 3.

Inception-V3 Model
An upgraded variety of the Inception-v1 [31] model is called the Inception-v3 model [31].For greater model adaptability, the network is optimized in several ways by the Inception-v3 model.Its network is larger than that of the Inception-v1 and v2 models.It is a lowconfiguration computer that was used to train a deep CNN model.Training can be difficult and time-consuming; it may even take several days at times.This issue is resolved by transfer learning, which preserves the model's last layer for application to new categories.The 48 layers of the Inception-V3 model comprise newly added layers based on our dataset classes, and all the model's upper layers are frozen, as shown in Figure 4.Moreover, complex DL problems benefit from the use of a model such as Inception-V3.Other Deep learning models employ a basic stack of convolutional layers, pooling layers, and fully connected layers, like Vgg16, Vgg19, and AlexNet, among others.The Inception-V3 models apply 1 × 1 convolutions, also referred to as point-wise convolutions, and then apply convolutional layers with varying kernel sizes simultaneously.This results in an increased number of hidden layers.This means that Inception-V3 models perform better for complex tasks or extracting features.For more complicated issues, Inception is utilized because it enables it to learn more complex features.

DenseNet121 Model
A deep learning architecture called DenseNet121 [32] was revealed in 2017 for use in computer vision and image categorization applications.It is distinguished by its deep connectivity, in which every layer has a direct connection to every layer that comes before it and after it, encouraging feature reuse and gradient flow across the network.It employs bottleneck and transition layers to regulate the number of parameters and geographical dimensions to manage model complexity.This architecture, shown in Figure 5, has gained popularity in the computer vision field due to its efficiency and leading-edge performance, which address important issues with deep neural networks and produce amazing results in image classification applications.

ResNet-50 Model
A deep neural network architecture called ResNet-50 [33], or residual network, was created to overcome the difficulties involved in training extremely deep networks; the architecture is shown in Figure 6.Since its introduction in a 2015 study, it has developed into a key model in the domains of computer vision and deep learning.ResNet-50 uses residual blocks with skip connections to achieve both performance and depth.The vanishing gradient issue, which frequently impedes training in deep networks, is lessened by these connections, which allow the gradient to move through the network more efficiently.
ResNet-50 models are, therefore, capable of becoming extremely deep, with hundreds of layers, which enhances their accuracy and efficiency for many uses for computer vision, such as image classification, object identification, and segmentation.ResNet-50 topologies are available at different depths, with the total number of layers indicated by numbers such as ResNet-18, ResNet-50, and so forth.Because of their novel approach to residual connections, which has been widely adopted and expanded to other domains outside image recognition, these models have had a significant influence on the evolution of deep neural networks.The success of deep learning has been greatly aided by the ResNet-50 concept, which has made it possible to create deep and highly accurate models that remain at the front of ML and AI research.

Fine-Tuning and Transfer Learning
This section outlines the steps involved in training and refining our models.Initially, pre-trained weights from the extensive ImageNet dataset, comprising 14 million images categorized into a thousand classes [34], are utilized.The Keras library facilitates the importation of these weights, accelerating convergence through the incorporation of previously learned features and improving image recognition performance.The transfer learning approach leverages the benefits of ImageNet weights, specifically tailored for image classification.This method expedites tasks by requiring less effort compared to the utilization of randomly initialized weights.Fine-tuning is then performed on the UCMerced_LandUse dataset images, focusing on adjusting the final layers of the base model while freezing all other layers to retain the initial UCMerced_LandUse dataset weights.This strategic approach optimizes training by preserving the valuable insights gained from the pre-trained ImageNet weights in the initial layers.During this process, all other layers are frozen to maintain the weights obtained from the initial training on the UCMerced_LandUse dataset [35].Upon combining the proposed layers and classifier, the entire network is released once the final layers have been trained on the UCMerced_LandUse dataset.Evaluation of the model accuracy is then conducted using test data, incorporating weights from both the ImageNet dataset and the UCMerced_LandUse images.All DL models undergo fine-tuning with diverse hyper-parameters outlined in Table 1.Image input dimensions are 224 × 224 × 3, utilizing a batch size of 32.We use an SGD optimizer and employ categorical cross-entropy (CC) as the loss function.The SoftMax activation function is applied to the output layer, ensuring the robustness and effectiveness of the land-cover classification model.

Dataset
The "UCMerced_LandUse Dataset" is widely used in the field of computer vision and machine learning for land use classification tasks.This dataset was created by the University of California, Merced, and it consists of high-resolution aerial images of various land use and LC classes, as given in Table 2.It is commonly used for tasks such as image classification, object recognition, and segmentation.Researchers and practitioners often use the UCMerced_LandUse Dataset for training and evaluating ML algorithms, especially for image classification tasks related to land use and LC mapping.It provides a valuable resource for testing the effectiveness of different models and techniques in remote-sensing and computer vision applications.

Evaluation Parameters
The evaluation parameters used in this study are F1-score, accuracy, precision, and recall.
In comparison to all observations, the ratio of correctly predicted observations is known as accuracy.Accuracy is assessed using Equation ( 2), wherein the terms false positive (FP), false negative (FN), true positive (TP), and true negative (TN) are utilized.
Precision is the ratio of correctly predicted positive observations to all predicted positive observations, as shown in Equation (3).
The ratio of all actual class observations to all correctly predicted positive observations is known as recall, as shown in Equation ( 4).
F1-score is the precision and recall weighted average.It is computed using Equation ( 5).
The hyper-parameters configuration is used to fine-tune both CNN networks.All the models are trained with a 0.0001 learning rate and 32-batch size using the Adam optimizer.Using the UCMerced_LandUse dataset, we fine-tune the inception-V3, DenseNet121, and ResNet-50 models by freezing all top layers and adding new layers according to the number of classes in the dataset classes.Each model is trained by using Python 3.8, TensorFlow 2.13.0, sklearn 1.0, matplotlib 3.7.3,and NumPy 1.24.3 libraries and an Intel(R) Core (TM) i7-8700 CPU with RAM of 16 GB, Seoul, South Korea.

Experimental Results
Evaluations and experiments are conducted on three pre-trained models: DenseNet121, Inception-v3, and ResNet-50.Assessment metrics encompass F1-score, recall, accuracy, and precision.The training process involves fifty epochs for each of the Inception-v3, DenseNet121, and ResNet-50 models.Upon completion of the training, the testing accuracies for the three models are as follows: 95% for Inception-v3, 94% for DenseNet121, and 93% for ResNet-50.Additionally, the validation accuracies are reported as 92%, 91%, and 91% for Inception-v3, DenseNet121, and ResNet-50, respectively.Table 3 shows the precision, recall and F1-Score for each model.These results provide insights into the model performance, indicating their ability to generalize well to unseen data.Furthermore, Figure 6 displays the model's static visual results.Actual labels are displayed in the first row, followed by the predicted labels of the Inception-v3 model in the second row, the DenseNet121 model predicted labels in the third row, and the ResNet-50 model predicted labels in the last row.The correct predicted labels are displayed as black, while the incorrect predicted labels are given as red as shown in Figure 7.We can see from Table 2 that Inception-V3 performs better than DenseNet121 and ResNet-50.Additionally, it demonstrates how the DenseNet121 and ResNet-50 models misunderstand certain labels, such as "river", "runway", and "airplane".Additionally, the confusion matrices and heatmaps for the models and classes are displayed in Figures 8 and 9, respectively.Finally, Table 2 displays the F1-score, precision, and recall for all Inception-v3, Den-sNet121, and ResNet-50 models in the land areas.This illustrates that DenseNet121 and ResNet-50 are not as effective as Inception-V3 because some classes' accuracies are lower comparatively.

Comparison with Contemporary Techniques
The comparative analysis presented in Table 4 assesses the performance of the proposed model against modern techniques, employing various deep learning models.The first model, EfficientNet [36], demonstrates an accuracy of 83%, a precision of 83%, a recall of 71%, and an F1-score of 77%.Moving on to the second model, CNN [37] achieved an accuracy of 89%, a precision of 87%, a recall of 84%, and an F1-score of 85%.This model outperforms EfficientNet in terms of accuracy and F1-score.The third model, ResNet34 [38], displays an accuracy of 70%, a precision of 70%, a recall of 70%, and an F1-score of 71%.While this model achieves a balanced performance, it falls behind in accuracy compared to the other models.However, the Inception-V3 model surpasses all others with an accuracy of 92%, a precision of 93%, a recall of 92%, and an F1-score of 92%.This indicates superior performance compared to the aforementioned models.The DenseNet121 model in the proposed system achieves an accuracy of 91%, a precision of 91%, a recall of 90%, and an F1-score of 89%.Although slightly lower than Inception-V3, it outperforms EfficientNet, CNN, and ResNet34.Lastly, the ResNet-50 model in the proposed system yields an accuracy of 91%, a precision of 90% [31], a recall of 90%, and an F1-score of 88%.It maintains competitive performance, particularly in accuracy and F1-score, when compared to other models.In short, the proposed Inception-V3 model stands out as an effective model in terms of accuracy and F1-score, showcasing the efficacy of the suggested system in surpassing the state-of-the-art techniques.

Ablation Study
In a comparison of the AID (Aerial Image Dataset) and UC Merced datasets, UC Merced comes out on top for certain applications.UC Merced provides a good balance of manageable dataset size and high-resolution aerial images, making it suitable for tasks like land-use classification.The dataset's moderate scale allows for efficient model training and testing without compromising scene diversity.UC Merced provides a comprehensive representation for training robust models by covering a wide range of land-use categories.The higher resolution of the dataset aids in the detailed analysis of aerial imagery, which is useful for tasks requiring precision in land-area classification.While both the AID and UC Merced datasets are useful in the field of aerial image analysis, the UC Merced dataset's balance of dataset size, diversity, and resolution makes it a better choice for applications that require both efficiency and detailed scene understanding.Herein, additional experiments are performed with different models to fully assess their generalization abilities.The detailed result of each model is given in Table 5. Res-Net50 91% 85%

Conclusions
In this study, we introduced a robust approach to land-area scene classification (LASC) through the implementation of three distinct deep learning (DL) models-Inception-v3, DenseNet121, and ResNet-50-leveraging the power of transfer learning.Our LASC system demonstrates impressive accuracy rates of 92%, 91%, and 91% in classifying both static and real-time land-area scenes from images.While these results show promise, there remains potential for further enhancement.Our system has the capacity to evolve, extending its capabilities to classify multiple land-area scene categories in real time.Such advancements could significantly contribute to improved urban planning and land management across both urban and rural areas, benefiting a diverse range of stakeholders.Upon analyzing our models' performance metrics, including recall, accuracy, precision, and F1-score, Inception-v3 emerges as the top performer, outclassing DenseNet121 and ResNet-50.Inception-v3 excels in accurately classifying land-area scene poses in images, achieving remarkable scores of 92% accuracy, 93% precision, 92% recall, and 92% F1-score.DenseNet121 follows closely with scores of 91%, 91%, 90%, and 89%, while the ResNet-50 model achieves scores of 91%, 91%, 90%, and 88%, respectively.
Our future endeavors with the LASC system will focus on the development of robotics and smartphone applications.Additionally, we aim to enhance the system's accuracy by creating an improved CNN incorporating advanced data fusion techniques.This strategic approach is expected to result in even more precise land-area scene classification, ultimately facilitating more informed decision-making in the realms of land management and urban planning.The widespread applications of these advancements underscore the potential benefits for various stakeholders in diverse domains.

Figure 1 .
Figure 1.The high-level diagram of the proposed work for LAC.

Figure 2 .
Figure 2. Data augmentation techniques, such as flipping, rotation, zooming, and cropping, employed to augment the training dataset and improve the model performance.

Figure 3 .
Figure 3. Proposed diagram of fully connected layers.

Figure 4 .
Figure 4. Architecture of Inception-V3 model with proposed fully connected layers.

Figure 5 .
Figure 5.The architecture of DenseNet121 model with proposed fully connected layers.

Figure 6 .
Figure 6.The architecture of ResNet-50 model with proposed fully connected layers.

Table 3 .Figure 7 .Figure 8 .
Figure 7. Visual results of the three models.Red color are used to represent the incorrect prediction where models are confused but inception model is not confused and give us correct prediction of all classes.

Table 4 .
Comparative analysis of the proposed system with SOTA methods.

Table 5 .
Comparative analysis of datasets.