Limiting the Collection of Ground Truth Data for Land Use and Land Cover Maps with Machine Learning Algorithms

: Land use and land cover (LULC) classiﬁcation maps help understand the state and trends of agricultural production and provide insights for applications in environmental monitoring. One of the major downfalls of the LULC technique is inherently linked to its need for ground truth data to cross-validate maps. This paper aimed at evaluating the efﬁciency of machine learning (ML) in limiting the use of ground truth data for LULC maps. This was accomplished by (1) extracting reliable LULC information from Sentinel-2 and Landsat-8 s images, (2) generating remote sensing indices used to train ML algorithms, and (3) comparing the results with ground truth data. The remote sensing indices that were tested include the difference vegetation index (DVI), the normalized difference vegetation index (NDVI), the normalized built-up index (NDBI), the urban index (UI), and the normalized bare land index (NBLI). Extracted vegetation indices were evaluated on three ML algorithms, namely, random forest (RF), k-nearest neighbour (K-NN), and k dimensional-tree (KD-Tree). The accuracy of these algorithms was assessed with standard statistical measures and ground truth data randomly collected in Prince Edward Island, Canada. Results showed that high kappa coefﬁcient values were achieved by K-NN (82% and 74%), KD-Tree (80% and 78%), and RF (83% and 73%) for Sentinel-2A and Landsat-8 imagery, respectively. RF was a better classiﬁer than K-NN and KD-Tree and had the highest overall accuracy with Sentinel-2A satellite images (92%). This approach provides the basis for limiting the collection of ground truth data and thus reduces the labour cost, time, and resources needed to collect ground truth data for LULC maps.


Introduction
Land use and land cover (LULC) classification is the most widely researched topic in the remote sensing field as it provides valuable information for urban planning, resource management, environmental monitoring, and agricultural mapping [1]. LULC classification can be used to highlight historical trends or provide evidence-based tools in decision making for resource management [2]. For several years, satellite imagery has been used in LULC classification in a variety of statistical and empirical methods. Unfortunately, these methods have several limitations on accuracy assessment, as each satellite has different spectral, temporal, and radiometric resolutions [3]. Recently, the data science and remote sensing communities have successfully achieved higher accuracy due to the launch of new satellite constellations and machine learning (ML) algorithms [4]. Furthermore, free access to data from earth observation satellites, including Sentinel-2 and Landsat-8, has that supervised classification algorithms are likely to perform better than unsupervised classification algorithms.
LULC maps require an appropriate classification algorithm to solve real-world problems with high accuracy [25]. Several studies have shown the potential of ML and statistical algorithms in LULC classification. For example, Desai and Umrikar [25] tested two supervised classifiers, namely maximum likelihood and minimum distance for LULC classification using Landsat imagery. The maximum likelihood classification of Landsat data had a higher accuracy than the minimum distance method. Nguyen et al. [26] used the ground truth data to train the ML algorithms for LULC classification. Other studies by Jia et al. [27] tested a support vector machine (SVM) and maximum likelihood algorithms on Landsat-8 imagery. The maximum likelihood classifier results were more accurate than the results of SVM. Over time, more advanced algorithms have been used in LULC classification, including decision trees (DT) and random forest (RF). Thanh Noi and Kappas [28] tested RF, k-nearest neighbour (K-NN), and SVM using training sample sizes generated from Sentinel imagery. Results from the study showed that SVM had the highest accuracy and had the least sensitivity to the size of the training sample. However, K-NN and RF classifiers attained a higher accuracy with a large training size compared to SVM.
The literature also reveals that every classification method performs differently depending on the types of satellites used to capture images. For example, Jia et al. [27] compared Landsat-7 and Landsat-8 using similar algorithms and found that the latter showed a higher accuracy than the former satellite. Similarly, Ali et al. [29] recorded a higher accuracy on ALOS-2 dual-polarization bands than the Landsat-8 optical imagery data with a maximum likelihood classifier. Clerici et al. [30] tested Sentinel-1 and Sentinel-2 satellite imagery to enhance mapping accuracy and found a higher accuracy for Sentinel-2 data than Sentinel-1 data in conjunction with the SVM algorithm. The above-mentioned results proved that the accuracy of LULC maps depends on the choice of satellite and the classification algorithms used.
Due to the high cost associated with the collection of ground truth points and the heightened demand for efficient natural resource management, the objective of this study was to evaluate the efficiency of ML algorithms in limiting the use of ground truth data for LULC maps. This will be accomplished by extracting LULC information from Sentinel-2 and Landsat-8 satellite images and by generating remote sensing indices used to train ML algorithms. The results of this paper are divided into three parts. First, results from the ML algorithms were evaluated against ground truth data. Second, standard statistical measures were used to evaluate the performance of each ML algorithm. Third, algorithms were compared to each other to understand their performance better.
The paper has been divided into two main sections, i.e., the materials and methods section and the results and discussion section. The materials utilized in this investigation and their processing details are mentioned in the materials and methods section. The accuracy of the ML algorithm is examined and compared in the results and discussion section with the findings of prior studies.

The Study Area
The study area consists of Prince Edward Island (PEI), one of Canada's smallest provinces with a land area of approximately 5669 square kilometres ( Figure 1). In 2019, the province had a total population of 157,262, which represents less than 0.5% of Canada's total population [31]. The climate on the Island is mild and strongly influenced by the warm waters of the Gulf of St. Lawrence [32]. PEI has a wide variety of landscape uses, including forests, agriculture, meadows, water, wetlands, and urban areas.

Data Acquisition
Two types of satellite images were evaluated, since the literature reveals that classification methods perform differently on different types of satellite imagery [33]. A total of seven satellite scenes were acquired from the USGS website from 7 July to 28 July 2019 (Table 1).

Data Acquisition
Two types of satellite images were evaluated, since the literature reveals that classification methods perform differently on different types of satellite imagery [33]. A total of seven satellite scenes were acquired from the USGS website from 7 July to 28 July 2019 (Table 1). The Sentinels satellites are a constellation that consists of two twin satellites, Sentinel-2A and Sentinel-2B. When these satellites operate simultaneously from the same orbit, phased at 180 • to each other, they can monitor the variability in land surface conditions every 5 days [34]. Sentinel-2 satellites acquire optical imagery at a resolution ranging from 10 to 60 m depending on the spectral bands. The satellite coverage limits are between 56 • latitude South and 84 o longitude North with a swath width of 290 km.
The Landsat-8 satellite is also an Earth observation satellite equipped with two payloads that collect 11 spectral bands with a spatial resolution ranging from 30 to 100 m. Landsat-8 was selected due to its enhanced thematic mapper in the range of visible bands compared to other Landsat satellites [27]. The Landsat-8 has improved capabilities from the previous generation due to the addition of new spectral bands in the blue spectrum, the use of two new thermal bands, and an enhanced duty cycle that has increased the daily image collection capacity of the satellite [35].
The Landsat-8 satellite scenes were selected with the lowest cloud cover available to reduce the scattering and absorption of light in the atmosphere (Table 1). Additionally, the satellite scenes were taken from the collection-1 level-1, which was already geometrically and radiometrically corrected.

Data Preparation
The Sentinel-2A and Landsat-8 images were processed using the Sentinel Application Toolbox version 8.0.0 (SNAP). All Sentinel-2A and Landsat-8 satellite image bands were resampled in SNAP using the nearest neighbour method into 20 and 30 m resolutions, respectively. The resampled images were mosaicked to cover PEI's provincial boundary using the SNFAP built-in raster mosaicking tool. Three Landsat-8 scenes were mosaicked to cover the entire Island. Two of these scenes were collected on 26 July 2019, and the other one was acquired on 19 July 2019. The satellite images were reprojected to a local coordinate system, imported in ArcGIS Pro, and used to create training data for the LULC maps.

Remote Sensing Indices and LULC Classes
Vegetation indices such as normalized difference vegetation index (NDVI) or soil adjusted vegetation index (SAVI) can be obtained from remotely sensed data. Vegetation indices are simple to generate from multispectral satellite imagery and effective algorithms for evaluating vegetation cover quantitatively and qualitatively. Similarly, an urban index, such as a normalized built-up index (NDBI), can be used to identify urban features on satellite images. In the hands of trained geospatial analysts, remote sensing indices can highlight different types of land cover and can be particularly useful for training classifiers used in LULC maps.
In this study, the difference vegetation index (DVI) and NDVI were used to identify vegetation cover [7]. Since agriculture and forest have similar values in both indices, the barren lands were identified using the normalized bare land index (NBLI) index to overcome this issue ( Figure 2A1-A5,B1-B5). The NBLI index is effective in highlighting soil composition and is helpful to differentiate agriculture from forested areas Figure 2A5,B5 [11]. Urban features were identified with the NDBI and built-up index (UI). These indices were used because they distinguish barren land from urban features [11]. Results of the NDBI index presented in Figure 2A3,B3 showed that some pixels representing urban areas on the Landsat-8 and Sentinel images were mixed with bare land features. This issue was resolved by using the UI index, since urban features can be identified with more precision in Figure  Remote sensing indices presented in Table 2 were used to delineate the LULC classes. Four LULC classes, namely agriculture, urban, barren land, and forest, were identified in the study area (Table 3). These indices were used for extracting the training samples for LULC classification in ArcGIS Pro. A total of 2000 training samples, 500 samples for each class, were created to train the classifier. The sample size was determined to be large  Table 2 were used to delineate the LULC classes. Four LULC classes, namely agriculture, urban, barren land, and forest, were identified in the study area (Table 3). These indices were used for extracting the training samples for LULC classification in ArcGIS Pro. A total of 2000 training samples, 500 samples for each class, were created to train the classifier. The sample size was determined to be large enough since it adequately covers the entire study area without exhausting the classifier computing power. The RF is a combination of tree predictors with each tree depending on an independently sampled random vector value with a similar distribution in all trees ( Figure 3) [40]. Boosting and bagging are two ensemble methods capable of squeezing additional predictive accuracy out of classification algorithms. Bagging algorithms are used to reduce the complexity of the models that overfit the training data, while the boosting algorithm increases models' complexity. The training samples, which are not used in the training sample, were included in the evaluation and were referred to as 'out of bag' samples [4]. In addition, the RF classifier is easy to use since it only uses two parameters (e.g., number of variables at each node and number of trees), which is not sensitive to the parameter value [41]. The number of trees and predictors in RF classification are vital parameters to achieving the highest accuracy possible. For assessing the accuracy of the current RF output, these parameters were set at 50 for the number of trees, and the maximum number of tree depth and samples per class were set as 30 and 500, respectively.

K-Nearest Neighbour
The K-NN is a supervised ML algorithm that can be used to solve classification and regression problems. It was first discussed in an unpublished report by [42], followed by more detailed K-NN rules published by [43]. It categorizes the objects based on the nearest

K-Nearest Neighbour
The K-NN is a supervised ML algorithm that can be used to solve classification and regression problems. It was first discussed in an unpublished report by [42], followed by more detailed K-NN rules published by [43]. It categorizes the objects based on the nearest neighbour class. The major deciding factor in the classification task is the number of neighbours (k) used to classify an object ( Figure 4). Small k values indicate relatively inaccurate results, while higher k values indicate a more credible result [44]. Through trial and error, the optimal k value was found and set to k = 20.

K-Nearest Neighbour
The K-NN is a supervised ML algorithm that can be used to solve classification and regression problems. It was first discussed in an unpublished report by [42], followed by more detailed K-NN rules published by [43]. It categorizes the objects based on the nearest neighbour class. The major deciding factor in the classification task is the number of neighbours (k) used to classify an object (Figure 4). Small k values indicate relatively inaccurate results, while higher k values indicate a more credible result [44]. Through trial and error, the optimal k value was found and set to k = 20.

K Dimensional-Tree
KD-Tree is the most common binary algorithm used for the nearest neighbour algorithm family. In KD-Tree classifiers, the clusters are developed based on the median of the x and y axes ( Figure 5). KD-Tree categorizes points based on the projections in lower dimensions [45]. For lower-dimensional datasets, the KD-Tree is designed to perform better compared to other algorithms such as ball-tree [46]. For an accuracy comparison of KD-Tree on both satellites, the number of training samples was set at 2000, and the number of

K Dimensional-Tree
KD-Tree is the most common binary algorithm used for the nearest neighbour algorithm family. In KD-Tree classifiers, the clusters are developed based on the median of the x and y axes ( Figure 5). KD-Tree categorizes points based on the projections in lower dimensions [45]. For lower-dimensional datasets, the KD-Tree is designed to perform better compared to other algorithms such as ball-tree [46]. For an accuracy comparison of KD-Tree on both satellites, the number of training samples was set at 2000, and the number of neighbours was set at 20. Similar to the k value in the K-NN algorithm, the optimal number of neighbours was determined through trial and error. neighbours was set at 20. Similar to the k value in the K-NN algorithm, the optimal number of neighbours was determined through trial and error.

Ground Truth Data for Validation and Model Evaluation Criteria
Five sites on PEI were selected to collect ground truth data. Using a Real-Time Kinematic (RTK) GPS with sub-meter accuracy, a total of 200 validation points were collected at each site. These points were equally distributed in each class, meaning that 50 points

Ground Truth Data for Validation and Model Evaluation Criteria
Five sites on PEI were selected to collect ground truth data. Using a Real-Time Kinematic (RTK) GPS with sub-meter accuracy, a total of 200 validation points were collected at each site. These points were equally distributed in each class, meaning that 50 points were collected per class at each site. The same ground truth data were used to validate LULC maps generated with Sentinel-2A and Landsat-8 imagery.
Several statistical indicators were used to assess the accuracy of the models. The overall accuracy of the models was used to describe the correct proportion of mapped pixels. The overall accuracy considers that 100% of all the classified reference sites are mapped accurately [47]. The overall accuracy was calculated using the following formula: Ovrall Accracy (%) = Number of correctly classified pixels Total number of referenced site pixels × 100 (1) Similarly, each LULC class's accuracy was determined using producer/user accuracies. Producer/user accuracy determines the real feature on the ground surface correctly shown on the classified map [47]. The producer and user accuracy were calculated using the following formula: Producer/User Accuracy (%) = Correctly classified pixels in one category Total classified pixels in all categories × 100 (2) The kappa coefficient is another statistical indicator to evaluate classification accuracy. Kappa evaluates how well the classification has performed compared to the randomly assigned value. Its values range from -1 to 1, with the lowest value indicating that the classification is not better than a random classification, while a value close to a positive one indicates that the classification is significantly better than the random classification [48]. The kappa coefficient was calculated using the following formula: where TS is the total number of samples, TCS is the total number of classified samples, and column sum and row sum represent the total number of classified pixels for each class in each column and row, respectively.

Land Use and Land Cover Mapping Results
In the prepared LULC maps (Figure 6), the yellow colour represents the agricultural area, the green colour represents the forest area, battleship grey represents barren land, and red represents the urban area.
From the Landsat-8 imagery, the KD-Tree classifier detected the true positives for the agriculture class, e.g., 45 out of 50 with a user accuracy of 90% and a producer accuracy of 79% ( Figure 7A and Table 4). For Sentinel-2A imagery, the highest true positives were classified by the K-NN algorithm, e.g., 47 out of 50 for the agriculture class. The RF and K-NN in Landsat-8 and KD-Tree and RF in Sentinel-2A recorded true positives for agriculture classes ranging within 38-45 out of 50 ( Figure 7A,D). Interestingly, the highest and lowest true positives for the agriculture class were recorded by the K-NN algorithm. This implies that the performance of classification may be improved by using a finer resolution and more refined imagery [49,50].

Land Use and Land Cover Mapping Results
In the prepared LULC maps (Figure 6), the yellow colour represents the agricultural area, the green colour represents the forest area, battleship grey represents barren land, and red represents the urban area.

Classifier
Landsat From the Landsat-8 imagery, the KD-Tree classifier detected the true positives for the agriculture class, e.g., 45 out of 50 with a user accuracy of 90% and a producer accuracy of 79% ( Figure 7A and Table 4). For Sentinel-2A imagery, the highest true positives were classified by the K-NN algorithm, e.g., 47 out of 50 for the agriculture class. The RF and K-NN in Landsat-8 and KD-Tree and RF in Sentinel-2A recorded true positives for agriculture classes ranging within 38-45 out of 50 ( Figure 7A,D). Interestingly, the highest and   lowest true positives for the agriculture class were recorded by the K-NN algorithm. This implies that the performance of classification may be improved by using a finer resolution and more refined imagery [49,50].  Figures (A-C). The error matrices for Sentinel-2A data are shown in Figures (D-F). Table 4 shows the producer and user accuracy calculated based on the error matrices.   Figures (A-C). The error matrices for Sentinel-2A data are shown in Figures (D-F). Table 4 shows the producer and user accuracy calculated based on the error matrices.
For the barren land class, the highest true positives were recorded by the KD-Tree classifier with the Sentinel-2A imagery, i.e., 49 out of 50 [51] with a user accuracy of 98% and a producer accuracy of 80% ( Figure 7D and Table 4). However, the performance of the KD-Tree classifier with Landsat-8 imagery for the barren land class recorded relatively lower true positives, e.g., 42 out of 50. Similarly, the highest true positives for the urban forest class were recorded by the random forest classifier with Sentinel-2A imagery. However, a relatively lower number of true positives was recorded for the forest class with Landsat-8 imagery, e.g., 32, 39, and 36 for the KD-Tree, RF, and K-NN classifiers, respectively ( Figure 7A-C). For the urban class, the highest average true positives were recorded for both satellite images. For the urban class, the RF algorithm with the Sentinel-2A satellite achieved the highest possible user accuracy (100%) compared to all other satellite-algorithm comparisons (Table 4). These results concur with the findings reported in the literature that mention that the resolution, image characteristics, classification algorithms, and the need of the user affect the classification accuracy of LULC mapping [49,51].

Satellite Accuracy Comparison
For Landsat-8 imagery, the algorithm's kappa coefficient was recorded as 78, 80, and 74% for KD-Tree, RF, and K-NN, respectively (Figure 8). For Sentinel-2A imagery, the same algorithms recorded considerably increased kappa coefficient values, i.e., 2.5, 10, and 10.8% for the KD-Tree, RF, and K-NN algorithms, respectively. Similarly, the average kappa coefficient was 83.3% for the Sentinel-2A, while the average was 77.3% for the Landsat-8. For the barren land class, the highest true positives were recorded by the KD-Tree classifier with the Sentinel-2A imagery, i.e., 49 out of 50 [51] with a user accuracy of 98% and a producer accuracy of 80% ( Figure 7D and Table 4). However, the performance of the KD-Tree classifier with Landsat-8 imagery for the barren land class recorded relatively lower true positives, e.g., 42 out of 50. Similarly, the highest true positives for the urban forest class were recorded by the random forest classifier with Sentinel-2A imagery. However, a relatively lower number of true positives was recorded for the forest class with Landsat-8 imagery, e.g., 32, 39, and 36 for the KD-Tree, RF, and K-NN classifiers, respectively ( Figure 7A-C). For the urban class, the highest average true positives were recorded for both satellite images. For the urban class, the RF algorithm with the Sentinel-2A satellite achieved the highest possible user accuracy (100%) compared to all other satellitealgorithm comparisons (Table 4). These results concur with the findings reported in the literature that mention that the resolution, image characteristics, classification algorithms, and the need of the user affect the classification accuracy of LULC mapping [49,51].

Satellite Accuracy Comparison
For Landsat-8 imagery, the algorithm's kappa coefficient was recorded as 78, 80, and 74% for KD-Tree, RF, and K-NN, respectively ( Figure 8). For Sentinel-2A imagery, the same algorithms recorded considerably increased kappa coefficient values, i.e., 2.5, 10, and 10.8% for the KD-Tree, RF, and K-NN algorithms, respectively. Similarly, the average kappa coefficient was 83.3% for the Sentinel-2A, while the average was 77.3% for the Landsat-8. The random forest classifier's overall accuracy was recorded as 92 and 85% for Sentinel-2A and Landsat-8 satellites, respectively ( Figure 9). The average accuracy of the KD-Tree classifier for both satellites was recorded to be 84.5%. The K-NN achieved 86 and 81% overall accuracy for the Sentinel-2A and Landsat-8 satellites, respectively. A slightly lower average overall accuracy of 83.5% was recorded for the K-NN algorithm in comparison with the KD-Tree classifier ( Figure 9). The random forest classifier's overall accuracy was recorded as 92 and 85% for Sentinel-2A and Landsat-8 satellites, respectively ( Figure 9). The average accuracy of the KD-Tree classifier for both satellites was recorded to be 84.5%. The K-NN achieved 86 and 81% overall accuracy for the Sentinel-2A and Landsat-8 satellites, respectively. A slightly lower average overall accuracy of 83.5% was recorded for the K-NN algorithm in comparison with the KD-Tree classifier (Figure 9).

Discussion
The sentinel-2A and Landsat-8 presently operate at medium resolution at 10, 20, and 30 m. The resolutions of these two satellite bands are different. Before further processing, all Landsat-8 bands were resampled to 30 m resolution, while Sentinel-2A bands were resampled to 20 m resolution. This study presented the potential of different remote sens-

Discussion
The sentinel-2A and Landsat-8 presently operate at medium resolution at 10, 20, and 30 m. The resolutions of these two satellite bands are different. Before further processing, all Landsat-8 bands were resampled to 30 m resolution, while Sentinel-2A bands were resampled to 20 m resolution. This study presented the potential of different remote sensing indices to create the training samples for LULC mapping in PEI in conjunction with three ML algorithms. The Island population is increasing, and major land cover classes such as forest, agriculture, barren land, and urban will be affected. These rapid changes demand more effective methods to map land cover changes and conduct resource management analyses.
The remote sensing indices, including DVI, NDVI, NDBI, UI, and NBLI, were selected to highlight the agriculture, forest, barren land, and urban area. This approach to preparing LULC maps is much cheaper and faster than other classification methods traditionally used. Although, for some LULC classes, it is hard to find suitable remote sensing indices. For example, it is hard to distinguish between forests and agriculture using remote sensing indices. The NBLI was used to overcome this problem because [6] documented that the NBLI can highlight the soil composition at the pixel level, which helps distinguish between agriculture and forest ( Figure 2). The results from the experiment also verified the validity of this proposed method.
In the last step, we used the same algorithms processing conditions (same training and validation data sets) to compare the Landsat-8 and Sentinel-2A optimal data sets for LULC mapping. The comparison results indicated that the overall accuracy of each algorithm highly depends on the input data of the results. For example, the highest overall accuracy of RF 92% showed that RF offers the best classification results for Sentinel-2A, whereas KD-Tree and K-NN's overall accuracies were slightly decreased. Interestingly, RF also offers the highest overall accuracy 85% for Landsat-8; likewise, KD-Tree and K-NN's overall accuracy was slightly decreased compared to RF. All the results mentioned above proved that the outcomes of each classifier depend on the input data set. The results proved that RF is a suitable ML algorithm as compared to the KD-Tree and K-NN for land cover classification without considering input data sets. Therefore, it is necessary to compare the obtained results with the literature because it offers a realistic view of this study's results.
For example, Lowe and Kulkarni [52] used the RF, SVM, maximum likelihood classifier for preparing the LULC map and achieved an overall accuracy of 87, 83, and 77%, respectively. Another study from Franco-Lopez et al. [53] prepared LULC maps with 13 classes using the K-NN algorithm and achieved an overall accuracy of 63%. These different results indicate that there are no clear rules for the acceptable accuracy for any land cover type, but it depends upon the user and adoptive methodology. In any LULC classification, errors are present in the form of estimation and prediction [54]. So far, no clear rules have been defined in the acceptable accuracy range because different users have different concerns about classification accuracy [55]. In addition, several factors influence the accuracy of classification, such as image quality, classifier, number of classes, and number sample size [51]. One study [51] that used Sentinel-2 data was for LULC mapping in Vietnam with RF, K-NN, and SVM algorithms, and it reported the highest accuracy by RF when the training vector size was appropriate to cover the study area. RF achieved a higher accuracy than the SVM by using Sentinel-1 data of the Brazilian Amazon [56]. These results indicate that Sentinel-2A and Landsat-8 data had satisfactory performance in LULC mapping. In [51], RF was recommended for LULC classification because of the ease in parameter selection in RF. The results of this study concur with the findings of [51].

Conclusions
This study proposed a methodology to produce LULC maps at a lesser cost and in a quick manner by using three ML algorithms (KD-Tree, RF, and K-NN) and two satellites (Landsat-8 and Sentinel-2A). Timely updated maps can help the local authorities with better resource management and land-use policy decisions. The proposed methodology to develop the LULC maps with remote sensing indices can be leveraged by researchers to determine the spatial-temporal changes of LULC due to human activities. The results of this study demonstrated the potential of remote sensing indices to limit the need for ground truth data for LULC mapping. This would lower the labour cost, time, and resources required to generate LULC maps.
In this study, training samples for four classes, forest, agriculture, urban, and barren land, were created on behalf of indices, and these training samples were used in conjunction with the ML algorithms for LULC mapping. The prepared LULC maps based on this proposed methodology showed promising results when they were validated with ground truth data. The six LULC maps, produced by running the three ML algorithms using the same training data for the two sources of imagery, were subjected to an accuracy assessment to determine the effectiveness of the ML algorithms.
Results from the study demonstrated that K-NN achieved average kappa coefficients of 82 and 74% and high overall accuracies of 86 and 81% for Sentinel-2A and Landsat-8, respectively. In comparison, the KD-Tree had average kappa coefficients of 80 and 78% and overall accuracies of 85 and 84% for Sentinel-2A and Landsat-8. Random forest achieved the highest average kappa coefficients, at 83.3 and 73.3%, and highest overall accuracies, at 92 and 85%. for Sentinel-2A and Landsat-8 data, respectively, compared to K-NN and KD-Tree.
Further research should be conducted in two tasks: (1) the evaluation of this methodology on satellite images with a higher resolution as well as refining the data by training samples for subclasses of crops such as potato, wheat, rice, maize, and grasses and (2) the quantity and quality of training samples have an impact on land cover classification. By assuring quality and increasing the training sample size, classification accuracy can be enhanced. The ideal combination of training sample size will also be researched in the future.