Integration of Sentinel-1 and Sentinel-2 Data with the G-SMOTE Technique for Boosting Land Cover Classification Accuracy

Ebrahimy, Hamid; Naboureh, Amin; Feizizadeh, Bakhtiar; Aryal, Jagannath; Ghorbanzadeh, Omid

doi:10.3390/app112110309

Open AccessArticle

Integration of Sentinel-1 and Sentinel-2 Data with the G-SMOTE Technique for Boosting Land Cover Classification Accuracy

by

Hamid Ebrahimy

¹

,

Amin Naboureh

^2,3

,

Bakhtiar Feizizadeh

^4,5

,

Jagannath Aryal

⁶

and

Omid Ghorbanzadeh

^7,*

¹

Remote Sensing and GIS Research Centre, Faculty of Earth Sciences, Shahid Beheshti University, Tehran 653641255, Iran

²

Research Center for Digital Mountain and Remote Sensing Application, Institute of Mountain Hazards and Environment, Chinese Academy of Sciences, Chengdu 610041, China

³

University of Chinese Academy of Sciences, Beijing 100049, China

⁴

Department of Remote Sensing and GIS, University of Tabriz, Tabriz 51666, Iran

⁵

Department of Geography, Humboldt University of Berlin, 12489 Berlin, Germany

⁶

Department of Infrastructure Engineering, Faculty of Engineering and IT, The University of Melbourne, Melbourne, VIC 3010, Australia

⁷

Institute of Advanced Research in Artificial Intelligence (IARAI), Landstraßer Hauptstraße 5, 1030 Vienna, Austria

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2021, 11(21), 10309; https://doi.org/10.3390/app112110309

Submission received: 15 September 2021 / Revised: 27 October 2021 / Accepted: 29 October 2021 / Published: 3 November 2021

Download

Browse Figures

Versions Notes

Abstract

:

The importance of Land Cover (LC) classification is recognized by an increasing number of scholars who employ LC information in various applications (i.e., address global climate change and achieve sustainable development). However, studying the roles of balancing data, image integration, and performance of different machine learning algorithms in various landscapes has not received as much attention from scientists. Therefore, the present study investigates the performance of three frequently used Machine Learning (ML) algorithms, including Extreme Learning Machines (ELM), Support Vector Machines (SVM), and Random Forest (RF) in LC mapping at six different landscapes. Moreover, the Geometric Synthetic Minority Over-sampling Technique (G-SMOTE) was adopted to deal with the class imbalance problem. In this work, the time-series of Sentinel-1 and Sentinel-2 data were integrated to improve LC mapping accuracy, taking advantage of both data. Moreover, Support Vector Machine-Recursive Feature Elimination (SVM-RFE) was implemented to distinguish the most informative features. Based on the results, the RF integrated with G-SMOTE showed the best result for four landscapes (coastal, cropland, desert, and semi-arid). SVM integrated with G-SMOTE had the highest accuracy in the remaining two landscapes (plain and mountain). Applied ML algorithms showed good performances in various landscapes, ranging Overall Accuracy (OA) from 85% to 93% for RF, 83% to 94% for SVM, and 84% to 92% for ELM. The outcomes exhibit that although applying G-SMOTE may slightly decrease OA values, it generally boosts the results of LC classification accuracies in various landscapes, particularly for minority classes.

Keywords:

Machine Learning (ML); Geometric Synthetic Minority Over-Sampling Technique (G-SMOTE); land cover mapping; European Space Agency (ESA); class imbalance problem

1. Introduction

Land cover (LC) data has great importance for different disciplines, such as biodiversity patterns [1], natural hazards studies (i.e., landslides [2] and wildfire [3]), and CO₂ emissions [4]. Additionally, there is a considerable need for current and precise information on LC and its changes for sustainable development and global warming studies [5]. The importance of the mentioned issues and the progress of Remote Sensing (RS) technologies toward providing data with better temporal and spatial resolutions have motivated scholars and scientists to study LC mapping widely. Although the tremendous attempts exerted in LC mapping, examining the roles of balancing data, image integration, and performance of different machine learning algorithms in various landscapes has not yet received much attention from scholars.

The advent of Sentinel-1 and Sentinel-2, providing images with high spatial resolution, global coverage, and ultimately their free access, brings excellent opportunities for LC mapping. As a result, many published papers have been conducted using these images. For example, Abdi [6] integrated these images for LC mapping complex boreal landscapes. In another study, those images were integrated for LC mapping in Colombia [7]. Integration of radar and optic RS data can deliver complementary information to improve LC mapping accuracy, taking advantage of both data [8]. More precisely, the geometrical characteristics of the classes are mainly examined by Sentinel-1, providing C-band. At the same time, Sentinel-2 Multi-Spectral Instrument (MSI) is sensitive to the manifest content of the LC classes [9]. It has been reported that incorporating time-series of these images can lead to more accurate and reliable LC maps compared to using them individually [10]. However, integrating these datasets in different landscapes for LC mapping has not yet been well documented.

To boost LC mapping accuracy, adding some supplementary information (i.e., textural information and spectral indices) into the classification procedure has been endorsed as an efficient and practical approach [11,12]. For example, the impact of complementary information (e.g., topographic data and spectral indices) has been investigated for LC mapping in mountainous areas [12]. Texture information provides some continuous measure of distribution in digital numbers of a satellite image within predefined local windows [13]. Using texture information and spectral bands can create high separation capability among different LC types, particularly in heterogeneous landscapes [13,14]. Moreover, it has been reported that spectral indices can also improve LC mapping accuracy [15]. Since using a large set of features has some disadvantages, such as being time-consuming and highly computational complex [16], selecting the most critical features in LC classification using an appropriate feature selection method can lead to a more operative and reliable LC classification procedure [17]. In this regard, the RS community has widely employed Feature Selection (FS) methods to select the most appropriate features from a pool of available features. Among the different FS methods, Support Vector Machine-Recursive Feature Elimination (SVM-RFE) as a powerful method has been successfully applied in different RS studies to eliminate redundant features [16].

It is generally accepted that Machine Learning (ML) algorithms can effectively improve LC classification accuracy. In this manner, although standard ML algorithms, in most cases, can obtain substantially reasonable accuracies for majority classes, they usually show poor accuracies for rare (minority) classes, mainly owing to the class imbalance problem [18]. Since LC classes are of various distribution and extent, gaining equal samples for all LC classes is very difficult [19], leading to the class imbalance problem and unacceptable accuracies for minority classes. To address this issue, several balancing data have been presented. However, the proposed methodologies have primarily examined specific landscapes, and their performances in different landscapes have not been investigated. For example, Naboureh et al. [20] proposed a hybrid data balancing method for mountainous regions. In another study, Waldner et al. [21] investigated the impact of different data balancing techniques for mapping crops. To this end, the recently proposed method, namely the Geometric Synthetic Minority Over-sampling Technique (G-SMOTE), by Douzas and Bacao (2019) [22], has been introduced as a robust method to address the class imbalance problem. However, there is still a lack of research that can thoroughly assess the performance of G-SMOTE in different landscapes by applying different ML algorithms.

Given the importance of the issues mentioned above, the present study was an attempt to investigate the performance of G-SMOTE to handle the class imbalance problem in LC classification at six different landscapes applying three frequently used ML algorithms, including RF, SVMs, and ELM. Furthermore, the SVM-RFE method was applied for each landscape to select the most informative features and use them as classification inputs to obtain the optimal feature subset from radar and optical bands, spectral indices, and texture information. Specifically, we are going to answer the following questions in this study:

(1): What are the most informative features from Sentinel-1, Sentinel-2, spectral indices, and textural information for LC mapping using three well-known ML algorithms in different landscapes?
(2): What is the performance of the G-SMOTE algorithm in LC classification in different circumstances?
(3): Which ML classifier has higher accuracy on LC mapping at diverse landscapes?

2. Materials

2.1. Overview of the Experiment Sites

In this study, six different landscapes (Figure 1) with different numbers of samples, LC types, elevation ranges, climate conditions, and areas were selected to assess the roles of G-SMOTE, integration of radar and optical data, and different ML algorithms in improving LC classification accuracy. Site-1, as a coastal landscape, covers an area of about 2266 km² located in Istanbul province, Turkey. Forest is the dominant LC type in this study area. Site-2, as a plain landscape, covers an area of about 2509 km² located in East Azerbaijan province, Iran. Bare land classes mainly cover this study area. Site-3, as a semi-arid landscape, covers an area of about 1309 km² located in Tehran province, Iran. Cropland is the dominant class in this study area. Site-4, as a desert landscape, covers an area of about 1966 km² located in South Turkmenistan. At the same time, bare land class and artificial surface cover most parts of this study area. Site-5, as a cropland landscape, covers an area of about 2966 km² located in Western Uzbekistan. The LC of Site-5 is largely cropland. Site-6, as a mountainous landscape, covers an area of about 3255 km² located in Urumqi province, China. Forest and snow classes mainly cover this study area.

2.2. Image and Reference Data

In the present study, time series of sentinel-1C products (Image Collection ID: COPERNICUS/S2_SR) with 10 m spatial resolution and Sentinel-2A MSI Level-2A products with less than 15% cloud coverage between January 2019 and January 2020 were utilized (Table 1). Several preprocessing steps (i.e., orbit file correction, radiometric calibration, terrain correction, and speckle noise reduction) were initially applied for SAR data. Then, the multi-look parameterization of vertical-vertical (VV) and vertical-horizontal (VH) polarizations were obtained using Sentinel-1A images. Moreover, ten spectral bands of Sentinel-2 from Band 2 to Band 8A, Band 11, and Band 12 were utilized in this study. Next, using the nearest neighbor method, bands 5, 6, 7, 8A, 11, and 12 were resampled to 10-m resolution from 20-m pixel size, achieving the exact resolution as the other bands.

A reliable training dataset with enough quantity and accuracy is needed for training any supervised ML classifier [3]. In this study, very-high-resolution images available in Google Earth^TM and ArcGIS and raw Sentinel-2 data were visually interpreted to produce the reference dataset. Based on the extent of LC types, 840, 759, 846, 818, 703, and 1077 samples were, respectively, produced for Site-1, Site-2, Site-3, Site4, Site-5, and Site-6 (Table 2). The obtained datasets were then divided into two parts; one was used to train ML classifiers (training dataset). The other was used to assess the accuracy of the LC maps (validation dataset). More precisely, 60% of the original GCPs were used for training, and the remaining 40% were used for accuracy assessment.

3. Methods

3.1. Methodology

The present research methodology comprises six main steps: (1) Obtaining sentinel-1 and Sentinel-2 images for each landscape and preprocessing. (2) Calculating spectral indices and texture information. (3) Adopting the SVM-RFE method to find the most valuable features (among the spectral band, spectral indices, and texture information). (4) Implementing the G-SMOTE method to rebalance the acquired reference datasets. (5) Employ ML methods to generate LC maps with the selected features from step 3. (6) Analyze the results and recommend the most helpful method and features for every landscape.

3.2. Spectral and Textural Features

Spectral and textural features are two main features widely applied in image interpretation and classification [15,23]. Three frequently used spectral indices were employed to improve classification accuracy: Normalized Difference Built-up Index (NDBI), Normalized Difference Vegetation Index (NDVI), and Normalized Difference Water Index (NDWI) (Table 3). On the other hand, the second-order statistics of the grey-level co-occurrence matrix (GLCM) obtained from the VV band of Sentinel-1 SAR data were also used to improve the LC accuracy. It has to be noted that we selected the VV band for GLCM calculation after some primary analyses and based on its high-value distribution. The texture measures of mean, contrast, variance, dissimilarity, homogeneity, second moment, energy, and entropy were derived from the VV band by applying a 3 × 3 window size filter.

3.3. Feature Selection

In this study, the SVM-RFE method was employed to select the most critical features for LC classification. The SVM-RFE seeks to rank the original features for subsequent analyzes [24]. With this method, at each iteration, an SVM classifier is built by sequentially eliminating available features. Meanwhile, analyzing the exhibited change in the cost function, the weight of all features calculated, and the feature with minimum rank is eliminated [16]. This process continues in anticipation that there are no additional features for removing to reach the feature-ranking list. Using the caret and e1071 packages in the R environment, the SVM-RFE method was applied to identify the best possible combination of features for each of the six sites.

3.4. G-SMOTE

Imbalanced data can occur in a deliberative normal sampling process, but it can probably happen in random sampling. As an extension of the SMOTE method, G-SMOTE forms convex combinations of neighboring samples and creates new alternatives of the minority classes instead of regenerating instances from the available instances. Unlike Smote that randomly generates synthetic samples somewhere along a line connecting minority instances to its k nearest neighbors, G-SMOTE specifies a flexible geometric area around each rare sample to generate synthetic samples (Figure 2). Generally speaking, G-SMOTE is developed to escape noisy sample generation as it modifies the SMOTE algorithm [22,25].

3.5. ML Classifiers and Accuracy Assessment

Applying ML classifiers, considered accurate and efficient approaches for LC mapping, has attracted many scholars. In this work, three frequently used and well-known ML algorithms, namely RF, ELM, and SVM, were assessed for LC classification in six different landscapes (For more detailed information about the ML classifiers, read [26,27,28,29,30,31]). In the RF model, the maximum number of trees was 1000, which was selected from a range of 500 to 3000 using cross-validation. While the number of variables for each split in RF was set as the default value of 25. To get the best performance, the cutoff fraction in this model and the resampling process repeated time were set as 0.01 and 500, respectively. For the SVM, we used a kernel width (γ) of 0.95 and a regularization (C) parameter of 0.8. Of note, we used a tenfold cross-validation strategy to find the most suitable values for the parameters of the RF, ELM, and SVM algorithms. Three accuracy evaluation metrics, including User’s Accuracy (UA), Producer’s Accuracy (PA), and Overall Accuracy (OA), were computed with validation datasets to evaluate the accuracy of produced LC maps. We did not use the Kappa static because of increased criticism in literature [15,32,33].

4. Results

This study applied the SVM-RFE method to obtain the ranked list of the features before classification. Therefore, original features, including Sentinel-1 bands (VV, VH), Sentinel -2 bands (B2, B3, B4, B8, B5, B6, B7, B8A, B11, and B12), spectral indices (NDVI, NDWI, and NDBI), and eight texture information derived from VV band (mean, dissimilarity, homogeneity, second moment, contrast, variance, entropy, and correlation) were used. Table 4 gives a summary of the most critical features after adopting SVM-RFE for different landscapes.

The RF, SVM, and ELM approaches using the obtained features (Table 4) were employed for generating LC maps in six different landscapes. The LC maps were generated with two sets of reference datasets to investigate the impact of class imbalance on the LC classification accuracy; first, without balancing samples and the scorned one by adopting the G-SMOTE for balancing samples of classes. As shown in Table 5 and Table 6, all three ML algorithms illustrated good performance in LC mapping. For example, all generated maps obtained OA above 0.83, ranging from 0.85 to 0.93 for RF, 0.83 to 0.94 for SVM, and 0.84 to 0.92 for ELM. Our analysis also showed that adopting the G-SMOTE method to rebalance reference datasets substantially improved UA and PA accuracies of minority classes. As illustrated in Figure 3, RF-G-SMOTE showed the best performance in four landscapes, namely coastal, cropland, desert, and semi-arid. In comparison, the SVM-G-SMOTE obtained higher accuracies for the remaining two landscapes, including plain and mountain.

Table 5. The result of accuracy assessment methods in six different landscapes (with original GCPs). Minority classes for each landscape are highlighted.

Methods		SVM			RF			ELM
Sites	Class	UA	PA	OA	UA	PA	OA	UA	PA	OA
Coastal	Barren	0.84	0.67	0.91	0.55	0.83	0.92	0.55	0.83	0.89
	Built-up	0.85	0.75		0.93	0.65		0.83	0.75
	Cropland	0.7	0.5		0.43	0.8		0.5	0.17
	Forest	0.94	0.96		0.96	0.86		0.91	0.96
	Water	1	0.97		1	1		1	1
Cropland	Barren	0.68	0.55	0.84	0.52	0.59	0.85	0.55	0.43	0.84
	Built-up	0.97	0.92		0.89	0.87		0.97	0.82
	Cropland	0.87	0.96		0.83	0.9		0.82	0.92
	Pasture	0.73	0.76		0.76	0.76		0.67	0.77
	Water	1	1		1	1		1	1
Desert	Barren	1	1	0.94	1	1	0.93	1	1	0.92
	Built-up	0.74	0.61		0.8	0.58		0.78	0.58
	Cropland	0.9	0.99		0.8	1		0.89	0.98
	Water	1	1		1	1		0.89	1
Mountain	Barren	0.89	0.96	0.90	0.83	0.92	0.91	0.87	0.74	0.84
	Cropland	0.62	0.73		0.75	0.5		0.57	0.65
	Pasture	0.96	0.84		0.91	0.87		0.81	0.91
	Snow	0.9	0.9		0.81	0.81		0.73	1
	Water	1	1		1	1		1	1
Plain	Barren	0.92	1	0.89	0.89	0.93	0.90	0.88	1	0.89
	Built-up	0.95	0.92		0.92	1		0.92	0.96
	Cropland	0.88	0.91		0.89	0.91		0.88	0.91
	Pasture	0.73	0.65		0.74	0.74		0.78	0.56
Semi-Arid	Barren	0.82	0.91	0.84	0.86	0.89	0.85	0.82	0.85	0.84
	Built-up	0.87	0.85		0.87	0.85		0.83	0.8
	Cropland	0.84	0.84		0.83	0.87		0.79	0.86
	Pasture	0.57	0.53		0.55	0.57		0.53	0.56

Table 6. The result of accuracy assessment methods in six different landscapes (after adopting G-SMOTE). Minority classes for each landscape are highlighted.

Methods		SVM			RF			ELM
Sites	Class	UA	PA	OA	UA	PA	OA	UA	PA	OA
Coastal	Barren	0.85	0.88	0.91	0.87	0.83	0.92	0.85	0.85	0.88
	Built-up	0.88	0.76		0.84	0.80		0.84	0.75
	Cropland	0.91	0.85		0.80	0.81		0.77	0.83
	Forest	0.93	0.9		0.90	0.89		0.90	0.94
	Water	1	1		1	1		1	1
Cropland	Barren	0.80	0.77	0.84	0.75	0.79	0.85	0.72	0.77	0.85
	Built-up	0.93	0.90		0.87	0.85		0.89	0.80
	Cropland	0.88	0.91		0.84	0.86		0.83	0.84
	Pasture	0.85	0.87		0.80	0.82		0.77	0.79
	Water	1	1		1	1		1	1
Desert	Barren	1	1	0.93	1	0.98	0.93.5	0.97	1	0.91
	Built-up	0.88	0.78		0.89	0.79		0.79	0.87
	Cropland	0.92	0.93		0.9	0.94		0.92	0.82
	Water	1	1		1	1		1	1
Mountain	Barren	0.9	0.96	0.91	0.83	0.95	0.90	0.85	0.81	0.84
	Cropland	0.85	0.82		0.85	0.88		0.80	0.78
	Pasture	0.96	0.9		0.86	0.94		0.84	0.94
	Snow	0.92	1		0.78	0.80		0.90	0.89
	Water	1	1		1	1		1	1
Plain	Barren	0.91	0.98	0.90	0.89	0.93	0.89	0.93	0.95	0.88
	Built-up	0.92	0.92		0.93	0.95		0.92	0.92
	Cropland	0.89	0.91		0.89	0.90		0.93	0.93
	Pasture	0.80	0.75		0.81	0.78		0.82	0.78
Semi-Arid	Barren	0.86	0.81	0.83	0.83	0.87	0.85	0.80	0.83	0.845
	Built-up	0.86	0.85		0.89	0.85		0.82	0.82
	Cropland	0.82	0.83		0.80	0.85		0.80	0.83
	Pasture	0.79	0.74		0.75	0.77		0.77	0.75

Figure 3. Impact of G-SMOTE on the overall accuracy of the generated LC maps.

5. Discussion

5.1. Most Informative Feature

After analyzing the results of SVM-RFE to choose the most critical feature for each landscape, it was revealed that NDVI, VV, and B12 bands were selected as prominent features in all six landscapes, which could confirm their importance in LC classification [16]. In contrast, NDBI, which consider as an essential index for extracting built-up areas [23], was only introduced as a critical feature in three landscapes (namely plain, semi-arid, and coastal). NDWI also had the same situation; it was only selected as the main feature for the coastal landscape. This selection is potentially related to the fact that separation of the water bodies is much simpler than the other classes, which makes the NDWI unnecessary in most cases. The SAR data were selected as informative features in all of the landscapes, which is plausible due to the excellent applicability of SAR data in LC mapping for broad land covers. Such data is already applied for different broad land cover classes such as forest [34], cropland [9], and built-up areas [35]. Among the texture information, only three features were selected in all experiments, except coastal sites, including mean, variance, and homogeneity. The results illustrated that the texture information was relatively unimportant in comparison to other features.

5.2. Comparison of ML Classifiers

In general, our experiments exhibited that the ML classifiers integrated with the G-SMOTE method yielded sustainably better results in different landscapes. With balanced datasets, the coastal and desert were two experiment sites that showed the highest accuracies; in contrast, the lowest classification accuracy belonged to the semi-arid landscape. Comparing the classifiers revealed that different classifiers show dissimilar performances in diverse landscapes. For example, although the highest accuracy (OA and G-mean) belonged to the SVM-G-SMOTE classifier in three landscapes (coastal, cropland, and mountain), the RF-G-SMOTE provided better results in three remaining landscapes (plain, semi-arid, and desert). On the other hand, ELM-G-SMOTE had the worst outcome in five sites (in common with the SVM-G-SMOTE in two cases), whereas it provided the same leadership position as RF-G-SMOTE in the plain landscape.

Since our methodological approach in image classification was the same in all experiments, this diversity in results might be attributed to the landscape circumstances and the LC class distribution of reference datasets. That diversity in results among classifiers agrees with the previous studies where scholars and scientists have tried to find the best classifier for LC classification, but their conclusions were contradictory. For instance, Clerici and others [7], Fusing Sentinel-1, and Sentinel-2 data, introduced the SVM algorithm as the most accurate one (OA = 88.75%). Adam and others [36] claimed that RF shows higher accuracy than SVM in a heterogeneous coastal landscape. Our results suggest a comparable performance by both the SVM-G-SMOTE and RF-G-SMOTE methods. While the time and experimentation required to select the user-defined parameters of RF are pretty small compared to SVM. The SVM implementation involves choosing a suitable kernel and some kernel specific parameters like cost and gamma [26]. On the other hand, the RF only needs a selection of two parameters, including n tree and m try. The value of 500 for n tree and the square root of the number of input variables for m try had already been proven as valid values [28].

5.3. Effect of G-SMOTE on LC Classification Accuracy

In addition to the quality and quantity of the training dataset, the class imbalance problem also has a significant impact on the ML classifier‘s performance [19]. Applying the G-SMOTE method improved the LC classification accuracies based on the results (Table 5 and Table 6). This finding agrees with the previous research [22]. Moreover, analyzing the impact of G-SMOTE on OA showed that balancing data can slightly decline OA values. Among landscapes, coastal and semi-arid experienced more improvement by G-SMOTE integration with ML classifiers, mainly because of the higher degree of complexity in these experiment sites. Moreover, the desert landscape, which can be considered as an experiment site with less complexity, had the lowest impact; even SVM showed better OA accuracy than SVM-G-SMOTE in this landscape.

6. Conclusions

This study analyzed the potential of RF, SVM, and ELM in LC classification at six different landscapes by integrating Sentinel-1 and Sentinel-2 images. We also used SVM-RFE to select the most informative features from Sentinel-1, Sentinel-2, spectral indices, and textural information. Furthermore, we discussed the impact of G-SMOTE on the classification accuracy of ML algorithms. The result showed that NDVI, VV, and B12 could contribute as main features to improve LC classification accuracy. Our findings also indicated that all three ML algorithms, especially RF and SVM, are robust approaches for LC classification in different landscapes.

Moreover, the results confirmed that applying G-SMOTE has a significant impact on the accuracy of minority classes. After applying G-SMOTE to ML algorithms, the differences between UA and PA metrics for minority and majority classes have decreased. However, there were significant differences among them without considering the class imbalance problem. Further study could investigate the performance of other algorithms and sample sizes in balancing data.

Author Contributions

Conceptualization, O.G.; Data curation, H.E.; Methodology, H.E. and A.N.; Software, H.E. and A.N.; Supervision, O.G.; Validation, A.N.; Writing—original draft, H.E.; Writing—review & editing, B.F., J.A. and O.G. All authors have read and agreed to the published version of the manuscript.

Funding

The open-access fee of this research was funded by the Institute of Advanced Research in Artificial Intelligence (IARAI) GmbH.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available on request.

Acknowledgments

The authors thank the Copernicus programs by the European Space Agency (ESA) for Sentinel-1 and Sentinel-2 datasets, and are also grateful to Google Earth Engine (GEE) for their services. The authors thank the three anonymous referees for their helpful and critical comments that helped us improve an earlier version of the manuscript. The open-access fee of this research was funded by the Institute of Advanced Research in Artificial Intelligence (IARAI).

Conflicts of Interest

The authors declare no conflict of interest.

References

Etter, A.; McAlpine, C.; Pullar, D.; Possingham, H. Modelling the conversion of colombian lowland ecosystems since 1940: Drivers, patterns and rates. J. Environ. Manag. 2006, 79, 74–87. [Google Scholar] [CrossRef]
Moharrami, M.; Naboureh, A.; Gudiyangada Nachappa, T.; Ghorbanzadeh, O.; Guan, X.; Blaschke, T. National-scale landslide susceptibility mapping in austria using fuzzy best-worst multi-criteria decision-making. ISPRS Int. J. Geo-Inf. 2020, 9, 393. [Google Scholar] [CrossRef]
Ghorbanzadeh, O.; Valizadeh Kamran, K.; Blaschke, T.; Aryal, J.; Naboureh, A.; Einali, J.; Bian, J. Spatial prediction of wildfire susceptibility using field survey gps data and machine learning approaches. Fire 2019, 2, 43. [Google Scholar] [CrossRef] [Green Version]
Houghton, R.A.; House, J.I.; Pongratz, J.; Van Der Werf, G.R.; DeFries, R.S.; Hansen, M.C.; Quéré, C.L.; Ramankutty, N. Carbon emissions from land use and land-cover change. Biogeosciences 2012, 9, 5125–5142. [Google Scholar] [CrossRef] [Green Version]
Naboureh, A.; Bian, J.; Lei, G.; Li, A. A review of land use/land cover change mapping in the china-central asia-west asia economic corridor countries. Big Earth Data 2020, 5, 237–257. [Google Scholar] [CrossRef]
Abdi, A.M. Land cover and land use classification performance of machine learning algorithms in a boreal landscape using sentinel-2 data. GISci. Remote Sens. 2020, 57, 1–20. [Google Scholar] [CrossRef] [Green Version]
Clerici, N.; Calderón, C.A.V.; Posada, J.M. Fusion of sentinel-1a and sentinel-2a data for land cover mapping: A case study in the lower magdalena region, colombia. J. Maps 2017, 13, 718–726. [Google Scholar] [CrossRef] [Green Version]
Ienco, D.; Gaetano, R.; Interdonato, R.; Ose, K.; Minh, D.H.T. Combining sentinel-1 and sentinel-2 time series via rnn for object-based land cover classification. In Proceedings of the IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019. [Google Scholar]
Mercier, A.; Betbeder, J.; Rumiano, F.; Baudry, J.; Gond, V.; Blanc, L.; Bourgoin, C.; Cornu, G.; Marchamalo, M.; Poccard-Chapuis, R. Evaluation of sentinel-1 and 2 time series for land cover classification of forest–agriculture mosaics in temperate and tropical landscapes. Remote Sens. 2019, 11, 979. [Google Scholar] [CrossRef] [Green Version]
Joshi, N.; Baumann, M.; Ehammer, A.; Fensholt, R.; Grogan, K.; Hostert, P.; Jepsen, M.R.; Kuemmerle, T.; Meyfroidt, P.; Mitchard, E.T. A review of the application of optical and radar remote sensing data fusion to land use mapping and monitoring. Remote Sens. 2016, 8, 70. [Google Scholar] [CrossRef] [Green Version]
Rakwatin, P.; Longépé, N.; Isoguchi, O.; Shimada, M.; Uryu, Y.; Takeuchi, W. Using multiscale texture information from alos palsar to map tropical forest. Int. J. Remote Sens. 2012, 33, 7727–7746. [Google Scholar] [CrossRef]
Feizizadeh, B. A novel approach of fuzzy dempster–shafer theory for spatial uncertainty analysis and accuracy assessment of object-based image classification. IEEE Geosci. Remote Sens. Lett. 2018, 15, 18–22. [Google Scholar] [CrossRef]
Al-Fares, W. Historical Land Use/Land Cover Classification Using Remote Sensing; Springer: Amsterdam, The Netherlands, 2013. [Google Scholar]
Gómez, C.; White, J.C.; Wulder, M.A. Optical remotely sensed time series data for land cover classification: A review. ISPRS J. Photogramm. Remote Sens. 2016, 116, 55–72. [Google Scholar] [CrossRef] [Green Version]
Naboureh, A.; Moghaddam, M.H.R.; Feizizadeh, B.; Blaschke, T. An integrated object-based image analysis and ca-markov model approach for modeling land use/land cover trends in the sarab plain. Arab. J. Geosci. 2017, 10, 259. [Google Scholar] [CrossRef]
Ebrahimy, H.; Azadbakht, M. Downscaling modis land surface temperature over a heterogeneous area: An investigation of machine learning techniques, feature selection, and impacts of mixed pixels. Comput. Geosci. 2019, 124, 93–102. [Google Scholar] [CrossRef]
Tao, Z.; Xin, H.; Wen, D.; Li, J. Urban building density estimation from high-resolution imagery using multiple features and support vector regression. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 3265–3280. [Google Scholar]
Mellor, A.; Boukir, S.; Haywood, A.; Jones, S. Exploring issues of training data imbalance and mislabelling on random forest performance for large area land cover classification using the ensemble margin. ISPRS J. Photogramm. Remote Sens. 2015, 105, 155–168. [Google Scholar] [CrossRef]
Azadbakht, M.; Fraser, C.S.; Khoshelham, K. Synergy of sampling techniques and ensemble classifiers for classification of urban environments using full-waveform lidar data. Int. J. Appl. Earth Obs. Geoinf. 2018, 73, 277–291. [Google Scholar] [CrossRef]
Naboureh, A.; Li, A.; Bian, J.; Lei, G.; Amani, M. A hybrid data balancing method for classification of imbalanced training data within google earth engine: Case studies from mountainous regions. Remote Sens. 2020, 12, 3301. [Google Scholar] [CrossRef]
Waldner, F.; Chen, Y.; Lawes, R.; Hochman, Z. Needle in a haystack: Mapping rare and infrequent crops using satellite imagery and data balancing methods. Remote Sens. Environ. 2019, 233, 111375. [Google Scholar] [CrossRef]
Douzas, G.; Bacao, F. Geometric smote a geometrically enhanced drop-in replacement for smote. Inf. Sci. 2019, 501, 118–135. [Google Scholar] [CrossRef]
Zha, Y.; Gao, J.; Ni, S. Use of normalized difference built-up index in automatically mapping urban areas from tm imagery. Int. J. Remote Sens. 2003, 24, 583–594. [Google Scholar] [CrossRef]
Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn. 2002, 46, 389–422. [Google Scholar] [CrossRef]
Douzas, G.; Bacao, F.; Fonseca, J.; Khudinyan, M. Imbalanced learning in land cover classification: Improving minority classes’ prediction accuracy using the geometric smote algorithm. Remote Sens. 2019, 11, 3040. [Google Scholar] [CrossRef] [Green Version]
Mountrakis, G.; Im, J.; Ogole, C. Support vector machines in remote sensing: A review. ISPRS J. Photogramm. Remote Sens. 2011, 66, 247–259. [Google Scholar] [CrossRef]
Tax, D.M.; Duin, R.P. Support vector data description. Mach. Learn. 2004, 54, 45–66. [Google Scholar] [CrossRef] [Green Version]
Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
Huang, G.-B.; Zhu, Q.-Y.; Siew, C.-K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
Ghorbanzadeh, O.; Blaschke, T.; Gholamnia, K.; Meena, S.R.; Tiede, D.; Aryal, J. Evaluation of different machine learning methods and deep-learning convolutional neural networks for landslide detection. Remote Sens. 2019, 11, 196. [Google Scholar] [CrossRef] [Green Version]
Tavakkoli Piralilou, S.; Shahabi, H.; Jarihani, B.; Ghorbanzadeh, O.; Blaschke, T.; Gholamnia, K.; Meena, S.R.; Aryal, J. Landslide detection using multi-scale image segmentation and different machine learning models in the higher himalayas. Remote Sens. 2019, 11, 2575. [Google Scholar] [CrossRef] [Green Version]
Memarian, H.; Balasundram, S.K.; Talib, J.B.; Sung, C.T.B.; Sood, A.M.; Abbaspour, K. Validation of ca-markov for simulation of land use and cover change in the langat basin, malaysia. J. Geogr. Inf. Syst. 2012, 4, 542–554. [Google Scholar] [CrossRef] [Green Version]
Pontius, R.G., Jr.; Millones, M. Death to kappa: Birth of quantity disagreement and allocation disagreement for accuracy assessment. Int. J. Remote Sens. 2011, 32, 4407–4429. [Google Scholar] [CrossRef]
Dong, J.; Xiao, X.; Sheldon, S.; Biradar, C.; Duong, N.D.; Hazarika, M. A comparison of forest cover maps in mainland southeast asia from multiple sources: Palsar, meris, modis and fra. Remote Sens. Environ. 2012, 127, 60–73. [Google Scholar] [CrossRef]
Tavares, P.A.; Beltrão, N.E.S.; Guimarães, U.S.; Teodoro, A.C. Integration of sentinel-1 and sentinel-2 for classification and lulc mapping in the urban area of belém, eastern brazilian amazon. Sensors 2019, 19, 1140. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Adam, E.; Mutanga, O.; Odindi, J.; Abdel-Rahman, E.M. Land-use/cover classification in a heterogeneous coastal landscape using RapidEye imagery: Evaluating the performance of random forest and support vector machines classifiers. Int. J. Remote Sens. 2014, 35, 3440–3458. [Google Scholar] [CrossRef]

Figure 1. (A) The Location of six different landscapes. (B) The RGB color composites of the study areas.

Figure 2. An example of oversampling by the G-SMOTE algorithm.

Table 1. Data set.

Sites	Number of Scenes
Sites	Sentinel-1	Sentinel-2
Coastal	208	146
Cropland	89	145
Desert	92	102
Mountain	118	91
Plain	118	140
Semi-Arid	117	76

Table 2. Information of GCPs.

Site	Barren	Built-Up	Cropland	Forest	Pasture	Snow	Water
Coastal	196	158	103	218	-	-	165
Cropland	90	182	249	-	93	-	89
Desert	346	100	264	-	-	-	108
Mountain	355	-	97	-	321	203	101
Plain	234	182	227	-	116	-	-
Semi-Arid	265	234	268	-	79	-	-

Table 3. The formula of the spectral index.

Spectral Index	Formula
NDBI	(B11 − B8)/(B11 + B8)
NDVI	(B8 − B4)/(B8 + B4)
NDWI	(B8 − B3)/(B8 + B3)

Table 4. Most informative features of each experiment site.

Sites	Selected Features by SVM-RFE
Coastal	VH, VV, B3, B5, B8A, B12, NDVI, NDWI, NDBI
Cropland	VH, VV, B2, B4, B7, B8A, B11, NDVI, variance
Desert	VV, B8A, B11, B12, NDVI, mean
Mountain	VH, VV, B2, B4, B8A, B12, NDVI, variance
Plain	VV, B3, B4, B5, B12, NDVI, NDBI, homogeneity, variance
Semi-Arid	VV, B2, B4, B5, B12, NDVI, NDBI, mean

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ebrahimy, H.; Naboureh, A.; Feizizadeh, B.; Aryal, J.; Ghorbanzadeh, O. Integration of Sentinel-1 and Sentinel-2 Data with the G-SMOTE Technique for Boosting Land Cover Classification Accuracy. Appl. Sci. 2021, 11, 10309. https://doi.org/10.3390/app112110309

AMA Style

Ebrahimy H, Naboureh A, Feizizadeh B, Aryal J, Ghorbanzadeh O. Integration of Sentinel-1 and Sentinel-2 Data with the G-SMOTE Technique for Boosting Land Cover Classification Accuracy. Applied Sciences. 2021; 11(21):10309. https://doi.org/10.3390/app112110309

Chicago/Turabian Style

Ebrahimy, Hamid, Amin Naboureh, Bakhtiar Feizizadeh, Jagannath Aryal, and Omid Ghorbanzadeh. 2021. "Integration of Sentinel-1 and Sentinel-2 Data with the G-SMOTE Technique for Boosting Land Cover Classification Accuracy" Applied Sciences 11, no. 21: 10309. https://doi.org/10.3390/app112110309

APA Style

Ebrahimy, H., Naboureh, A., Feizizadeh, B., Aryal, J., & Ghorbanzadeh, O. (2021). Integration of Sentinel-1 and Sentinel-2 Data with the G-SMOTE Technique for Boosting Land Cover Classification Accuracy. Applied Sciences, 11(21), 10309. https://doi.org/10.3390/app112110309

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Integration of Sentinel-1 and Sentinel-2 Data with the G-SMOTE Technique for Boosting Land Cover Classification Accuracy

Abstract

1. Introduction

2. Materials

2.1. Overview of the Experiment Sites

2.2. Image and Reference Data

3. Methods

3.1. Methodology

3.2. Spectral and Textural Features

3.3. Feature Selection

3.4. G-SMOTE

3.5. ML Classifiers and Accuracy Assessment

4. Results

5. Discussion

5.1. Most Informative Feature

5.2. Comparison of ML Classifiers

5.3. Effect of G-SMOTE on LC Classification Accuracy

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI