Land Cover Mapping in Cloud-Prone Tropical Areas Using Sentinel-2 Data: Integrating Spectral Features with Ndvi Temporal Dynamics

Chong Huang; Chenchen Zhang; Yun He; Qingsheng Liu; He Li; Fenzhen Su; Gaohuan Liu; Arika Bridhikitti

doi:10.3390/rs12071163

,

and

¹

State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China

²

CAS Engineering Laboratory for Yellow River Delta Modern Agriculture, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China

³

University of Chinese Academy of Sciences, Beijing 100049, China

⁴

Environmental Engineering and Disaster Management Program, School of Multidisciplinary, Mahidol University, Kanchanaburi Campus, Sai Yok, Kanchanaburi 71150, Thailand

Remote Sens.2020, 12(7), 1163;https://doi.org/10.3390/rs12071163

Version Notes

Order Reprints

Abstract

Accurate remote sensing and mapping of land cover in the tropics remain difficult tasks since data gaps and a heterogenic landscape make it challenging to perform land cover classification. In this paper, we proposed a multi-feature classification method to integrate temporal statistical features with spectral and textural features. This method is designed to improve the accuracy of land cover classification in cloud-prone tropical regions. Sentinel-2 images were used to construct an NDVI stack for a time-series statistical analysis to characterize the temporal variance of land cover. Two statistical indices were calculated and used to represent the variation in annual vegetation. These indices included the mean (NDVI_mean) and coefficient of variation (NDVI_cv) for the NDVI time series. The temporal statistical features were then integrated with spectral and textural features extracted from high-quality Sentinel-2 imagery for Random Forest classification. The performance and contribution of different combinations were assessed based on their classification accuracies. Our results show that the time-series statistical analysis is an effective way to represent land cover category information contained in annual NDVI variance. The method uses clear pixels from dense low-quality images to obtain the NDVI statistical characteristics, thus, to reduce the influence of random factors such as weather conditions on single-date image. The addition of NDVI_mean and NDVI_cv can improve the separability among most types of land cover. The overall accuracy and the kappa coefficient reached values of 0.8913 and 0.8514 when NDVI_mean and NDVI_cv were integrated. Furthermore, the time-series statistical analysis has less stringent requirements regarding image quality and features a high computational efficiency, which shows its great potential to improve the overall accuracy of land cover classification at regional scales in cloud-prone tropical regions.

Keywords:

land cover; mapping; cloud-prone areas; Sentinel-2; time series; NDVI; statistical indices

1. Introduction

Land cover patterns denote fundamental land surface characteristics and underlying biophysical processes. Therefore, land cover has been regarded as an important climatic variable and environmental factor for modelling Earth surface processes related to anthropogenic activities and the natural environment [1,2,3]. Timely and accurate land cover information at different scales is also crucial for natural resource management and development strategies. Thus, characterizing and mapping land cover cannot be underestimated due to its societal implications [4].

Land cover maps are traditionally produced using remote sensing image classification. There are various studies that have used supervised or unsupervised algorithms to perform land cover mapping based on the spectral discrimination of different bands from a single date image [5,6]. The advantages of using single-date imagery for classification include high efficiency, easy operability, and low demand for data processing. However, single-date images only represent instantaneous spectral characteristics of the land surface at a single period in time. Moreover, land cover categories may show a similar spectral reflectance due to the limitations imposed by broad spectral bands, which can produce inadequate classification results [4].

Multi-temporal or time-series classification has been emphasized in recent studies [7,8,9,10,11] since single images cannot distinguish between all land cover categories. It has been demonstrated that maps generated using a temporal series approach can achieve higher accuracy compared to single-image methods. Furthermore, the temporal information contained in time-series images can be mined as an important feature space to assist with land cover classification [4,12]. There are two ways to utilize temporal information. The first is to select images from several key dates based on local vegetation or crop phenology to maximize the spectral or textural difference among specific land cover types. Multi-temporal data represent discriminative phenological stages that can be integrated for classification [13,14]. The second method is to incorporate as many dates as possible in remote sensing data to construct a time-series image stack for remote sensing classification [8,15]. Some recent studies have highlighted the advantages of using a time-series vegetation index (e.g., NDVI, EVI, RVI) to improve classification accuracy [4]. Within this method, dense time-series satellite images are needed to construct a reliable growth profile. Thus, land cover classification using time-series remote sensing data is usually focused on low to moderate resolution sensors with a short revisit cycle such as AVHRR, SPOT-VEGETATION, or MODIS [16,17,18,19]. However, low to moderate resolution sensors are not appropriate for high-precision land cover classification research at regional scales.

The Sentinel-2 MultiSpectral Instrument (MSI) platform offers a promising opportunity for land cover mapping at fine spatial and temporal scales. The practicality of Sentinel-2 has been tested over the past several years, and results, mostly in Europe, show that the platform has potential for land cover mapping. Immitzer et al. [20] used one scene of Sentinel-2 imagery to map six summer crop species in Lower Austria and seven deciduous and coniferous tree species in Germany. Studies have employed multi-date Sentinel-2 imagery in mapping land cover in Romania [21], floodplain grassland plant communities in France [22], crops in Lower Austria [11], and forests and tree species in southern Poland [23]. Belgiu and Csillik [24] evaluated a time-weighted dynamic time warping (TWDTW) method that used a Sentinel-2 time series for classification of various crop types in three study areas (Romania, Italy, and the United States).

Single-date and multi-date (time series) image classification and land cover mapping in the tropics remains a challenging task. Moreover, data gaps and a heterogenic landscape make this region a challenging site for performing land cover analysis. Firstly, landscapes in humid tropical regions are highly dynamic, with frequent and diverse land cover variations throughout the year [25]. It is difficult to discriminate among different types of land cover using cost-effective and single-date remote sensing imagery due to their similar spectral characteristics. A time-series vegetation index, on the other hand, represents the full growth profiles of vegetation. The typical phenological phases contained in these profiles can be used for vegetation and crop discrimination [16,26]. However, only a few high quality images can be acquired throughout the year in cloud-prone and rainy tropical areas [27]. Furthermore, there is still difficulty in effectively representing and integrating temporal information for land cover classification in these areas [8,28].

This study was aimed at assessing the potential for discriminating land cover categories in cloud-prone tropical areas using NDVI statistical indicators derived from time-series Sentinel-2 images, combined with spectral and textural features derived from single-date Sentinel-2 imagery. We hypothesized that temporal statistical features can assist in discriminating between land cover categories with similar spectral reflectance in single-date imagery. Multi-feature classification was implemented using ensemble learning algorithms. In this paper, we detail the method and discuss its strengths and weaknesses.

2. Materials and Methods

2.1. Study Area

The study area was located in the Southeastern Mun River Basin, Thailand (Figure 1). The area features a humid tropical monsoon climate with an annual temperature of 18 °C or greater and an average annual precipitation of 1,300~1,500 mm. The rainy season lasts from mid-May to the beginning of October, while the dry season lasts from November to April [29]. Natural and plantation forests are distributed in the southern mountainous area. Agricultural crops are cultivated on the northern plain. Crops can be cultivated at any time in the wet season due to the suitable hydrothermal conditions. They can also be cultivated in the dry season with the support of irrigation systems. The crops are harvested two or three times in a year in many areas. The spectral characteristics for crops can vary at different times and locations since the planting period is not limited by a fixed growing season [30]. The types of land cover were divided according to the natural conditions and land use in the study area. The first class was divided into forests (including natural and plantation forests as the second class), farmlands (including paddy fields and dry fields as the second class), water bodies, wetlands, and impervious surfaces.

Figure 1. Study area.

2.2. Data and Preprocessing

2.2.1. Sentinel-2 Single-Date Image Data

The Sentinel-2 program is a land monitoring mission that is a part of the European Copernicus Program and provides unprecedented perspectives on terrestrial observations [31]. The Sentinel-2 platform is a twin-orbit constellation (Sentinel-2A/B) with a revisit time of approximately five days at the equator. The main MSI sensor provides 13 spectral bands at various spatial resolutions. These spectral bands include: four bands in the blue (b2), green (b3), red (b4), and infrared (b8) spectrum with a 10 m spatial resolution; six bands in the red-edge (b5, b6, b7), near-infrared (NIR) (b8a), and short-wave infrared (SWIR) (b11, b12) spectrum with a 20 m spatial resolution; and three atmospheric bands (b1, b9, and b10) with a 60 m spatial resolution. In the study area, a few cloud-free images can be obtained in the dry season, while the rainy season lasts for a long period and features heavy cloud contamination. A cloud-free Sentinel-2A image (Figure 1) from Feb. 13, 2017, was downloaded from the European Space Agency’s (ESA) Sentinel Scientific Data Hub (https://scihub.copernicus.eu/). The image is available as a Level-1C product with radiometric and geometric corrections. In data pre-processing, atmospheric correction was performed using the Sen2cor plug-in [32] to convert the Top-of-Atmosphere (TOA) reflectance to corrected Bottom-of-Atmosphere (BOA) reflectance values. Six bands were then resampled from a 20 m to 10 m resolution using the nearest neighbor sampling method within SNAP software. These bands were then stacked using four bands with a 10 m resolution for classification. The three atmospheric bands were excluded from analysis.

2.2.2. Stacking Time-Series Sentinel-2 NDVI

A total of 40 scenes composed of Sentinel-2A and Sentinel-2B images from 2017 were downloaded from the European Space Agency’s (ESA) Sentinel Scientific Data Hub to calculate the NDVI temporal statistical characteristics. The imagery contained a cloud quality flag band, and the values were stored between 0–100. This value represented the probability of the pixel being covered by clouds from 0% to 100%. The pixels that were partially or fully covered by clouds were masked based on the cloud quality flag band of the image. The pixels with a cloud quality flag band value that was not 0 were then set as an invalid value to obtain all clear pixels in the 2017 period. The band math tool in ENVI software [33] was then used to calculate the NDVI value for the clear pixels in each image. Lastly, the final 40 NDVI image scenes were used to build the Sentinel-2 NDVI stack data for statistical analysis.

2.2.3. Ground Survey Data and Sample Datasets

The samples for training and validation were established using ground survey data and Google Earth [34] high-resolution remote sensing images. A total of 718 samples of typical land-use types were collected from ground surveys in February and August 2017. Moreover, 2598 samples were selected using visual interpretation within Google Earth. Stratified random sampling was carried out in ArcGIS 10.2 [35] based on the total number of samples. Additionally, 70% of samples (2,321) was used for model training. The remaining 995 samples, together with additional 3135 samples which were obtained by another researchers’ independent visual interpretation of Google Earth imagery, were used for validation and accuracy assessment.

2.3. Methods

2.3.1. Spectral and Textural Feature Analysis

Spectral reflectance provides the basis for land cover classification. In the single-date imagery, four bands with a 10 m spatial resolution and six bands with a 20 m spatial resolution (after resampling) were used for land cover classification. The gray-level co-occurrence matrix (GLCM) method [36] was used to extract textural features. Using each spectral band to calculate texture will increase the feature space, and may hamper the classifier’s performance. Before the texture calculation, a classical statistical method, the Principal Component Analysis (PCA) was applied to the cloud-free single-date imagery to decrease the high correlation between the spectral bands providing independent information [31]. After PCA, the original dataset was transformed into a new set of uncorrelated attributes called principal components (PCs). The cumulative contribution rate of the first PC reached 72.36% after transformation of ten spectral bands, and the first and second PCs contained over 95.58% of the original information. Therefore, the first and second PCs were used to extract textural features. Five texture indices were calculated for each PC in reference to Haralick [36] and Jensen et al. [37]. These indices included the angular second moment (ASM), contrast (CON), entropy (ENT), mean (MEAN), and correlation (COR). The window size was set as 5×5 and the step length was set as 1 in this study. A total of 10 textural features were obtained via this method.

2.3.2. Time-Series NDVI Statistical Indicator Analysis

In this study, the temporal features corresponded to annual NDVI statistical values and these were extracted from the time-series Sentinel-2 NDVI stack data. Two statistical indicators were employed, including the mean value (NDVI_mean) and coefficient of variation based on the NDVI (NDVI_cv).

(1): NDVI_mean
The mean value reflects the concentrated trend in time-series NDVI data. The time-series dataset is represented as $X = {x_{1}, x_{2}, \dots, x_{N}}$ . N is the length of X, and the mean value ( $u_{X}$ ) of X is calculated by

$m e a n_{X} = \frac{x_{1} + x_{2} + \dots + x_{N}}{N}$

(1)
(2): NDVI_cv
The coefficient of variation reflects the dispersion degree and concentrated tendency of time-series NDVI data. This value is influenced by both the dispersion degree and the average population level. It is calculated by

$c v_{X} = \frac{σ_{X}}{u_{X}}$

(2)

where $u_{X}$ is the mean value of the time-series data, and $σ_{X}$ represents the standard deviation and reflects the dispersion degree of time-series data. $σ_{X}$ is calculated as

$σ_{X} = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(x_{i} - u_{X})}^{2}}$

(3)

Both NDVI_mean and NDVI_cv were calculated for each pixel based on NDVI time-series stack data within a Python development environment.

2.3.3. Random Forest Classification

Random forest (RF) is an ensemble learning algorithm proposed by Breiman [38] for supervised classification with decision trees as the basic classifier. It is a non-parametric classification and regression method. Previous studies have demonstrated that the RF algorithm can process massive high-dimensional data while maintaining high accuracy [39,40]. The algorithm also has a more stable and robust response to noise and outliers in training data. The RF algorithm is more advantageous for estimating the importance of predictor variables compared with other machine learning algorithms [39,41]. The feature importance is calculated using the Mean Decrease in Gini (MDG), which measures how much a feature reduces the Gini Impurity metric in a class. The Out of Bag (OOB) error is then used for model evaluation to determine the importance of model features. In this study, a Python script was used to import the GDAL library [42] to read training sample data and the random forest classifier was imported from the Scikit-learn library. A total of 22 features (e.g., 10 spectral bands, 10 textural features, and 2 temporal features) were incorporated to construct the RF model for land cover classification. The parameter n_tree (the number of trees created by randomly selecting samples from training samples) was set to 1000. The parameter m_try (the number of variables used for tree node splitting) was set to 4 as the square root of the number of available layers. This method was developed in reference to previous studies [39,43].

2.3.4. Accuracy Assessment

In previous studies, an error or confusion matrix has mostly been used to assess the accuracy of remote sensing classification [44]. This approach can assist in identifying misclassification between different classes and potential sources of error. Furthermore, several metrics can be computed based on the confusion matrix to quantitatively assess the land cover classification accuracy. An accuracy assessment was conducted using three indicators, including the overall accuracy (OA), user’s accuracy (UA), producer’s accuracy (PA) [45], and the kappa coefficient [46].

3. Results

3.1. Separability of NDVI_mean and NDVI_cv for Different Land Cover Types

Land cover samples were selected to calculate NDVI temporal statistics, and to assess the separability of NDVI_mean (Figure 2) and NDVI_cv (Figure 3) among all land cover categories (Figure 3). To ensure inclusion of as much land cover variation as possible, we selected the points with a simple random sampling method for each land cover type for statistical analysis. The land cover types in these samples included natural forests (84), plantation forests (68), paddy fields (84), dry fields (84), water bodies (60), wetlands (48), and impervious surfaces (84).

Figure 2. Mean of the annual time-series NDVI (NDVI_mean) for (a) natural forest, (b) plantation forest, (c) paddy field, (d) dry land, (e) impervious surface, (f) water body, and (g) wetland.

Figure 3. Coefficient of variation of the annual time-series NDVI (NDVI_cv) for (a) natural forest, (b) plantation forest, (c) paddy field, (d) dry land, (e) impervious surface, (f) water body, and (g) wetland.

3.1.1. NDVI_mean

In Figure 2, natural forests and plantation forests were observed to have similar NDVI_mean values, which fluctuated between 0.78 and 0.76, respectively. Both land cover types were significantly higher than other classes, indicating a good separability. The NDVI_mean values for paddy fields and dry fields were around 0.34 and 0.50, respectively. Overall, the NDVI_mean values for paddy fields were lower than the values for dry fields, with some overlap within the 0.35–0.45 range. The NDVI_mean values for both water bodies and wetlands varied significantly, with average values of −0.22 and 0.11, respectively. The values for these two land cover types could be partly used to differentiate them from other classes. However, wetlands were probably confused with water bodies in the −0.2–0.0 range and with paddy fields in the 0.2–0.4 range. The NDVI_mean value for impervious surfaces was around 0.19 and featured a better separability than most other land cover types except for wetlands and paddy fields.

3.1.2. NDVI_cv

The NDVI_cv statistics for all samples from each class are shown in Figure 3. Both natural and plantation forests featured low NDVI_cv values at around 0.25. This suggests that the annual NDVI value for these classes was more stable, and there was smaller spatiotemporal variation. Thus, it was beneficial to separate these classes from the other categories. Dry fields and paddy fields were characterized by the seasonal variation in NDVI values, with a larger NDVI_cv value of around 0.5. The NDVI_cv for wetlands was affected by the seasonal change in moisture and vegetation and varied significantly during the year. This led to higher NDVI_cv values compared to most other land cover types. The coefficient of variation was represented by the ratio of the standard deviation (σ) to the mean (μ) value. Some NDVI_cv outliers were found in wetlands and water bodies due to the small μ value. The NDVI_cv values for impervious surfaces were generally lower than 0.3, indicating a good separability from paddy fields and dry fields.

3.2. The Importance of Different Features

The results for the feature importance measure are listed in Table 1. The most important features included the NIR-2, Red edge-1, and Red bands, with values of 10.05%, 9.82%, and 7.82%, respectively. Green, SWIR-1 (B11), and Red edge-3 (B7) were also of high importance, ranking 5th, 7th, and 8th, respectively. MEAN_1 and MEAN_2 ranked 6th and ninth among all features, denoting their relatively high importance. However, other textural features were less important and were placed at the end of the list. For the statistical indicators, the NDVI_mean ranked 4th in the list with a high importance value of 7.0%, but NDVI_cv only ranked 13th with a value of 4.48%. In Table 1, OOB accuracy was observed to increase as the number of features involved in the classification increased from 1 to 6. The OOB accuracy was observed to increase and then stabilize.

Table 1. Results of the feature importance measure.

3.3. Comparison of Classification Results from Different Feature Combinations

All 22 features were selected as the input features for constructing an RF model to determine the contribution of spectral bands, textures features, and temporal indicators to land cover classification. Five combination schemes were designed for land cover classification, including:

Spectral bands only (SB)
Spectral bands and textural features (SBT)
Spectral bands, textural features, and NDVI_mean (SBTM)
Spectral bands, textural features, and NDVI_cv (SBTC)
Spectral bands, textural features, and NDVI_mean + NDVI_cv (SBTMC)

The classification maps are shown in Figure 4. The overall land cover types featured similar patterns when classified using the five combination models. Paddy fields accounted for most of the study area. Natural forests were distributed in the south with well demarcated boundaries. Plantation forests were distributed on the periphery of natural forests and mainly in the transition zone between mountains and plains. There were a few patches of water bodies in the form of ponds and reservoirs. Impervious surfaces included clustered towns, scattered villages, and line-shaped impervious roads.

Figure 4. Classification results for (a) SB, (b) SBT, (c) SBTM, (d) SBTC, and (e) SBTMC combinations.

The confusion matrix for each combination was calculated to further compare the performance of different schemes (Table 2, Table 3, Table 4, Table 5 and Table 6).

Table 2. Confusion matrix for SB classification.

Table 3. Confusion matrix for SBT classification.

Table 4. Confusion matrix for SBTM classification.

Table 5. Confusion matrix for SBTC classification.

Table 6. Confusion matrix for SBTMC classification.

3.3.1. Spectral Bands Only (SB)

The OA and kappa coefficient produced by the SB method reached 0.8051 and 0.7347, respectively. The PA value for the natural forests was 87.63%, and the UA value was 86.29%. Other land cover types such as plantation forests, paddy fields, and dry fields may have been wrongly classified as natural forests. The PA value for plantation forests was low (78.40%) compared to natural forests. The classification accuracy for dry fields was relatively low. The PA and UA for dry fields were 73.25% and 55.02%, respectively. Paddy fields were the most likely to be confused with impervious surfaces. The most confusion involved in dry field classification was caused by paddy fields, natural forests, and impervious surfaces. The UA value for water bodies was high (93.83%), but the PA was low (84.44%) due to the misclassification into paddy fields. Wetlands were prone to be misclassified as paddy fields and other types, resulting in a lower PA value (45.16%). Both the PA and UA for impervious surfaces were relatively low, with values of 59.12% and 56.97%, respectively. This was caused by the misclassification of impervious surfaces and farmlands.

3.3.2. Spectral Bands and Textural Features (SBT)

The addition of textural features improved the accuracy of most categories, and the OA and the kappa coefficient amounted to 0.8545 and 0.8010, respectively. The most obvious improvement occurred in impervious surfaces. PA increased from 59.12% to 81.13%, and UA increased from 56.97% to 79.38%. According to Table 3, textural features greatly reduced the confusion between impervious surfaces and farmlands. The proportion of impervious surfaces misclassified into paddy fields and dry fields decreased by 14.5% and 7.2%, respectively. Meanwhile, the proportion of paddy fields divided into impervious surfaces decreased by 3.71%. As a result, the PA and UA for paddy fields increased by 5.17% and 3.38%, respectively. Similarly, the UA for dry fields also increased from 55.02% to 60.24%. The addition of textural features also slightly reduced the confusion between natural and plantation forests. Moreover, the probability of misclassification between plantation forests and paddy fields also declined. It should be noted that not all categories showed improvement. The addition of textural features also caused some new confusion among specific land cover classes. For example, the PA for wetlands decreased from 45.16% to 29.03% since they were increasingly misclassified as paddy fields. Additionally, there was an increase in the number of paddy fields that were misclassified as natural forests.

3.3.3. SBT+NDVI_mean (SBTM) and SBT+NDVI_cv (SBTC)

The OA and kappa coefficient were further improved when temporal features (NDVI_mean or NDVI_cv) were added. This resulted in values of 0.8685 and 0.8204 for the SBTM combination, and 0.8714 and 0.8249 for the SBTC combination. According to Table 4 and Table 5, each statistical indicator was observed to have a different contribution towards improving the classification accuracy.

The high NDVI_mean and low NDVI_cv values for natural and plantation forests further improved their distinction from other categories. NDVI_mean contributed more to the improvement of the UA for natural forests, while NDVI_cv was slightly effective in increasing the PA for natural forests. NDVI_mean and NDVI_cv also reduced the classification error for paddy fields. NDVI_mean was more effective for differentiating paddy fields from natural forests and dry fields, resulting in an increase in the PA value (90.02%). Moreover, NDVI_cv was better for correcting the misclassification of impervious surfaces as paddy fields, resulting in a higher UA value (92.65%). In dry fields, there was a greater improvement in the UA value after the addition of NDVI_cv. Impervious surfaces were much confused with paddy fields in both the SB and SBT models. There was less confusion following the addition of NDVI_cv. As a result, both the PA and UA values for impervious surfaces increased to 92.45% and 81.89%, respectively. Compared with the SB model, the PA and UA values for wetlands were observed to decrease after the addition of NDVI_mean or NDVI_cv (Table 2). The UA values for water bodies also decreased in SBTC since wetland pixels were wrongly classified as water bodies.

3.3.4. SBT+NDVI_mean+NDVI_cv (SBTMC)

Both NDVI_mean and NDVI_cv were added to SBTMC. The OA and kappa coefficient was 0.8913 and 0.8514, or the highest among the five combinations (Table 6).

In most cases, SBTMC retained the advantages of both the SBTM and SBTC combinations. Additionally, NDVI_mean and NDVI_cv complemented each other, and improved the overall classification accuracy. For example, there was a reduced misclassification of paddy fields as natural forests or dry fields in SBTM. However, most paddy fields were misclassified as impervious surfaces (Table 4). In the SBTC model, most paddy fields were misclassified as natural forests or dry fields, but there were few misclassifications between paddy fields and impervious surfaces (Table 5). The advantages of NDVI_mean and NDVI_cv were preserved when both variables were added into the SBTMC model. As a result, there was a reduction in the misclassification of paddy fields into other categories, and the PA increased to 91.01%.

Similarly, in SBTMC, NDVI_mean reduced the misclassification of dry fields as paddy fields. Moreover, NDVI_cv reduced the misclassification of impervious surfaces as paddy fields. As a result, the UA value for paddy fields reached 93.56% in the SBTMC model. Similar improvements could also be found in impervious surfaces and other categories.

4. Discussion

4.1. The Advantages of Statistical Indicators for Land Cover Classification in Cloud-Prone Regions

Many studies have noted that the addition of time-series remote sensing data can improve the accuracy of land cover classification [17,40,47]. In this study, statistical indicators derived from time-series Sentinel-2 NDVI were proposed to represent the temporal variation of land cover, and were used to improve the capacity for discernment among different categories. These statistical indicators were observed to have several advantages.

First, the statistical indices have less stringent requirements regarding cloud contamination in time-series remote sensing images. Many studies have shown that high cloud cover in the tropics is the biggest obstacle to time-series analysis [8,28] In cloud-prone regions, few high-quality images are available throughout the year (Figure 5) even for satellite platforms with a high revisit period [8] Furthermore, these few high-quality images are mostly acquired during the dry season. Additionally, it is difficult to use only a few images to extract key phenological information, and an accurate and complete phenology profile cannot be constructed due to data limitations [48,49]. However, temporal statistical analysis has lower image quality requirements. Some clear pixels can still be used for statistical analysis in the rainy season with severe cloud cover, which greatly improves the utilization rate of pixels.

Figure 5. Cloud coverage of time-series Sentinel-2 imagery in the study area in 2017.

Second, the statistical indicators can effectively represent the NDVI’s annual variation characteristics. These indicators can reduce the influence of random factors on single-date imagery such as specific seasons or weather conditions. Natural vegetation grows vigorously throughout the year in the tropics due to the high moisture and heat conditions. Moreover, crop planting methods are flexible [16], and there is no fixed growing season for crops. This often leads to high variation in crop phenology across the region (Figure 6a), which results in different spectral characteristics for the same crop at different growing stages (Figure 6b). Moreover, the similar spectral characteristics among different land cover types makes it challenging for discriminating between classes using cost-effective and single-date remote sensing imagery data. For example, farmland can be planted or fallowed at any time of the year in tropical Southeast Asia. The spectral characteristics of fallowed farmland can easily be confused with impervious surfaces in clustered towns (Figure 6c). As a result, there is a large degree of misclassification between farmlands and impervious surfaces due to the spectral differences in imagery. However, the relatively low NDVI_cv values for impervious surfaces are different from the values for farmland, which is beneficial for reducing misclassification. Overall, temporal statistical analysis is more robust when participating in classification, especially for land cover types with a strong seasonal variation.

Figure 6. The complex spectral characteristics of land cover types in the study area. (a) Rice growing at different stages. The red, yellow and white color rectangles denote flooding, growing, and maturing phases. (b) Different spectral reflectance of rice at different stages. (c) Similar spectral reflectance between fallowed farmlands (red color rectangles) and impervious surfaces (yellow color rectangles).

Moreover, temporal statistical computation is simpler and more efficient compared with other time-series analysis methods. For example, the NDVI values for forests in tropical areas are relatively stable throughout the year. As a result, the statistical indicators for a small number of available clear forest pixels can characterize high NDVI_mean values and distinguish forests from other land cover types.

4.2. Difficulty in Land Cover Classification at Fine Scales

In this paper, the integration effect for temporal information contained in time-series imagery was tested using spectral and textural features from single-date imagery. As a result, the discrimination was maximized for different land cover categories. However, misclassification still occurred for land cover types in the second classification category.

4.2.1. Natural Forest vs. Plantation Forest

It is difficult to differentiate natural forests from plantation forests due to their similar spectral and phenological characteristics [28,50]. In Figure 3 and Figure 4, both NDVI_mean and NDVI_cv values were similar between natural and plantation forests, and the separability between them was insignificant. The main difference between natural and plantation forests was due to their textural features at the landscape scale. In the tropics, natural forests are characterized by rich species diversity and a stratified canopy. Conversely, plantation forests have fewer tree species with more apparent textural characteristics due to their regular row and plant spacing. Therefore, textural features have great potential for differentiating between the two types. However, the improvement in the misclassification of natural and plantation forests was observed to be limited when textural features were considered. According to the field surveys, there was a low canopy coverage in plantation forests during the early growth stages, which was greatly affected by understory vegetation. Moreover, the matured plantation forest had a dense canopy coverage that was quite similar to natural forests. The textural features extracted based on Sentinel-2 imagery with a 10 m resolution were not significant compared to natural forests. However, it should be noted that only a few textural features were tested in this study. As noted by other studies, textural characteristics are related to the size of the canopy, the window size for calculation, and the spatial resolution of remote sensing data [51,52]. The significance of textures and other features for classifying natural and plantation forests still requires further investigation.

4.2.2. Paddy Field vs. Dry Field

In this study, the addition of temporal statistical indicators greatly reduced the misclassification between farmlands and other land categories. However, it is still difficult to accurately classify paddy fields and dry fields within farmlands due to their similar spectral reflectance, textural features, and phenology patterns. Xiao et al. [53] used a time series of MODIS vegetation indices to map rice paddy fields in southern China based on the sensitivity of LSWI to increased surface moisture during periods of flooding. However, it is difficult to obtain images for these periods in cloud-prone areas. Moreover, there is a certain degree of saturated moisture both in paddy fields and dry fields during the rainy season. Some studies have confirmed the applicability of other data sources such as SAR for identifying paddy fields [54]. Radar signals are less affected by cloud cover and are sensitive to changes in moisture. The incorporation of radar backscatter information is expected to further improve the misclassification of paddy fields and dry fields.

4.2.3. Wetlands vs. Others

Wetlands in this study were defined as a shallow swamp with hydrophytic vegetation or forests that experienced seasonal flooding (e.g., mainly located in the flood plain). The highly dynamic variation in soil moisture and the seasonal variation in plant components can cause the NDVI_mean and NDVI_cv of wetlands to overlap with other land categories. In some cases, the addition of NDVI_mean and NDVI_cv may increase the difficulty of wetland identification. The difficulty in wetland discrimination is a result of both its highly dynamic nature and broad definition.

4.3. Uncertainty in Feature Importance Measures of the Random Forest Model

Past studies have demonstrated that the integration of multiple features is helpful for improving the detail and accuracy of land cover maps [55,56]. The contribution of each input feature is uncertain when working with remote sensing data at high spatiotemporal resolutions and with rich diverse features. In this study, the red-edge and shortwave infrared (SWIR) bands were found to be important for land cover mapping, which agrees with similar studies [57,58,59]. The near-infrared band (B8a) was also ranked as an important feature. Surprisingly, Immitzer et al. [20] found that Sentinel-2′s near infrared bands were among the least important channels. However, they also noted that only one set of images was analyzed. Other bands may be of high importance if the images are acquired in different seasons, or if the study area is characterized by different landscapes.

One of the common advantages of the random forest model is its ability to evaluate the importance of variables, which can be used as a basis for feature optimization or dimensionality reduction [56]. In this study, Mean Decrease in Gini (MDG) was used to calculate feature importance. However, the measure of variable importance should be considered with caution since it only represents the feature’s prediction ability for the target variable in the current model. Further study is required to assess the role of variable importance on feature selection and optimization. Useful information can be omitted if features regarded with low importance are excluded in the process of optimum feature selection. Suppose that multiple features are associated with each other and any one of them can be used as an indicator (an excellent feature). Once a feature is selected, the importance of other associated features can drop sharply because the impurity is reduced by the selected feature. It is then difficult to select other features due to the lower impurity. In this approach, only the first feature has a high importance, and the other features tend to have a low importance. This leads to the misunderstanding that the features that were selected first have greater importance, while the other features are not as important. In this study, the NDVI_cv features were ranked as having a low importance. However, the classification results suggest that NDVI_cv can effectively improve the overall classification accuracy, especially for paddy fields and impervious surfaces. Therefore, when applying the random forest model, features should be selected according to both the order of importance and the relationship between different features.

5. Conclusions

This study demonstrates the advantages of using time-series Sentinel-2 NDVI statistical indicators combined with spectral and textural features from Sentinel-2 single-date imagery. This method can be used to accurately map dynamic land cover in cloud-prone tropical areas. The result also revealed that time-series statistical analysis is an effective way to represent temporal information contained in annual land cover variance. A time-series analysis can make use of clear pixels from dense low-quality images and reduce the influence of random factors such as weather conditions on single-date images. The discriminant analysis revealed that the NDVI_mean can effectively differentiate forests from other land categories and reduce the misclassification between farmlands and other land cover types to a large extent. NDVI_cv can also discriminate between impervious surfaces and fallowed farmlands. The overall accuracy and kappa coefficient were respectively 0.8545 and 0.8010 for spectral and textural features. When NDVI_mean and NDVI_cv were integrated, the overall accuracy and kappa coefficient reached 0.8913 and 0.8514, respectively. The time-series statistical analysis has less stringent requirements for analysis. It is also more robust in accounting for seasonal variation and is simple to calculate. The method has great potential when integrated with other features for mapping land cover in tropical cloud-prone areas at large scales with high accuracy.

Author Contributions

Conceptualization, C.H.; methodology, Y.H. and C.Z.; validation, H.L. and C.Z.; writing—original draft preparation, C.H.; writing—review and editing, C.H., Q.L. and A.B.; supervision, F.S. and G.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the CAS Earth Big Data Science Project, grant number. XDA19060302; the National Science Foundation of China, grant number 41561144012, 41661144030; and the Innovation Project of LREIS, grant number O88RA303YA.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bounoua, L.; DeFries, R.; Collatz, G.J.; Sellers, P.; Khan, H. Effects of land cover conversion on surface climate. Clim. Chang. 2002, 52, 29–64. [Google Scholar] [CrossRef]
Herold, M.; Mayaux, P.; Woodcock, C.; Baccini, A.; Schmullius, C. Some challenges in global land cover mapping: An assessment of agreement and accuracy in existing 1 km datasets. Remote Sens. Environ. 2008, 112, 2538–2556. [Google Scholar] [CrossRef]
Running, S.W. Ecosystem disturbance, carbon, and climate. Science 2008, 321, 652–653. [Google Scholar] [CrossRef]
Gómez, C.; White, J.C.; Wulder, M.A. Optical remotely sensed time series data for land cover classification: A review. ISPRS J. Photogramm. Remote Sens. 2016, 116, 55–72. [Google Scholar] [CrossRef]
Deng, C.; Wu, C. The use of single-date MODIS imagery for estimating large-scale urban impervious surface fraction with spectral mixture analysis and machine learning techniques. ISPRS J. Photogramm. Remote Sens. 2013, 86, 100–110. [Google Scholar] [CrossRef]
Shoko, C.; Mutanga, O. Examining the strength of the newly-launched Sentinel 2 MSI sensor in detecting and discriminating subtle differences between C3 and C4 grass species. ISPRS J. Photogramm. Remote Sens. 2017, 129, 32–40. [Google Scholar] [CrossRef]
Csillik, O.; Belgiu, M.; Asner, G.P.; Kelly, M. Object-based time-constrained dynamic time warping classification of crops using Sentinel-2. Remote Sens. 2019, 11, 1257. [Google Scholar] [CrossRef]
Griffiths, P.; Nendel, C.; Hostert, P. Intra-annual reflectance composites from Sentinel-2 and Landsat for national-scale crop and land cover mapping. Remote Sens. Environ. 2019, 220, 135–151. [Google Scholar] [CrossRef]
Hansen, M.C.; Potapov, P.V.; Moore, R.; Hancher, M.; Turubanova, S.A.; Tyukavina, A.; Thau, D.; Stehman, S.; Goetz, S.J.; Loveland, T.R. High-resolution global maps of 21st-century forest cover change. Science 2013, 342, 850–853. [Google Scholar] [CrossRef]
Ludwig, C.; Walli, A.; Schleicher, C.; Weichselbaum, J.; Riffler, M. A highly automated algorithm for wetland detection using multi-temporal optical satellite data. Remote Sens. Environ. 2019, 224, 333–351. [Google Scholar] [CrossRef]
Vuolo, F.; Neuwirth, M.; Immitzer, M.; Atzberger, C.; Ng, W.-T. How much does multi-temporal Sentinel-2 data improve crop type classification? Int. J. Appl. Earth Obs. Geoinf. 2018, 72, 122–130. [Google Scholar] [CrossRef]
Khatami, R.; Mountrakis, G.; Stehman, S.V. A meta-analysis of remote sensing research on supervised pixel-based land-cover image classification processes: General guidelines for practitioners and future research. Remote Sens. Environ. 2016, 177, 89–100. [Google Scholar] [CrossRef]
Parmentier, B.; Eastman, J.R. Land transitions from multivariate time series: Using seasonal trend analysis and segmentation to detect land-cover changes. Int. J. Remote Sens. 2014, 35, 671–692. [Google Scholar] [CrossRef]
Sexton, J.O.; Urban, D.L.; Donohue, M.J.; Song, C. Long-term land cover dynamics by multi-temporal classification across the Landsat-5 record. Remote Sens. Environ. 2013, 128, 246–258. [Google Scholar] [CrossRef]
Shao, Y.; Lunetta, R.S.; Wheeler, B.; Iiames, J.S.; Campbell, J.B. An evaluation of time-series smoothing algorithms for land-cover classifications using MODIS-NDVI multi-temporal data. Remote Sens. Environ. 2016, 174, 258–265. [Google Scholar] [CrossRef]
Guan, X.; Huang, C.; Liu, G.; Meng, X.; Liu, Q. Mapping rice cropping systems in Vietnam using an NDVI-based time-series similarity measurement based on DTW distance. Remote Sens. 2016, 8, 19. [Google Scholar] [CrossRef]
Jia, K.; Liang, S.; Zhang, N.; Wei, X.; Gu, X.; Zhao, X.; Yao, Y.; Xie, X. Land cover classification of finer resolution remote sensing data integrating temporal features from time series coarser resolution data. ISPRS J. Photogramm. Remote Sens. 2014, 93, 49–55. [Google Scholar] [CrossRef]
Loveland, T.R.; Reed, B.C.; Brown, J.F.; Ohlen, D.O.; Zhu, Z.; Yang, L.; Merchant, J.W. Development of a global land cover characteristics database and IGBP DISCover from 1 km AVHRR data. Int. J. Remote Sens. 2000, 21, 1303–1330. [Google Scholar] [CrossRef]
Xiao, X.; Boles, S.; Liu, J.; Zhuang, D.; Liu, M. Characterization of forest types in Northeastern China, using multi-temporal SPOT-4 VEGETATION sensor data. Remote Sens. Environ. 2002, 82, 335–348. [Google Scholar] [CrossRef]
Immitzer, M.; Vuolo, F.; Atzberger, C. First experience with Sentinel-2 data for crop and tree species classifications in central Europe. Remote Sens. 2016, 8, 166. [Google Scholar] [CrossRef]
Rujoiu-Mare, M.-R.; Olariu, B.; Mihai, B.-A.; Nistor, C.; Săvulescu, I. Land cover classification in Romanian Carpathians and Subcarpathians using multi-date Sentinel-2 remote sensing imagery. Eur. J. Remote Sens. 2017, 50, 496–508. [Google Scholar] [CrossRef]
Rapinel, S.; Mony, C.; Lecoq, L.; Clement, B.; Thomas, A.; Hubert-Moy, L. Evaluation of Sentinel-2 time-series for mapping floodplain grassland plant communities. Remote Sens. Environ. 2019, 223, 115–129. [Google Scholar] [CrossRef]
Hościło, A.; Lewandowska, A. Mapping forest type and tree species on a regional scale using multi-temporal Sentinel-2 data. Remote Sens. 2019, 11, 929. [Google Scholar] [CrossRef]
Belgiu, M.; Csillik, O. Sentinel-2 cropland mapping using pixel-based and object-based time-weighted dynamic time warping analysis. Remote Sens. Environ. 2018, 204, 509–523. [Google Scholar] [CrossRef]
Leinenkugel, P.; Wolters, M.L.; Kuenzer, C.; Oppelt, N.; Dech, S. Sensitivity analysis for predicting continuous fields of tree-cover and fractional land-cover distributions in cloud-prone areas. Int. J. Remote Sens. 2014, 35, 2799–2821. [Google Scholar] [CrossRef]
Petitjean, F.; Inglada, J.; Gançarski, P. Satellite image time series analysis under time warping. IEEE Trans. Geosci. Remote Sens. 2012, 50, 3081–3095. [Google Scholar] [CrossRef]
Leinenkugel, P.; Kuenzer, C.; Oppelt, N.; Dech, S. Characterisation of land surface phenology and land cover based on moderate resolution satellite data in cloud prone areas—A novel product for the Mekong Basin. Remote Sens. Environ. 2013, 136, 180–198. [Google Scholar] [CrossRef]
Torbick, N.; Ledoux, L.; Salas, W.; Zhao, M. Regional mapping of plantation extent using multisensor imagery. Remote Sens. 2016, 8, 236. [Google Scholar] [CrossRef]
Zhao, Z.; Liu, G.; Liu, Q.; Huang, C.; Li, H.; Wu, C. Distribution characteristics and seasonal variation of soil nutrients in the Mun River Basin, Thailand. Int. J. Environ. Res. Public Health 2018, 15, 1818. [Google Scholar] [CrossRef]
Guan, X.; Liu, G.; Huang, C.; Meng, X.; Liu, Q.; Wu, C.; Ablat, X.; Chen, Z.; Wang, Q. An Open-Boundary Locally Weighted Dynamic Time Warping Method for Cropland Mapping. ISPRS Int. J. Geo-Inf. 2018, 7, 75. [Google Scholar] [CrossRef]
Chatziantoniou, A.; Psomiadis, E.; Petropoulos, G.P. Co-Orbital Sentinel 1 and 2 for LULC mapping with emphasis on wetlands in a mediterranean setting based on machine learning. Remote Sens. 2017, 9, 1259. [Google Scholar] [CrossRef]
Louis, J.; Debaecker, V.; Pflug, B.; Main-Knorn, M.; Bieniarz, J.; Mueller-Wilm, U.; Cadau, E.; Gascon, F. Sentinel-2 Sen2Cor: L2A processor for users. In Proceedings of the Living Planet Symposium 2016, Prague, Czech, 9–13 May 2016. Spacebooks Online. [Google Scholar]
Exelis, V. ENVI 5.3; Exelis VIS: Boulder, CO, USA, 2015. [Google Scholar]
Yu, L.; Gong, P. Google Earth as a virtual globe tool for Earth science applications at the global scale: Progress and perspectives. Int. J. Remote Sens. 2012, 33, 3966–3986. [Google Scholar] [CrossRef]
Price, M.H. Mastering ArcGIS; McGraw-Hill: New York, NY, USA, 2010. [Google Scholar]
Haralick, R.M.; Shanmugam, K.; Dinstein, I.H. Textural features for image classification. IEEE Trans. Syst. ManCybern. 1973, 610–621. [Google Scholar] [CrossRef]
Jensen, J.R. Introductory Digital Image Processing: A Remote Sensing Perspective, 4th ed.; Pearson Education: Glenview, IL, USA, 2016. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
Rodriguez-Galiano, V.F.; Ghimire, B.; Rogan, J.; Chica-Olmo, M.; Rigol-Sanchez, J.P. An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J. Photogramm. Remote Sens. 2012, 67, 93–104. [Google Scholar] [CrossRef]
Li, M.; Ma, L.; Blaschke, T.; Cheng, L.; Tiede, D. A systematic comparison of different object-based classification techniques using high spatial resolution imagery in agricultural environments. Int. J. Appl. Earth Obs. Geoinf. 2016, 49, 87–98. [Google Scholar] [CrossRef]
Warmerdam, F. The Geospatial Data Abstraction Library, in Open Source Approaches in Spatial Data Handling; Springer: Berlin/Heidelberg, Germany, 2008; pp. 87–104. [Google Scholar]
Gislason, P.O.; Benediktsson, J.A.; Sveinsson, J.R. Random forests for land cover classification. Pattern Recognit. Lett. 2006, 27, 294–300. [Google Scholar] [CrossRef]
Foody, G.M. Status of land cover classification accuracy assessment. Remote Sens. Environ. 2002, 80, 185–201. [Google Scholar] [CrossRef]
Congalton, R.G. A review of assessing the accuracy of classifications of remotely sensed data. Remote Sens. Environ. 1991, 37, 35–46. [Google Scholar] [CrossRef]
Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
Brown, J.C.; Kastens, J.H.; Coutinho, A.C.; de Castro Victoria, D.; Bishop, C.R. Classifying multiyear agricultural land use data from Mato Grosso using time-series MODIS vegetation index data. Remote Sens. Environ. 2013, 130, 39–50. [Google Scholar] [CrossRef]
Basnet, B.; Vodacek, A. Tracking land use/land cover dynamics in cloud prone areas using moderate resolution satellite data: A case study in Central Africa. Remote Sens. 2015, 7, 6683–6709. [Google Scholar] [CrossRef]
Ju, J.; Roy, D.P. The availability of cloud-free Landsat ETM+ data over the conterminous United States and globally. Remote Sens. Environ. 2008, 112, 1196–1211. [Google Scholar] [CrossRef]
Miettinen, J.; Liew, S.C. Separability of insular Southeast Asian woody plantation species in the 50 m resolution ALOS PALSAR mosaic product. Remote Sens. Lett. 2011, 2, 299–307. [Google Scholar] [CrossRef]
Culbert, P.D.; Pidgeon, A.M.; Louis, V.S.; Bash, D.; Radeloff, V.C. The impact of phenological variation on texture measures of remotely sensed imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2009, 2, 299–309. [Google Scholar] [CrossRef]
Dian, Y.; Li, Z.; Pang, Y. Spectral and texture features combined for forest tree species classification with airborne hyperspectral imagery. J. Indian Soc. Remote Sens. 2015, 43, 101–107. [Google Scholar] [CrossRef]
Xiao, X.; Boles, S.; Liu, J.; Zhuang, D.; Frolking, S.; Li, C.; Salas, W.; Moore, B., III. Mapping paddy rice agriculture in southern China using multi-temporal MODIS images. Remote Sens. Environ. 2005, 95, 480–492. [Google Scholar] [CrossRef]
Park, S.; Im, J.; Park, S.; Yoo, C.; Han, H.; Rhee, J. Classification and mapping of paddy rice by combining Landsat and SAR time series data. Remote Sens. 2018, 10, 447. [Google Scholar] [CrossRef]
Wang, D.; Wan, B.; Qiu, P.; Su, Y.; Guo, Q.; Wang, R.; Sun, F.; Wu, X. Evaluating the performance of sentinel-2, landsat 8 and pléiades-1 in mapping mangrove extent and species. Remote Sens. 2018, 10, 1468. [Google Scholar] [CrossRef]
Zhu, J.; Pan, Z.; Wang, H.; Huang, P.; Sun, J.; Qin, F.; Liu, Z. An Improved Multi-temporal and Multi-feature Tea Plantation Identification Method Using Sentinel-2 Imagery. Sensors 2019, 19, 2087. [Google Scholar] [CrossRef] [PubMed]
Ramoelo, A.; Cho, M.; Mathieu, R.; Skidmore, A.K. Potential of Sentinel-2 spectral configuration to assess rangeland quality. J. Appl. Remote Sens. 2015, 9, 094096. [Google Scholar] [CrossRef]
Schultz, B.; Immitzer, M.; Formaggio, A.R.; Sanches, I.D.A.; Luiz, A.J.B.; Atzberger, C. Self-guided segmentation and classification of multi-temporal Landsat 8 images for crop type mapping in Southeastern Brazil. Remote Sens. 2015, 7, 14482–14508. [Google Scholar] [CrossRef]
Schuster, C.; Förster, M.; Kleinschmit, B. Testing the red edge channel for improving land-use classifications based on high-resolution multi-spectral satellite data. Int. J. Remote Sens. 2012, 33, 5583–5599. [Google Scholar] [CrossRef]

Figure 1. Study area.

Figure 2. Mean of the annual time-series NDVI (NDVI_mean) for (a) natural forest, (b) plantation forest, (c) paddy field, (d) dry land, (e) impervious surface, (f) water body, and (g) wetland.

Figure 3. Coefficient of variation of the annual time-series NDVI (NDVI_cv) for (a) natural forest, (b) plantation forest, (c) paddy field, (d) dry land, (e) impervious surface, (f) water body, and (g) wetland.

Figure 4. Classification results for (a) SB, (b) SBT, (c) SBTM, (d) SBTC, and (e) SBTMC combinations.

Figure 5. Cloud coverage of time-series Sentinel-2 imagery in the study area in 2017.

Figure 6. The complex spectral characteristics of land cover types in the study area. (a) Rice growing at different stages. The red, yellow and white color rectangles denote flooding, growing, and maturing phases. (b) Different spectral reflectance of rice at different stages. (c) Similar spectral reflectance between fallowed farmlands (red color rectangles) and impervious surfaces (yellow color rectangles).

Table 1. Results of the feature importance measure.

No.	Features	Importance	OOB accuracy	No.	Features	Importance	OOB accuracy
1	NIR-2	0.10050	0.72482	12	NIR-1	0.04578	0.99751
2	Red edge-1	0.09822	0.91278	13	NDVI_cv	0.04481	0.99748
3	Red	0.07824	0.94389	14	Red edge-2	0.04001	0.99788
4	NDVI_mean	0.07004	0.98054	15	ENT _2	0.02297	0.99788
5	Green	0.06544	0.98764	16	ASM _1	0.01926	0.99788
6	MEAN_1	0.06429	0.99484	17	ASM _2	0.01536	0.99777
7	SWIR-1	0.06116	0.99580	18	CON _2	0.01193	0.99788
8	Red edge-3	0.06111	0.99696	19	ENT _1	0.01141	0.99799
9	MEAN_2	0.05809	0.99699	20	COR_2	0.01046	0.99803
10	SWIR-2	0.05230	0.99710	21	COR_1	0.01029	0.99807
11	Blue	0.04825	0.99740	22	CON_1	0.01008	0.99803

Note: The subscript 1 and 2 denote the texture indices of PC1 and PC2, respectively.

Table 2. Confusion matrix for SB classification.

	Natural forest	Plantation forest	Paddy field	Dry field	Water body	Wetland	Impervious surface	Producer’s accuracy
Natural forest	900	36	45	28			18	87.63%
Plantation forest	69	479	34	27			2	78.40%
Paddy field	45	31	1427	105			116	82.77%
Dry field	22	28	33	241			5	73.25%
Water body			13		76		1	84.44%
Wetland	1		11		5	14		45.16%
Impervious surface	6		87	37			188	59.12%
User’s accuracy	86.29%	83.45%	86.48%	55.02%	93.83%	100.00%	56.97%

Overall accuracy: 0.8051; kappa coefficient: 0.7347

Table 3. Confusion matrix for SBT classification.

	Natural forest	Plantation forest	Paddy field	Dry field	Water body	Wetland	Impervious surface	Producer’s accuracy
Natural forest	917	17	57	34			2	89.29%
Plantation forest	66	494	25	26				80.85%
Paddy field	58	7	1516	91			52	87.94%
Dry field	21	22	25	250			11	75.99%
Water body			3		85		2	94.44%
Wetland			20		2	9		29.03%
Impervious surface	5		41	14			258	81.13%
User’s accuracy	85.94%	91.48%	89.86%	60.24%	97.70%	100.00%	79.38%

Overall accuracy: 0.8545; kappa: 0.8010

Table 4. Confusion matrix for SBTM classification.

	Natural forest	Plantation forest	Paddy field	Dry field	Water body	Wetland	Impervious surface	Producer’s accuracy
Natural forest	916	23	42	44			2	89.19%
Plantation forest	59	501	21	30				82.00%
Paddy field	35	2	1552	77			58	90.02%
Dry field	21	22	25	256			5	77.81%
Water body			2		85		3	94.44%
wetland			15		5	11		35.48%
Impervious surface	4		36	12			266	83.65%
User’s accuracy	88.50%	91.42%	91.67%	61.10%	94.44%	100.00%	79.64%

Overall accuracy: 0.8685; kappa: 0.8204

Table 5. Confusion matrix for SBTC classification.

	Natural forest	Plantation forest	Paddy field	Dry field	Water body	Wetland	Impervious surface	Producer’s accuracy
Natural forest	930	27	50	10			10	90.56%
Plantation forest	70	515	20	6				84.29%
Paddy field	74	4	1512	86			48	87.70%
Dry field	20	24	26	253			6	76.90%
Water body			4	1	84		1	93.33%
Wetland			9		11	11		35.48%
Impervious surface	6		11	7			294	92.45%
User’s accuracy	84.55%	90.35%	92.65%	69.70%	88.42%	100.00%	81.89%

Overall accuracy: 0.8714; kappa: 0.8249

Table 6. Confusion matrix for SBTMC classification.

	Natural forest	Plantation forest	Paddy field	Dry field	Water body	Wetland	Impervious surface	Producer’s accuracy
Natural forest	941	26	38	17			5	91.63%
Plantation forest	68	514	19	10				84.12%
Paddy field	45	4	1569	57			49	91.01%
Dry field	19	20	28	258			4	78.42%
Water body			3		85		2	94.44%
Wetland			10		5	16		51.61%
Impervious surface	5		10	5			298	93.71%
User’s Accuracy	87.29%	91.13%	93.56%	74.35%	94.44%	100.00%	83.24%

Overall accuracy: 0.8913; kappa: 0.8514

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Land Cover Mapping in Cloud-Prone Tropical Areas Using Sentinel-2 Data: Integrating Spectral Features with Ndvi Temporal Dynamics

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data and Preprocessing

2.2.1. Sentinel-2 Single-Date Image Data

2.2.2. Stacking Time-Series Sentinel-2 NDVI

2.2.3. Ground Survey Data and Sample Datasets

2.3. Methods

2.3.1. Spectral and Textural Feature Analysis

2.3.2. Time-Series NDVI Statistical Indicator Analysis

2.3.3. Random Forest Classification

2.3.4. Accuracy Assessment

3. Results

3.1. Separability of NDVI_mean and NDVI_cv for Different Land Cover Types

3.1.1. NDVI_mean

3.1.2. NDVI_cv

3.2. The Importance of Different Features

3.3. Comparison of Classification Results from Different Feature Combinations

3.3.1. Spectral Bands Only (SB)

3.3.2. Spectral Bands and Textural Features (SBT)

3.3.3. SBT+NDVI_mean (SBTM) and SBT+NDVI_cv (SBTC)

3.3.4. SBT+NDVI_mean+NDVI_cv (SBTMC)

4. Discussion

4.1. The Advantages of Statistical Indicators for Land Cover Classification in Cloud-Prone Regions

4.2. Difficulty in Land Cover Classification at Fine Scales

4.2.1. Natural Forest vs. Plantation Forest

4.2.2. Paddy Field vs. Dry Field

4.2.3. Wetlands vs. Others

4.3. Uncertainty in Feature Importance Measures of the Random Forest Model

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics