Automatic High-Resolution Land Cover Production in Madagascar Using Sentinel-2 Time Series, Tile-Based Image Classiﬁcation and Google Earth Engine

: Madagascar, one of Earth’s biodiversity hotpots, is characterized by heterogeneous landscapes and huge land cover change. To date, ﬁne, reliable and timely land cover information is scarce in Madagascar. However, mapping high-resolution land cover map in the tropics has been challenging due to limitations associated with heterogeneous landscapes, the volume of satellite data used, and the design of methodology. In this study, we proposed an automatic approach in which the tile-based model was used on each tile (deﬁning an extent of 1 ◦ × 1 ◦ as a tile) for mapping land cover in Madagascar. We combined spectral-temporal, textural and topographical features derived from all available Sentinel-2 observations (i.e., 11,083 images) on Google Earth Engine (GEE). We generated a 10-m land cover map for Madagascar, with an overall accuracy of 89.2% based on independent validation samples obtained from a ﬁeld survey and visual interpretation of very high-resolution (0.5–5 m) images. Compared with the conventional approach (i.e., the overall model used in the entire study area), our method enables reduce the misclassiﬁcations between several land cover types, including impervious land, grassland and wetland. The proposed approach demonstrates a great potential for mapping land cover in other tropical or subtropical regions. Author Contributions: Conceptualization, M.Z. and P.G.; methodology, M.Z.; software, M.Z.; validation, M.Z.; formal analysis, M.Z., H.H., Z.L. and P.G.; data curation, M.Z., Z.L., R.L.A., T.N.A.N.R., and Y.L.; writing—original preparation, M.Z.; writing—review and editing, M.Z., Z.L., K.O.H., H.H.,


Introduction
The characteristics of the earth's surface are determined by the dominant land cover categories such as impervious surface, vegetation, water, soil and permanent snow and glaciers [1]. Land cover maps provide fundamental information of the surface of the land. Accurate, timely land cover information is a In this study, we used an automatic land cover mapping approach and designed the tile-based model to classify the corresponding region, which integrated high-resolution Sentinel-2 imagery in GEE to improve the 10-m resolution land cover mapping for the highly heterogonous landscapes. We produced a 10-m land cover map of Madagascar for circa 2018. We then compared the performance of our proposed approach to that of the conventional method (i.e., one model is applied to the entire study area) using qualitative and quantitative approaches. We also explored the performance of employing big data in mapping highly heterogeneous areas. Finally, we estimated the performance of the finished land cover map by comparing it with the previous land cover maps of Madagascar.

Materials and Methods
A comprehensive overview of the methodology used for mapping heterogeneous landscape in Madagascar is shown in Figure 1. The sample collection, image preprocessing, feature selection, classification and post classification steps are presented in the following sub-sections.

Sampling Strategy and Classification Scheme
Collecting accurate reference samples is a prerequisite to performing accurate land cover classification [39]. Insufficient and unrepresentative samples have been recognized as the main source of error in land cover classification [30]. In this study, the training and validation datasets were sampled independently using a hexagon random sampling strategy (see Figure 2) based on the prior knowledge provided by our extensive field campaigns and local experts. The hexagon-based equalarea random sampling scheme guaranteed the random and even distribution of different land cover categories in Madagascar. We randomly selected ~10 points of the dominant land cover type within

Sampling Strategy and Classification Scheme
Collecting accurate reference samples is a prerequisite to performing accurate land cover classification [39]. Insufficient and unrepresentative samples have been recognized as the main source of error in land cover classification [30]. In this study, the training and validation datasets were sampled independently using a hexagon random sampling strategy (see Figure 2) based on the prior knowledge provided by our extensive field campaigns and local experts. The hexagon-based equal-area random sampling scheme guaranteed the random and even distribution of different land Remote Sens. 2020, 12, 3663 4 of 15 cover categories in Madagascar. We randomly selected~10 points of the dominant land cover type within each hexagon cell as training samples, but in order to balance the proportion of land cover types with small areas such as impervious area, bare land, and wetland, several hexagons had unequal point numbers. All samples were selected uniformly across the entire island to avoid spatial correlation. For accuracy assessment, we selected random samples from sub-meter to 5-m very high-spatial resolution imagery (VHRI) as an independent validation set. Using geometry tools, samples were drawn in earth engine code editor through visual interpretation. Finally, the total numbers of training and validation samples were 9220 and 1278, respectively. The classification scheme included the following land cover classes: cropland, forest, grassland, shrubland, wetland, waterbody, impervious land land and bare land, and their classification descriptions are presented in Table 1.
Remote Sens. 2020, 12, x FOR PEER REVIEW 4 of 16 each hexagon cell as training samples, but in order to balance the proportion of land cover types with small areas such as impervious area, bare land, and wetland, several hexagons had unequal point numbers. All samples were selected uniformly across the entire island to avoid spatial correlation. For accuracy assessment, we selected random samples from sub-meter to 5-m very high-spatial resolution imagery (VHRI) as an independent validation set. Using geometry tools, samples were drawn in earth engine code editor through visual interpretation. Finally, the total numbers of training and validation samples were 9220 and 1278, respectively. The classification scheme included the following land cover classes: cropland, forest, grassland, shrubland, wetland, waterbody, impervious land land and bare land, and their classification descriptions are presented in Table 1.

Forest
Areas where tree cover percentage classification to >15%; limits tree height classification to >3 m.

Grassland
Grassland for grazing and natural grassland are identifiable. Herbaceous cover percentage classification to >15%.

Shrubland
Areas characterized by a texture finer than tree canopies but coarser than grasslands, height between 5 and 0.3 m, and cover percentage classification to >15%.

Wetland
Areas dominated by natural and semi-natural aquatic or regularly flooded vegetation.

Image Processing and Feature Collection
The remotely sensed data are from the Sentinel 2 Surface Reflectance imagery archive available on GEE's data pool, with 10-m spatial resolution and 5-day temporal resolution. The GEE's cloud screening algorithm based on quality assessment bands (QA60) was applied in order to remove cloud and cloud shadow contaminated pixels for each of the Sentinel scenes covering Madagascar. All available images during the period of 2017-2019 (i.e., 11083 images in total) were used to generate the circa 2018 wall-to-wall cloud-free mosaics in order to reduce the influence of frequent cloud and less effective observation in Madagascar. JAXA's ALOS 30 m Digital Elevation Models (DEM) [40] are used to characterize the topographic characteristics of Madagascar.
We extracted the spectral-temporal features derived from the time series of Sentinel 2 observation and DEM data. Based on the surface reflectance bands, we calculated five spectral indices, including the Normalized Difference Vegetation Index (NDVI), the Enhanced Vegetation Index (EVI), the Normalized Difference Buildup Index (NDBI), the Normalized Difference Water Index (NDWI), and the Normalized Burn Ratio Index (NBRI). Moreover, we computed three terrain features (i.e., elevation, slope and hillshade) based on SRTM data. A multi-temporal data stack was generated for each of the ten reflectance bands (i.e., B2-B8, B8A, B11, and B12), the four spectral indices using all available images, and the terrain features. Subsequently, for each pixel, the 10th, 25th, 50th, 75th, 90th and (25th-75th) percentiles were calculated. This approach of incorporating spectral-temporal metrics not only solves the problems of data gaps related to clouds, but also adds specific temporal information to the data, and can be helpful when identifying complex land cover classes such as cropland [35,37,41], especially for Madagascar.

Forest
Areas where tree cover percentage classification to >15%; limits tree height classification to >3 m.

Grassland
Grassland for grazing and natural grassland are identifiable. Herbaceous cover percentage classification to >15%.

Shrubland
Areas characterized by a texture finer than tree canopies but coarser than grasslands, height between 5 and 0.3 m, and cover percentage classification to >15%.

Wetland
Areas dominated by natural and semi-natural aquatic or regularly flooded vegetation.
Waterbody Areas dominated by natural waterbodies/artificial waterbodies.

Impervious
Areas dominated by artificial surfaces and associated area(s), primarily based on artificial cover such as asphalt, concrete, sand and stone, brick, glass, and other cover materials.
Bare land Areas where vegetation is hardly observable but dominated by exposed soil, sand, gravel, and rock backgrounds.
We also computed five textural features of NDVI based on gray-level co-occurrence matrix (GLCM) to enlarge the discrepancy of different land cover categories, including contrast, which measures the local contrast of an image; correlation, measuring the correlation between pairs of pixels; inverse difference moment, measuring the homogeneity; entropy, measuring the randomness of a gray-level distribution; and angular second moment, which measures the number of repeated pairs. In addition, three terrain factors-elevation, hillshade and slope, were included into the feature set.

Classification and Accuracy Assessment
Due to the enormous size of the 10-m Sentinel-2 time series data over the entire area of Madagascar, it is essential to use a powerful calculation tool, the GEE cloud platform, for image processing, feature collection, and classification. With regard to the highly heterogeneous land cover in Madagascar, we designed the tile-based classification model to carry out the classification task of corresponding tiles (defining an extent of 1 • × 1 • as a tile) to solve the difficulty in mapping tropically heterogeneous landscapes over the large spatial scale. Specifically, we first generated the 1 • × 1 • tiles over the whole island, resulting in 75 tiles. Each tile-based model was trained using the training samples within this tile and its neighboring 8 tiles (i.e., 3 • × 3 • , see Figure 1), which allowed each tile-based model to have enough samples to be calibrated, subsequently ensuring its predictive ability. Then, we carried out the classification procedure of the corresponding tile. The procedure was repeated for each tile in the study area, which made the classification model more specific to the corresponding areas with larger differences of land cover types. The random forest (RF) classifier was chosen to implement the classification task in this study, with a tree number of 200, as the RF algorithm is able to process massive high-dimensional data while maintaining high accuracy [42,43], and it is also resistant to noise and overfitting issues [44]. The parameters of RF have little influence on classification accuracy [45], and the number of trees was thus adjusted in this study.
For evaluating the predictive accuracy across the classes as well as the overall accuracy, a confusion matrix was produced using the independent validation data, which enable the provision of a cross-tabulation of the validation samples against the corresponding mapped pixel classes. This allowed for an assessment of the producer accuracy (the number of correctly classified pixels divided by the total number of true pixels in a given class) and the user accuracy (the number of correctly classified pixels divided by the total predicted pixels within that class), which were calculated to assess the accuracy of the final land cover map.

Comparison Analysis among Products and Methods
We compared our produced land cover map for Madagascar (namely the MDG LC-10 map) with two existing high-resolution land cover products of Madagascar for evaluating the proposed approach in this study, including the S2 prototype land cover 20- . This product has a legend of eleven types and accuracy varied from 44% (for South Africa) to 91% (for Gabon) [15]. The FROM-GLC10 dataset was produced by the Tsinghua University using Sentinel-2 images to generate a 10-m resolution global land cover map [34]. The classification system of FROM-GLC10 includes ten land cover classes that are described in Gong, et al. [46], and its overall accuracy is 72.76%. Furthermore, we compared our MDG LC-10 map with the sub-meter to 5-m very high spatial resolution imagery from Google Earth, representing the real land cover surface.
In addition, we also used qualitative and quantitative approaches to evaluate the performance of the proposed method in this study. In general, a classification model is used to classify land cover types of the whole study area (namely "the overall model"). We compared our proposed method (i.e., the one tile-based model is applied to one tile) with the method used by the overall model (i.e., one model is applied to the entire study area).

Ten-Meter Circa 2018 LC Map of Madagascar
We produced a contemporary land cover map of Madagascar at 10-m spatial resolution for circa 2018 using Sentinel-2 big data and tile-based random forest classification implemented in the GEE platform (Figure 1), and zoomed in on major locations in Madagascar to illustrate the mapping performance (Figure 3b-d). This land cover map was produced over eight dominant classes, with a good performance in terms of spatial consistency on a national scale. An accuracy assessment was implemented using the independent validation dataset. Based on the confusion matrix shown in Table 2, the overall accuracy and Kappa of the resultant map were 89.2% and 0.87, respectively, demonstrating good potential of the proposed approach for automatic land cover mapping for highly heterogeneous landscapes. As shown in Figure 3a, the proportions of eight land cover classes over the whole island were as follows: cropland, 7.2 %; forest, 21.7%; grassland, 52.3%; shrubland, 17.2%; wetland, 0.3%; waterbody, 0.7%; impervious land, 0.2%; and bare land, 0.5%. Due to large heterogeneity of tropical vegetation, it is difficult to map the fine and detailed land cover in Madagascar using coarser spatial resolution satellite data (e.g., Landsat and MODIS images). Previous studies have proved that the Sentinel-2 derived land cover classifications have an advantage in mapping extensive and smaller-sized cover types [34,47,48], especially for tropical regions. Our results also provided evidence that Sentinel-2 imagery can be applied to accurately delineate the complex and heterogeneous land cover types in Madagascar. Although Sentinel-2 satellite imagery is good for monitoring land surface at a higher spatial resolution, the accuracy assessment indicated that croplands are easily confused with grassland, and grassland is more likely to be misclassified as bare land (Table 2).
is good for monitoring land surface at a higher spatial resolution, the accuracy assessment indicated that croplands are easily confused with grassland, and grassland is more likely to be misclassified as bare land (

Comparisons among Google Earth Images, Two Available High-Resolution Land Cover Maps of Madagascar and the MDG LC-10 Land Cover Map
It was observed that there was a significant improvement in the MDG LC-10 map produced in this study when compared with two available high-resolution land cover maps (i.e., the CCI Africa LC-20 map and the FROM-GLC10 map) and very high spatial resolution imageries from Google Earth. For instance, plenty of misclassified croplands occurred in forest-dominant areas in the CCI Africa LC-20 map (Figure 4a), which were improved in the FROM-GLC10 map but not completely solved. However, this phenomenon was well avoided in the MDG LC-10 map. This can be explained by the use of the tile-based model, which effectively prevented the occurrence of some specific land cover types such as cropland and impervious areas that are not originally available in one tile (i.e., an extent of 1 • × 1 • ). The strength of our method is also reflected in the way it addresses the misclassified imperious type in the other land cover maps. As an example, in Figure 4b, the FROM-GLC10 map wrongly classified lots of grassland areas as impervious surface. Furthermore, it is evident in Figure 3b that the wetland class was well identified in the MDG LC-10 map, while the CCI Africa LC-20 map and the FROM-GLC10 map failed. Moreover, Figure 4c demonstrates that the MDG LC-10 map provided more detailed and accurate extents of impervious areas than the other land cover maps. This is likely the result Remote Sens. 2020, 12, 3663 8 of 15 of introducing the texture features in our approach. Moreover, we also quantified the performances between our map and other maps using the independent validation samples, and then compared their producer accuracies. Figure 5 indicates that the producer accuracies of the eight land cover classes of the MDG LC-10 map (average PA = 88.2%) outperformed those of the CCI Africa LC-20 map (average producer accuracy (PA) = 60.8%) and the FROM-GLC10 map (average PA = 65.9%). Based on these comparisons, we demonstrate that the 10-m resolution land cover map generated in this study (i.e., the MDG LC-10 map) is able to provide a more reliable land cover information than the FROM-GLC10 map and more detailed land cover information than the CCI Africa LC-20 map.

Comparisons of the Overall Model vs. the Tile-Based Model
The advantages of the proposed method in this study are mainly highlighted in two aspects: (1) the use of tile-based model makes the classification more specific to those regions with greater heterogeneity; (2) the employment of big data from GEE provides more observation information, which is used for delineating more detailed characteristics between different land cover classes.
In large-scale land cover mapping, the misclassification of land cover types with similar spectra (e.g., cropland and grassland) is common, and is more severe when classifying areas with high spatial heterogeneity and fragmented landscapes such as Madagascar. The proposed approach in this study demonstrated great potential in addressing this problem. This is because the tile-based model employed in this study made the classification more specific and targeted different heterogeneous landscapes. However, the conventional approach, which adopts a overall classification model, does not possess this advantage. It is evident from Figure 6 that, when the conventional method is used, incorrectly identified cropland (Figure 6b) and impervious land (Figure 6e) classes occur in the results, while the proposed methodology correctly identifies these classes (Figure 6c,f). This shows that the proposed method has great potential to improve the phenomenon of misclassification in the two land cover types-cropland and impervious area.
In addition to the visual comparison, a statistical comparison was made between the land cover maps produced using our proposed method and the conventional method. In this regard, the maps were evaluated based on the independent validation samples described in Section 2.1. Figure 7 represents the producer accuracies (PAs) of the eight different land cover classes identified in the classification results of our proposed method (i.e., the one tile-based model is applied to one tile) and the method based on a overall model (i.e., one model is applied to the entire study area). We found that the PAs of all land cover classes from our proposed method, except for forest, were higher than that for the conventional method. The average PA for our proposed method was 88.2%, while the traditional method had a lower average accuracy of 84.8%. The highest difference in PA was observed for waterbody, followed by wetland and bare land, indicating the advantage of employing a tile-based model for accurately mapping the frequently changing land cover types. For the other classes, the differences in PA varied between 1% and 6%, demonstrating the efficiency of applying the tile-based model for the Madagascar-wide land cover mapping.

Comparisons of the Overall Model vs. the Tile-Based Model
The advantages of the proposed method in this study are mainly highlighted in two aspects: 1) the use of tile-based model makes the classification more specific to those regions with greater heterogeneity; 2) the employment of big data from GEE provides more observation information, which is used for delineating more detailed characteristics between different land cover classes.
In large-scale land cover mapping, the misclassification of land cover types with similar spectra (e.g., cropland and grassland) is common, and is more severe when classifying areas with high spatial heterogeneity and fragmented landscapes such as Madagascar. The proposed approach in this study demonstrated great potential in addressing this problem. This is because the tile-based model employed in this study made the classification more specific and targeted different heterogeneous landscapes. However, the conventional approach, which adopts a overall classification model, does not possess this advantage. It is evident from Figure 6 that, when the conventional method is used, incorrectly identified cropland ( Figure 6b) and impervious land (Figure 6e) classes occur in the results, while the proposed methodology correctly identifies these classes (Figure 6c,f). This shows that the proposed method has great potential to improve the phenomenon of misclassification in the two land cover types-cropland and impervious area. In addition to the visual comparison, a statistical comparison was made between the land cover maps produced using our proposed method and the conventional method. In this regard, the maps were evaluated based on the independent validation samples described in Section 2.1. Figure 7 represents the producer accuracies (PAs) of the eight different land cover classes identified in the The use of big data from GEE also exhibited an advantage in mapping land cover in Madagascar. Based on a visual interpretation and comparison with high-resolution satellite images, it was observed that the proposed big data processing method in this study provided a visually satisfactory depiction of nearly all classes (Figure 8). The results showed that there were lots of misclassified impervious area in the classification results using the Sentinel-2 images with cloud cover below 10% (i.e., 4900 images) as input, while a large improvement was observed when all available Sentinel-2 images covering Madagascar (i.e., 11,083 images) were used as input (Figure 8a-c). Moreover, we also found that the usage of big data enhanced the ability to differentiate bare land and shrubland as well as grassland, compared with the fewer Sentinel-2 images used in the classification (Figure 8d,e). This main reason for this observation is that long-term and dense observations are able to provide seasonal phenological information. Our results demonstrate that the use of big data for land cover mapping makes the classification results finer and more accurate, which highlights the advantages of GEE in providing a large volume satellite imagery pool and data processing ability for mapping tropically heterogeneous land cover. The usage and processing of large amounts of satellite-based data is becoming more difficult as the spatial resolution of images from new sensors increases [26,49,50]. Furthermore, for tropical regions, the usable and efficient observations are fewer than other regions of the world mainly due to frequent cloud cover. GEE applied here overcame the computational challenge of handling large earth observation data, which is favorable for tropical land cover mapping due to its fewer surface observations. The use of big data from GEE also exhibited an advantage in mapping land cover in Madagascar. Based on a visual interpretation and comparison with high-resolution satellite images, it was observed that the proposed big data processing method in this study provided a visually satisfactory depiction of nearly all classes (Figure 8). The results showed that there were lots of misclassified impervious area in the classification results using the Sentinel-2 images with cloud cover below 10% (i.e., 4900 images) as input, while a large improvement was observed when all available Sentinel-2 images covering Madagascar (i.e., 11,083 images) were used as input (Figure 8a-c). Moreover, we also found that the usage of big data enhanced the ability to differentiate bare land and shrubland as well as grassland, compared with the fewer Sentinel-2 images used in the classification (Figure 8d,e). This main reason for this observation is that long-term and dense observations are able to provide seasonal phenological information. Our results demonstrate that the use of big data for land cover mapping makes the classification results finer and more accurate, which highlights the advantages of GEE in providing a large volume satellite imagery pool and data processing ability for mapping tropically heterogeneous land cover. The usage and processing of large amounts of satellite-based data is becoming more difficult as the spatial resolution of images from new sensors increases [26,49,50]. Furthermore, for tropical regions, the usable and efficient observations are fewer than other regions of the world mainly due to frequent cloud cover. GEE applied here overcame the computational challenge of handling large earth observation data, which is favorable for tropical land cover mapping due to its fewer surface observations. Remote Sens. 2020, 12, x FOR PEER REVIEW 13 of 16

Conclusions
In this study, we developed a novel classification framework for highly heterogonous landscape, in which the tile-based model was used to classify the complex land cover types of Madagascar. Using a combination of field survey and satellite observation data, we produced the first Madagascar land cover map with a 10-m spatial resolution. The GEE platform provided the tools to the handle high computation requirements and to extract spectral-temporal features from the large time-series images across the spatial extent of the country. The overall accuracy of 89.2% obtained for the map demonstrates the high potential of the proposed method for producing land cover maps in tropical regions. Moreover, by comparing the land cover map produced in this study with those previously generated by CCI and Tsinghua University, the proposed method clearly presents a significantly improved land cover map (i.e., MDG LC-10) of Madagascar. This map will greatly contribute to many fields and, thus, can be incorporated into various applications, such as the inventory of land use status and biodiversity conservation. Additionally, the methodology presented in this study offers an efficient approach for producing Madagascar-wide land cover maps and also makes it possible to automatically produce annual up-to-date country-wide land cover maps, enabling users (e.g., land managers and policy makers) to investigate the dynamic of land cover over longer time periods. We recommend this approach especially for regions dominated by heterogeneous land cover classes and frequent cloud cover.

Conclusions
In this study, we developed a novel classification framework for highly heterogonous landscape, in which the tile-based model was used to classify the complex land cover types of Madagascar. Using a combination of field survey and satellite observation data, we produced the first Madagascar land cover map with a 10-m spatial resolution. The GEE platform provided the tools to the handle high computation requirements and to extract spectral-temporal features from the large time-series images across the spatial extent of the country. The overall accuracy of 89.2% obtained for the map demonstrates the high potential of the proposed method for producing land cover maps in tropical regions. Moreover, by comparing the land cover map produced in this study with those previously generated by CCI and Tsinghua University, the proposed method clearly presents a significantly improved land cover map (i.e., MDG LC-10) of Madagascar. This map will greatly contribute to many fields and, thus, can be incorporated into various applications, such as the inventory of land use status and biodiversity conservation. Additionally, the methodology presented in this study offers an efficient approach for producing Madagascar-wide land cover maps and also makes it possible to automatically produce annual up-to-date country-wide land cover maps, enabling users (e.g., land managers and policy makers) to investigate the dynamic of land cover over longer time periods. We recommend this approach especially for regions dominated by heterogeneous land cover classes and frequent cloud cover.