Effect of Using Different Amounts of Multi-Temporal Data on the Accuracy: A Case of Land Cover Mapping of Parts of Africa Using FengYun-3C Data

Regional or continental-scale land cover mapping requires various amounts of months of multi-temporal satellite data to pick phenological variation in vegetation, enhancing differentiability among surface cover types and improving accuracy. However, little has been addressed about the number of months/multi-temporal images needed to obtain the best result and the impact of using different amounts of these data on the accuracy of individual classes. This work aimed to analyze these effects by utilizing the various amounts of months of time series FengYun-3C (FY-3C) data within one year for land cover mapping of parts of Africa using a random forest classifier. The study area covers roughly one-third of Africa, including eastern, central, and northern parts of the continent. One-year FY-3C ten-day composite images consisting of eleven-band each with 1-km spatial resolution were divided into seven input datasets that comprise stacked images of 1-month, 3-month, 6-month, consecutive 9-month, 12-month, selected images from 12 months using band/feature importance, and selected 9-month. Comparisons of these datasets on independent test samples revealed that overall accuracy, kappa coefficient, and the accuracy of the individual classes generally increase significantly with increasing the number of data/months. However, the highest accuracy and kappa coefficient, 0.86 and 0.83, were obtained by processing selected 9-month imageries. The second maximum accuracy and kappa (0.85 and 0.82,) were found by manipulating 12-month scenes which are the same as the results obtained by applying feature reduction. Although 4% and 5% higher accuracy were achieved by manipulating 3-month and 6-month data relative to 1-month imageries, no variation of accuracy was observed between six- and nine-months of consecutive data, both yielding equal accuracy and kappa value (0.84 and 0.81) indicating redundancy of information. Overall, the high accuracy results show the feasibility of FY-3C data for land cover mapping of Africa.


Introduction
Land cover (LC), or the composition and characteristics of land surface elements, is critical environmental data. It is essential for several scientific, resources planning, and regulatory activities, as well as for a variety of applications [1]. It is a significant determinant of land use and thus the societal value of the land. Land cover differs at a variety of spatial scales, from local to global, and at temporal frequencies ranging from days to millennia. As the importance of environmental management and planning grew, so did 3D, which is the same instrument as FY-3C but the number bands, data can yield equal or better results than MODIS and AVHRR.
Thus, apart from achieving the earlier mentioned goal, this paper also evaluates the performance of FY-3C data that has not been considered so far for Africa LC mapping.

Study Area
The study area, gray-shaded region, covers approximately 30% of the total area of the African continent ( Figure 1). It is found between latitude, 11°58′31.71" S, 33°0'18.55″ N, and longitude, 19°4′35.03″ E, 51°24′37.33″ E. It includes about 18 countries, fully or partially, that occur in the eastern, central, northeastern parts of Africa. The region is characterized by three main climates such as arid, Sahelian, tropical, and equatorial; where the northern part is mostly arid that includes the Sahara, the world's largest desert, and central Africa is known for its tropical rain forest cover.

Technical Workflow
Various materials and techniques have been employed to achieve the objective of this research as discussed below and the technical route is exhibited in Figure 2.

Technical Workflow
Various materials and techniques have been employed to achieve the objective of this research as discussed below and the technical route is exhibited in Figure 2.

Materials and/or Resources
Two types of satellite imageries from various sources were used for different purposes. FY-3C data are the primary input data that were acquired from the Chinese Metrological Administration (CMA). We collected one-year, 1-April 2019 to 30-March 2020, 10day composite images with 1-km spatial resolution. Then, 11 bands, that include visible, infrared (IR), and maximum value composite (MVC) of NDVI bands, were selected to use for the intended goal (see Table 1).

Materials and/or Resources
Two types of satellite imageries from various sources were used for different purposes. FY-3C data are the primary input data that were acquired from the Chinese Metrological Administration (CMA). We collected one-year, 1-April 2019 to 30-March 2020, 10-day composite images with 1-km spatial resolution. Then, 11 bands, that include visible, infrared (IR), and maximum value composite (MVC) of NDVI bands, were selected to use for the intended goal (see Table 1). Landsat imageries were the other images, with higher spatial resolution than the input data, used to collect reference data. For this purpose, we acquired one-year Landsat 8 Collection 1 Level-2, which are atmospherically corrected, surface reflectance images with the same interval date as the input data. The scenes were accessed from the United States Geological Survey (USGS) website [16] by setting cloud cover criteria <10% and excluding data of unknown cloud cover. Once the images have been obtained, the best scenes were selected manually for reference sample collection for that particular location.
In addition to Landsat data, an existing map, i.e., Copernicus global discrete land cover map at 100 m resolution (CGLS LC100 discrete map) for the year 2019 (https://lcviewer. vito.be/download, accessed on 14 September 2021), Google Earth Pro and/or Google maps have also been used as a complementary and/or reference data, while collecting training and test data.
Furthermore, in order to perform various pre and post-processing of vector and raster data software such as ENVI 5.3 and ArcMap 10.7 were employed. Python 3.8 programming language was also employed to process and classify the images using machine-learning methods. To manipulate the data, we used certain Python libraries, such as Scikit learn library that includes NumPy and Matplotlib (https://scikit-learn.org/stable/, accessed on 14 September 2021); and other important packages such as osgeo, Geospatial Data Abstraction Library (gdal), ogr, geopands, and others.

Random Forest Classifier
Machine learning algorithms are nonparametric supervised techniques that have become a major focus and a huge success in remote sensing technology in recent years, e.g., Pal and Mather [17]; Mountrakis et al. [18]; Belgiu and Drǎgut [19]; Maxwell et al. [20] and Wulder et al. [21]. Applying machine-learning algorithms provides significant benefits including the capacity to model complex class signatures, accepting a variety of input predictor data, and being unaffected by the distribution of data [20]. Several studies have found out that these methods generally yield a better result compared to traditional parametric algorithms, particularly, for complex datasets with a high-dimensional feature space, i.e., several estimator features or attributes (e.g., Friedl and Brodley [22]; Hansen and Reed [23]; Huang et al. [24]; Pal [25]; Pal and Mather [17]; Ghimire et al. [26]; Otukei and Blaschke [27]).
The random forest (RF) is an ensemble approach that consists of numerous decision trees that are formed using randomly picked predictor variables from randomly chosen training samples subsets, with class prediction based on a majority vote [19,28] (see Figure 3). It is one of the most widely applied and robust machine-learning algorithms [19].
The method of constructing a random forest is generally a combination of bagging and random subspace methods [29]. The trees are formed by drawing a subset of training samples through replacement (a bagging approach, in which case the same sample can be chosen repeatedly at the expense of the others ( Figure 3A). About two-thirds of the samples (referred to as in-bag samples) are used to train the trees. Whereas, the remaining one-third (referred to as out-of-the-bag samples) are used in an internal cross-validation technique for testing the effectiveness the resulting RF model performs, or for error estimate also known as the out-of-bag (OOB) error [28,30]. Each decision tree is independently generated without any pruning and each node is split using a user-defined number of features (Mtry), selected at random. By growing the forest up to a user-defined number of trees (Ntree), the algorithm creates trees that have high variance and low bias [28]. Finally, the algorithm classifies new samples by applying majority votes obtained from several estimators (trees) [30] ( Figure 3B).
Remote Sens. 2021, 13, x FOR PEER REVIEW 6 of 20 [28]. Finally, the algorithm classifies new samples by applying majority votes obtained from several estimators (trees) [30] ( Figure 3B). Two param significantly affect the performance of the RF classifier: Ntree and Mtry [19,20]. Although various studies have shown that Mtry parameter is more important than Ntree in affecting the classification result [31,32], setting appropriate values for both param is essential to find a better accuracy. In this regard, Belgiu and Drǎgut [19] and Gislason et al. [33] suggested 500 as a default value for Ntree. Whereas, Guan et al. [34] recommended the value of Ntree as much as possible, arguing RF classifier is computationally efficient and does not overfit. Regarding Mtry parameter, it is mostly considered as the square root of the number of input variables [19,33]. On the other hand, some investigators assume Mtry as equal to the total number of exiting variables (e.g., Ghosh et al. [31]). However, such an assumption can compromise the computational efficiency, as the algorithm has to calculate the information gain resulted from all of the features used to divide the nodes [19].
A number of advantages of applying the RF algorithm have been mentioned in the literature. RF provides high accuracy [19,35], even better than several other machine learning algorithms including, discriminant analysis, support vector machines, and neural networks [30] and robust to over fitting [19,28]. It is also computationally inexpensive unlike other ensemble classifiers including support vector machine and AdaBoot. Furthermore, it enables us to select important variables [35] that allow us to remove the least significant features; and it mostly requires setting a few, two, param, i.e., Ntree and Mtry, [19,20,30] One drawback of RF is that having many trees reduces the capacity to visualize the trees [28]. In this paper, thus, we employed the RF model by setting Ntree = 500 and Mtry equals to the square root of the number of variables. Two param significantly affect the performance of the RF classifier: Ntree and Mtry [19,20]. Although various studies have shown that Mtry parameter is more important than Ntree in affecting the classification result [31,32], setting appropriate values for both param is essential to find a better accuracy. In this regard, Belgiu and Drǎgut [19] and Gislason et al. [33] suggested 500 as a default value for Ntree. Whereas, Guan et al. [34] recommended the value of Ntree as much as possible, arguing RF classifier is computationally efficient and does not overfit. Regarding Mtry parameter, it is mostly considered as the square root of the number of input variables [19,33]. On the other hand, some investigators assume Mtry as equal to the total number of exiting variables (e.g., Ghosh et al. [31]). However, such an assumption can compromise the computational efficiency, as the algorithm has to calculate the information gain resulted from all of the features used to divide the nodes [19].
A number of advantages of applying the RF algorithm have been mentioned in the literature. RF provides high accuracy [19,35], even better than several other machine learning algorithms including, discriminant analysis, support vector machines, and neural networks [30] and robust to over fitting [19,28]. It is also computationally inexpensive unlike other ensemble classifiers including support vector machine and AdaBoot. Furthermore, it enables us to select important variables [35] that allow us to remove the least significant features; and it mostly requires setting a few, two, param, i.e., Ntree and Mtry, [19,20,30] One drawback of RF is that having many trees reduces the capacity to visualize the trees [28]. In this paper, thus, we employed the RF model by setting Ntree = 500 and Mtry equals to the square root of the number of variables.

Selecting Reference Data Location and Landsat Data Acquisition
The location and distribution of the Landsat imageries for reference data collection were determined on Google Earth Pro by creating a polygon (see Figure 4) on the basis of eco-regions of Africa, previous LC maps of Africa, Copernicus global land cover, globcover, and personal experience. As shown in Figure 4, the white polygons, KMZ files format, are randomly distributed across the study area according to earlier mentioned criteria. Then, the KMZ files (location of reference data) were imported to the USGS website to select and acquire Landsat imageries.

Selecting Reference Data Location and Landsat Data Acquisition
The location and distribution of the Landsat imageries for reference data c were determined on Google Earth Pro by creating a polygon (see Figure 4) on the eco-regions of Africa, previous LC maps of Africa, Copernicus global land cov cover, and personal experience. As shown in Figure 4, the white polygons, KMZ mat, are randomly distributed across the study area according to earlier mention ria. Then, the KMZ files (location of reference data) were imported to the USGS to select and acquire Landsat imageries.

Naming of Classes
Land cover types in the study area have been categorized into eight majo (Table 2) on the basis of the United Nations Land Cover Classification System (L stated in Copernicus Global Land Operations-Lot 1 [7]. Lands that are covered with intermittent crops, harvested, and barren (e.g., single and multiple cropping systems). Perennial crops will be categorized as a forest or shrub based on the def

Forest
Lands covered by woody plants, with a percent cover of more t and a height of more than 5 m. Exceptions: a woody plant havi tinct physiognomic feature of a tree can be categorized as a tree its height is less than 5 m but greater than 3 m.

Herbaceous wetland
Land covered by a persistent mixture of having water and her or woody plants. Vegetation may be found in salt, brackish, or ter.

Herbaceous vegetation
Plants that lack a distinct solid structure and have no persisten or shoots above the surface. It may consist of up to 10% tree

Naming of Classes
Land cover types in the study area have been categorized into eight major classes (Table 2)   Lands that are covered with intermittent crops, harvested, and then left barren (e.g., single and multiple cropping systems). Perennial woody crops will be categorized as a forest or shrub based on the definition.

Forest
Lands covered by woody plants, with a percent cover of more than 15% and a height of more than 5 m. Exceptions: a woody plant having a distinct physiognomic feature of a tree can be categorized as a tree even if its height is less than 5 m but greater than 3 m.

Herbaceous wetland
Land covered by a persistent mixture of having water and herbaceous or woody plants. Vegetation may be found in salt, brackish, or freshwater.

Herbaceous vegetation
Plants that lack a distinct solid structure and have no persistent stems or shoots above the surface. It may consist of up to 10% trees and shrubs.

Shrubs
These are woody perennial plants, with persistent and woody stems, that are less than 5 m tall and lack a distinct main stem. The leaves of the shrub can be either evergreen or deciduous.

Water bodies
These include lakes, reservoirs, and rivers. The water could be fresh or salty.

Reference Data/ROI Collection
Studies have revealed that the accuracy of land cover maps predominantly relies on the quality and quantity of reference (training and test) data. According to Huang et al. [24] training data may have a greater influence than the type of classifier employed. However, researchers recommend contradicting figures on the size of training data. For instance, Noi and Kappas [36] suggested that the training sample size should be 0.25 percent of the overall study area. Jensen and Lulla [37], whereas, proposed training pixels should be at least ten times the number of variables in the classification model. Other studies, on the other hand, found out that machine-learning algorithms required a large number of training data to achieve better results [33].
For this work, therefore, we managed to collect a large amount of reference data, 91,207 training pixels, and 27,667 test pixels, across the study area (see Figure 5) from Landsat 8 surface reflectance imageries. The number of both training and test pixels for the various classes varies according to the size of the land cover, Table 3. Because classes that occupy vast regions require more samples than those that occupy small areas, areaproportional distribution of training samples per class produced the best classification results [38].
7. Shrubs that are less than 5 m tall and lack a distinct main stem. The leaves of the shrub can be either evergreen or deciduous.
8. Water bodies These include lakes, reservoirs, and rivers. The water could be fresh or salty.

Reference Data/ROI Collection
Studies have revealed that the accuracy of land cover maps predominantly relies on the quality and quantity of reference (training and test) data. According to Huang et al. [24] training data may have a greater influence than the type of classifier employed. However, researchers recommend contradicting figures on the size of training data. For instance, Noi and Kappas [36] suggested that the training sample size should be 0.25 percent of the overall study area. Jensen and Lulla [37], whereas, proposed training pixels should be at least ten times the number of variables in the classification model. Other studies, on the other hand, found out that machine-learning algorithms required a large number of training data to achieve better results [33].
For this work, therefore, we managed to collect a large amount of reference data, 91,207 training pixels, and 27,667 test pixels, across the study area (see Figure 5) from Landsat 8 surface reflectance imageries. The number of both training and test pixels for the various classes varies according to the size of the land cover, Table 3. Because classes that occupy vast regions require more samples than those that occupy small areas, areaproportional distribution of training samples per class produced the best classification results [38].   Before starting image interpretation to collect land cover examples, false-color composite images, mostly 5,4,3, were created from six-stacked bands (band 2 through 7) in ENVI and made to overlay exiting Copernicus map of the particular area. Then, identification and/or interpretation of the various land cover classes were carried out on the composite Landsat images by employing three techniques/references simultaneously.
Interpretations of the images by applying image interpretation elements such as texture, pattern, association, color, tone, and others.

2.
Crosschecking the underlying previous land cover map, i.e., Copernicus global land cover map (100 m) if the class is homogenous for at least the minimum mapping unit (1 km × 1 km, i.e., 40 × 40 Landsat pixels) of the input data.

3.
Finally, further cross-checking/consolation was done on higher resolution imageries, Google Earth Pro, and Google Maps.
Generally, class naming was based on the interpretation of Landsat images aided by the previous map and Google Earth Pro/Google Maps. However, when discrepancies/disagreement among these references occurred, as happened often, naming of classes was made by referring to Google Earth Pro and/or maps as they are higher resolution imageries than the other two references. After the name of the land cover type was confirmed, the samples were annotated as polygons XML files and subsequently exported as shapefile using ENVI.
Finally, the shapefiles from the different scenes were merged in ArcMap to create a single shapefile containing all collected reference data. Then, merged single reference data were divided into two ( Figure 5), using ArcMap, approximately 75/25 train/independent test data, respectively by randomly selecting 5-sample polygons out of 20-shape files from each class and exporting them. Once the selected test samples were exported as separate shapefile, they were deleted from the merged data that contain both training and test data so that the remaining data comprise only 75-percent of the sample, the training data.
As mentioned in Section 2.1, a total of one-year (1 April 2019 to 30 March 2020) 10-day composite data were considered and systematically divided into different months, following yearly seasons, so as to determine how variations in the number of months of multi temporal-data impact the land cover classification accuracy.
Accordingly, seven different types of stacked imageries were formed from eleven spectral bands including NDVI by varying the number of continuous months for the first five input data listed below and by employing feature selection/reduction methods for the last two. Nine months stacked image (NMSI): it is a stacked scene of 9-month (Apr, May, Jun, Jul, August, September, October, November, December 2019) and has a total of 297-bands/features. 5.
One-year stacked image (OYSI): contains the maximum features/bands, 396, as it is made of one-year, 1 April 2019 to 1 April 2020, composite images. 6.
Selected data from one-year stacked image (SDFOYSI): as the name implies the data are generated by feature selection/reduction methods using variable importance of random forest classifier. One of the advantages of using a random forest classifier is its feature/variable importance algorithm, built-in functionality that helps to select the most important variables and remove the least significant ones. Using the algorithm and by setting 95% cumulative importance, 337 features/bands were selected out of 396 features/bands of OYSI (see Figure 6). Implying the rest 59 bands are the least important features and have an insignificant effect on the overall result. 7.
Selected nine months stacked image (SNMSI): a stacked image consisting of selected 9-month imageries (April, May, June, July, August, September 2019; and January, Februry, and March 2020) that are created based on the result obtained by processing SMSI and NMSI. That is, it is formed from the one-year data by removing three months (October, November, December 2019) data as there is no variation in accuracy observed by processing SMSI and NMSI where the latter comprises the excluded months. In other words, it is data derived by a systematic feature selection technique. 6. Selected data from one-year stacked image (SDFOYSI): as the name implies the data are generated by feature selection/reduction methods using variable importance of random forest classifier. One of the advantages of using a random forest classifier is its feature/variable importance algorithm, built-in functionality that helps to select the most important variables and remove the least significant ones. Using the algorithm and by setting 95% cumulative importance, 337 features/bands were selected out of 396 features/bands of OYSI (see Figure 6). Implying the rest 59 bands are the least important features and have an insignificant effect on the overall result. 7. Selected nine months stacked image (SNMSI): a stacked image consisting of selected 9-month imageries (April, May, June, July, August, September 2019; and January, Februry, and March 2020) that are created based on the result obtained by processing SMSI and NMSI. That is, it is formed from the one-year data by removing three months (October, November, December 2019) data as there is no variation in accuracy observed by processing SMSI and NMSI where the latter comprises the excluded months. In other words, it is data derived by a systematic feature selection technique.

Model Training
Data processing including model training, classification, and accuracy assessments was performed in Python 3.8 programming language platform. The software consists of

Model Training
Data processing including model training, classification, and accuracy assessments was performed in Python 3.8 programming language platform. The software consists of various built-in libraries such as Scikit-Learn(sklearn), NumPy, and Matplotlib that enable us to execute various machine learning algorithms including random forest.
In this paper, therefore, we employed a random forest model to classify the data. The algorithm was trained using the seven different input data types discussed in Section 2.8 but using the same training dataset that consists of 91,207 training pixels. Model param were set to default values for the majority of the param as given in sklearn library except for the Ntree (n_estimator) and random state, which were converted from 100 to 500 following suggestions by Gislason et al. [33], Maxwell et al. [20], and Belgiu and Drǎgut [19], and none to 42, respectively. Regarding Mtry, we used the default value, which is equal to the square root of the number of variables; the most widely applied and suggested value by several previous works.

Accuracy Assessment
Performances of the various models that were generated by varying the input data sets were evaluated by using error matrix, the most common accuracy evaluation techniques, fscore and kappa coefficient. The error matrix is also known as the confusion matrix because it identifies not only overall errors for each category but also misclassifications (due to confusion between categories) by category. The confusion matrix allows an assessment of the user's accuracy or recall (the number of correctly classified pixels divided by the total number of pixels predicted within that class) and producer's accuracy also known as precision (the number of correctly classified pixels divided by the total number of pixels truly in that class) for each class [39].

Results
The performance of all seven input datasets composed of the different number of stacked imageries consisted of 3 to 36 composite images, where each of them containing 11 bands including NDVI was evaluated with the same independent test sample. However, the number of reference data, training, and test pixels, collected for the various classes vary according to the size of the land cover. That is, land cover types with larger areal coverage, e.g., Bare/Sparse vegetation, cropland, shrub, and forest were represented by a large number of pixels; while minority classes, such as built-up and herbaceous wetland were represented by fewer pixels in proportion with the prevalence of the class (see Figure 7 and Table 4). The strategy proved to yield the best accuracy [38].
Remote Sens. 2021, 13, x FOR PEER REVIEW 11 of 20 for the Ntree (n_estimator) and random state, which were converted from 100 to 500 following suggestions by Gislason et al. [33], Maxwell et al. [20], and Belgiu and Drǎgut [19], and none to 42, respectively. Regarding Mtry, we used the default value, which is equal to the square root of the number of variables; the most widely applied and suggested value by several previous works.

Accuracy Assessment
Performances of the various models that were generated by varying the input data sets were evaluated by using error matrix, the most common accuracy evaluation techniques, f-score and kappa coefficient. The error matrix is also known as the confusion matrix because it identifies not only overall errors for each category but also misclassifications (due to confusion between categories) by category. The confusion matrix allows an assessment of the user's accuracy or recall (the number of correctly classified pixels divided by the total number of pixels predicted within that class) and producer's accuracy also known as precision (the number of correctly classified pixels divided by the total number of pixels truly in that class) for each class [39].

Results
The performance of all seven input datasets composed of the different number of stacked imageries consisted of 3 to 36 composite images, where each of them containing 11 bands including NDVI was evaluated with the same independent test sample. However, the number of reference data, training, and test pixels, collected for the various classes vary according to the size of the land cover. That is, land cover types with larger areal coverage, e.g., Bare/Sparse vegetation, cropland, shrub, and forest were represented by a large number of pixels; while minority classes, such as built-up and herbaceous wetland were represented by fewer pixels in proportion with the prevalence of the class (see Figure  7 and Table 4). The strategy proved to yield the best accuracy [38].  As stated in Section 2.9 to conduct the accuracy test, therefore, the error matrix ( Figure 7) along with f1-score and kappa coefficient was employed. Table 4 and Figure 7a-d depict the overall accuracy, kappa score, precision, recall, and f1-score generally increase with the increasing number of months the data comprised. Up to 6% variation in overall accuracy and notable differences of individual class accuracy between OMSI and OYSI were observed. This variation/improvement in accuracy is due to an increase in the number of months and the related number of time series data. In other words, images collected over a longer period acquire more important attributes of the various classes than scenes acquired in a lesser time interval. The more of the period the data covers, the more properties of the land cover types associated with the composition of a feature which is intrinsic and determines the spectral character of a particular object on the surface. Seasonal variation also known as phenological properties and mixing of classes could be registered and enhance discrimination of classes.
Hence, using continuous data over a longer period enables us to record the variations/changes caused by different factors thereby the different land cover types are characterized well; and that can help us to find better classification accuracy.
However, this generalization does not necessarily mean that processing a higher number of months of multi-temporal images results in better accuracy than manipulating a lesser amount of months. For example, manipulating SNMSI produced the highest accuracy (0.86) and kappa score (0.83) although it consisted of 3 months lesser data than OYSI (see Table 4 and Figure 7e,g); consequently, the data were used to produce the land cover map of the study area (Figure 8). Similarly, despite consisting of the highest time interval/data, OYSI, resulted in the second-best result, 0.85 and 0.82, accuracy and kappa value, respectively, which is similar to the outcome found by processing SDFOYSI, a lesser amount of data were obtained by using feature reduction method (see Table 4 and Figure 7e,f. Not only these, but NMSI also produced almost the same accuracy and kappa value (0.84 and 0.81) as SMSI although it has 3-month more information. These indicate the existence of redundant information that has little/no impact on the accuracy if they are removed.

Effect of Feature Selection/Reduction
To address the redundancy of information and to remove the least significant data, two feature selection techniques have been implemented. Feature/band reduction is a method of removing the least important variables/bands without significantly altering the final product. In other words, it is a method of selecting the most vital information that represents the object [33]. It is also a means of minimizing/avoiding overfitting thereby enhancing the capacity of classifiers to generalize and reduce computational cost. In this

Effect of Feature Selection/Reduction
To address the redundancy of information and to remove the least significant data, two feature selection techniques have been implemented. Feature/band reduction is a method of removing the least important variables/bands without significantly altering the final product. In other words, it is a method of selecting the most vital information that represents the object [33]. It is also a means of minimizing/avoiding overfitting thereby enhancing the capacity of classifiers to generalize and reduce computational cost. In this paper, therefore, two feature selection methods, automatic and manual, were employed.
In the automatic method, we used the variable importance module of random forest in sklearn library (python 3.8) on OYSI consisted of 396 features/bands to select the most import variables; consequently, we obtained 337 features/bands using 95% cumulative importance. Analyzing the selected 337 bands resulted in almost the same accuracy and kappa score, 0.85 and 0.82, respectively, as OYSI which justified the redundancy of data that do not add vital information. Implying the rest of the 59 bands are the least important features and their effect on the overall result is insignificant.
In the second scenario, whereas, we manually created data by systematically selecting 9-month images, SNMSI, from the whole one-year images. That is, it was formed by leaving out the three continuous month images containing redundant information, based on the earlier experiment. The selected data yielded the best result and the highest accuracy (0.86), kappa score (0.83), and the best precision, recall, and f1-score (Table 4 and Figure 7g). The result infers that incorporating several features/variables with similar information negatively affects the overall accuracy and individual classes' accuracy as it may lead to overfitting that affects the ability of the algorithm to generalize. In other words, a high number of data features (i.e., bands) might yield lesser classification accuracy than a subset of those variables, especially if the variables in the subset are selected in a certain way to focus on those that are highly relevant in differentiating the classes [20]. Figure 7 shows water bodies and bare/sparse vegetation are the most accurately classified classes, above 0.9, and little variations in recall and f1-score for all types of input data. These indicate that the classes have distinct spectral properties that are affected only slightly by the change in the number of months/temporal data processed; seasonal variation has little impact on them. On the other hand, the accuracy (user's, producers, and f1-score) of four vegetation classes such as cropland, herbaceous wetland, herbaceous vegetation, and shrub improves by at least by 1% with increasing the number of months/temporal data, implying the significant impact of phenological event variation in surface cover type. Built-up area, however, showed no pattern with changes in the number of months as it is independent of seasonal variation. Moreover, it was classified with the least classification accuracy barely above 60%; and it is highly confused with cropland. This could be caused by the mixed pixels as the two classes are spatially associated and found mixed in most cases. Mixed pixel is one of the main factors that affect land cover accuracy when coarse resolution imageries are used [2]. The other reason could be its size. Since it is a minority class, found covering a small area, it is difficult to collect a large amount of reference data. Similarly, herbaceous wetland and herbaceous vegetation were mapped with less accuracy. The former is due to its size, being a minority class, and exhibiting similar spectral properties with water body, cropland, and forest; and/or they have mixed pixels of these classes. As a result, it was often mislabeled to these classes. Regarding herbaceous vegetation, it was mostly misclassified as shrub and cropland. In addition to the reasons mentioned above, that could be attributed to class/legend definition. That is, herbaceous vegetation can contain up to 10% trees and/or shrubs by definition, so it is highly likely to misclassify mixed classes where one of the classes is dominant and the other is close to the cut-off value. For instance, land covered by 87% herbaceous vegetation and 13% trees/shrubs may be misinterpreted as herbaceous vegetation while collecting data as the method of gathering information is a visual inspection which is prone to error that ultimately leads to confusion and misclassification. This could also be the main reason for the misclassification of shrubs as forest and vise verse. The impact of land cover class definition on accuracy has also been reported by [7].

Conclusions
The objectives of this work were to analyze the effect of processing various amounts of months of multi-temporal satellite data on the accuracy of a land cover classification and to produce a land cover map of parts of Africa using the best-input data and machine learning, random forest, classifier. We also aimed to evaluate/assess the effectiveness of FY-3C Chinese metrological satellite data in mapping the land cover of Africa.
Stepwise experimentations were carried out by increasing the number of months, i.e., temporal data. We started with a one-month stacked image comprised of three 10-day composite scenes where each scene has 11-band including the average maximum NDVI and 1 km spatial resolution; and then increased the number of the month by three, six, nine, and twelve. The months were set to be consecutive beginning from April 2019 to March 2020 and divided considering the four annual seasons. In addition, two different input data sets were generated by the band selection/reduction method from the one-year data that makes the total amount of input data sets seven. For all input data sets, a random forest algorithm was trained using the same training samples (91207 training pixels), but different quantities for each class, that were collected from Landsat imageries. Finally, all the input data were tested against the same independent test samples that consist of 27667-pixel.
The overall accuracy, kappa coefficient, and individual class accuracy generally increase with increasing the number of continuous time-series data/months. However, the best result was not achieved by manipulating one-year stacked multi-temporal data despite consisting of the maximum months/input data. Using this data only yielded the second-best accuracy (0.85) which is the same as the accuracy resulting from processing data obtained after feature selection/reduction on one-year data. The highest accuracy, both overall accuracy (0.86) and kappa coefficient (0.83) were attained when systematically selected 9-month imageries, three seasons, were processed; consequently, the data were chosen to create the land cover map of the study area.
In large area land cover mapping, employing a systematic selection of months and/or features from one-year multi-temporal data allows us to achieve the best result. Moreover, conducting variable selection using 95% cumulative importance over one-year data had little/no impact on the overall accuracy although significant bands (59) were discarded.
In this study, therefore, despite using a single input data type from FY-3C, high overall accuracy was attained, above 85%, suggesting FY-3 data are highly effective and suitable for land cover classification of Africa.
Finally, better classification accuracy can be reached if the spatial resolution of the input data are improved and other ancillary input data are considered. In addition, increasing spatial resolution would decrease pixel mixture that in turn enhances differentiability among classes; especially minority classes could be mapped with better accuracy. On the other hand, the overall accuracy decreases if the thematic resolution, the number of classes, increases [5,21].