Operational Large-Area Land-Cover Mapping: An Ethiopia Case Study

Knowledge of land cover and land use nationally is a prerequisite of many studies on drivers of land change, impacts on climate, carbon storage and other ecosystem services, and allows for sufficient planning and management. Despite this, many regions globally do not have accurate and consistent coverage at the national scale. This is certainly true for Ethiopia. Large-area land-cover characterization (LALCC), at a national scale is thus an essential first step in many studies of land-cover change, and yet is itself problematic. Such LALCC based on remote-sensing image classification is associated with a spectrum of technical challenges such as data availability, radiometric inconsistencies within/between images, and big data processing. Radiometric inconsistencies could be exacerbated for areas, such as Ethiopia, with a high frequency of cloud cover, diverse ecosystem and climate patterns, and large variations in elevation and topography. Obtaining explanatory variables that are more robust can improve classification accuracy. To create a base map for the future study of large-scale agricultural land transactions, we produced a recent land-cover map of Ethiopia. Of key importance was the creation of a methodology that was accurate and repeatable and, as such, could be used to create earlier, comparable land-cover classifications in the future for the same region. We examined the effects of band normalization and different time-series image compositing methods on classification accuracy. Both top of atmosphere and surface reflectance products from the Landsat 8 Operational Land Imager (OLI) were tested for single-time classification independently, where the latter resulted in 1.1% greater classification overall accuracy. Substitution of the original spectral bands with normalized difference spectral indices resulted in an additional improvement of 1.0% in overall accuracy. Three approaches for multi-temporal image compositing, using Landsat 8 OLI and Moderate Resolution Imaging Spectroradiometer (MODIS) data, were tested including sequential compositing, i.e., per-pixel summary measures based on predefined periods, probability density function compositing, i.e., per-pixel characterization of distribution of spectral values, and per-pixel sinusoidal models. Multi-temporal composites improved classification overall accuracy up to 4.1%, with respect to single-time classification with an advantage of the Landsat OLI-driven composites over MODIS-driven composites. Additionally, night-time light and elevation data were used to improve the classification. The elevation data and its derivatives improved classification accuracy by 1.7%. The night-time light data improve producer’s accuracy of the Urban/Built class with the cost of decreasing its user’s accuracy. Results from this research can aid map producers with decisions related to operational large-area land-cover mapping, especially with selecting input explanatory variables and multi-temporal image compositing, to allow for the creation of accurate and repeatable national-level land-cover products in a timely fashion.

of an index such as the NDVI, a statistical measure such as a mean or median, and so on [54][55][56][57][58]. Creation of composite layers based on monthly/seasonal/annual or statistical measures has shown great potential for characterizing land-cover classes [59][60][61][62][63]. Because these composite layers are extracted from time-series of observations, they could provide higher levels of consistency compared to single-time imagery.
In this research, several classifications based on different explanatory input variables are implemented and compared based on classification overall and class-specific accuracies. We are interested in comparing performances of top of atmosphere (TOA) and surface-reflectance (SR) products in terms of classification accuracy and if additional radiometric normalization might improve classification while mapping large areas. In addition, we investigate if and how much different multi-temporal composites can improve classification accuracy of large areas with respect to single-time classification. Finally, ancillary datasets are tested for classification accuracy improvement. Even though, some of those methods have been used in previous research for small-area mapping projects, here we are interested in testing the effects and implications of such methods and decisions on the accuracy of large-area operational land-cover mapping. This will provide more information about the relative performance of different methods/data, for example comparison between surface versus top of atmosphere reflectance products or comparison among different time-series compositing methods, which will assist other researchers with similar questions and enhance the existing discussion on comparisons and uses of the rapidly increasing selection of global land-cover products.

Study Area: Ethiopia
Ethiopia maintains the second largest population among African countries (102 million) and is also among the poorest in the world. Per capita income remains at $660 USD despite the government pushing to reach lower-middle-income status by 2025 [64]. Under the Growth Transformation Plan of Ethiopia, the agricultural sector is listed as the country's major source of economic growth [65]. Farming provides a livelihood to about 80% of the population and 45% of the country's Gross Domestic Product (GDP) comes from the agricultural sector [66]. Approximately 96% of the cultivated land is accounted for by smallholder rainfed agriculture; most of which work with less than one hectare. Cereals constitute most of the production and are estimated to make-up nearly three quarters of the total cultivated area. Other crops include fruits, vegetables, root crops, oil seeds and coffee, though these only make up about 10% of the area under cultivation combined [67,68]. Larger industrial-style farms specialize in the cultivation of crops such as sugarcane and cotton, which contribute far less to total crop production. Smallholder, multiple crop farming is practiced by 80% of the country's population and is concentrated predominantly in the highlands. Similar practices are followed in the lowlands with more drought tolerant varieties of cereals and oil crops. The northern and Somali regions of the country are largely occupied by agro-pastoralism while the southern and western portions use shifting cultivation with slash and burn [69]. Pastoralism areas mainly include north eastern, eastern, south eastern and some part of south of the country.
Ecological zones throughout the country are characterized into fifteen classes based on elevation and rainfall [70]. The elevation ranges from approximately 110 m below sea level to more than 4000 m above sea level. This vast irregularity in topography creates varying rainfall regimes with annual totals of 150 mm in the southeast and as much as 2000 mm in the southwest [71]. The key agricultural regions of Ethiopia experience two rainy seasons known as the Kiremt or Mehere (June-September) and the Belg (March-May) [72]. According to Biazin and Sterk [73], dry lands occupy approximately 65% of the country and only 15.1% of the land is considered arable. The dominant land covers present throughout the country include bare areas in the northeast, grassland/shrubland in the east, and rainfed croplands in the western highlands. Due to the rampant deforestation in the 1900s, forest cover is limited to just 12.5% of total land area and remains mostly concentrated in the southwest [74]. The northwestern Remote Sens. 2020, 12, 954 5 of 24 region is barren with sparse vegetation caused by low elevation that results in high temperatures. The major cause of changes in land use/land cover in the highland regions has been deforestation, mainly due to agricultural intensification [75]. In addition to expanding agricultural land, population growth has exerted pressure through increased urban development [76,77].

Classification Scheme and Reference Data Collection
The classification scheme included six classes as shown in Table 1. As our goal was to study changes in agricultural activities and savanna systems, the classification scheme was mainly focused on these classes, which was considered appropriate given the national scale of this research. While more local or regional classifications do exist, these classifications were inappropriate to scale up nationally, and given the focus of this research was the creation of a national product, which would be reproducible across earlier dates for future research, we focused on these broader land-cover classes at this initial stage. Table 1 also includes the corresponding classes from two existing land-cover maps of Ethiopia: Copernicus Global Land Operations (https://land.copernicus.eu/global/; last accessed on February 2020) and GlobeLand30 [22]. The two existing land-cover maps were used for spatial stratification of the reference pixels (explained below) and benchmarking of the performance of the proposed mapping process (Section 3.2).  Figure 1 shows some examples for each class representing the wide range of variations for most classes. Our initial assessment revealed confusion among both visual appearance and spectral signatures of some classes. The major confusions observed were among Dense Forest and Savanna; Cultivated and Bare Soil; and Urban/Built and Bare Soil classes. While additional refinement and very careful selection of training pixels helped reduce these confusions, it is also clear that these confusions can be expected to be the main sources of misclassifications in the final product and must be considered when reviewing the final products. Examples of the target land-cover classes showing the ranges of class distribution, ordered by land-cover proportion cover of a pixel, from maximum cover to minimum cover to still be assigned to the class, from left to right. Red squares show footprints of Landsat pixels.
Representative precise and robust reference data are essential for land-cover classification. Maintaining consistent, high quality reference data is challenging when the data are being collected for large areas, by different analysts, and over long periods. For our classification, we used GEE to implement a protocol to ensure the high quality of the reference data. Landsat 8 imagery were used as the main remote-sensing data for classification (details are presented in section 2.3) and therefore reference labels were collected for Landsat pixels. Because Ethiopia is located on multiple Universal Transverse Mercator (UTM) projected coordinate system zones, images were reprojected into the WGS84 geographic coordinate system (details are presented in section 2.3). For each class, 250 reference pixels were collected. To distribute the reference data over the entire study area, Copernicus Global Land Operations and GlobeLand30 land-cover products were used to implement a stratified random sampling, in which Copernicus and GlobeLand30 classes and their areas of Figure 1. Examples of the target land-cover classes showing the ranges of class distribution, ordered by land-cover proportion cover of a pixel, from maximum cover to minimum cover to still be assigned to the class, from left to right. Red squares show footprints of Landsat pixels.
Representative precise and robust reference data are essential for land-cover classification. Maintaining consistent, high quality reference data is challenging when the data are being collected for large areas, by different analysts, and over long periods. For our classification, we used GEE to implement a protocol to ensure the high quality of the reference data. Landsat 8 imagery were used as the main remote-sensing data for classification (details are presented in Section 2.3) and therefore reference labels were collected for Landsat pixels. Because Ethiopia is located on multiple Universal Transverse Mercator (UTM) projected coordinate system zones, images were reprojected into the WGS84 geographic coordinate system (details are presented in Section 2.3). For each class, 250 reference pixels were collected. To distribute the reference data over the entire study area, Copernicus Global Land Operations and GlobeLand30 land-cover products were used to implement a stratified random sampling, in which Copernicus and GlobeLand30 classes and their areas of agreement/disagreement were used to define the strata. The Copernicus Global Land Operations is a 100-m pixel size global land-cover product for 2015 and includes 18 land-cover classes. The GlobeLand30 is another global product and has 30-m pixel size. The GlobeLand30 map includes 10 classes and was constructed for 2010. The classes of the two existing land-cover products were recoded to match our six target classes as shown in Table 1, and some of the two products' classes did not exist over our study area. In addition, the two land-cover products were reprojected and resampled, using the nearest neighbor method, to match with the Landsat images. Then, the two maps were overlaid to specify areas of agreement and disagreement. This was done to identify areas that might be potentially simpler (agreement areas) and more difficult (disagreement areas) to classify for each class. The 250 reference pixels of each class were allocated to the areas of agreement and disagreement proportionate to their areas and were selected at random from the corresponding areas. The above procedure was used to ensure reference pixels were spatially well distributed and that there was a balance between areas of agreement and disagreement. The reference labels of sampled pixels were identified through manual interpretation of high-resolution imagery in GEE. To do so, using codes, footprints of reference pixels were highlighted on GEE's high-resolution imagery (as shown in image 1) and a spectrum of examples for each class were produced and used as guides by analysts during reference data collection. The pixels were labeled based on the majority rule, i.e., the most common class within a pixel. To ensure high-quality reference data collection, the same reference pixels (250 pixels per class) were independently labeled by two interpreters. Discrepancies in labels assigned by the two interpreters were double-checked and resolved by a third, more experienced, interpreter.

Landsat-Based Single-Time Classification
Long-term systematic acquisition and well-balanced spatial, spectral, and temporal resolutions of Landsat sensors make them valuable resources for large-area land monitoring and change detection [78][79][80][81]. Additionally, free availability of Landsat images over the last decade has made it the main source for large-area land-cover characterization [82][83][84][85]. Additionally, Landsat sensor was selected over the other sensors, such as Sentinel-2, mainly due to Landsat's longer-term data continuity. This enables us to produce land-cover maps for past periods, before the beginning of the Ethiopia land transactions in early 2000s, to map the land-cover changes due to land transactions in our future work. This need for reproducible methods and high accuracies thus resulted in the selection of Landsat as the main data source for this analysis. The entire area of Ethiopia is covered by 66 Landsat scenes. Different image dates were investigated based on cloud/shadow cover and images from 1 January 2017 to 17 January 2017 were selected because of low cloud cover. The selected 16-day period includes one image acquisition per scene that was used for single-time land-cover classification. Both Landsat 8 Collection 1 TOA and SR data were examined in this research. SR images are TOA images that have undergone pixel level atmospheric adjustment, through the Landsat 8 Surface Reflectance Code (LaSRC) [86], in order to obtain consistent spectral values over space and time, i.e., within and between images [86,87]. The pixels of the selected images were filtered based on Fmask quality flags [88,89] to exclude pixels with cloud and cloud shadow. Because the images fall in different UTM zones, the images were reprojected to WGS84 geographic coordinate system and mosaicked. While reprojecting the images, a bilinear resampling method was used to calculate the new pixel values. Two separate classifications were performed using spectral bands 1 to 7 of each of the TOA and SR images to compare their differences in classification accuracy. Even though SR images provide the required normalization adjustment for consistency of spectral values, these products, just as with any measurement/model-based product, have some imperfections. Therefore, as a simple normalization method, normalized difference indices (NDIs) (Equation (1)) were calculated from the seven spectral Remote Sens. 2020, 12, 954 8 of 24 bands from each image and replaced the original bands during classifications. Indices do not provide an additional source of information for classification, as their information already exists in the original bands. However, NDIs were used, as they are relative measures, i.e., differences/ratios of bands, and, as such, could be more consistent among images acquired across different dates and locations. Six NDIs were created using consecutive pairs of bands. Other pairs of bands were excluded, as they would include redundant information. Additionally, to incorporate textural information content of images in classifications, textural layers were calculated [90]. The textural layers were calculated using the panchromatic band, with 15-m pixel size, from the cloud masked TOA images (the panchromatic band does not have a corresponding SR band). The textural layers were calculated over 3 × 3 pixel windows based on the gray-level co-occurrence matrix (GLCM) using GEE's built-in functions which outputs 18 textural layers such as entropy, variance, contrast, and homogeneity [26].

Multi-Temporal Image Compositing
Incorporation of time-series imagery has been shown as one effective way to improve classification accuracy [91][92][93][94]. Compared to single-time images, time-series images include additional information such as vegetation phenology, which can be utilized to separate land-cover classes that might have similar spectra for some dates but different patterns over time. Therefore, composite layers based on monthly/seasonal/annual or statistical measures have frequently been used to enhance land-cover classification [95][96][97][98][99]. Compositing helps to deal with cloudy and missing value pixels. Moreover, compared to single-time images, summary measures extracted from time-series could be more consistent, both between and within scenes, as they represent longer period distributions of spectral values rather than values at a specific date/condition. In this research, three types of composites for land-cover classification have been investigated including sequential and probability density function (PDF) composites and a sinusoidal model. Sequential composites were calculated as per-pixel/per-band median values over predefined periods. Sequential composites were calculated for different periods and therefore could be used to separate land-cover classes based on their temporal patterns. On the other hand, PDF composites are per-pixel/per-band annual percentiles of spectral values and could be used to model probability density function of pixel values. PDF composites are time-independent and, therefore, they would be more helpful when different regions of the same class have similar distributions but different timing. This is very common for large-area mapping as timing of agricultural activities and vegetation greening could vary for different areas.
Landsat 8 images from 1 March 2016 to 1 March 2018, were used for the construction of composite layers, which included 2807 individual Landsat images. Landsat images from two years were used to increase data availability for cloudy periods. First, the Landsat images were masked per-pixel for cloud and cloud shadow using the Fmask values. This resulted in 90% of pixels with the number of observations in the range of 14-112. Because in single-time classifications the SR images outperformed the TOA images and the NDIs outperformed the original bands (details discussed in the Results section), the Landsat multi-temporal composites were constructed using the SR NDIs. PDF composites were extracted for five percentiles: 5%, 25%, 50%, 75%, and 95%. The five values were obtained independently per-pixel/per-band.
To create the Landsat sequential composite, the same cloud-masked images were divided into four periods of three months. The periods were established using the greenness onset, greenness maximum, senescence, and dormancy of vegetation in Ethiopia's ecoregions [100]. Additionally, precipitation was assessed to ensure appropriate division of the water year. The periods were formed as 1) December, January, and February; 2) March, April, and May; 3) June, July, and August; and 4) September, October, and November. For each period, the median spectral values were calculated per-pixel/per-band. This resulted in four median layers per band. The three-month periods were selected to increase availability of cloud-free observations for most pixels. However, this aggregation would eliminate some finer temporal resolution phenology variations. Therefore, Moderate Resolution Imaging Spectroradiometer (MODIS) images were used to take advantage of their finer temporal resolution. Terra surface reflectance daily products with 250-m pixel size (MOD09GQ version 6) from 1 July 2016 to 30 June 2017 were used for MODIS compositing. MOD09GQ provides spectral values at red and infrared wavelengths. To mask cloudy pixels in the MOD09GQ images, pixel-level cloud state information from Terra surface reflectance daily products with 1000-m pixel size (MOD09GA version 6) were used. First, all MOD09GQ images were matched with their corresponding MOD09GA images based on image acquisition date. Then, non-clear pixels were masked from MOD09GQ based on "state_1km" quality band of the corresponding MOD09GA images. NDVI values were calculated for clear pixels for all images and used for the MODIS composites calculations. This resulted in 90% of pixels with the number of observations in the range of 29-254. The MODIS NDVI images were resampled to 30 m to match with the base Landsat imagery. For PDF composite, ten percentile layers were created in 10% increments from 5% to 95% of NDVI values for each pixel. In addition, 12-monthly NDVI layers were constructed by extracting median NDVI values per-pixel/per-month.
In addition to the multi-temporal image composites, a seasonality analysis was used to model the patterns of Landsat NDVI variations. To do so, a sinusoidal model [101,102] was used to model NDVI variation over one year (1 July 2016 to 1 July 2017) as follows: where t is time in days starting from 1 July 2016, A is the amplitude, ω is the frequency of oscillation (set to equal 1 as we assumed a full cycle over one year), and ϕ is a phase shift. The sinusoidal model was fitted to the annual time-series of Landsat NDVI values independently for each pixel. The four estimated coefficients a 1 − a 4 were used as additional explanatory variables for classifications.

Ancillary Datasets
Elevation information was examined as one source of ancillary data for classification improvement. The Shuttle Radar Topography Mission (SRTM) version 4 [103] was used as the elevation dataset. Besides elevation, slope and aspect layers were calculated and used as additional classification explanatory variables. In addition, night-time light data were used to improve the classification of urban areas [58,104,105]. The Visible Infrared Imaging Radiometer Suite (VIIRS) Day/Night Band (DNB) was used as the night-time light dataset. DNB are monthly average radiance composite images with 15 arc seconds pixel size. The DNB data has been filtered to remove data that has been affected by stray light, lightning, lunar illumination, and cloud cover (https://ngdc.noaa.gov/eog/viirs/download_ dnb_composites.html, last accessed on November 2019). Seven monthly images from October 2016 to April 2017 were used. The "cf_cvg" band identifies per-pixel number of observations for each monthly composite. The pixels with less than four observations were excluded from monthly data. Then, a single layer was created by extracting the per-pixel minimum value of the seven months. This was done to reduce the effects of faulty observations, which mostly come from unfiltered cloudy pixels. Both elevation and night-time light datasets were resampled to match with the Landsat images.

Land-Cover Classification
To assess the effects of the classification input variables on the classification accuracy, a series of classifications based on different choices of explanatory variables were implemented. This included 13 classifications with the following input explanatory variables: classifications 1-3 corresponded to single-time classifications of TOA, SR, and SR NDIs images (results indicated outperformance of SR NDIs with respect to TOA and SR images and therefore the SR NDIs were used in classifications 4-13); classification 4 included textural layers in addition to classification 3 data; classifications 5-9 included one multi-temporal composite (described in Section 2.3.2) in addition to classification 4 data; classification (10) included all the multi-temporal composites in addition to classification 4 data; classification 11 included the night-time data in addition to classification 10 data; classification 12 included the elevation data in addition to classification 11 data; and classification 13 included classification 12 data except excluding the single-time NDIs.
Random forest [106,107] was used as the classification algorithm. Random forest is a non-parametric ensemble machine learning algorithm which has been used frequently for land-cover mapping and has shown relatively high accuracy compared to other state-of-the-art algorithms [50,108]. Random forest is an easy and fast algorithm to train and is commonly resistant to over-fitting because it averages large numbers of trees trained based on random permutations of training data and explanatory variables. Similar to many other non-parametric classification algorithms, random forest is capable of prioritizing explanatory variables that are more informative for delineating target classes [106,108]. Given the statistical rigor of random forest it is a commonly used approach, especially when many and varied input data are being evaluated, given the lack of over-fitting concerns with this approach. As such, there was no concern regarding the number of variables being used and, indeed, all variables could be included and evaluated using this technique. The reference pixels were randomly divided into training and test data based on 70% and 30% proportions, respectively, which is a commonly used division and appropriate here, in terms of pixel numbers and training data rigor. The same training and test data were used for all classifications. In addition, training of the random forest classifier was conducted in the out-of-bag mode where bag fraction was set to 30% of training data (out-of-bag data was different from the test data that was exclusively used for accuracy assessment). For all classifications, the random forest's number of trees parameter was set to 100. The number of variables per split parameter was determined using a grid search optimization based on out-of-bag error. For the grid search, number of variables around two-thirds of the number of explanatory variables, which depended on the number of variables used for a given classification, over a range of about one-third of the number of explanatory variables were tested, and the number of variables that resulted in the smallest out-of-bag error was selected as optimal number of variables per split.

Comparison of the Land-Cover Classification Processes
Several classifications based on combinations of explanatory variables were performed. The classifications were compared based on classification accuracy of the independent test dataset. Single-time classification of Landsat 8 images were conducted separately for SR and TOA image products where the seven spectral bands were used as classification independent variables. Table 2 presents the estimated overall accuracy (OA) and class-specific accuracy values. SR classification (classification 2 in Table 2) outperformed TOA classification (classification 1 in Table 2) by 1.1% in OA. This improvement might be attributed to the higher degree of radiometric consistency among SR images than TOA images. In addition, the classification of six NDIs from SR images (classification 3 in Table 2) resulted in 1.0% larger classification OA with respect to the seven original SR bands (classification 2 in Table 2). In terms of user's accuracy (UA) and producer's accuracy (PA), each of the three classifications (classifications 1-3 in Table 2) outperformed in some categories, with the overall best performance from SR NDIs. Incorporation of textural layers (classification 4 in Table 2) from the panchromatic band as additional explanatory variables to the SR NDIs improved classification OA by 1.9%. The textural layers improved classification accuracy by up to 5.3% for Cultivated, Dense Forest, Savanna, and Urban/Built classes. Bare Soil and Water/Wetland classes have relatively uniform spatial patterns and less textural information compared to the other classes and did not benefit from the inclusion of textural layers. Because SR NDIs outperformed the original TOA and SR images, the Landsat multi-temporal composites were constructed using the SR NDIs instead of the original spectral bands. Examples of the multi-temporal composites are presented in Figure 2 for an area covering parts of Addis Ababa and its vicinity. In Figure 2a, the area in the south-east is highly urbanized, while patches of dense forest exist immediately in the north and with some distance to the west of Addis Ababa. The majority of land in the north and west of Figure 2 are cultivated fields, visualized mainly as light brown in Figure 2a. Gefersa reservoir is located approximately in the center of Figure 2a. Figure 2b shows the Landsat NDVI sinusoidal model fitted independently to time-series data for each pixel. Red green blue (RGB) colors in Figure 2b correspond to the coefficients a 4 , a 3 , a 2 in Equation (2). As different land-cover classes have different NDVI model amplitudes and phase shifts, the sinusoidal model coefficients can differentiate them. Figure 2c illustrates the Landsat PDF composites with 95%, 75%, 50% of NDI 4 , i.e., NDVI, as RGB colors. The urban area and Gefersa reservoir appear dark as they have low NDVI value distributions. Forest patches appear very bright as NDVI values are larger for longer periods of time compared to the other classes. Cultivated fields appear as red/yellow as they have high NDVI at some periods and lower NDVI at other periods of the year. Figure 2d shows the Landsat sequential composites with the median of NDI 4 , i.e., NDVI, from seasons 4,2,1 (see Section 2.3.2) as RGB colors. Similar to Figure 2c vegetated and non-vegetated classes can be differentiated in Figure 2d due to the differences in their ranges of NDVI values. Due to differences in their phenology patterns, forest patches are differentiated from cultivated fields, mainly because forested areas have large NDVI for longer periods of the year. While NDI 4 , i.e., NDVI, was used for visualization of the Landsat composites in Figure 2c,d, the other NDIs also seemed effective in differentiating the target land-cover classes. The MODIS PDF composite, represented in Figure 2e where RGB colors correspond to the per-pixel 95%, 75%, and 50% of NDVI values, is similar to the Landsat PDF composite in Figure 2c but with lower spatial resolution. The MODIS sequential composite is shown in Figure 2f as the median of NDVI, for June 2016, October 2016, and January 2017 as RGB. Cultivated areas have high NDVI in October but low NDVI in January, therefore, they appear mostly as green/yellow in Figure 2f. Forested areas appear as white/cyan as they have higher NDVI compared to the other classes especially in January.
Overall, the inclusion of any multi-temporal composite, in addition to single-time and textural layers, improved overall classification accuracy (OA). The largest improvement in OA was obtained by the addition of the Landsat sequential composites, which resulted in 3.1% improvement in OA (classification 7 in Table 2), followed by inclusion of the Landsat PDF composites and sinusoidal models (classifications 6 and 5 in Table 2) with 2.7% and 2.1% improvement in OA, respectively. In terms of a comparison between Landsat and MODIS composites, Landsat composites resulted in larger OA improvements than MODIS for both PDF and sequential compositing methods. The advantage of Landsat over MODIS was 1.0% and 1.2% for PDF (classifications 6 and 8 in Table 2) and sequential composites (classifications 7 and 9), respectively. The performances of the two methods of compositing were similar with slight advantages of sequential over PDF compositing; the advantages were 0.4% (classifications 7 and 6 in Table 2) and 0.2% (classifications 9 and 8 in Table 2) for Landsat and MODIS compositing, respectively. In classification 10, where all the Landsat and MODIS composites were used as additional variables to the single-time and textural layers, classification OA improved by 5.1%, which was larger than improvements achieved by each of the multi-temporal composites (classifications 5-9 in Table 2) individually. Generally, Cultivated, Savanna, and Bare Soil classes benefited the most from multi-temporal composites. The largest improvement was obtained for UA of the Bare Soil class that was 16.4%. The Dense Forest, Urban/Built, and Water/Wetland classes did not improve substantially with the addition of multi-temporal composites and in some cases classification accuracy decreased for those classes, especially PA of the Urban/Built class (Table 2). This might be attributed to the fact that spectral signatures of those classes, especially Urban/Built and Water/Wetland, have fewer variations compared to the other classes over different seasons and benefited less from multi-temporal compositing. Overall, the inclusion of any multi-temporal composite, in addition to single-time and textural layers, improved overall classification accuracy (OA). The largest improvement in OA was obtained by the addition of the Landsat sequential composites, which resulted in 3.1% improvement in OA (classification 7 in Table 2), followed by inclusion of the Landsat PDF composites and sinusoidal models (classifications 6 and 5 in Table 2) with 2.7% and 2.1% improvement in OA, respectively. In terms of a comparison between Landsat and MODIS composites, Landsat composites resulted in larger OA improvements than MODIS for both PDF and sequential compositing methods. The advantage of Landsat over MODIS was 1.0% and 1.2% for PDF (classifications 6 and 8 in Table 2) and sequential composites (classifications 7 and 9), respectively. The performances of the two methods of compositing were similar with slight advantages of sequential over PDF compositing; the advantages were 0.4% (classifications 7 and 6 in Table 2) and 0.2% (classifications 9 and 8 in Table 2) for Landsat and MODIS compositing, respectively. In classification 10, where all the Landsat and MODIS Night-time light data from the VIIRS DNB and SRTM elevation dataset were evaluated as ancillary explanatory variables to improve land-cover characterization (classifications 11 and 12 in Table 2). Night-time data improved PA of the Urban/Built class by1.9%; however, UA for this class decreased by 9.6%. This can be partly explained by the blooming effect of night-time data, i.e., the expansion of light beyond urban boundaries, which can lead to an over-estimation of urban areas. Overall, the DNB dataset did not improve classification OA. On the other hand, incorporation of elevation and its derivatives, slope and aspect, improved classification OA by 3.1% (classification 12 in Table 2). This was the classification where all 108 explanatory variables were utilized. Class-specific accuracies also improved by an average of around 2.8%. Classification 13 in Table 2 shows the results of the classification where single-time NDIs were excluded. Exclusion of single-time NDIs decreased classification OA by 2.6%. This suggests that even though multi-temporal information of a sensor, Landsat 8 here, is employed in a classification process, the classification might still benefit from inclusion of single-time image mosaics, if cloud-free images, from a relatively short period, exist for the entire case study.

Comparison with the Existing Land-Cover Products
To further investigate the performance of the classification outputs, the best classification in terms of OA (classification 12 from Table 2), referred to as LC_2017 hereafter, is compared with the two existing contemporary land-cover products for Ethiopia, which are the GlobeLand30 for 2010 at 30 m resolution and the Copernicus Global Land Operations for 2015 at 100 m resolution. As discussed earlier (Section 2.2), these classifications were recoded and resampled to match with LC_2017 classifications. Note that there are some temporal gaps between the two products and LC_2017 classification. Therefore, some of the differences among these products and their accuracies could be related to actual land changes. Consequently, the comparisons in this section are performed to provide some context to interpret the relative performance of the proposed method rather than selecting a superior methodology. Figure 3 shows the three land-cover map products. The same test dataset used for accuracy assessment of LC_2017 product ( Table 2) was used to evaluate the accuracy of the two existing land-cover products (Table 3). Overall, the Copernicus and LC_2017 classifications outperformed the GlobeLand30 product by more than 10% in overall accuracy ( Table 3). The difference between OA of LC_2017 and GlobeLand30 was 11.0% and the difference was statistically significant (p-value < 0.001). The difference between OA of LC_2017 and Copernicus products was minor, 0.7% (p-value = 0.46). Moreover, for most classes, the Copernicus and LC_2017 classifications had larger class-specific accuracies than the GlobeLand30 (Table 3). For the Cultivated class, LC_2017 classification had the largest PA (lowest omission error), 79.0% and the Copernicus map had the largest UA (lowest commission error), 75.3%. As Table 4 shows, the cultivated area was 221,303 km 2 , 264,511 km 2 , and 298,831 km 2 for the GlobeLand30 (2010), Copernicus (2015), and LC_2017, respectively. Given the temporal order of the three products from 2010 to 2017, increasing rates of cultivation activities were observed, which confirms the recent land transactions and increase in intensified agricultural activities. For the Dense Forest class, a high degree of similarity exists between the Copernicus and LC_2017 products where Dense Forest pixels are mainly existing in the southwestern part of the country, whereas in the GlobeLand30 map many Forest pixels also exist in north, north-west, and eastern parts of the country (Figure 3). Moreover, the area of Dense Forest class varies substantially from 64,505 km 2 for the Copernicus map to 139,657 km 2 for the GlobeLand30 map. Besides classification error, the discrepancies among Dense Forest estimations can be attributed to some extent to the differences in the Dense Forest class definition by the different products. Considerable disagreements exist among the pattern of coverage of the Bare Soil class in GlobeLand30 map ( Figure 3A) and the other two products, where many bare areas are classified as Savanna by GlobeLand30. Consequently, the estimated Bare Soil area in GlobeLand30 is less than half of those in the Copernicus and LC_2017 classifications (Table 4). LC_2017 classification had more than 10% larger UA and PA for the Bare Soil class than the existing land-cover products (Table 3). For the Urban/Built class, LC_2017 classification had notably larger PA, 81.1%, than GlobeLand30 and Copernicus maps, 60.0% and 64.3%, respectively. The Urban/Built class UA of the existing maps, however, was larger than LC_2017 map by an average of 6.5%. Finally, with estimated 100% UA, LC_2017 classification outperformed the existing maps, 55% and 78.1% UA for the GlobeLand30 and Copernicus maps, in terms of Water/Wetland class commission error. Water class PA for the Copernicus map was on average 3.7% larger than those of the other two maps (Table 3).
For the Urban/Built class, LC_2017 classification had notably larger PA, 81.1%, than GlobeLand30 and Copernicus maps, 60.0% and 64.3%, respectively. The Urban/Built class UA of the existing maps, however, was larger than LC_2017 map by an average of 6.5%. Finally, with estimated 100% UA, LC_2017 classification outperformed the existing maps, 55% and 78.1% UA for the GlobeLand30 and Copernicus maps, in terms of Water/Wetland class commission error. Water class PA for the Copernicus map was on average 3.7% larger than those of the other two maps (Table 3).

Final Classification Production
The final classification product of this work is illustrated in Figure 4. The newly developed map provides a contemporary high spatial resolution land-cover characterization for Ethiopia, enabling investigation of the land transactions. To quantify agricultural expansion due to land transactions, a similar classification process will be utilized to produce land-cover maps for earlier dates and change maps will be produced. As evidenced from the three different coverage products already shown (Figure 3), there appears to have been a significant increase in agricultural coverage across Ethiopia, so the specific locations, transaction types and conversion trajectories becomes an important area of future study. The creation of the national classification, with our user-defined classes (Figure 4) is an essential first step, as is the development of full accuracy analyses and tools for comparison with existing products and with available high-resolution imagery. This research thus sets the stage for substantial, and meaningful additional research.

Discussion
Land-cover maps are essential variables to study the dynamics of environment. Human activities, such as urbanization, expansion of agricultural activities, and deforestation, as well as natural phenomena such as wildfire, flooding, and desertification frequently lead to changes in land cover. On the other hand, land-cover change itself can direct subsequent changes in the environment and its inhabitants. Therefore, land-cover mapping is integral to understand types of land change, their magnitude and distribution, and the drivers of change. Remote-sensing data provide practical means to map land cover of large areas in more efficient ways, in terms of time and cost, than traditional methods.
The national land-cover product we created shows that 26% of the land surface is in cultivated land use. This is higher than both the GlobeLand30 and Copernicus products (Table 4). GlobeLand30 was created in 2010 and Copernicus is for 2015 and show 221,303 km 2 and 264,511 km 2 of area under cultivated uses, compared to our estimate of 298,831 km 2 . Given that our product is for 2017 much of the differences in area may well relate to actual increases in area under cultivated uses, in part related to increased land transactions. In order to account for the differences across product types, future research will focus on this cultivated class and create a land-cover classification using these same methodologies, now they have been defined, developed and tested to compare increasing cultivated area and locations across Ethiopia from early 2000 through current. The remaining land-cover proportions nationally are 8.8% dense forest cover, 56.7% savanna cover, 7.4 % bare soil, 0.4% urban and built and 0.7% water. Comparing our results to theGlobeLand30 and Copernicus products, we show three times the area in urban/built cover, and less area in savanna cover. While some differences

Discussion
Land-cover maps are essential variables to study the dynamics of environment. Human activities, such as urbanization, expansion of agricultural activities, and deforestation, as well as natural phenomena such as wildfire, flooding, and desertification frequently lead to changes in land cover. On the other hand, land-cover change itself can direct subsequent changes in the environment and its inhabitants. Therefore, land-cover mapping is integral to understand types of land change, their magnitude and distribution, and the drivers of change. Remote-sensing data provide practical means to map land cover of large areas in more efficient ways, in terms of time and cost, than traditional methods.
The national land-cover product we created shows that 26% of the land surface is in cultivated land use. This is higher than both the GlobeLand30 and Copernicus products (Table 4). GlobeLand30 was created in 2010 and Copernicus is for 2015 and show 221,303 km 2 and 264,511 km 2 of area under cultivated uses, compared to our estimate of 298,831 km 2 . Given that our product is for 2017 much of the differences in area may well relate to actual increases in area under cultivated uses, in part related to increased land transactions. In order to account for the differences across product types, future research will focus on this cultivated class and create a land-cover classification using these same methodologies, now they have been defined, developed and tested to compare increasing cultivated area and locations across Ethiopia from early 2000 through current. The remaining land-cover proportions nationally are 8.8% dense forest cover, 56.7% savanna cover, 7.4 % bare soil, 0.4% urban and built and 0.7% water. Comparing our results to theGlobeLand30 and Copernicus products, we show three times the area in urban/built cover, and less area in savanna cover. While some differences vary by each product, these differences are consistent across both products compared to LC_2017. Increases in urban/built are expected over time, and the increase in cultivated area has come at the expense of savanna vegetation and so these differences are explainable. As already noted previously, there will be some confusion in classes with cultivated and savanna, urban/built and bare soil, and dense forest and savanna and so these potential spectral overlaps will be evaluated in more detail in future work focused specifically on the changes in cultivated land due to increased land transactions.
The land transactions and intensified agricultural activities over the last two decades in Ethiopia have caused considerable environmental change, as well as socio-economic impacts on farmers' livelihoods. To understand land-cover distribution nationally, within which the land transactions are located, it is necessary to characterize land cover at a national level. Time-series of land-cover characterizations are required to quantify the magnitude of land transactions and their spatial distribution. Consequently, to identify an optimum classification process, we examined several land-cover classifications using remote sensing imagery for Ethiopia and assessed their accuracy in this work.
Large-area land-cover image classification is associated with a spectrum of technical challenges such as cloud cover, radiometric inconsistencies within/between images, and big data processing. Consequently, classification accuracy for large-area land-cover products are usually less than those of small area single-scene classifications and in many cases the qualities of the final products might not be sufficient for the intended applications. For example, Yang et al. [109] investigated the accuracy of seven land-cover products over China and reported that overall accuracies ranged from 31.9% to 58.7%. A study of 64 global and regional land-cover products revealed that except in a small number of cases, the reported overall accuracies of most products ranged from 60% to 80% [110]. In addition, spatial variations are likely present in classification accuracies of land-cover products, which might increase as the size of the target area and the number of scenes increases [111][112][113][114][115].
Achieving the target quality could be challenging in large-area mapping projects, as it might require revising classification process/data and investigating multiple classification improvement avenues, which would require considerable effort, time, and budget. These requirements commonly confine analysts' ability to implement and examine several classification processes to determine which processes and input data are more appropriate for their specific case-study. Therefore, in this research, the classification was replicated using Landsat 8 TOA, SR, and SR NDIs' variables and the best-performing variable set was identified. In addition, different image compositing methods were investigated to quantify their relative performances.
On review, the NDIs outperformed the original spectral bands. The advantage of the spectral derivatives over the original bands may be attributed to the fact that the derivatives are differences and ratios of bands and are less affected by within and between image radiometric variations. Also, SR images resulted in larger classification OA than TOA images. In addition, multi-temporal composites improved classification accuracies. The advantage of multi-temporal composites is that they are constructed based on time-series of images experiencing broad ranges of sun illumination and atmospheric conditions. These composites can communicate longer-term characteristics of land objects and, therefore, could be more consistent than single-time images that represent only a single snapshot of the case study. Landsat multi-temporal composites resulted in larger OA improvements than MODIS composites. However, because of shorter revisit time, MODIS composites might be more helpful than Landsat composites when short time series of data are being analyzed or when the classification area experiences frequent cloud cover and less probability of clear observations. Sequential composites outperformed PDF composites for both Landsat and MODIS sensors, although the differences were small. Sequential composites could be especially useful with delineating target classes when different target classes have different phenology across a year, such as for Cultivated versus Dense Forest, as in this study (see Figure 2). On the other hand, PDF compositing might be prioritized over sequential compositing for areas with consistent cloud cover where there might not exist enough observations for some months/seasons/periods. In addition, timing of vegetation phenology and agricultural activities could vary over large areas, especially when the target area covers a wide range of latitudes. This can lead to within-class variations of variables of sequential composites. In such cases, PDF composites might provide more consistency than sequential composites, as PDF composites model the overall distributions of spectral values independent from their timing. Moreover, a combination of Landsat and MODIS multi-temporal composites outperformed each of them individually. This means that the two datasets provide complementary information for classification purposes. The efficacy of multi-temporal composites depends on the classification scheme target classes as well. For instance, Urban/Built and Water/Wetland did not largely benefit from those composites in this study as these classes have limited variation over different seasons. Moreover, night-time light data decreased omission error but at the cost of increasing commission error, partly due to a blooming effect. Depending on the application of a target map and effects of those errors on subsequent analyses, an analyst might prioritize one type of error over another. Finally, elevation data and its derivatives as ancillary explanatory variables improved classification accuracy.

Conclusions
In this research we examined the effects of several factors on the accuracy of land-cover map production at a national level. The Landsat 8 OLI surface reflectance products are aimed to provide spectral consistency among images captured from different scenes. However, replacing the top of atmosphere images with their corresponding reflectance images resulted in a moderate improvement of 1.1% in classification overall accuracy (OA) in this research. On the other hand, a simple substitution of the original spectral bands with normalized difference spectral indices resulted in an improvement of 1.0% in OA. Therefore, replacement of the original bands with spectral indices might be considered by image analysts, especially for situations (or sensors) where surface reflectance products are not available. The largest improvement (5.1% improvement in the classification OA with respect to the single-time classification) was achieved by the application of the multi-time image composites. Overall, Landsat time-series composites resulted in larger OA improvements than MODIS for both probability density function (PDF) and sequential compositing methods. Among the three approaches for multi-temporal image compositing, the sequential composites slightly outperformed the PDF and sinusoidal composites. The elevation data and its derivatives improved classification accuracy by 1.7%. The night-time light data improve producer's accuracy of the Urban/Built class with the cost of decreasing its user's accuracy. Results from this research can aid map producers with decisions related to operational large-area land-cover mapping projects, especially with selecting input explanatory variables and multi-temporal image compositing.
In our future work, we will examine additional methods to improve LC_2017 classification such as incorporation of active sensor data, alternative radiometric normalization methods, object-based image analysis methods, and post-processing techniques. Additional analysis will be conducted on the Cultivated class to separate smallholder fields from industrial fields. To quantify rates and locations of land-cover change in Ethiopia, especially those that are due to recent agricultural land transactions, a similar classification will be conducted for the years 2005-2006 and will be used for post-classification change detection. Developing these methods now allows for multiple applications across multiple dates for more accurate land-cover change analyses.
The land-cover map produced in this research provides up-to-date land information that is reproducible and accurate such that we can use this as a base map for future work, specifically the creation of a 2005-2006 land-cover national product to more fully evaluate the impact of increased land transactions and thus cultivated area, across Ethiopia. Such information is vital to study land transactions, as those changes have been occurring and accelerating over the last 10-15 years. Indeed, a frequent time-series of consistent and well validated accurate land-cover maps is required to investigate