Long-Term Land Use/Land Cover Change Assessment of the Kilombero Catchment in Tanzania Using Random Forest Classification and Robust Change Vector Analysis

: Information about land use/land cover (LULC) and their changes is useful for different stakeholders to assess future pathways of sustainable land use for food production as well as for nature conservation. In this study, we assess LULC changes in the Kilombero catchment in Tanzania, an important area of recent development in East Africa. LULC change is assessed in two ways: first, post-classification comparison (PCC) which allows us to directly assess changes from one LULC class to another, and second, spectral change detection. We perform LULC classification by applying random forests (RF) on sets of multitemporal metrics that account for seasonal within-class dynamics. For the spectral change detection, we make use of the robust change vector analysis (RCVA) and determine those changes that do not necessarily lead to another class. The combination of the two approaches enables us to distinguish areas that show a) only PCC changes, b) only spectral changes that do not affect the classification of a pixel, c) both types of change, or d) no changes at all. Our results reveal that only one-quarter of the catchment has not experienced any change. One-third shows both, spectral changes and LULC conversion. Changes detected with both methods predominantly occur in two major regions, one in the West of the catchment, one in the Kilombero floodplain. Both regions are important areas of food production and economic development in Tanzania. The Kilombero floodplain is a Ramsar protected area, half of which was converted to agricultural land in the past decades. Therefore, LULC monitoring is required to support sustainable land management. Relatively poor classification performances revealed several challenges during the classification process. The combined approach of PCC and RCVA allows us to detect spatial patterns of LULC change at distinct dimensions and intensities. With the assessment of additional classifier output, namely class-specific per-pixel classification probabilities and derived parameters, we account for classification uncertainty across space. We overlay the LULC change results and the spatial assessment of classification reliability to provide a thorough picture of the LULC changes taking place in the Kilombero catchment. exemplary shows the multitemporal minimum, mean and maximum NDVI composites and a schematic presentation of their calculation. floodplain and in the western part of the catchment. The change directions reveal


Introduction
Land use and land cover (LULC) and their change are key drivers of global change [1]. Over the past two decades, land-use change has accelerated, particularly in sub-Saharan Africa. The main reasons for this are growing population and consequently increasing demand for food and space. LULC change puts ecosystems under pressure. Hence, wetlands traditionally untouched landscapes were discovered as potential food production zones due to their capacities to reliably provide clean water and therefore good conditions for agricultural production and numerous other ecosystem services [2]. Natural wetlands, however, are among the most threatened ecosystems with a decline in area of approx. 35% between 1970 and 2015 [3][4][5]. Within the multi-disciplinary research project "GlobE -Wetlands in East Africa", challenges of stagnating or declining trends in food production and nature protection were addressed for four representative wetland sites in Kenya, Rwanda, Uganda and Tanzania. The objectives of the project were the assessment of wetland contribution to food security and of the sustainability of current wetland uses, as well as the spatiotemporal extrapolation of results through simulation modeling and scenarios. Knowledge of land cover holds quantified and spatially explicit information as an imprint of socio-economic activity in wetland landscapes and can be integrated in hydrological and agricultural modeling [6]. However, such data is particularly scarce in many African regions. Earth Observation provides the means for detailed inventory, status and trend analysis, and long-term monitoring [7,8]. The present study therefore focuses on the remote sensing based long-term LULC change assessment in the Tanzanian Kilombero catchment.
Knowledge of the land surface is crucial, since primary production takes place at the land surface, making it the key resource for all animals and human beings. In turn, LULC become central ecosystem indicators [9]. However, extensive and repeated information on LULC for the Kilombero catchment is limited. First scientific assessments of the Kilombero valley ecosystem characterizing geology as well as LULC and economic specifications emerged in the 1960s [10,11]. Attention was put on the Kilombero area again since it became increasingly important for Tanzania's agricultural production. Recent publications focus on hydrological impacts of LULC change within the Kilombero catchment [6,12,13]. They reveal a high dependency of the wetland on baseflow contribution from the enclosing catchment and demonstrate strong impacts of anthropogenic LULC change on water balance components at subcatchment scale.
LULC classification in the context of wetlands is often complicated by frequent, sometimes persistent cloud cover and high air moisture content leading to reduced data availability of optical images compared to other regions. Wetlands are dynamic systems with sometimes substantial intraand interseasonal variation [14]. Therefore, optimal acquisition times for repeated assessments usually do not exist. To capture specific events such as floods or specific features such as peak season it is most beneficial to take into account all available information. Although various approaches to assess wetlands were published recently [15,16] their analysis is often limited to the actual wetland area and usually a buffer around it. Catchment-wide studies are rare, in particular in East Africa. LULC change in the Kilombero valley was addressed in several case studies [6,12,[17][18][19][20][21][22][23][24][25][26]. Only a few of these studies explored the whole catchment and only few provided an exhaustive evaluation of the generated LULC change maps. LULC maps are needed, however, to balance the different interests in the catchment, to support local authorities and to implement policies [20].
With this study we want to assess and quantify LULC of the Kilombero catchment, Tanzania, and their changes over the past 45 years. The underlying research questions are to which extent and where LULC changes have occurred and of which nature these changes are. This spatially explicit information is needed to address land management in the Kilombero valley. Our objective is further to address major challenges occurring during the classification and change detection process. Postclassification comparison (PCC) is a common technique to assess LULC changes [27,28]. This wellestablished technique is prone to error propagation because errors from both the first and the second classification translate into the resulting map. Therefore, change tends to be overestimated. To reduce this source of error, PCC can be combined with spectral change detection methods [29]. Applying a threshold allows us to differentiate changed from unchanged areas. Assuming that only areas with strong spectral changes translate in a new classification, map update is often applied only to changed areas [30]. A further objective of this study is hence to analyze the correlation of PCC and spectral changes in order to achieve better understanding of the results. In addition, we explore the per-pixel classification probabilities to identify regions of different classifier reliability. Even though many machine learning algorithms provide probabilities as secondary output, these are not considered in most studies. The rest of the paper is organized as follows. In Section 2, we introduce the study site and describe satellite and field data and their processing. This section also includes a description of the LULC mapping and change detection approach and a protocol of the accuracy assessment. Section 3 presents major findings of this study and a thorough discussion; Section 4 briefly presents a conclusion.

Study Area
The Kilombero catchment ( Figure 1) is located in southern Tanzania and covers an area of 40,240 km 2 . It contains a vast floodplain wetland at an elevation of about 200-250 m above sea level. The Udzungwa mountains adjacent to the floodplain in the north and northwest reach an elevation of approx. 2500 m above sea level [31]. The Mahenge mountains further bound the valley zone in the south, and an undulating peneplain borders it to the west [11]. According to the Köppen-Geiger climate classification [32,33], the study area lies in the bioclimatic zone of "tropical savannah" (Aw), and the mean temperature and annual rainfall are 25°C and 1,418 mm in Ifakara, the largest town in the catchment. Woodlands and edaphic grasslands compose the natural vegetation [34]. However, due to the large extent of the catchment spanning across various topographic zones, rainfall patterns change with more orographic rainfall towards the Great Escarpment mountains and less rainfall in flatter areas [11]. There are two rainy seasons: a short one between November and January and a long one from March to May. A distinct dry season stretches from July to October [35,36]. The Kilombero river is a naturally shaped tributary of the Rufiji river with a high yearly discharge variability of about 92-3044 m³/s as observed at the Swero gauging station situated close to the outlet of the catchment and regular extensive flooding of the wetland during the long rainy season [20,37,38]. The Kilombero wetland is classified as an important bird area and was included in the Ramsar Convention on Wetlands list of wetlands of international importance in 2002.
The only connection of the Kilombero and Ulanga districts north and south of the Kilombero River was a ferry until 2017, when a bridge was finally installed after several attempts. With the recently established bridge and hence improved accessibility of the southern part of the valley, an increased population growth can be expected in the Ulanga district.
The road network inside the Kilombero valley is very loose, and access in particular to the south is limited although the road network is being developed (Figure 1). Rice cultivation in the Kilombero valley began in the mid of the last century. Today, rice production provides the main economic value of the Kilombero floodplain followed by sugarcane, forest products, fishing and livestock [39]. Over the past decades, pastoralist and agro-pastoralist groups of Maasai, Sukuma, and Barbaig have migrated from other Tanzanian regions into the catchment [40]. Consequently, both agricultural and livestock productivity have experienced considerable growth in the 21st century [40]. In the multistakeholder setting, land conflicts are sometimes attributed to these recent immigrants as they are often seen as intruders by local villagers and blamed for environmental degradation and threatening of resources. Large-scale farming and large protected areas are further competitors for land. During the past years, teak (Tectona grandis) plantations were installed in the uplands mainly south of the Kilombero river [24]. Additionally, smallholder forest plantations are a quickly developing phenomenon throughout south-western Tanzania [41].

Satellite Data and Pre-Processing
Landsat is the only operational medium spatial resolution earth observation (EO) system providing data covering the last 45 years. Therefore, this study relies on data from the Landsat system. The challenge of wetland monitoring with optical data is that they are often obscured by clouds due to higher evaporation rates caused by higher water availability compared to drylands. Additionally, the mountain ranges surrounding the Kilombero floodplain act as a cloud trap. In this study, we produced decadal LULC maps for 1974, 1994, 2004, and 2014. Since it was not possible to generate a gapless dataset for the 1980s even when the whole decade was considered and different compositing approaches were tested, we had to skip this period. All images from the three time periods (1994,2004,2014) ± 1 year with less than 80% cloud cover were downloaded from the United States Geological Survey (USGS) earth explorer (https://earthexplorer.usgs.gov/). Additional 17 images were downloaded for the 1970s. The satellite imagery as of Landsat 5 possesses the same spatial resolution of 30 m, and although bandwidths differ slightly between sensors, the products are nevertheless compatible [42]. The dataset properties are presented in Table 1. Cloud masking was performed using the function of mask (Fmask) output included in the pixel quality files appended to each dataset [43,44]. For each Landsat image we calculated the normalized difference built-up index (NDBI) [45], the normalized difference vegetation index (NDVI) [46], the normalized difference water index (NDWI) [47], and the Tasseled Cap [48,49] components brightness, greenness, and wetness as defined for Landsat surface reflectance data [50]. Image tiles from one orbit (i.e., with identical acquisition dates) were mosaicked to avoid double occurrence of along-track overlapping areas. Temporal variability is often a challenge of wetland classification since some classes are changing dynamically their condition [14,[51][52][53], e.g., through flooding, pronounced dry phases, or cropping cycles. Several image compositing approaches were developed in the past years [54][55][56][57][58][59][60][61][62]. The general idea of those methods is to select on a per-pixel basis the most suitable observations from a predefined selection of images that fulfill different quality criteria and represent spectral reflectance of one particular day of year in the best way. As a result, one gapless and cloud free image composed of most suitable observations from multiple -possibly partly clouded -images is generated. However, both high seasonal class variability and high cloud coverage make conventional mosaicking and compositing approaches inadequate for the study area at the given data availability. Therefore, we made use of multitemporal metrics to generate gapless and cloud free synthetic images from a selection of images with noise (e.g., clouds, cloud shadows, gaps from Landsat 7 scan line corrector failure) [63][64][65][66][67][68]. Since extreme values may be caused by artifacts such as missed clouds and cloud shadows, percentiles are usually preferred, although actual extreme values may be missed ( Figure 2). We used multitemporal metrics (10th percentile, 25th percentile, mean, 75th percentile, 90th percentile) of the six reflective Landsat bands (blue, green, red, near infra-red, and two shortwave infra-red), NDVI, NDBI, NDWI, Tasseled Cap brightness, greenness, and wetness. NDBI and TC brightness are sensitive to non-vegetated areas and settlements; NDVI and TC greenness are useful for discriminating vegetation properties; and NDWI and TC wetness highlight open water bodies. Although the use of NDBI, NDVI, NDWI, and the Tasseled Cap components may appear redundant, the approaches to calculate them are different and make them complementary. Figure 2 exemplary shows the multitemporal minimum, mean and maximum NDVI composites and a schematic presentation of their calculation.
Earlier Landsat products from MSS sensor have a coarser spatial resolution (60 m) and crudely different radiometric characteristics, and were therefore processed in a different way. Individual images were radiometrically normalized to a master image using the iteratively re-weighted multivariate alteration detection (IR-MAD) algorithm [69,70]. Clouds were masked based on manually selected thresholds of image brightness (i.e., the squared root of the sum of all spectral bands) and manual corrections. Finally, the resulting images were mosaicked to one gapless and cloud free image that represents the 1974 period. From that we calculated NDVI and the TC components.
The 30 m resolution shuttle radar topography mission digital elevation model (SRTM DEM) was used to account for the distinct morphology of the Kilombero catchment. The near-global DEM is composed from C-band radar data acquired in 2000 [71]. The data is widely used because of its relatively high and spatially coherent accuracy compared to other global datasets and its availability free of cost [72]. The floodplain wetland developed on the lower part of a fault, whereas the Udzungwa Mountains represent the upper part, leading to harsh differences in elevation and inclination across morphological zones. Since these are associated with land use, land cover, and species composition [73], the main Landsat dataset was complemented with the SRTM DEM and a number of derivatives: the morphometric indices calculated from the SRTM DEM were the terrain ruggedness index (TRI) [74], slope variability (SV) [75], the topographic position index (TPI) [76,77], and for potential soil moisture the topographic wetness index (TWI) [78], which all combine different derivatives of elevation. In addition to those four variables, slope and elevation were used in the classification. TRI, SV, and TPI were calculated using the ArcGIS geomorphometry and gradient toolbox [79].

Field Data and Other Reference Data
Several field campaigns and uncrewed aerial vehicle (UAV) flight campaigns were conducted between 2013 and 2016 to collect ground information. During three UAV campaigns in 2014 and 2015 a fixed-wing drone took images of ~20 cm spatial resolution in the area around Ifakara (approx. 10×3 km 2 ). The field campaigns covered dry and rainy season and included trips by foot, motorbike and car, during which global positioning system (GPS) points and photos of regions of interest were taken. We recorded LULC data as well as information about vegetation cover along paths (roads, footpaths, flight paths) for a subset of points. Independently, between 2015 and 2016 the Belgian Development Agency (Enabel, https://www.enabel.be/) conducted flight and road campaigns using a small aircraft and a jeep to collect high resolution imagery of the floodplain. The photos were taken with an action camera tied to the vehicles, and a handheld GPS at 2-second intervals.
Another 28 high-resolution RapidEye scenes were obtained via the RapidEye science archive (RESA) of the German aerospace center (DLR). They were used to complement the ground reference database. The images were captured between 25/08/2013 and 06/06/2015, thus covering almost two years' wet and dry season cycles. The spatial coverage, however, is limited to the northern part of the catchment, including the town of Ifakara. The commercial RapidEye sensors provide a spatial resolution of 6.5 m, 5 spectral bands, and an approximate revisit time of 6.5 days [80]. To complement the ground information, we subdivided the catchment in a regular grid of 25×25 km 2 cells and selected 10 points per grid cell using random sampling. Each point was labeled based on photo interpretation of GoogleEarth and RapidEye images.

LULC Mapping
To assess long-term LULC changes, we performed a classification based on historical records of remote sensing data. We used a robust classification scheme targeting environmentally and socioeconomically important classes, based on an approach that can handle data scarcity and that can capture variable classes. Our classification scheme includes 11 classes: montane forest, closed woodland, open woodland, teak plantation, swamp, grassland, savanna, upland agriculture, rice, built-up area, and water (Table 2). Open woodland mainly refers to Miombo dry forests (Brachystegia longifolia). Upland agriculture comprises maize, vegetables and other food crops usually grown in mosaics of small patches and in crop rotation. In contrast, floodplain agriculture comprises rice fields that rely on the water of the annual Kilombero flooding and its tributaries. Grassland refers to natural flood grassland dominated by sedges such as Cyperus distans as described in [11] whereas savanna comprises all non-flooded open grasslands with isolated trees. Open water, including streams, ponds, and lakes The 60 image bands (five multitemporal metrics for each of the six spectral bands and the six spectral indices) were stacked together with the topographic properties of the DEM (elevation, slope, TPI, SV, TRI, TWI) and subjected to a random forest (RF) supervised classification [82]. Yielding high accuracy rates, RF classifier is more and more used in different domains, including wetland monitoring [83,84]. In this study, we used the free and open-source randomForest package [85] in the R statistical software [86]. RF is an ensemble classifier based on decision trees, contributing to a 'forest'. For each tree, a random and independent bootstrapped sample (in bag) from the original input dataset is used to determine the tree's split values, best-reducing entropy among the samples' values, and the result is evaluated with the remaining samples (out of bag). From these, the so-called out of bag error (OOB) is estimated as an internal validation. Among its advantages are the convergence instead of overfitting when using large numbers of trees [82] and the non-parametric nature, which means independence from strong a priori assumptions about the statistical characteristics of a class [87]. RF have proven to handle categorical data, imbalanced data, and data with missing values and insignificant features well with similar performances than other classifiers like Support Vector Machines (SVM) [88]. Also, it provides the relative importance of predictive variables by removing a variable for each tree and subtracting the resulting accuracy from the original one. For normalization of that value, the raw variable importance is divided by its standard deviation. High values indicate high importance for the whole RF model [89,90]. The ability to evaluate the model internally with the OOB error can be seen as an advantage, but with caution. Although this error can serve as an approximation, this estimation can differ significantly from an independent accuracy assessment and should in any case be complemented by one, as shown by [84]. As a general advantage of ensemble classifiers, along with the final classification, the classifier provides the option to draw a probability map that consists of the distribution of votes of the single tree classifier components. The vote distribution across classes, for example measured as entropy, can be seen as an estimate of classification uncertainty across the study area and used as additional quality criterion [91]. For the classification reference, we split the data randomly into 60% for training and 40% for validation. Thus, in a stratified context, small classes were intentionally oversampled, a common practice in the use of imbalanced training data [92]. We found that buffering the reference points resulted in consistently higher classification accuracies and therefore buffered built-up and water by 30 m (in order to retain pure pixels we kept the buffer small) and all other classes by 75 m. All other tests to improve the classification performance (e.g. feature selection, different training/validation data splits) resulted in negligible differences. The number of trees was set to 500; the number of variables tried at each split was 6 for 1974, 24 for 1994 and 32 for 2004 and 2014.

Change Assessment
We used PCC to quantify the changes of the 40-year study period [27]. Interpretation of the resulting maps is simple and statistics can be calculated based on 'from-to' classes independently from the original data used for the classifications [93]. Subtle land-cover changes can be interpreted as modification of a landscape unit without necessarily changing its classification. Hence, spectral changes may occur due to anthropogenic activity or climate variability that does not translate in a different land use class. Therefore, we also assessed spectral changes in addition to the PCC. Multitemporal metrics represent pseudo-spectral bands rather than spectral measurements. We achieved spectral comparability between the time steps by selecting the most appropriate bands with an all-against-all correlation analysis. We used percentiles of TC brightness, greenness, and wetness at 5% intervals and correlated them between consecutive time steps (e.g. minimum brightness 2004 against minimum brightness 2014, 5th percentile brightness 2004 against minimum brightness 2014 and so forth). The TC components were selected because of their physical interpretability and their comparability across Landsat sensors. Based on the assumption that the highest correlation includes least spurious changes, we calculated Pearson's correlation coefficient and selected those band pairs that showed highest correlation. With only a few exceptions the mean values yielded highest correlation. For each consecutive reference year, we generated a TC component stack from the bands that showed the highest correlation and performed change detection based on the robust change vector analysis (RCVA) [94], a modification of the well-known change vector analysis (CVA) [95]. For example, the 40th percentile brightness 2004, mean greenness 2004, and mean wetness 2004 were assessed against the 35th percentile brightness 2014, mean greenness 2014, and mean wetness 2014. This variant of the well-known change vector analysis provides information about change intensity expressed as magnitude, and the nature of change expressed as direction [94,[96][97][98]. Change direction can be calculated in different ways. Here, we applied coding according to increase or decrease in each index band difference [98]. Using three bands, this results in only eight values (directions), each of them providing a unique combination of increase or decrease in the TC components. We applied the Otsu threshold method [99] on each magnitude image to separate changes from unchanged areas and applied the resulting binary mask on both images, magnitude and direction. A combination of PCC and RCVA change results allowed us to distinguish areas that show no change, areas where only PCC or only RCVA recorded changes and those where both methods detected changes. PCC reveals changes related to the conversion of land use classes to other classes, whereas RCVA is more suited to detect within-class spectral changes. These can be related for example to forest growth or selective logging and other forms of degradation. Sometimes these spectral changes indicate the beginning of land conversion at sub-pixel scale and are therefore important to locate potential future LULC changes.

Accuracy Assessment and Evaluation of Classification Results
We calculated overall accuracy, user's accuracy, and producer's accuracy, well-established measures of accuracy which are all averaged across space [100], following the suggestions by Olofsson et al. (2014) [101]. Also, authors increasingly explore map quality and spatial uncertainty, which complement regular accuracy measures [91]. Many studies applying RF do not further consider the class probabilities. In contrast, we calculated individual class probabilities within the RF classification to account for the level of classifier-internal uncertainty [84]. As additional measure, we calculated Shannon's entropy which shows how distinct the class votes for each pixel are. Low entropy indicates only little confusion between classes and a concordant decision whereas high entropy denotes ambiguity. Hence, sources of uncertainty crystallize in low probability levels for the resulting class and high entropy, whereas areas of high probability and low entropy values for the resulting class can be called areas of low internal uncertainty. Here, we additionally calculate the highest probability per pixel from the per-class probabilities to identify areas with high and low classification confidence. In addition, we determined the second highest probability and calculated the difference between highest and second highest probability. This difference shows how dominant a class score is. Low probabilities do not necessarily translate into misclassification, but understanding misclassification can help to interpret mapped LULC changes.
RF final class decision results from the most popular class vote from all trees [82]. The proportions can be translated to probabilities, where a high probability does not necessarily mean that a pixel is correctly classified. However, high values indicate a high probability of a pixel to belong to certain class. A pixel therefore does not need to have probabilities higher than 50% in order to be assigned to a class. Higher ambiguities can be expected from those pixels that are assigned to a class with votes of less than 50% of the trees. We looked on all pixels with less than 50% maximum classification probability, determined the class with second highest classification probability, and calculated for each class the percentages of the second-placed classes. Whereas this cannot be used as a validation measure it reveals dominant confusion and hence similarities in feature space.

LULC Classification
Most of the selected classes are of continuous nature making it challenging to distinguish them. For example, floodplain grassland and savanna are both dominated by grass species; rainfed crops and grasses have similar phenology; and open and closed woodland are at the ends of a continuum. However, we achieved consistent classification results with reasonable overall accuracies between 64% and 74% based on validation with independent data (Table 3-6). Internal classifier performance was good with low OOB errors ranging between 3.28% and 6.67% (see supplementary material). Most classes had high user's and producer's accuracies, in particular the forest classes. Built-up areas are very small in extent compared to the other classes and therefore show low producer's accuracy. Confusion also exists between savanna and grassland, and between upland agriculture and rice crops. Indeed, it is common practice to grow rice as the main crop and other food crops such as maize and vegetables during the small rainy season or in the dry season. With the present approach, we are not able to account for these management practices but rather classify the dominant crop. Figure 3 shows the classification results of the entire catchment for the 1970s, 1994, 2004, and 2014.   Since rice is usually planted within the areas regularly flooded during the rainy season, it can be seen as contrasting to upland agriculture even though typical upland agriculture crops (e.g., maize, vegetables, cassava, sugar cane) are often planted as secondary crops in the same places.
Since the statistical figures of the confusion matrix represent aggregated information over the whole study area, it is useful to assess spatially explicit classifier performance. Figure 4 shows the highest per-pixel probability, the difference between highest and second-highest probability, and entropy. Classification of a pixel is reliable when its highest probability is high, the difference to the second-highest probability is high, and entropy is low (in Figure 4 good performance is indicated by blue colors).
Distinct patterns show across all classifications with higher uncertainty in the floodplain where savanna, grassland, open woodland and upland agriculture are more likely to be confused. This can be explained by higher variability as compared to stable classes like montane forest and teak, by cooccurrence of those classes in small-sized mosaics, crop rotation as well as by spectral similarities. On the one hand, the use of multitemporal metrics over the period of three years leads to robust data to be analyzed. On the other hand, multitemporal metrics reduce the spectral contrast between spectrally related classes, potentially contributing to class confusion. Other studies confirm that temporal aggregation does not necessarily lead to higher classification accuracies compared to wellchosen two-date images [102]. However, in our study selection of appropriate images was not an option due to persistent cloud cover and due to the size of the study area. To get a better understanding of the per-class RF performance, the reader is referred to the supplementary material that shows the per-pixel classification probability per class. The assessment of the pixels with less than 50% classification probability considered the secondplaced class. As a result, it is possible to plot confusion percentages per class ( Figure 5). It can be seen that the similarity in feature space is rather stable over time, e.g. the second-placed classes of all montane forest pixels classified with less than 50% probability are open and closed woodland with only a few swamp and water pixels in 1974 and 1994 (Figure 5a and b). This is in line with gradient nature between dense montane forests, closed woodland and open woodland. Hence, open woodland has similarity with montane forest, closed woodland, and savanna. Most likely because there are only scattered trees in the floodplain grassland only little similarity exists with this class. There is however increasing similarity with upland agriculture. Small-scale agricultural fields are often covered by trees and might lead to this kind of confusion. Exploring pixels with less than 50% classification probability focuses on pixels that are likely to be confused with other classes. In turn, if we explore pixels with high probabilities, confusion is likely to include classes that are less well defined. This mainly refers to built-up, which has the largest probability to include mixed pixels. The general pattern of similarity between first and second-ranked classes of pixels with high classification probability (>75%, Figure 6) and those with higher ambiguity (probability <50%, Figure 5) is identical with the exception of higher shares of built-up when the classification probability of the first class is high. The percentage of pixels with classification probabilities lower or higher than 50% is shown in Table 7. It can be seen that more than 80% of all pixels are classified with more than 50% of the votes for a particular class. Some classes are more challenging and show a high percentage of pixels with less than 50% classification probability, e.g. built-up, grassland and rice.

LULC Change
We assessed two different types of change: 1. LULC conversion, i.e. replacement of a LULC class by another, and 2. spectral changes. The latter usually apply for LULC conversion but also for modification, i.e. within-class changes that do not necessarily translate in a different classification. LULC conversion is visible in Figure 3 which reveals most pronounced changes within the Kilombero floodplain where natural grasslands, savanna vegetation and swamps were converted to agricultural areas with rice dominating in the flood prone areas. It also becomes obvious that the changes accelerated between 2004 and 2014. Apart from the obvious LULC conversion, spectral changes are evident in huge parts of the study site (Figure 8, 9). High values of RCVA magnitude indicate strong spectral changes whereas values close to zero indicate unchanged areas. The thresholds separating change from no-change were calculated based on the magnitude histograms using the Otsu thresholding method [99]. Figure 8 shows that pronounced spectral changes between 1974 and 1994 appear in the floodplain and the eastern part of the catchment, but also the western part shows some change. These changes might result from undesired atmospheric effects caused by high water vapor concentration within the Kilombero valley, as indicated by the respective change direction. Changes in the western part (blueish-greenish colors in Figure 8d) are characterized by decreasing TC greenness -most likely due to dryer conditions or due to removal of vegetation that might be attributed to agricultural activities and clearing of natural vegetation. This area (Njombe region) is comparably well developed with infrastructure ( Figure 1) and well known for maize, Irish potato, tea and flower production. There are, however, only a few areas affected by LULC conversion between 1974 and 1994. If classes are spectrally similar it may happen that their classification changes from one period to the next, e.g. closed woodland is replaced by open woodland. The respective spectral change, however, may lie below the threshold and is therefore not recorded as change. In other places spectral changes may occur without necessarily changing the classification of a pixel. However, areas where PCC and spectral changes overlap are very likely to show distinct alterations of the bio-physical properties. Areas with highest ambiguities (Figure 4) correlate with areas of most pronounced spectral changes (Figure 8a-c). Maps of the overlaps of the PCC results, the overlaps of the spectral RCVA-based changes, and the overlaps of both of them are shown in Figure 9. There are only a few larger areas that have not changed at all during the observation period (white areas in Figure 8f). They amount to 927,739 ha corresponding to less than one-quarter of the catchment. At the same time, about 1,440,899 ha show both, PCC-based and spectral changes corresponding to about one-third of the catchment area. Those areas are dominant in the Kilombero floodplain and in the west. Fluxes of LULC classes are depicted in the Sankey plots of Figure 10. Looking at the entire catchment area reveals that extension of agricultural areas (upland agriculture and rice cultivation) is the dominant type of change (a). However, a closer look to the Ramsar site shows that rice became the dominant LULC class in this protected area over the past period (b). The Sankey plots also reveal that rice extension is at the expense of savanna and grassland whereas upland agriculture takes place mainly at the expense of open woodland. Land cover conversion in the Kilombero catchment strongly accelerated within the past two decades, resulting in a range of documented effects. LULC changes within the whole catchment have an impact on the hydrological and energy budget of the Kilombero floodplain [12,25]. The feedback loops between LULC change, hydrology and climate are yet to be understood [12,13]. Drivers of change were not addressed in this study but were examined by Msofe et al. (2019) [22] and addressed by Daconto et al. (2018) [103] in the Kilombero valley integrated management plan. According to our findings most intense LULC changes are visible within the Ramsar site, whose extent coincides with the floodplain for the most part. The detected changes are dominated by increased conversion to agriculture, in particular by rice cultivation after 2004. This LULC conversion mostly affected savanna and floodplain grassland, which declined over the past years. Only 1,900 km 2 of floodplain grassland remain. This trajectory can be explained by the growing population in-situ as well as through migration, but also reflects governmental strategies [22,40]. While there is a lack of wetland inventories in Africa and hence only limited knowledge about wetland extent and change, a general decline of wetlands is evident [3,5,104], which is in line with our results. A recent LULC change assessment in the Wami river basin in Tanzania revealed similar results to our study with a remarkable decline of grassland and woodland resulting from a conversion to cultivated land [105]. By buying the land, domestic and foreign investors push the local farmers into the wetland and the forests. Increased forest use intensity does not necessarily translate into LULC change between forest classes or in forest loss.
Food production is of high priority in Tanzania, as reflected by Tanzania's kilimo kwanza (agriculture first) strategy entailing modernization of agriculture. In particular, this manifests in the implementation of the SAGCOT in the Kilombero catchment. However, the cropland area within the Kilombero floodplain spreads along the margins of the floodplain, leaving only little space for further changes. Crop production increasingly extends inside the flooded areas with the risk of drowning crops. About 7,250 km 2 of the catchment's natural vegetation, predominantly grassland and savanna were converted to cropland since the 1970s (≙ 18% of the catchment area); most of the changes occurred between 2004 and 2014. Suitable food production areas are exploited almost to their limits. Further LULC conversion can only take place at the expense of natural woodland or within the floodplain. Going further inside the floodplain will only work if the Kilombero floods are regulated, which could have dramatic negative impacts on the ecosystem. For the study period, agricultural expansion into forests is of minor importance. Converting forest to cropland impacts hydrological processes since the forests are the place where runoff and groundwater discharge are generated [12]. In addition, forest loss would directly translate into further habitat loss, biodiversity loss and negative impacts on local livelihoods that depend on their ecosystem services. Hence, further agricultural development must rely on measures to increase productivity, e.g., moderate irrigation, use of fertilizer, leveling, weeding, bunding [106,107]. Such options must be weighed carefully, as also agricultural intensification without areal extension can have detrimental effects on the environment, if not done in a sustainable manner, causing soil compaction, erosion or pollution through input of chemicals [108]. Nevertheless, good agricultural practices can improve yields but are currently not common practice in the Kilombero area [107]. Intensification, however, requires financial capital, access to fertilizer, and capacities. All these measures must be balanced with the guidelines of the different protected areas that cover vast areas of the Kilombero catchment. A management plan was established in late 2018, which outlines various scenarios depending on the funding [103]. It forms a policy framework for future sustainable development in the region. Our LULC maps are integral part of it, and continuous monitoring may support those efforts [20]. LULC often results in extensive loss of ecosystem services [23,109]. Hence, our LULC change assessment may serve to quantifying related ecosystem services.

Conclusion
Long-term LULC change assessments with dense time series of satellite imagery are often used to generate a comprehensive understanding of landscape changes and natural resources management. Despite a large historical data record provided by the Landsat program, change analyses carried out in tropical wetlands face limited availability of usable data due to persistent cloud cover triggered by excessive moisture. It is precisely in these environments where LULC change analyses are needed the most, due to the intense expansion of agriculture and urban environments experienced during this century. In this study, we demonstrate an appropriate approach to LULC analysis using multitemporal metrics of Landsat data. We generated decadal gapless composites that reflect seasonal patterns and are comparable between years, achieving reasonable classification accuracies for these dynamic environments. The analysis of per-pixel classification probabilities and derived parameters, as performed here, is well suited to detect spatial patterns of classification reliability and to support the interpretation of the results. Our combined approach of spectral change detection and PCC revealed complementarity, since LULC conversion and subtle within-class changes could be identified. Using least correlated multitemporal metrics of TC components allowed us to minimize spurious changes. Our results also showed that the area of land per capita for agricultural development is becoming increasingly scarce in the Kilombero catchment. Almost half of the Kilombero Ramsar site, a protected area with international importance, was converted from natural to anthropogenic land use, mostly during the 21st century. Located alongside a widely unregulated river system the Kilombero wetland is still a precious ecosystem. Under the current situation of constant economic and population growth and accelerating LULC change over the past decades it is likely that anthropogenic land use expansion will further increase. A regular LULC monitoring may serve as a means to guide decision makers through the process of sustainable land resource management, and to facilitate the implementation and success of such plans.
Supplementary Materials: The following are available online at www.mdpi.com/xxx/s1, Figure S1: classification probabilities per pixel and per class, Figure S2: variable importance for the classifications of 1974, 1994, 2004, and 2014, Table S5: From-to-LULC changes between 1974 and 1994, Table S6: From-to-LULC changes between 1994 and 2004, Table S7

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.