A Forest Monitoring System for Tanzania

: Tropical forests provide essential ecosystem services related to human livelihoods. However, the distribution and condition of tropical forests are under signiﬁcant pressure, causing shrink-age and risking biodiversity loss across the tropics. Tanzania is currently undergoing signiﬁcant forest cover changes, but monitoring is limited, in part due to a lack of remote sensing knowledge, tools and methods. This study has demonstrated a comprehensive approach to creating a national-scale forest monitoring system using Earth Observation data to inform decision making, policy formulation, and combat biodiversity loss. A systematically wall-to-wall forest baseline was created for 2018 through the application of Landsat 8 imagery. The classiﬁcation was developed using the extreme gradient boosting (XGBoost) machine-learning algorithm, and achieved an accuracy of 89% and identiﬁed 45.76% of the country’s area to be covered with forest. Of those forested areas, 45% was found within nationally protected areas. Utilising an innovative methodology based on a forest habitat suitability analysis, the forest baseline was classiﬁed into forest types, with an overall accuracy of 85%. Woodlands (open and closed) were found to make up 79% of Tanzania’s forests. To map changes in forest extent, an automated system for downloading and processing of the Landsat imagery was used along with the XGBoost classiﬁers trained to deﬁne the national forest extent, where Landsat 8 scenes were individually downloaded and processed and the identiﬁed changes summarised on an annual basis. Forest loss identiﬁed for 2019 was found to be 157,204 hectares, with an overall accuracy of 82%. These forest losses within Tanzania have already triggered ecological problems and alterations in ecosystem types and species loss. Therefore, a forest monitoring system, such as the one presented in this study, will enhance conservation programmes and support efforts to save the last remnants of Tanzania’s pristine forests.


Introduction
Tropical forests contain the most distinct and complex biome on Earth, with unique plant species of high economic value, and support habitat for many animal species.They provide numerous valuable ecosystem services while also aiding the mitigation of climate change [1][2][3][4].Africa is home to some of the world's most magnificent tropical forests.With more more than 60 million people dwelling within or near these forests, they are relied upon for many ecosystem systems, with livelihoods dependent on them for providing food, medicinal plants, fuel, fibres, and non-timber forest products.At the same time, they are also important for societal and cultural purposes [5].Despite the importance of these forests, the spatial and temporal variation in tropical forest cover has raised the dynamic state and increased forest cover decline.However, throughout many parts of Africa, forest cover change remains poorly understood due to field-based monitoring challenges and a lack of remote sensing studies.Government policies have often failed to prevent illegal forestry activities due to the absence of defined tenure, which has also increased over-utilisation.This is also linked to the lack of national capacity for monitoring and reporting deforestation, especially in many sub-Saharan Africa countries [3] and hence remains a barrier.
Therefore, the loss of tropical forests has been of international concern [6] in various Multilateral Environmental Agreements (MEA) such as the Convention on Biological Diversity (CBD) and the United Nations Framework Convention on Climate Change (UNFCCC) [7], as it is linked to increases in erosion, run-off and flooding, CO 2 concentration, climate change, and biodiversity loss [4,[8][9][10].This demonstrates the need for more conservation of tropical forests, particularly threats from the expansion of agriculture, urban expansion and other forms of deforestation in contemporary society.Accurate information on the extent and extinction of tropical forests is necessary as the yardstick for policy and decision makers.
The application of Earth Observation data and growing computational power can help characterise tropical forests and monitoring at the landscape level to address inadequate information [11].The provision of satellite-based data has come a long way since the 1970s, with more missions to be launched over the coming years [12] that will support the mapping of changes in tropical forest areas with broad geographic coverage.However, efforts to provide ways to operationalise Earth Observation data for tropical forest monitoring have, to date, been heavily biased toward particular regions, such as South America [11], while Africa, apart from the Congo Basin, has been severely understudied.For example, from 1995 to 2003, approximately two-thirds of studies focused on the Amazon Basin, 18% focused on central Africa, and 17% on Southeast Asia [13].Therefore, expanding tropical forest monitoring beyond these areas is required to capitalise on Earth Observation data availability with extensive area coverage.Perhaps most importantly, attempts to combine societal constraints while contributing to an increase in scientific knowledge and provide up-to-date and appropriate maps and statistics of tropical forest conservation activities on the ground are needed.
The situation is more complicated in developing countries such as the United Republic of Tanzania (hereafter referred to as Tanzania), where many households are highly dependent on forest resources.Despite the value of forests, changes in use patterns pose a noticeable threat to forest resource sustainability on both socio-economic and ecological functioning.This has led to the increased scarcity of forest resources, further aggravated by the continuing high deforestation rate [14] and associated with a change in climatic conditions experienced in many parts of Tanzania.Therefore, the need for data and information on the state of Tanzania's forest resources is of increasing importance.Yet, Tanzania's forest resource status and trends are mostly unknown, with current data being fragmented and outdated [15].
The lack of institutional capacity has largely constrained data reliability on Tanzanian forest resources, with inadequate national-wide forest monitoring coverage using an Earth Observation-based system.Using freely accessible satellite data and advanced remote sensing methods can provide a cost-effective and timely approach to achieving systematic wall-to-wall information for Tanzania's forests.This information is required to support national policy processes aimed at improving sustainable forest management-at the same time addressing the issues of Reduced Emissions from Deforestation and Forest Degradation (REDD+) and Green House Gas (GHG) as international reporting obligations.Also the 2030 Agenda for Sustainable Development Goals (SDGs) for enhancing life on land (goal 15) and its targets through combating deforestation [16].
Large-scale forest monitoring by remote sensing has been reported from global forest change studies and provides initial forest loss estimates.Hansen et al. [14] provides a global forest loss estimate from 2001 to 2019 with a spatial resolution of 30 m through the publically available Global Forest Watch dataset (version 1.7).As a global and freely available dataset, it offers extensive forest change information and the accuracy of this dataset should be assessed for Tanzania.
This study aimed to create the basis for a long-term national forest monitoring system for Tanzania.Such a system is needed to help the country bridge the information gap and knowledge concerning remote sensing in forest monitoring at a national level as a cost-effective and timely means, instead of time-consuming and costly ground surveys.Specifically, this study will focus on the following research questions: (i) What is the current (baseline) spatial extent of forest cover in Tanzania?(ii) What is the distribution of the different forest types in Tanzania?(iii) How can changes in forest extent be mapped as part of an ongoing monitoring system?

Study Area
Our study area is located in mainland Tanzania.The country's territory covers 945,100 km 2 (29 • and 41 • E and 1 • and 12 • S) (Figure 1).Tanzania has diverse terrain, with a combination of plains, hills and forested mountains, with the highest peak at 5895 m a.s.l.The country has different bioclimatic and topographic zones, ranging from dry regions where precipitation levels are below 400 mm to humid areas where precipitation levels reach over 2000 mm per year [17], with a maximum mean temperature range of 26.6-33.1 • C and a minimum of 5.3-18.3• C [18].This broad diversity of geographic conditions has given rise to various ecosystems and habitats, including semi-arid, humid, and tropical subhumid zones [19].The primary natural forest types are montane, lowland, mangrove forest and woodlands (open, closed and thickets) [20] and managed timberline trees, with different localised habitat distributions [21].

Software and Data Processing
Data acquisition and processing were undertaken using the open-source Remote Sensing and GIS Software Library (RSGISLib; [22]), the KEA file format [23] and the XG-Boost [24] machine learning library.Earth Observation Data Downloader (EODataDown) was used for downloading and processing the Landsat-8 data to an analysis-ready data (ARD) product [25].For the detection of change, EODataDown plugins were implemented, automating the scene-based processing following the generation of the ARD product.The EODataDown software is written in Python and uses a PostgreSQL database for metadata storage with dependencies on RSGISLib and Atmospheric and Radiometric Correction of Satellite Imagery (ARCSI) [26]) software.For this study, the system was deployed on the SuperComputing Wales (SCW) high-performance computing (HPC) infrastructure.The use of SCW significantly reduced the processing time by paralleling the data processing, where approximately 100 cores were used at each processing stage.

Landsat-8 Pre-Processing
The forest baseline for Tanzania was developed using multispectral data from the Landsat 8 Operational Land Imager (OLI) [27].At the time of analysis, all the collection-1 images (2013 to 2018) for May to November with a cloud cover threshold of >80% were downloaded.The images were selected from a dry period to minimise the cloud and cloud shadow and discrepancies in reflection caused by seasonal vegetation fluxes [28].A total of 3200 Landsat 8 OLI images were downloaded using the Google Cloud API.
Pre-processing to surface reflectance was undertaken using the ARCSI [26] software, which uses a dark object subtraction (DOS) [29] in the visible bands to retrieve an estimate of the aerosol optical depth (AOD) [26] from the image.This is then used to parameterise the 6S radiative transfer model [30] and apply the resulting correction to create standardised reflectance using the topographic and bidirectional reflection correction proposed by Shepherd and Dymond [31].More details of this processing chain are provided in [32].

Classification Methodology
The classification followed a hierarchical approach, first delineating a binary forest extent (level 1) and then classifying the forest pixels into forest types (level 2) (Figure 2), generating the forest baseline for Tanzania.Training with this number of samples would create an XGBoost classifier with many levels using very large trees that would be very slow to train and apply.Therefore, 100 training sets were derived where 500 (of the 3200) scenes were randomly selected.The associated training samples were merged and then balanced to ensure an equal number of forest and non-forest samples in each training set.The number of samples within the 100 sets varied, with a minimum of 60,874,216 and a maximum of 78,833,485; the mean was 68,118,012 and the standard deviation was 35,37,652.Each of the 100 sets was then split by taking random non-overlapping samples to create training (50%), validation (25%) and testing (25%) datasets.

XGBoost Classifier
The XGBoost algorithm was used in this analysis.The XGBoost algorithm utilises a gradient boosting (GBM) approach that creates scalable tree structures designed for memory efficiency with parallel processing capability.The XGBoost algorithm also includes tree pruning and regularisation to avoid overfitting or bias to provide better classifier performance [24].Therefore, the algorithm can use the large training datasets [33] produced as part of this study.

Optimising the XGBoost Parameters
The XGBoost algorithm has a large number of hyperparameters which can impact the classifier performance and therefore require optimisation.A Bayesian optimisation was therefore used to optimise the parameters for each of the 100 training sets.The full training and validation datasets were not used for the hyperparameter optimisation to reduce the time required to perform the optimisation.Randomly selected subsets of the training and validation data were therefore created; 60,000 training samples of forest and non-forest were used (i.e., 120,000 in total) and 20,000 for each (i.e., 40,000 in total) for the validation dataset.

Training the XGBoost Classifiers
Using the full training and validation datasets and the optimised hyperparameters, the 100 forest/non-forest classifiers were consecutively trained using 40 cores per job.It took 35 days for all 100 models to be trained.Using the testing datasets, the average accuracy of the classifiers was 99%.

Creating the Final Forest Extent Map
To generate the forest extent classification, the 100 classifiers were applied to each of the 3200 scenes.The 100 classifications were summarised on a per-pixel basis providing a percentage for the number of times a pixel was classified as forest.The scene was then thresholded using thresholds of 30%, 50% and 80%.The thresholds were selected based on a visual inspection of the subsample of scenes and chosen to capture the probability of pixels classified as a forest at the three different levels.To merge the scene-based classifications, to create a national forest mask, a 100 km tiling was used to allow parallel processing.The percentage of times it was classified as forest was calculated for each pixel, resulting in three output images for each scene-based threshold (i.e., 30%, 50% and 80%).Those outputs were subsequently thresholded using the same 30%, 50% and 80% thresholds, creating 9 forest extent maps for Tanzania (e.g., scene threshold of 50% and national threshold of 80%).An independent accuracy assessment was used to identify the optimal forest extent map.

Accuracy of the Forest Extent Map
The National Forest Inventory (NFI; NAFORMA) collected by Tanzania Forest Services from 2011 to 2015 [20] and other local forest inventories for the period 2016-2018 were considered for this assessment of accuracy.Still, the temporal-and spatial-scale differences in defining the forest extent from these data were difficult.Therefore, the NFI data were not considered reliable reference data to assess the forest extent map against.Therefore, the accuracy assessment was conducted using stratified random sampling in 9 185 km by 180 km subregions distributed throughout Tanzania, ensuring that the validation took into account the variability in forest and non-forest land cover across the country.High-resolution images in base-map layers from virtual globe web-based maps (i.e., ESRI Satellite, Bing Satellite and Google Satellite), available through QGIS, were used as reference data as these layers have been found to be adequate for validation by previous studies (e.g., [34]) and enabled national coverage of the reference points rather than biased based on high-resolution imagery availability.For each subregion, 1000 sample points for forest and non-forest classes were generated, making 2000 points for each sample area and 18,000 reference points in total.
The ClassAccuracy QGIS plugin [32] was used to efficiently verify each point with an overlay on a virtual globes web-based map between the two classes.The accuracy metrics from the error matrix were summarised as both an overall accuracy (OA), user and producer accuracy (UA and PA), allocation disagreement (AD), quantity disagreement (QD) [35,36], F1-score [37], and the Matthews Correlation Coefficient (MCC) [38].The metrics enable the users to understand the distribution of errors in the products.

Forest Type Classification
The second step (level 2) of forest classification was to classify the forested pixels into forest types (montane, lowland, mangrove, plantation forest, closed woodland, open woodland and thicket).The classification of forest types is necessary for generating detailed forest distribution to evaluate forest ecological systems and support monitoring and management practices.To constrain this analysis, each forest type habitat suitability, previously published in John et al. [21], was used.This novel approach was selected to minimise the classification error such that a pixel was only considered for the forest types the habitat suitability analysis had identified.Therefore, it constrained the classification of forest types based on their adaptation and corresponding bioclimatic patterns, minimising misclassification.

Forest Types Mask
The habitat suitability extent maps were intersected to merge the individual forest type suitability maps, identifying 34 combinations.For a small number of areas, the habitat suitability result provided suitability for only a single class (e.g., open woodland).However, this would not allow the classifier to perform a classification, so a second class was added in these cases.For example, for the areas which only had suitability for open woodland, then closed woodland was added (Table 1).
The habitat suitability analysis was undertaken at a pixel resolution of 1 km as this was the resolution of the environmental variables used for the analysis, which allowed local environmental variability to be captured [39] at a country scale, and the corresponding bioclimatic patterns enable inferring of relationships between different forest type habitats.Therefore, a nearest neighbour resampling was used to create a 30 m resolution product required for the Landsat classification.However, due to this resolution change, there were 30 m pixels that were within the forest mask but did not have a habitat suitability.For example, along the coast and other forest/non-forest boundaries present at 1 km (e.g., wetlands and lakes).A k-Nearest Neighbour (k-NN) fill was performed to fill these pixels, where an unknown pixel was filled with the mode of the k spatially nearest pixels, k = 5 was used for this analysis.To define the number of samples for each of the 34 class combinations (e.g., 'mangroves' and 'lowland'), the data were combined, creating 34 training sets.The training data were balanced using random sampling to ensure an equal number of samples per class, avoiding bias towards the majority class.The maximum number of samples for a class was limited to 10,000,000 to reduce processing time.Therefore, for a combination of Mangroves, Lowland Forest, and Closed Woodland, the Lowland Forest and Closed Woodland samples were subset to 2,225,183 (i.e., the number of samples of mangroves as this was the smallest of the three classes).However, if the combination were Closed Woodland and Open Woodland, then the number of samples would have been limited to 10,000,000.

Training of the Classifiers
A single classifier was trained for each combination where the samples were split into training (50%), validation (25%) and testing (25%) sets.A 10% sample of the training and validation datasets were randomly extracted to optimise the XGBoost hyperparameters using Bayesian optimisation.The XGBoost classifiers were then trained using the optimised hyperparameters and full training and validation datasets.Using the independent testing dataset, the average classifier accuracy was 99%.

Final Forest Types Map
As with the forest/non-forest classification, the classification was applied on an individual scene basis.The habitat suitability derived mask was used to define the classifier applied to each pixel.Therefore, each pixel was only considered for the classes defined by the habitat suitability analysis.To summarise the scene-based forest type classifications, creating a national map, the mode of the scene-based analysis was taken on a per-pixel basis using the same 100 km tiles to allow for parallelisation of the processing.

Accuracy of the Forest Types Map
To assess the forest type map, the National Forest Inventory (NFI; NAFORMA; [20]) was used where it was masked using the forest mask from this study.Any remaining points within the NFI defined as being outside of a forested area were then removed and the points for the period 2011-2015 were checked for validity using virtual globe web-based maps in QGIS (e.g., Google Earth data).The temporal differences prevented the NFI data from being used to assess the forest extent map.However, to assess forest type classification, the NFI data were masked using the forest extent defined by this study, removing the majority of the temporal change from the dataset.If a pixel was defined as forest in our 2013-2018 baseline mapping and with the NFI data (2011-2015), it was unlikely that the forest type could have changed.Forest types are also more difficult to assess by analysing high-spatial-resolution remotely sensed data, particularly the differences between classes such as open and closed woodlands.Therefore, access to a field derived dataset is strongly preferred.
A final total of 13,200 NFI field points were used to assess the accuracy of forest type classification (n = 3895 closed woodland, n = 1708 lowland forest, n = 57 mangrove forest, n = 401 montane forest, n = 6721 open woodland, n = 216 plantation forest and n = 202 thicket).The number of these points related to their spatial distribution across Tanzania with 6721 points for open woodland while just 57 for mangroves.
2.7.Forest Cover Change and Monitoring 2.7.1.System Architecture One of the requirements for identifying change was the generation of a monitoring system, rather than just a classification of forest extent change.Therefore, the system was based on the EODataDown software system (Figure 3).EODataDown can be configured to automatically download and process Landsat, Sentinel-2 and Sentinel-1 data to an analysis-ready data product.The analysis is executed in date order with the oldest image first.The EODataDown can also execute a set of user-defined plugins that perform a set of data analysis tasks (e.g., the detection of change).Each time the system is executed, the latest imagery is downloaded and analysed.Using a tool such as cron, the system can be automated to run independently at a set time interval (e.g., daily or weekly) and create an automated monitoring system.An advantage of the EODataDown system is that it allows the end-user to focus on their data analysis, while the EODataDown architecture manages the data storage and processing; creating a monitoring system.

Landsat 8 Imagery
EODataDown uses the ARCSI software to generate optical ARD data products, and therefore the same ARD product is used for the change analysis as the baseline classification.However, to minimise the identification of False-Positives for change, a further processing step was applied where an addition 'clear-sky' mask was also derived through the RSGISLib software [22] and implemented as an EODataDown plugin.
The aim of the 'clear-sky' mask is to identify the large continuous areas of available imagery which have a 'clear view to the sky', discarding small isolated regions of imagery close to clouds that are sources of error within the cloud masking and ARD generation.The generation of the 'clear-sky' product is a two-step process.The first step is to buffer the identified cloud and cloud shadows by 30 km.Those regions outside of the resulting mask are clumped, and only those with a size greater than 3000 pixels are selected.Those regions are then grown, so they were not within 10 km (30 Landsat pixels) of a cloud or cloud shadow pixel.
While applying the 'clear-sky' mask reduces errors in the following change analysis associated with omissions in the cloud masking, it also reduces the extent of data available for the change analysis, including regions of valid data.

Forest Change Definition
Forest change is described as the complete or partial removal of forest cover (Figure 4) that causes changes in forest structure [40].In the context of remote sensing, this needs to consider the resolution of the imagery being used, and at least three pixels (30 × 30 m), an area of approximately 0.27 ha, was considered the minimum mapping unit for this analysis.Defining the minimum mapping unit minimises the number of false-positive changes due to the complex land surface conditions, especially in savanna ecosystems, which are naturally more variable.The forest change detection is designed to detect and track abrupt forest change events from anthropogenic and natural catastrophes.It augments the forest information as a novel source of forest change map products in Tanzania.A summary of the forest change detection process is presented in Figure 4. Within savanna ecosystems, fire is a natural process.Therefore, it is normal for these ecosystems to have a patchwork of burnt areas and rarely produces long-term changes to the ecosystem.The proper use of fire in woodland savannas is essential for maintaining these ecosystems.Early burning is carried out to reduce more severe fire damage later in the fire season.Therefore, in the context of this study, these changes are not considered "real change".To separate fire from other changes within the analysis, the Normalised Burn Ratio (NBR) index (Equation (1); [41]) and Burn Area Index (BAI; Equation (2); [42]) were used.Thresholds of NBR > −0.02 and BAI < 100 were used to define the unburnt areas, and it was within these areas that the remaining change analysis was undertaken.These thresholds were identified through a visual sensitivity analysis across a range of scenes and locations throughout Tanzania.
where in Landsat 8: NIR = band 5, SWIR = band 7 and R = band 4 Plantations are heavily managed forests within the landscape and are not considered in terms of national forest change statistics as they are already considered 'changed' and under anthropogenic modification.Change regularly occurs in plantation forests with partial forest loss after harvesting.However, replanting occurs shortly after and therefore, while the land cover may have temporally changed, the land use has not.To mask the plantation forest areas, a mask was extracted from the forest type classification.

Scene-Based Change Detection
The change analysis was undertaken in two steps.The first identified possible change pixels within the forest extent baseline previous defined, following masking for burnt areas and plantations.A normalised difference vegetation index (NDVI) threshold of <0.35 was applied to identify potential change areas.This threshold was identified based on expert knowledge of the environment and a visual sensitivity analysis across a number of scenes and time periods.To avoid seasonal changes (i.e., loss of leaves due to phenology), particularly within the savanna's of western Tanzania, an additional threshold was applied to the whole scene where if >30% of the forested pixels were identified as a possible change, then the whole scene was ignored.At the extent of a Landsat scene, even following the removal of cloud, the extent of change within a scene will be small (i.e., probably much less than <1%).Finally, features of less than 3 pixels are removed from the layer to reduce noise.
Using the pixels identified in the first stage, the second identifies the change through classification, using the same 100 XGBoost classifiers trained to generate the forest extent baseline.Reducing the number of pixels classified significantly reduces the processing time for a scene, as only a small percentage of the total number of pixels within the scene is being classified.The 100 classifications are merged, and a threshold of 50% was applied to identify forest and non-forest regions.The resulting non-forest regions are considered the final change features for the scene.

Confirming Changes and Updating the Forest Baseline
On a per-scene basis, the identification of a change is considered to be of low confidence due to omissions in the cloud masking, topographic shadowing and misclassification of the forest extent.However, changes can be confirmed through multiple observations as errors are unlikely to coincide on consecutive scenes.Through experimentation, a threshold of 5 observed changes was selected to confirm a change.The changes were summarised to provide an annually updated forest baseline (Figure 3), as required for national reporting.The date of the first observed change was used to define the time point of the change occurring.EODataDown can be implemented as a scoring system where confirmed changes are provided as automated alerts to end users.
Change occurrence summaries were therefore generated for 2018, 2019 and 2020.The 2018 change layer included all changes identified against the forest extent for the 2013-2018 period of the imagery used to define the baseline and application of the 2018 changes to the baseline was considered to create a forest baseline with a discrete date of 1 January 2019.The change alerts for 2019 were identified and applied to generate a baseline for 1 January 2020.Finally, while the Landsat imagery was processed for 2020, few changes were confirmed, with 5 observations due to the frequency of cloud-free Landsat observations, meaning that many changes are only confirmed once the following years' data (i.e., 2021) are available.Lowering the threshold to 3 observations increased the amount of change identified for 2020, indicating that change is similar to 2019.However, there was also an increase in the number of false positives.Therefore, the focus will be on reporting and validating the changes identified for 2019.

Forest Change Accuracy Assessment
Assessing change detection accuracy is difficult as change is rare, with only a small percentage of pixels changing within a given period (e.g., one year).Therefore, to assess the accuracy of the changes identified, relatively small but intensively sampled plots were used.A single plot was defined as a 1 × 1 km area.The extent was observed as the most suitable trade-off, providing sufficient area for a representative assessment but being viable to interpret accuracy points intensively.16 sample plots were selected in areas where changes were known to have occurred and stratified across the different forest types.For each 1 × 1 km plot, 1000 reference points were randomly generated, with a minimum spacing of 30 m between the points.The minimum distance constraint was designed to ensure a pixel was only assessed once.In total, 16,000 verification points were used for the assessment.The verification points were not stratified using the generated change layer to avoid bias in the points' location, but the densely allocated points should enable the estimation of change omissions, a significant challenge when assessing the accuracy of a change product [43].
The reference points were visually examined using the available Landsat 8 and, for some areas, Planet 3 m datasets.The ClassAccuracy accuracy assessment tool in QGIS [32] was used to efficiently assess the points.The images were acquired in October or November between the two target years (i.e., 2018 and 2019) to minimise seasonal variation were used for the analysis.For example, the validation was undertaken by observing the earlier image from the year 2018 and was confirmed using the later image of the year 2019, whether "real" or "false" forest change.
The evaluation of forest change was assessed using precision, recall, F1-score, user and producer accuracy, and overall accuracy metrics, which match binary classification and are widely applied in remote sensing classification methods [44].Therefore, the evaluation focused on generating forest change information based on true positive, true negative, false positive, and false negative.The final forest loss accuracy assessment was compared with the available global forest change dataset from Hansen et al. [14] version 1.7 for 2019.

Forest/Non-Forest Classification
The binary analysis (forest/non-forest) produced nine forest extent maps for Tanzania (Table 3).To identify which map to take forward for further analysis, the accuracy of each map was considered.

Accuracy Assessment and Model Selection
The reported accuracy for the 9 models exhibited a satisfactorily level of OA ranging from 68.46 ± 0.50% to 89.66 ± 0.40% (Table 3).The best three models were further evaluated to select the final model.Therefore, the final chosen model (Figure 5) had a single-scene threshold of 80% and a multi-scene threshold of 50% with an overall accuracy of 89.66 ± 0.40%, F1-score of 0.87 and MCC value of 0.78 (Table 4), sufficiently separated primary forests from non-forest classes.Classification errors were generally associated with wetlands and agricultural cropping being missed classified as forest.In contrast, the underrepresentation of forest extent was mainly associated with open woodlands where strong seasonal patterns, related to intermix with edaphic areas, especially in semi-arid regions and disturbances often by frequent fire and image availability may have contributed to the misclassification.

Forest Area Estimates
Table 5 presents the forest cover extent for Tanzania, as generated from the nine classification models.The results were compared with the previous national field inventory (2011-2015; [20]; NAFORMA).As shown in Tables 3-5 , the classification identified as having the highest accuracy also resulted in forest area estimates closest to the NFI estimates [20].Table 5 also demonstrates the relationship between the forest area mapped and the single-scene and multi-scene thresholds.As seen, the differences in the area mapped between the thresholds are also quite large (Table 5).This implies that an improvement in the classification accuracy might be possible with further refinement of the threshold selection and is an area for further study.Similarly, Table 6 summarises the forest extent by region in Tanzania.

Forest Types Classification
The novel forest type classification methodology, which used the forest habitat suitability to constrain the classification, resulted in an OA of 85%.This classification result provides important information on the current status of unique forest ecosystems and patterns in Tanzania.With a complex forest landscape with varying climatic conditions, from dry savanna to moist montane forest.

Accuracy Assessment
The overall accuracy of the forest type classification map (Figure 6) was 85%, with F1-scores ranging from 0.77 to 0.99.A quantity disagreement of 0.02 was calculated along with an allocation disagreement of 0.11, demonstrating that error in the classification has resulted in areas being misclassified but that these areas largely cancelled each other out to provide a more accurate overall area estimation (Table 7).
The majority of the classification error (15%) was found between the deciduous forest types, closed and open woodland.These are challenging classes to define a boundary (Figure 7a,b) as the class definitions are described by a variation in tree cover rather than species composition.Therefore, the spectral difference between the classes is associated with the percentage of background soil and grassland reflectance verse canopy leaf reflectance.This is also associated with a broad geographic overlap between open and closed woodland and other forest communities (Figure 7c).For example, the possibility of habitat overlap between closed and open woodland is estimated at 80% and closed woodland and lowland is approximately 40% (Figure 7c).However, this study has demonstrated the ability to reliably retrieve quantitative information for forest types within the heterogeneous landscape of Tanzania and at a national level, meeting the requirements for national forest monitoring and reporting.

Forest Type Area Estimates
The forest type areal estimates were compared with the National Forest Inventory (NAFORMA), with some small differences between the forest types (Table 8) mostly associated with open woodland and related forest types such as closed woodland (Figure 7a,b).Open woodland is the most significant forest type by area, with 57%, followed by closed woodland with 22%.Therefore, woodlands occupy around 79% of the forest types, spreading from central to the western part of the country and with a mosaic of lowland forest along the coast and southernmost (Figure 6).

Estimated Forest Extent in Protected Areas
The forest cover of in situ conservation strategies such as protected areas is necessary for biodiversity and ecosystem protection in Tanzania.Therefore, this study also provides forest extent in forest reserves and wildlife managed areas (Tables 9 and 10) as an ecological parameter required to produce desired conservation outcomes (Figure 8a,b).

Forest Cover Change Results
The accuracy assessment will focus on the 2019 change product as this is the only full year.The 2018 change contains the product of changes that occurred between 2013 and the end of 2018.While there were not sufficient observations to confirm the changes for 2020, the 2019 result demonstrated a countrywide wall-to-wall map of forest cover change over Tanzania (Figure 9) adequately detected forest changes from the baseline map and could be used for reporting annual forest change statistics.The forest loss area estimates for 2019 from this study were compared with the global forest change analysis of Hansen et al. [14] version 1.7, (Table 11) and found to be comparable.

Accuracy Assessment
Table 12 summarises the accuracy assessment results for the changes identified in 2019.The accuracy was found to be good with an F1-score of 0.82 compared to the global forest change assessment of Hansen et al. [14] with an F1-score of 0.45, highlighting that the Hansen et al. [14] product is not capturing the full extent of change with Tanzania.For the no-change class, the accuracy was similar, with an F1 score of 0.96 for this study and 0.89 for Hansen et al. [14].It would be expected that the no-change results will be similar, with change only representing a few per cent of the landscape.Therefore, most accuracy assessment points will be in no-change regions.Even significant errors within the change result would only result in small changes to the no-change class extent.This difference in the accuracy of the change product also demonstrates an improvement in forest change data quality.This result also highlights the importance of locally optimised analysis methods compared to being reliant on global datasets for national reporting.

Estimated Forest and Forest Type Change by Region
The forest change results were also summarised at a localised level to indicate regions with high deforestation rates (Table 13).The majority of forest changes identified are within the open woodlands, primarily due to shifting cultivation.However, closed woodlands and lowland forests have also witnessed significant change (Table 14, although it should also be noted that the extent of change within each forest type corresponds with the area of that type (i.e., open woodland, closed woodland and lowland forest are the three most extensive forest types by area in Tanzania) (Table 8).

Estimated Forest Change in Protected Areas
Although forest loss is more pronounced outside the protected areas, the result also highlights that forest loss is also occurring in protected areas (forest reserves and wildlife areas) (Table 15 and Figure 10).These protected areas are designed to protect and support the country's biodiversity, and therefore changes within these areas are particularly significant.An example is shown in Figure 10, where deforestation in the protected area is occurring due to encroachment from the boundaries, with changes occurring up to 500 m from the protected area boundary.This is particularly concerning, as unchecked, these protected areas could witness further encroachment increasing the vulnerability of these important habitats and ecosystems.

Updating Earlier Forest Baseline
The confirmed forest changes for 2018 and 2019 were used to update the forest extent baseline to generate national forest baselines for 2018 and 2019.Table 16 present the comparison of forest loss from this study as compared to the global dataset of [14] version 1.7 over six years (2013)(2014)(2015)(2016)(2017)(2018)(2019).It should be noted that these studies are not directly comparable for the period 2013-2019 as while the [14] product is producing an annual change product for each year, this study is using a baseline forest mask which is the product of imagery from 2013 to 2018 and not a 2013 map of forest extent.Therefore, while the accuracy assessment of 2019 suggests that Hansen et al. [14] is underestimating the true extent of change in Tanzania, this study has a lower value as it is against a different baseline.Only the 2019 annual change is directly comparable to the Hansen et al. [14] products.Table 17 summarises the updated baseline forest extent from the forest loss detected from the earlier baseline.

Summary of Results
Our study demonstrated the potential application of open-free software, freely available satellite data, and advanced remote sensing techniques to provide a cost-effective method to obtain wall-to-wall information on the forest extent (Figures 5a,b and 6) and associated changes (Figure 9) in Tanzania.It bridges the information gap and knowledge concerning the use of remote sensing for generating forest information about status, extent, types and location over a larger geographical area (national level) [46].Therefore, the forest baseline (reference level) represents the extent (intact) forest area required to monitor future forest cover in Tanzania.The information is vital for developing practical, long-term plans to conserve and manage biodiversity, based on forest extent, type and composition [46].
The forest/non-forest classification model achieved an overall accuracy of 89.22% with an F1-score of 0.87 (Table 4).However, this still resulted in a 10% error in the classification.From a visual assessment of the map (Figure 5a), the likely sources of error were ascertained to be forest intermixed with edaphic areas, especially in woodland areas and the forestgrassland mosaic that remains evergreen throughout the year.Yet, the results are sufficient for the reporting of forest extent in Tanzania.This study introduced a novel method using habitat suitability modelling [21] to constrain forest type classification (Figure 5b) such that only appropriate forest types were considered for classification inappropriate geographical regions.For forest type classification, an overall accuracy of 85% with an F1-score ranging from 0.77 to 0.99 was achieved (Table 7).A particular challenge for the classification of forest types (15% error) was differentiating closed and open woodland areas (Figure 7a-c), as the boundary between these classes is based on the tree cover rather than the species composition of the woodlands.Future studies could also consider approaches that aim to retrieve associated biophysical parameters such as canopy cover.However, the result is considered the best mapping of Tanzania currently available and could be applied to other neighbouring countries (e.g., Kenya and Mozambique) which have similar ecosystems.These maps will help to establish a structure and long-term forest monitoring system in Tanzania.Forest cover information is needed to support the national forest policy to sustainably manage, conserve, restore and utilise forests and associated resources for Tanzania's socio-economic growth and climate resilience.
A further consideration is that this study used imagery over 5 years (2013-2018) to mitigate the issues of data availability given the high level of cloud cover, particularly in the coastal areas.However, during this period, change will have occurred in the forest extent.Therefore, the forest extent and type maps represent the forest cover for the majority of the scenes within the period.Thus, the maps were updated to provide maps for 2018 and 2019 through the change detection result (Section 3.4.4).

Forests and Forest Types Extent
The present analysis estimated a forest cover area of 407,976 km 2 , representing 45.76% of the country landmass (Table 5).The forest type classification (Figure 5b) result indicated a prominent class of woodlands (closed, open woodland and thickets), estimated to cover approximately 336,405 km 2 , which make up 81.20% of the forested land (Table 8), an important ecosystem of great significance to human economies [1][2][3], mainly covering the central and western part of the country.The montane forests represent biodiversity hotspots along a chain of isolated mountain ranges (Figure 5b), supporting a diversity of endemic species [47], an area of approximately 9717 km 2 representing 2.35% of the forest cover (Table 8).Similarly, montane forests harbour the world famous tropical montane rivers, including the Eastern Arc Mountains [48], feeding major rivers, floodplains and ocean.The proposed construction of the large Julius Nyerere hydro-power station across the Rufiji river will depend on forest conservation, especially upland montane forests, to reduce siltation, but also the Selous Game Reserve and Nyerere National Park, home to a wealth of flora and fauna as the long-term resource sustainability base for the nation at large.The lowland forest habitat overlapping with montane forest and woodlands with the most significant biological value and source of water supply for wildlife and people was estimated to cover 60,718 km 2 , representing 14.16% of the forest cover next to closed woodland (Table 8).Therefore, the result will support developing diverse conservation states for different forest types that minimise overexploitation, especially on fragile sites.
In the previous National Forest Inventory (NAFORMA) released in 2015, the area was estimated at 481,000 km 2 representing 54% of the total land area [20].Arguably, the main observational gap between these analyses should not be considered due to forest loss in Tanzania, but due to differences in timing, methodology and accuracy in the two studies, i.e., wall-to-wall mapping using Earth Observation data viz., sample-based forest inventory plots with a relative sampling error of 8.89% [20].There are challenges reaching some areas with traditional forest inventory and pushing for sampling intensity reduction and focused sampling efforts, with few samples/plots selected from these areas [49], particularly in the mountains.The forest is evenly distributed over the country.The top three regions with extensive woodland areas include Lindi, Ruvuma and Morogoro, followed by the western part of the country, covering extensive dry miombo woodlands stretching across Tabora, Katavi, Kigoma and parts of the Rukwa region (Figure 5b).Overall, these areas occupy essential protected areas (Figure 8) in the country and are the cornerstone of forest and biodiversity conservation in Tanzania [50].Therefore, increasing and maintaining a well-connected system of protected areas is a viable conservation strategy as a natural solution to global challenges, including climate change and deforestation [1].
Similarly, the increase in industrial forest plantations from government, communities and individual farmers supports the increase in forest extent in the southern highlands (Figure 6) [51].For example, approximately 564,678 ha of plantation forest are found in the three regions of Iringa, Njombe and Mbeya.Accordingly, the result provides a consistent forest extent at a national level, whereby conservation policy actions can be planned and evaluate future forest changes and carbon storage assessment, e.g., Suarez et al. [52].

Scene-Based Forest Change Detection
Seasonality changes and persistent cloud cover in Tanzania create low data availability and excessive gaps (missing data) in the Landsat archive.Yet, the scene-based change detection method proposed in this study was found to overcome these challenges and achieved an overall accuracy of 82% (Table 12).However, future work could also include other pixel-based time series change detection methods such as Breaks For Additive Seasonal and Trend (BFAST; [53]), BFAST Monitor [54], Continuous Change Detection and Classification (CCDC; [55]), Jumps Upon Spectrum and Trend (JUST; [56]) and Exponentially Weighted Moving Average Change Detection (EWMACD; [57]).These methods have been demonstrated to be applicable to a large range of land cover change problems (e.g., [58][59][60][61]), primarily with Landsat but also Sentinel-2.For Tanzania, the availability of a longer time series of Sentinel-2 imagery will likely make such approaches viable at a national scale.The high levels of cloud cover, particularly in the East of the country, can make optical remote sensing difficult.

Forest Change Area Estimates
The forest cover change estimates provide essential information to guide policy formulation and implementation in protecting forests with better decision making in government programmes and other forest protection fiscal incentive projects [62]-notably, for addressing issues such as the UN Reduced Emissions from Deforestation and Forest Degradation (REDD+) and Green House Gas (GHG) as international reporting commitments and the 2030 Agenda for Sustainable Development Goals (SDGs) over fighting deforestation [16].The result identified 157,204 ha of forest loss for 2019 in Tanzania, presenting 0.39% loss of intact forest, close to the global forest change analysis of Hansen et al. [14] (version 1.7), which mapped 142,773 ha in the same period.However, this study's forest change analysis presented a methodology that was optimised for Tanzania and therefore had a higher degree of accuracy (Figure 9) and given that change is infrequent, even large changes in the accuracy of the change detection algorithm can result in relatively small changes in the area estimated.However, these relatively small geographic areas can be significant if found to occur in areas of importance (e.g., protected areas).Therefore, it can be considered that global forest change datasets remain suitable for providing an indicative trend of forest loss at a national scale [63,64] but locally optimised products are preferred for national and regional management decision making.Therefore, the forest change area estimates from this study provide an essential reference point in the region to which the Hansen et al. [14] product can be compared.Few countries have established a wall-to-wall forest change map or method for long-term national-scale forest monitoring.
The prime three regions with high deforestation include Tabora, Katavi and Rukwa (Table 13).The forest cover change analysis was achieved with an accuracy of 82% compared to 45% from the global forest change analysis [14] (Table 12).The changes detected by the monitoring system were also used to update the forest baseline map.Hence, the baseline map (forest mask) was updated from 407,976 to 397,514 km 2 by 2019, indicating a decrease of 2.56% of the forest cover in Tanzania over six years (2013-2019) (Table 17).
The developed deforestation monitoring methodology aimed to provide the ability to respond immediately to reduce or stop the newly detected illegal deforestation situation from further expanding.In the future, the proposed change system might be considered for the formation of an early warning system.For early warning, the number of scenes used to confirm a change can be reduced (e.g., two observations) but would require visually checking to confirm the change and is recommended for forest guardians working in areas of high importance, such as protected areas under pressure from deforestation rather than a national system.For the annual reporting of forest loss for policy makers, using five observations of change to confirm a true change has been demonstrated to produce reliable results which are fit for this purpose.

Forest Management Outlook
Deforestation is detected beyond general-use land in protected areas (Table 15) and (Figure 10) typical of explicit forest conservation status, indicating a threat to the last remnant of important tropical forests in Tanzania.Protected areas and other conservation strategies support forest protection.However, they will become increasingly isolated and fragmented (Figure 11) as surrounding forest land is removed, turning to non-forest.Similarly, these protected areas will lose all of their protected status, leading to conversion to other land uses.Likewise, forest loss will increase the isolation (patches) of the protected areas impacting wildlife corridors.For example, Figure 10a at the Msaginia forest reserve supports wildlife movement between Katavi National Park and other protected areas, but the ongoing forest loss will limit this corridor.
Similarly, the detected deforestation in the western part of the country (Figure 9i), if it remains unchecked, will disrupt water flow, increase soil erosion, threaten the Malagarasi river that supports the Malagarasi-Muyovozi Ramsar site, and increase the siltation of Lake Tanganyika.It will raise the severity of the flooding that is already occurring and disrupting livelihoods [65].Therefore, it can be argued that forest cover loss is endangering Tanzania's economy.Increasingly, natural catastrophes such as droughts and El Niño climatic crises have influenced much agricultural productivity, power generation and transportation.Consequently, the forest monitoring system aimed to enhance forest law enforcement in protecting forests with better decision making.

Conclusions
Timely tropical forest monitoring is required to provide information about forest extent and changes over time, reducing data gaps necessary for forest conservation, management and responding to climate change for sustainable development.This study has provided the first consistent and robust forest cover extent with an area of 407,976 km 2 (45.76% of the country's area) and an overall accuracy of 89%.The estimated forest loss for 2019 was 157,204 ha, with an overall accuracy of 82%, contributing essential information for both science and forest management in Tanzania.These results have improved the quality of information available for Tanzania, which were previously considered inadequately in terms of quality and coverage to provide a baseline for national reporting.The forest monitoring system developed through this study is intended to link policy making on forest conservation and protection to meet national forest data requirements and integrate them into national institutions.It will enhance conservation programmes, which are rearguard efforts to save the last remnants of pristine forests remaining in Tanzania.
Innovative methods were developed through this study, including constraining forest type classification using the results of a habitat suitability analysis that achieved an overall accuracy of 85%.The methods used could be directly applied to other optical remotely sensed data, particularly Sentinel-2, which for the monitoring of change would provide an increase in the temporal frequency of the observations and therefore speed up the identification and confirmation of changes.In the future, SAR data such as Sentinel-1, ALOS-2 PALSAR-2 and NISAR data could also be integrated into the monitoring system, reducing the impacts of clouds, haze, and dust, common in Tanzania.Future studies might also consider removing the single-scene and multi-scene thresholds and merging all individual forest/non-forest classifications to create a single probability surface for forest extent.A local sensitivity analysis could be carried out to derive the optimal thresholds within biogeographic regions rather than nationally.

Figure 1 .
Figure 1.Map of the study area with the distribution of reference data from forest inventory samples [20] and regions labelled.

2. 5 .
Forest/Non-Forest Classification 2.5.1.Defining Training DataThe training polygons for the forest/non-forest classification were generated with reference to the Landsat-8 imagery, higher-resolution Google Earth imagery and field knowledge.A total of 46,176 training polygons were collected (forest, n = 22,440 and non-forest, n = 23,736).These samples were then rasterised onto each of the 3200 images and the associated image pixels were extracted, creating 435,808,135 forest samples and 1,423,875,598 non-forest samples.

Figure 3 .
Figure 3. Forest change and monitoring system design.

Figure 4 .
Figure 4.A flowchart of of forest change detection analysis.

Figure 5 .
Figure 5. Map showing estimated areal proportional from the classification result: (a) forest/nonforest and (b) forest types in Tanzania.

Figure 6 .
Figure 6.A set of examples sites, with field photos, illustrating the classification result and forest types.These examples demonstrate the complex nature of these forests and the local quality of the resulting classification (photos acquired by authors).

Figure 7 .
Figure 7. (a) Classification results; (b) ESRI satellite image, which highlights a detailed sample of woodland landscape with a mosaic of closed and open woodland overlaid with forest inventory plots, and (c) the estimated probability of mapped forest type overlap (mosaic).Similarly, the mosaic pattern tends to increase on the woodland landscape as compared to other forest types.This makes separability, and hence accuracy assessment, challenging.

Figure 8 .
Figure 8. Distribution of protected areas (PA) [45] to the estimated (a) forest/non-forest and (b) forest types in Tanzania.

Figure 9 .
Figure 9. Map showing deforestation area for 2019 with detailed sample areas (i) (a-d) and (ii) (a-d).

Figure 10 .
Figure 10.A sample of forest cover loss in protected areas on the western part of the country with a buffer of 500 m: (a) Msaginia forest reserve, (b) Ugalla North forest reserve, (c) Loasi river forest reserve, and (d) Lugufu and Mkuti forest reserve.

Figure 11 .
Figure 11.Aerial view of fragmented forest landscape based on drone capture for savanna woodland at (a) Igombe river forest reserve and (b) Itigi thicket, October 2019 (photos acquired by authors).

Table 1 .
A sample of predicted combined forest types suitability depicting single classes occurrences.Therefore, a second class was added in these situations to enable the classifier to perform classification.Similar to the forest/non-forest classification, the training polygons were defined for each forest type and extracted from all 3200 Landsat scenes.A total of 20,370 sample polygons were defined for the forest types and resulted in 249,302,636 pixel samples (Table2).

Table 2 .
Summarise the training dataset.

Table 3 .
Evaluating classification models performance using accuracy assessment metrics for binary classification (forest/non-forest).

Table 4 .
Detailed accuracy metrics for the best three classification models.

Table 5 .
[20]mated forest area from the classification results for the nine models; compared with the National Forest Inventory (NFI)-NAFORMA[20].

Table 6 .
Summarised forest extent using a map ranked by regions.

Table 7 .
Thematic accuracy measures of the forest types classification.

Table 8 .
[20]mated area for forest type classification as compared with the National Forest Inventory (NFI) NAFORMA assessment[20].

Table 9 .
A summary of forest extent (ha) in protected areas.Area percentage computed from a total estimated forest area (40,797,600 ha) of the classification result (Table5).

Table 10 .
Forest types in protected areas.

Table 11 .
[14]st loss area for the year 2019 (this study) compared with global forest cover loss from Hansen et al.[14]version 1.7.

Table 12 .
[14]l performance evaluation metrics for potential forest changes in 2019 at a 95% confidence interval, compared to global forest change analysis version 1.7 of 2019 Hansen et al.[14].

Table 13 .
Summarise forest change extent by region in Tanzania.

Table 14 .
Forest type change by regions.

Table 15 .
Estimated forest cover change in protected areas.

Table 17 .
Update of forest baseline extent from the detected forest loss for the period 2013-2019.