Forecasting Amazon Rain-Forest Deforestation Using a Hybrid Machine Learning Model

: The present work aims to carry out an analysis of the Amazon rain-forest deforestation, which can be analyzed from actual data and predicted by means of artiﬁcial intelligence algorithms. A hybrid machine learning model was implemented, using a dataset consisting of 760 Brazilian Amazon municipalities, with static data, namely geographical, forest, and watershed, among others, together with a time series data of annual deforestation area for the last 20 years (1999–2019). The designed learning model combines dense neural networks for the static variables and a recurrent Long Short Term Memory neural network for the temporal data. Many iterations were performed on augmented data, testing different conﬁgurations of the regression model, for adjusting the model hyper-parameters, and generating a battery of tests to obtain the optimal model, achieving a R-squared score of 87.82%. The ﬁnal regression model predicts the increase in annual deforestation area (square kilometers), for a decade, from 2020 to 2030, predicting that deforestation will reach 1 million square kilometers by 2030, accounting for around 15% compared with the present 1%, of the between 5.5 and 6.7 millions of square kilometers of the rain-forest. The obtained results will help to understand the impact of man’s footprint on the Amazon rain-forest.


Introduction
In the last 20 years, the world has seen significant economic and productivity growth, but they have come at a great cost in terms of equity and sustainability, where sustainability is realized as our ability to pass on the assets necessary for future generations' well-being [1,2]. With rising population and per capita consumption, one of the major concerns of our time is ensuring long-term sustainability. This is reflected in Rio + 20's post-2015 development agenda, where the Sustainable Development Goals (SDGs) replaced the Millennium Development Goals [1]. Under this agenda, efforts to limit deforestation are rising worldwide as a result of climate change mitigation strategies, and have been promoted as one of the United Nations' SDGs for the period 2015-2030 [3]. Target 15.2 (under Goal 15) was added to the SDGs to promote efforts to stop deforestation and promotes "the implementation of sustainable management of all types of forests, halt deforestation, restore degraded forests and substantially increase afforestation and reforestation globally" [4].
In recent years, debates have risen on the difficulties and potentials for sustainable agriculture and natural resource management, particularly of land, water, and forests. This is attributable, not only to a more prominent climate change agenda, which aims to reduce greenhouse gas emissions in order to keep global warming below 1.5 degrees Celsius, but also to the current SDGs agenda. Soil erosion, reduced water quality and supply, biodiversity loss, and increased carbon emissions are all consequences of forest recurrent network model for the temporal data, together with a dense network for the static data. Together with the proper optimization of the model and data augmentation, the aforementioned difficulties can be addressed.
A hybrid learning method is proposed to predict deforestation rate in the Amazon rain-forest in the future decade. The workflow of the present research is depicted schematically in Figure 1. This study used a dense neural network, namely a multi-layer perceptron (MLP), the deforestation to model the static data and a long-short term memory (LSTM) network to model the temporal data. The research is based in a dataset which included information on about 760 municipalities, and their latitude, longitude, area, hydrography, non-forest area, and also included annual deforestation data for the last 20 years. Considering the variables typology, the designed model combines the MLP for fixed variables and recurrent LSTM neural networks for temporal variables. A data augmentation process has been carried out to generate synthetic data to train the model, which produced better results, as demonstrated by the hyper-parameter optimization process, where the amount of generated data was included. Once the best hyper-parameter optimization was found, the resulting model was used to model the deforestation increment from the observed data. A model loss analysis confirmed the appropriateness of the presented model to avoid over-fitting. A residual or error analysis is carried out to measure the performance of the model to evaluate the regression model.  The next sections are structured as follow. Section 2: Material and methods, describes the data and details the proposed machine learning model used in this work to predict the increment in deforestation of the Amazon rain forest. Section 3: Results, presents the data pre-processing methods and data augmentation carried out and present the experimental results, obtained for the model and the deforestation forecasting. Finally, in Section 4 Discussion and conclusions, as well as future works are exposed.

Data Description
The data is composed of static variables with geographical and administrative information from 760 municipalities from Brazilian Amazon, and temporal data of yearly deforestation for each municipality. The collected data comprised the years between 2000 and 2020. Table 1 depicts the dataset variables, with the corresponding data types (measurement scale) and a description of each variable. The data comes from the TerraBrasilis [28] platform for the organization, access and use of geographic data for environmental monitoring. TerraBasilis was developed by The National Institute for Space Research (INPE) which is a research unit of the Brazilian Ministry of Science and Technology.  Figure 2 depicts the deforestation up to 2020 in yellow, and in orange the Brazilian Amazon biomass limits. The map, like the data, has been taken from TerraBrasilis [28]. It can be observed that in the south and east of the Amazon rainforest there is a higher degree of deforestation; this is in accordance with the number of municipalities in the region. Thus, it can be concluded that, whence more municipalities are, i.e., the more the human presence, greater is the deforestation. The total deforestation in the last 20 years is depicted in Figure 3 for the Amazon states of Brazil. In the Figure 3a the Forest area of 2020 is compared to the Forest area that existed in 2000, and expressed as rate between 2020 and 2000. This rate, i.e., deforestation, can be expressed as the percentage of Forest lost, that occurred in the last 20 years for each state, and is presented in Figure 3b. The state with higher deforestation levels is Maranhão (MA) which lost around 56% its Forest area, from 68,806.3 km 2 to 30,346.3 km 2 for an area of 38,460 km 2 of forest lost. Roraima (RR) is the next state that lost the most Forest in the period with 43% decrease, from 156,588.4 km 2 to 88,778.64 km 2 , for an area of 67,809.76 km 2 of forest lost. However, the Forest area of the aforementioned state is smaller than states such as Pará (PA) and Amazon (AM), that lost 21% and 19%, respectively, but in absolute values represent a forest lost of (944,248.1-761,848.27) 182,399.83 km 2 , and (1,462,388.9-1,343,017.65) 119,371.25 km 2 , respectively. This latter values are substantially a larger forest lost for these states. A summary of the plots is presented in Table 2.  The annual increment in deforestation is shown in Figure 4a depicts the deforestation increment for each year from 2001 to 2020. Note that the reference year 2000 is not present, since this graph shows an increment, that is a difference between the actual and past year. A downward trend can be observed from 2000 to 2012, but from this minimum, an upward trend has been produced in the deforestation increment. Figure 4b depicts the cumulative deforestation from 2000 to 2020. The annual increment in deforestation will be predicted by the hybrid learning model, using both, the static and temporal data described in Table 1. If the deforestation increment is known, so is the cumulative deforestation by year, that will also be analyzed.  Figure 5 depicts a schematic representation of the hybrid learning model used to predict the increment in deforestation of the Amazon rain forest. A dense neural network, i.e., a Multi-layer perceptron (MLP), is used to process the neural static input variables described in Table 1. The dense network to model the static data can be expressed as follows:

Hybrid Learning Model
where X s i represents the static input andŷ s the dense network output. The Long-short term memory (LSTM) model for the dynamic data (in Table 1) can be represented asŷ where X d i represents the temporal input andŷ d the LSTM output. Note that the temporal model is multivariate, that is the deforestation increase will be forecasted on more than one time-dependent variable. Each variable depends not only on its past values but also has a dependency on other variables. This dependency is used for forecasting future values of deforestation. This is expressed in Equations (3) and (4) as matrices indicating the multidimensional nature of the temporal input, where X = X d i , for i ∈ {1, . . . , 6}. Given the temporal nature of the data used in the LSTM, Equation (2), can be expressed as follows: where t ∈ {1, . . . , N − k}, {X t , X t−1 , . . . , X t−k } are the actual and past values of the time series,x t+1 is the forecasted value, f represents the LSTM model, k is the window size used to perform the forecasting, and N is the number of observations (years) in the database. In the LSTM model k is an hyper-parameter to be optimized. As an example, Equation (3) for a windows size k = 3, can be written as: Note that the indexation here, corresponds to a temporal index t.
Finally, an additional dense layer is added to the neural network to merge static and dynamic outputs for the deforestation predictionŷ, The concatenation of the two models is schematically represented in Figure 5.
2.2.1. Dense Network: Multi-Layer Perceptron for Static Data Figure 6 represents the basic computation unit of the Multi-layer perceptron (MLP), which will be used to model the static data. Each neuron computes the weighted sum of its input, the output of the node is obtained from a so called, activation function. This can be expressed in a matrix form, for the whole dense network in Figure 5, as follows: X represents the input matrix, where rows are the observed instances, and the columns the features in the data. W is the weight matrix, w b is the weight vector for each node bias. φ is the activation function. The weights are calculated iteratively by the backpropagation algorithm [29,30] with the aim of reducing the error between an observed output y and the output of the networkŷ s . The backpropagation algorithm, can be summarized as follows. First, the input is feedforwarded through the network, i.e., f (X) is calculated. Second, The error of the network is calculated,ŷ s − y. The error is backpropagated through the network, and the responsibility in the error of each neuron in each layer is calculated. Third, the network weights are updated accordingly the neurons responsibility in the error. The backpropagation algorithm minimizes the error function using an optimization technique known as gradient descent. In the process the chain rule is applied to calculate partial derivatives, thus the activation function should be derivable.
Usual activation functions are relu, sigmoidal, soft-max, among others [31]. From a implementation perspective, a relu function is usually employed, given its lower computational cost. Also, the network is not fed with all the instances available. The neural network is required to learn from the entire training set, but, for computational reasons, generally the weights are not updated after evaluating all the instances. On the other hand, the entire training set is reordered randomly and divided into batches of a given size. The updating of weights is produced by applying the gradient descent using the elements of the batch.

Long-Short Term Memory for Temporal Data
The model in Figure 5 is comprised of two parts, the dense network (MLP) to model the static input to the static outputŷ s , described above, and the LSTM network to model the temporal input to the temporal outputŷ d , which is described as follows. Traditional Artificial Neural Networks (ANN), such as the MLP, build a direct mapping between the input and output data for the forecasting approach. The MLP does not consider the time correlation in a data sequence, so the ANN model is unable to capture the relationship between data and time. This restricts the application of the MLP in time series forecasting approaches. The use of the MLP for the static data, together with a Recurrent Neural Network (RNN), is proposed to overcome this disadvantage.
An RNN can be considered as multiple copies of the same network, each one of which transmits information to its successor in a loop, allowing the information to persist [32]. The RNN is represented schematically in Figure 7. The RNN receives a sequence of temporal inputs X i , i < k ∈ N, where k is the window size to perform the predicted forecast. First it receives the input x 0 and generates the output h 0 . In the next step, the inputs are h 0 and x 1 and the output h 1 is generated, which again serves as an input in the next step together with x 2 . This process is repeated successively until the last step, which receives the input x n and the output of the previous step h k−1 to predict h k , which is the output of the model. Finally, a recurrent red evaluates each input x i and output of the previous step h i−1 to generate the output hi and transmit it to the next step. All of the recurrent neuronal networks form a chain of repetitive modules of the neuronal network. Then the training process of a RNN consists a forward pass and a backward pass. The forward pass of a RNN is the same as that of a MLP with a single hidden layer, except that activations arrive at the hidden layer from both the current external input and the hidden layer activations from the previous time steps. The backward pass to calculate weight derivatives for RNN is called back propagation through time (BPTT). Like standard backpropagation, BPTT consists of a repeated application of the chain rule. For a RNN, the loss function depends on the activation of the hidden layer not only through its influence on the output layer but, as aforementioned, through its influence on the hidden layer at the next time step.
RNNs have the problem of exploding and vanishing gradient when learning long dependencies. The LSTM network model solves this issue. An LSTM network consists of a RNN modified to include a cell, with an input gate, output gate, and forget gate [33], so that a LSTM layer is able to learn long-term dependencies, which is useful for time series prediction. The LSTM cell is presented in Figure 8. First, x t and h t−1 are used as input of a sigmoidal layer known as "forget gate" in Equation (7), which filters out which information is retained from a previous state. The forget gate returns a vector C t−1 ∈ (0, 1), where a value of 1 represents "completely use this information" while a 0 represents "completely get rid of this information". Second, in the "input gate" in Equation (8), which is also a sigmoidal layer, is decided which state values will be updated. Also, a tanh layer generates a candidateC t , which will update C t . Both layers output are multiplied to create an update of the state. The state is then updated, discarding the non relevant information and adding the state update, C t = f t * C t−1 + i t * C t . Finally, the output h t is generated according the present state in Equation (9). The sigmoidal layer know as "output gate" decides at which states values will conform to the output.
In summary, the key of the LSTM networks is their internal state, which is updated to know what to remember from previous steps, how to update and which are the relevant values for the output. In practice, these significantly improve the results of traditional recurrent neuronal networks for time series analysis. This justifies its use to try to predict the levels of deforestation in the Amazon jungle in the coming years, together with the dense network for the static data.

Data Pre-Processing and Augmentation
According to the algorithm used to develop a Machine Learning model, an adequate pre-processing of the data is required so that the learning algorithm works correctly. In this case, given that there are two different data typologies, they must be structured differently so that they serve as input to the different networks of them that make up the model. In order for a MLP to be able to interpret data correctly, these must be numerical, i.e., categorical variables must be transformed into numerical ones. Furthermore, the data must be transformed to change the values of the features in the data-set to a common scale, without distorting differences in the ranges of values. First, we used two subsets of data: one for fixed variables and one for annual deforestation. Both can be related by the index of the data-frame or by the name of the municipality. Each is treated differently: • Static data X s i (input of the MLP): -One Hot Encoding of the "State" variable. The State variable is transformed to a binary vector of the same size as the set of states, where a 1 is used for the belonging state and 0 otherwise. -Min-max scaling: to assign the numeric variables a value between 0 and 1.
• Temporal data X d i (input of the LSTM network): -Min-max scaling: to assign the numeric temporal variables a value between 0 and 1.

-
The label Y: the values of "Deforestation increment", which is the variable to predict, in the window_size year. This will serve to contrast the output of the model with the real values, with the purpose of using supervised learning in the training of the model.

Data Augmentation
One possible solution is to generate "synthetic" data that replicates the distribution of the real data. This technique is known as data augmentation. With this objective in mind, "replicas" of the municipalities with spurious data are generated. These are real data to which noise has been added, e.g. variations with respect to the real municipalities, so that they are not exact copies and the model is able to generalize correctly. The number of "replicas" to create for each municipality is defined with the number of replicas n r parameter. Noise is randomly set for the following variables in the following ranges: • Latitude, Longitude: ±10% of the standard deviation. • Length: ±10% of the standard deviation. • Total area: ±30% of the total area, this allows a greater variation in the area to create a sample with small and large municipalities. • No forest: the proportion with respect to the total area is maintained ±10%. • Hydrography: the proportion with respect to the total area is maintained ±10%. • Deforestation increment: the proportion with respect to the total area is maintained ±5%, with this the rest of the temporary variables can be calculated.
In this way, new municipalities are artificially created, which are in the same state (noise is not added to the variable "State") and are geographically located in a nearby area. The area can vary more to generate municipalities of varied sizes, but maintaining similar proportions of water and non-forest, as well as deforestation throughout the years. Figure 9b shows the new geographical distribution of the augmented data for a number of municipalities with n r = 10 replicas, i.e., 8360 municipalities. This is a higher density of municipalities if one compares with the observed number of municipalities in Figure 9a. For n r = 10 replicas, the data augmentation goes from having 760 municipalities to 8360 (760 actual municipalities plus 10 "replicas" of the actual data). A number of a maximum of n r = 20 replicas, equivalent to 15,960 municipalities, is used to generate the augmented data.

Model Hyper-Parameter Optimization
Hyper-parameter optimization is the process of choosing a set of optimal hyperparameters for the selected learning algorithm for the problem being modeled (dataset). A hyper-parameter is a parameter whose value is used to control the learning process. For the presented model the hyper-parameters have been discussed in the Section 2: Hybrid learning model. The hyper-parameters to be optimized are listed as follows: • Window size: LSTM temporal window size of the input to model the temporal output.  Figure 10 shows the results of the hyper-parameter optimization. The best hyperparameter combination corresponds to the one resulting in a model with minimum mean square error MSE. Figure 10a depicts the minimum MSE for a combination of a window size of 5 years (orange dashed curve) with n r = 20 replicas, that is 15,960 municipalities in the augmented data. Figure 10a depicts that the minimum MSE is achieved for a combination of 2 hidden layers and a batch size of 32 samples. Summarizing the best model is obtained for a LSTM window size of 5 years, 15,960 municipalities for the augmented data, 2 hidden layers and a batch size of 32 samples for the MLP static model.
The loss of the base model built using the original data is compared with the loss of the optimized model (optimized hyper-parameters), can be observed in Figure 11. The loss function corresponds to the cross-entropy, which is a common loss function in machine learning when optimizing neural networks [34,35]. The base model is trained with 760 municipalities during 100 epochs. One can observe that over-fitting takes place from approximately 20 epochs onwards, where the loss in the training set (unseen data) increases, as illustrated in Figure 11a. Over-fitting is an indication that the model learns very specific details of the training set, losing the ability to generalize its predictions to new inputs from unseen data. The model loss for the optimized hyper-parameter combination discussed before is presented in Figure 11a. The loss in the test set does not indicate over-fitting compared with the base model trained only with 760 municipalities. In the loss curve of the best model, it is possible to appreciate continuous learning until step 30. From there the error starts to increase suggesting to stop the training on early stage of the learning epochs. Thus, one can conclude that the optimized model is fit to model the deforestation increment for the original dataset and predict the deforestation for the next decade.

Model Performance
The regression model obtained from the hyper-parameter optimization process can be tested to measure its prediction performance. The observed data in year 2020 for all 760 municipalities, will be compared with the predicted output of the model for the same municipalities. Note that the model is built on n r = 20 replicas of augmented data, and will be tested on the original data corresponding to the 760 municipalities. Each municipality have a unique index, to identify the original data from the augmented data where the unique index is also coded with the corresponding number of replica index + n ri . The difference between observed y and predictedŷ data values (by the model) are known as residuals in a statistical or machine learning model. They are a diagnostic tool for evaluating the quality of a model. Residuals are also known as errors, and can be defined as residuals = y −ŷ [36]. A residual or error analysis will be carried out for each of the 760 municipalities for the year 2020, and the model predicted output. Figure 12 depicts a comparison of the Predicted value (blue dots) vs the Observed actual value (orange dots) for the 760 Brazil Amazon municipalities deforestation in km 2 . It can be appreciated that both predicted and observed values are very close, indicating that the error is generally small. The more the difference between the blue and orange dots, the larger the error of the predicted value by the model. Also, can be appreciated that the model is overestimating the predicted value, i.e., the predicted values (blue dots) are higher than the observed values (orange dots). The residual analysis can be visualized, also, using the scatter plot in Figure 13a. The Observed (actual values) of the Deforestation for each of the 760 municipalities is depicted in the x-axis, while the y-axis depicts the deforestation value predicted by the model. The orange line represents the 1:1 line, where a dot will lie, if the predicted value is exactly equal to the observed one. Thus, the orange line represents the perfect regression (no prediction error), where the residual is equal to zero. The closer a point is to the orange line, the better is the prediction. It can be appreciated, that most of the points fall above the orange line, this corresponds to a higher positive slope indicating a steeper upward tilt when compared to the orange line used as reference. As aforementioned, this shows that the model is overestimating the deforestation value, i.e., the predicted value is larger than the observed one. However, in general, the points are close to the orange line, indicating a good prediction performance. Besides the visual analysis performed, the R-squared score, Mean Absolute Error (MAE), and Root Mean Square Error (RMSE), can be used to asses quantitatively the quality of the prediction performed by the regression model [36]. The R-squared can be defined as R 2 = Explained variation/Total variation, where SS residuals = ∑ i (y i −ŷ i ) 2 , and SS total = ∑ i (y i −ȳ) 2 , whereȳ is the mean of the observed values. That is, R 2 , indicates the proportion of the variation in the dependent variable that is predictable from the independent variables, for the regression model. For the optimized hybrid learning model the R-squared score is R 2 = 0.8782, indicating that the regression model accounts for 87.82% of the variance. Such high value for the R 2 , indicates a good prediction quality for the 760 municipalities in the dataset.
The residuals (errors) of the model are plotted in Figure 13b. The residual or error for each of the municipalities can be calculated subtracting the Observed value from the predicted value (residuals = y −ŷ). A histogram of the residuals is plotted in Figure 13b with a density plot overlaying the histogram. It can be observed that most of the residuals (≈69%) are negative, indicating that the predicted value is higher than the observed one, i.e., the model is overestimating the actual value. However, in average this overestimation is small (close to zero), with a mean value of the residuals of −6.8174 km 2 . In the histogram, outliers can be appreciated, where the occurrence of a single (one) residual larger than |100 km 2 | is present. The residuals have a minimum of −130.022355, with the following quartiles, Q1 (25%) of −9.679812 Q2 (50%) of −3.280831 Q3 (75%) of 0.000000 and a maximum of 15.260053. Then, the 50% of the mid residuals (between Q1 and Q3) are between than −9.679812 km 2 and 0 km 2 , which can be considered a good prediction. From the residuals, the quality of the model predictions can be assessed, calculating the mean absolute error, MAE, from the absolute values of the absolute residuals = |y −ŷ|. The MAE of the model is of MAE = 7.83 km 2 , indicating that the absolute error for any of the municipalities predicted deforestation is of 7.83 km 2 in average. Finally, the mean square error, MSE, can be calculated from the square residuals square residuals = (y −ŷ) 2 . The root of the RMSE = 13.24, indicates that in average the root mean square error for any of the 760 municipalities is of of 13.24 km 2 . This, measures indicate the quality of the prediction returned by the model, which can be considered good for the present problem. In the following subsection the prediction for all 760 municipalities, is summarized for each year from 2020 to 2030, forecasting the next decade of deforestation increase in the Brazilian Amazon rain-forest.

Deforestation Forecasting
The yearly forecast from 2020 to 2030 is shown in Figure 14. The prediction of the increment in deforestation has been carried out for all the 760 municipalities until 2030. The resulting prediction has been aggregated, to calculate the total increment of deforestation by year. Figure 14a depicts the observed deforestation increase from 2000-2020 in orange bars, the blue bars represent the forecast from 2020-2030. Year 2020 is an overlap, comparing observed versus predicted deforestation for that single year. Similarly, Figure 14a shows the cumulative deforestation, again, observed is presented in orange bars, and predicted in blue bars with the 2020 year overlap. The observed and forecasted data are summarized in Table 3. In light of the actual data, it can be seen that, after a few years in which deforestation has been reduced, an upward trend is being developed. Figure 14a and Table 3 show that deforestation increases in the 2020s, but there is a maximum in 2023 and from 2025 it decreases slowly. In general, the model predicts a prolongation of the incremental trend in deforestation, observed in the last two years, but at a remarkably greater pace for the next decade. It may be that the deforestation in the coming years could have been overestimated by the model, but nevertheless, the model presents a probable scenario where the forest lost pace is worrisome, inviting all of us to take action before it is too late.

Discussion and Conclusions
A hybrid learning model was presented to forecast the deforestation increment of the Amazon rain-forest. A dense neural network, namely an MLP, was used to model the geographical static data, together with a LSTM network to model the temporal data of deforestation. A data augmentation process proved to be useful to improve the learning model, as demonstrated in the hyper-parameter optimization, given that the model built on such data performed better. The model proved to be useful to forecast the Amazon deforestation in the coming years. After analyzing the results obtained, it is clear that the situation is not ideal and the trend must be reversed to give a hopeful future to the Amazon. The Amazon rain-forest covers between 5.5 and 6.7 millions of square kilometers and from 2008 to 2018 less than 8000 square kilometers have been deforested annually, less than 1% of the forest area. However the Amazon rain-forest continue to decline daily. The most worrying scenario is that in 2019 deforestation has exceeded 10,000 square kilometers, which may have influenced the model to predict this upward trend. The presented model predicts a worrisome scenario where, it is estimated that by 2030 the accumulated deforestation will be 1 million of square kilometers, around 15% of the actual forest. As stated in the results, probably the model has overestimated the deforestation increase for the next decade, but it is not an impossible scenario. As the hypothesis of this work proposes, a proper data augmentation from 20 years data, and a model hyper-parameter optimization was carefully performed, to avoid over-fitting. Forecasting is always a complex task, but we are confident that the present model predicts a likely scenario, and invite us all to take action.
The socioeconomic context of Brazil, aggravated by the global pandemic, which is currently the center of attention, may not be the most favorable scenario for taking the pertinent measures to protect the Amazon. In any case, urgent attention must be paid to the problem of deforestation. It is not too late to save the Amazon, but work must begin to define a course of action to plan medium and long term sustainability of the rain-forest. Human beings have been using the resources offered by the planet for centuries to advance and improve the quality of life. In the past, the impact our actions had on the environment was unknown. To this day, this cannot serve as an excuse. Concepts such as climate change, sustainability, or renewable energy are not alien to us. We must be aware that we must take care the planet we temporarily live on, so that future generations will be able to enjoy it as we do today.
This work lays the foundations of a deforestation prediction model for municipalities in the Amazon, combining dense neural and LSTM networks. Over the next few years, we can contrast the results of the predictions and perform retraining to update the model with new data. If new variables become available to the public that can be used, the model will be enriched. In the same way, the data can also be crossed with other sources including socioeconomic variables of the 760 municipalities or the states that were analyzed. Another possible course of action is to develop a new model with another type of data. It would be interesting to have images of the Amazon to accurately identify the annual deforestation. A convolutional neural network could be used to identify in the satellite images, the deforested area from the forest. Thus the present work guarantees a follow up analysis to test its predictive power, as well as to improved the model. It is imperative to become a more sustainable species, this research and related results might help to achieve such goal.

Data Availability Statement:
The data is available in the TerraBrasilis web portal, which is a platform developed by INPE to provide access, query, analysis and dissemination of spatial data generated by government environment monitoring programs. The data is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License on the following web link http://terrabrasilis.dpi. inpe.br/en/home-page/, accessed on 1 December 2021.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: