Conﬁguration of the Deep Neural Network Hyperparameters for the Hypsometric Modeling of the Guazuma crinita Mart. in the Peruvian Amazon

: The Guazuma crinita Mart. is a dominant species of great economic importance for the inhabitants of the Peruvian Amazon, standing out for its rapid growth and being harvested at an early age. Understanding its vertical growth is a challenge that researchers have continued to study using different hypsometric modeling techniques. Currently, machine learning techniques, especially artiﬁcial neural networks, have revolutionized modeling for forest management, obtaining more accurate predictions; it is because we understand that it is of the utmost importance to adapt, evaluate and apply these methods in this species for large areas. The objective of this study was to build and evaluate the efﬁciency of the use of a deep neural network for the prediction of the total height of Guazuma crinita Mart. from a large-scale continuous forest inventory. To do this, we explore different conﬁgurations of the hidden layer hyperparameters and deﬁne the variables according to the function HT = f (x) where HT is the total height as the output variable and x is the input variable(s). Under this criterion, we established three HT relationships: based on the diameter at breast height (DBH), (i) HT = f (DBH); based on DBH and Age, (ii) HT = f (DBH, Age) and based on DBH, Age and Agroclimatic variables, (iii) HT = f (DBH, Age, Agroclimatology), respectively. In total, 24 different conﬁguration models were established for each function, concluding that the deep artiﬁcial neural network technique presents a satisfactory performance for the predictions of the total height of Guazuma crinita Mart. for modeling large areas, being the function based on DBH, Age and agroclimatic variables, with a performance validation of RMSE = 0.70, MAE = 0.50, bias% = − 0.09 and VAR = 0.49, showed better accuracy than the others.


Introduction
The Guazuma crinita Mart. (Bolaina Blanca) is characterized as a fast-growing forest species established in plantations in which it reaches growth maturity by the eighth or ninth year, being ready for harvesting [1,2]. The wood has a high commercial value and is used to obtain round and sawn wood for the manufacture of stretchers, boxes, laminates, toys, matches, handicrafts, plywood, construction and coating of houses and the obtaining of cellulose for paper, contributing to the livelihood of local farmers [3,4]. According to the Servicio Nacional Forestal y de Fauna Silvestre [5], there is 8530.76 ha of Bolaina Blanca plantations in Peru, which represents 503,839.71 m 3 of standing trees.
A hypsometric model is generally expressed between the height and diameter relationship of a tree; however, it has also been shown that the variables of age, basal area, site The plantations are distributed at altitudes that vary between 180 and 500 m above sea level, the average annual temperature is 27 °C, the average annual relative humidity is 85% and the annual precipitation varies between 2000 and 3000 mm, with greater intensity of precipitation between the months of November to March [29]. According to Holdridge [30] life zone classification, the study area is located in a region covered by tropical humid forest (bh-T), very humid tropical forest (bmh-T), and very humid transitional tropical forest (bmh-TT).
In this study we used agroclimatic variables extracted from the NASA Prediction Of Worldwide Energy Resources website: https://power.larc.nasa.gov/ (accessed on 12    The plantations are distributed at altitudes that vary between 180 and 500 m above sea level, the average annual temperature is 27 • C, the average annual relative humidity is 85% and the annual precipitation varies between 2000 and 3000 mm, with greater intensity of precipitation between the months of November to March [29]. According to Holdridge [30] life zone classification, the study area is located in a region covered by tropical humid forest (bh-T), very humid tropical forest (bmh-T), and very humid transitional tropical forest (bmh-TT).
In this study we used agroclimatic variables extracted from the NASA Prediction Of Worldwide Energy Resources website: https://power.larc.nasa.gov/ (accessed on

Variable Input, Output, and Data Splitting in Training and Validation
For model fitting, we used the technique of deep artificial neural networks using the H20 pack [31] in R [32]. We set the function HT = f (x), where HT is the output variable and x is the input variable(s). Under this criterion, we established three HT relationships: depending on the DBH variable, (i) HT = f (DBH); based on DBH and Age, (ii) HT = f (DBH, Age) and based on DBH, Age and agroclimatic variables, (iii) HT = f (DBH, Age, Agroclimatology), respectively. All downloaded agroclimatic variables were included in the third function. These functions were trained separately, they were configured with different hyperparameters and their performance was compared. In total, we performed 72 training runs, i.e., 24 training models for each HT = f (x) function set in this study.
The data was standardized and randomly separated establishing 70% of the data for Training and 30% for Validation.
denoting the parameters at the t-th iteration, t g is the compute gradient, t is the time and RMS is the root mean squared error. For our study, the learning rate time decay factor (rho) was 0.99 and the learning rate time smoothing factor (epsilon) was 1 × 10 −8 .

Model Performance
The estimates were analyzed according to [8,34]. The estimates of the training and testing data were with the statistical variables of Root Mean Squared Error, RMSE (Equation (8)), and Mean Absolute Error, MAE (Equation (9)). For testing data, we increased bias% (Equation (10)) and the variance error, VAR (Equation (11)). Likewise, percentage The Tanh (Equation (1)), Rectified Linear (Equation (2)), and Maxout (Equation (3)) activation functions were used in the hidden layer, while in the output layer we use the Linear (Equation (4)) activation function for all cases.
where f is the function that represents the non-linear activation used in the entire neural network, b is the bias for the neuron activation threshold, x i and w i denote the input values of the unit or neuron and their weights; α denotes the weighted combination:

Distribution and Loss Functions
The Gaussian distribution function was specified as equivalent to wMSE (weighted mean squared error) (Equation (5)) as it was, our numerical response variable and the loss function chosen was quadratic (Equation (6)): where y is a true response, f is a predicted response, and ω is weighted.
where t (j) and o (j) are the predicted and actual output; j and W is the collection {W i } 1:N−1 : W i denotes the weight matrix connecting layers i and i + 1 for a network of N layers; B is the collection {b i } 1:N−1 : b i denotes the column vector of biases for layer i + 1.

Optimization Algorithm, Regularization, Epoch, and Batch Size
The optimization algorithm used in this study was the adaptive learning rate ADADELTA (Equation (7)) [33]. The mini-batch was of size 1, the number of epochs was 300, and the type of regularization was with the early stop system, with 5 stop rounds, stop tolerance of 0.001, and MSE (mean square error) stop metric.
where θ t denoting the parameters at the t-th iteration, g t is the compute gradient, t is the time and RMS is the root mean squared error. For our study, the learning rate time decay factor (rho) was 0.99 and the learning rate time smoothing factor (epsilon) was 1 × 10 −8 .

Model Performance
The estimates were analyzed according to [8,34]. The estimates of the training and testing data were with the statistical variables of Root Mean Squared Error, RMSE (Equation (8)), and Mean Absolute Error, MAE (Equation (9)). For testing data, we increased bias% (Equation (10)) and the variance error, VAR (Equation (11)). Likewise, percentage graphs of cases by percentage relative error, RE% (Equation (12)) were also interpreted. Figure 3 shows the methodological flowchart used in this study. where n = the number of observations for the measurer, Y i = observed total height value i , Y i = predicted total height value i, and Y = mean of observed total height value i .
R PEER REVIEW 8 of 16

Training Status
The maximum processing time for each model was 50 s. In Table 2 we can observe the status and architecture of each trained model according to each function. The trained models it was not necessary to complete the complete training epochs (300 epoch).  The data were processed with the following computer features:

Training Status
The maximum processing time for each model was 50 s. In Table 2 we can observe the status and architecture of each trained model according to each function. The trained models it was not necessary to complete the complete training epochs (300 epoch). Complete training status of each model and type of function evaluated used to predict the total height of Guazuma crinita Mart. in the Peruvian Amazon, it can be seen in Table S1.

Model Validation Performance
We statistically analyzed each trained model with its respective function, evaluating the performance of RMSE, MAE for training, and we increased two more parameters for validation (  To recognize the model of each function, we not only analyze the forecast Key Performance Indicator (KPI). We also analyze the residual plot; that is, the relative error in percentage between the predicted values ( Figure 4)

Training Status for the Prediction of the Total Height of Bolaina Blanca
All the trained models did not need to complete the 300 number of epochs to converge the weights, because thanks to the regularization of early stopping the training of the models stopped as they did not present improvements in the validation metric, this method is not very intrusive and minimizes established metric across epochs [35], however stopping too early can enlarge bias and reduce variance, just as stopping too late can reduce bias and enlarge variance [36], that is why the importance of performing a hyperparameter optimization search with several trainings and observing the variance and bias compensation, adapting it for each type of problem [37]. In our study, model 5 of the HT = f(DBH) function needed the greatest number of epochs to converge the weights, with 195.7 epochs, and model 10 of the HT = f(DBH) function needed the least amount of epochs. to converge the weights with 5.4 epochs, which leads to a greater and lesser process of training time, respectively. However, model 24 of the function HT = f(DBH), model 12 of the function HT = f(DBH, Age) and model 15 of the function HT = f(DBH, Age, Agroclimatology) with 11.3, 10.7 and 11 number of epochs, respectively, present a better performance in their statistical evaluations than the rest (Table 3). Regarding its typology of number of neurons, the best networks of each function were 2 (50:50), 5 (50:25:5:25:50), and 2 (50:50), respectively, this is relatively dependent on In each study, in case of presenting too much information, more neurons will be needed to converge the weights [38] and more hidden layers in the model will be more complex or deep. The hidden layer

Training Status for the Prediction of the Total Height of Bolaina Blanca
All the trained models did not need to complete the 300 number of epochs to converge the weights, because thanks to the regularization of early stopping the training of the models stopped as they did not present improvements in the validation metric, this method is not very intrusive and minimizes established metric across epochs [35], however stopping too early can enlarge bias and reduce variance, just as stopping too late can reduce bias and enlarge variance [36], that is why the importance of performing a hyperparameter optimization search with several trainings and observing the variance and bias compensation, adapting it for each type of problem [37]. In our study, model 5 of the HT = f (DBH) function needed the greatest number of epochs to converge the weights, with 195.7 epochs, and model 10 of the HT = f (DBH) function needed the least amount of epochs. to converge the weights with 5.4 epochs, which leads to a greater and lesser process of training time, respectively. However, model 24 of the function HT = f (DBH), model 12 of the function HT = f (DBH, Age) and model 15 of the function HT = f (DBH, Age, Agroclimatology) with 11.3, 10.7 and 11 number of epochs, respectively, present a better performance in their statistical evaluations than the rest (Table 3). Regarding its typology of number of neurons, the best networks of each function were 2 (50:50), 5 (50:25:5:25:50), and 2 (50:50), respectively, this is relatively dependent on In each study, in case of presenting too much information, more neurons will be needed to converge the weights [38] and more hidden layers in the model will be more complex or deep. The hidden layer activation function of the best performing models was maxout, the advantage of this hidden layer activation function is that the network learns the relationship between the hidden units and also the activation function of each hidden unit [35] but doubles the number of parameters for each neuron, which leads to a high total number of parameters [11], as shown in Table 2, the increase in the weights of the training used. This maxout function was initially presented as a natural companion using dropout to train convolutional networks, but studies have also been carried out without dropout as a substitute for the sigmoidal function and it has even been tested to solve regression problems, producing good results [39]. Although it is true, until the completion of this manuscript, that deep learning techniques have not been used for Bolaina Blanca tree height predictions, many studies have been conducted in other species using classical artificial neural network (ANN) techniques in other species, i.e., with a single hidden layer, in a large part of all these studies have been used with sigmoidal activation function, such as the hyperbolic and sigmoid tangent, obtaining satisfactory results [7,[40][41][42]. The processing time of the modeling functions depends on the characteristics of the computer and is relative, however, the execution of our configurations for the height mode does not require a high demand on the characteristics of the computer from the user.

Growth and Estimation of the Total Height of Bolaina Blanca
In Peru, forestry and forest management of the species of Guazuma crinita Mart. It has been extensively studied since 1992 by Vidaurre and Héctor [43], evaluating its growth and the optimal sites for the development of the species. Subsequently, its economic importance is studied [44], becoming the dominant species for the sustenance of farmers in the Peruvian Amazon, initially opening up to a series of investigations, such as its geographical variation in its growth and wood density [45], modeling of its production [46]. However, it was not until 2018 that the total height of the species was modeled for the first time by Elera Gonzáles [47] in the Peruvian Amazon, in which she used regression techniques applying six hypsometric models, obtaining as a result of the performance of the models a range of 1.86 ≤ RMSE ≤ 1.93 and 1.44 ≤ MAE ≤ 1.52, it should be noted that these hypsometric models had a relationship of total height between DBH, DBH dominant and Age. In our study using a DNN, it exceeds the statistical performances (Table 3) and it is very likely that it also exceeds when ANN techniques are used, for smaller areas. When we analyzed the best configurations obtained in our study, the relationship between height and diameter, HT = f (DBH), the RMSE result was 1.26 (Model 24). The performance increases when we increase the relation between the DBH and age, HT = f (DBH, Age) with RMSE = 0.71 (Model 12) and even more when it is related to agroclimatic variables, HT = f (DBH, Age, Agroclimatology), with RMSE = 0.70 (Model 15). As we can see, obtaining a relationship between diameter and height in this species is relatively complex using regression techniques, and could even worsen if biased data were used, especially from inventories of plantation areas that have not received uniform forest management. The inclusion of climatological variables could bias the modeling for the prediction of the total height of the species (Figure 5), these directly influence decisionmaking for forest planning, such as silvicultural treatments, land acquisition, and genotype selection [48], in which various studies have shown better performance using agroclimatic variables, especially for growth and production models in eucalyptus plantations [49][50][51][52].
The models used in this study are efficient, both statistically and practically, as we highlight a specific configuration for each function used. We developed three functions to be adapted to different areas of the Peruvian Amazon, according to the database of each community. As the first function, we use only the diameter as an independent variable, being able to be considered as a guide for local communities with smaller-scale production, where plantations are not monitored (age or other variables of the forest mass). The second and third functions are for companies or cooperatives with medium or large-scale plantations, where permanent monitoring is carried out.
Extrapolation beyond the levels of the predictor variables, for example, dbh, will always have some risk, so the application to values outside the ranges observed in the study requires caution on the part of the reader in the application of the functions. However, although it may be a limitation, the range observed in the data covers a very large range of occurrences of values in the predictor variables, resulting in a great potential for the use of the proposed models. The study is a great contribution to the scientific community, farmers, and companies dedicated to the modeling and production of Guazuma crinita Mart. in the Peruvian Amazon.

Conclusions
The deep artificial neural network technique presents satisfactory performance for predictions of the total height of Guazuma crinite Mart. in modeling large areas. In general, all the variables used to influence the predictions. However, the addition of the agroclimatic variables together with the diameter at breast height and age have shown better accuracy than the others. Our hyperparameter configuration proposal (Model 24-HT = f (DBH), Model 12-HT = f (DBH, Age) and Model 15-HT = f (DBH, Age, Agroclimatology)) present the best performance and can be adapted to other forest management problems using a large amount of data. Likewise, we recommend carrying out studies with data from pre-cut inventories and with the addition of categorical variables.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/10.339 0/f13050697/s1, Table S1: Training status of each model and type of function evaluated used to predict the total height of Guazuma crinita Mart. in Peruvian amazon.