Comparative Assessment of Sap Flow Modeling Techniques in European Beech Trees: Can Linear Models Compete with Random Forest, Extreme Gradient Boosting, and Neural Networks?

: Transpiration and sap ﬂow are physiologically interconnected processes that regulate nutrient and water uptake, controlling major aspects of tree life. They hold special relevance during drought, where wrecked sap ﬂow can undermine overall tree growth and development. The present study encompasses ﬁve-year (2012–2015 and 2017) sap ﬂow datasets on European beech ( Fagus sylvatica ). Four different techniques were used for sap ﬂow modeling, namely, a linear model (LM), random forest (RF), extreme gradient boosting machine (XGBM), and neural networks (NN). We used six variants (Variants 1–6) differing in the captured conditions and the dataset size. The ‘prediction power’ was the ratio of the predicted and observed sap ﬂow. We found the LM had the maximum prediction power for the overall sap ﬂow in beech trees with 1 h shift of global radiation. In the reaming variants, the LM provided comparable prediction power to RF and XGBM. At the same time, NN exhibited relatively poor prediction power over other machine learning models. The study supports an easier-to-apply and computationally simpler approach (LM) to assess sap ﬂow over more sophisticated machine learning approaches (RF, XGBM, and NN).


Introduction
Transpiration creates a negative pressure gradient for water movement through a cohesion-tension approach in xylem. On the other hand, sap flow is the movement of water, nutrients, sugars, hormones, and other dissolved substances in the plant. Thus, sap flow has critical relevance in photosynthesis and plant development considering the nutrient and photosynthates movement. Sap flow is not limited to xylem vessels, unlike transpiration, yet they are tightly interconnected processes. Therefore, any disruption in tree-water relations compromises the transpiration-sap flow dialogue.
An extensive repercussion of climate change is exponentially worsening global drought events. Trees in particular are more susceptible to drought as they have higher size-driven water demands when compared with tiny plants. Tree roots cannot compensate for such water demand during drought scenarios [1]. Transpiration is essential to the hydrological cycle because it shifts substantial water from the soil to the atmosphere. Its precise depiction within Land Surface Schemes in climate models is vital to generate accurate and trustworthy climate projections [2]. Accurate measurement or estimation of plant transpiration or sap flow, respectively, is significant for understanding a plant's water use efficiency. It is estimated that more than half of terrestrial precipitation is transpired by forests and thus constitutes a crucial component of global water cycle [3]. Transpiration is primarily controlled by stomata, internal plant resistances, local hydrological conditions, and climatic variations, especially evaporative demands [4][5][6]. Although it is crucial to quantify plant transpiration, accurate and direct measuring is technically demanding, time and money consuming, and labor intensive [7].
Several approaches, which resulted in empirical or physical models, have been widely used to estimate transpiration based on relevant predictors [8,9]. Among these, the SIM-DualKc, Shuttleworth-Wallace, and modified Jarvis-Stewart models are widely used to estimate transpiration rates [8]. However, the application of these mathematical models is still restricted due to their complicated parameterization and requirement of a vast amount of observational data. Machine learning models have been suggested as an alternative to conventional models because they do not require the knowledge of internal factors and can capture non-linear relationships [10]. Furthermore, such models can render an enriched sap flow understanding with precise predictions and optimization of the tree-water relation in the agricultural and ecological context.
The manuscript focuses on comparing various approaches for modeling sap flow, with an emphasis on different soil moisture conditions. The study employs machine learning methods, including random forest, gradient boosting machine, deep neural networks, and a multiple linear regression model to predict European beech (Fagus sylvatica) sap flow.

Study Site
The study was conducted in the central region of Slovakia (Kremnické vrchy Mts.) at an altitude of 450 m above sea level (48 • 36 43 N; 19 • 03 59 E). The experimental plot, Bienská dolina, was established in the spring of 2012 in a 65-year-old European beech forest. The study area is situated in an oak-beech forest zone (according to the classification of Zlatník et al. [11]), with soil classified as Haplic Cambisol (Humic, Eutric, Endoskeletic, Siltic) derived from volcanic parent material. The experimental plot is located on a slight east-facing slope at the lower edge of the distribution of beech in Slovakia, which is potentially more susceptible to drought. The study site has a slightly warm and moderately humid climate, with a mean air temperature of 16 • C in July. The long-term mean annual air temperature and total annual precipitation from 1961 to 1990 were 7.3 • C and 690 mm, respectively. The sample plot covered an area of 608 m 2 , and the European beech trees within it had a diameter at breast height (DBH) ranging from 5.7 to 42.3 cm and tree heights ranging from 8.4 to 29.3 m. The selected trees were representative, with an average height of 26.3 with a standard deviation of 1.3 m (ranging from 24.7 to 29.1 m) and an average DBH of 32.4 cm with a deviation of 4.8 cm (ranging from 27.1 to 42.3 cm). Sap flow data from the growing seasons of 2012, 2013, 2014, 2015, and 2017 were utilized in this study. The data from 2016 were not included in the analysis due to incompleteness as a result of technical reasons. A detailed description of the Bienská dolina study site was provided by Sitková et al. [12].

Sap Flow Measurements
Sap flow systems (model EMS51A) connected to a 16-channel data logger RailBox V16 were installed on the selected 12 sample trees (Fagus sylvatica L.). The system utilized a tissue heat balance method (THB) based on the volume (three-dimensional) heating of the stem segment to measure volumetric sap flow directly in units of kg of water per a specific period and per one centimeter of stem circumference [13,14]. The sap flow was measured at 5 min intervals and recorded as 20 min averages in the data logger. For further analysis, hourly data were used. A more detailed description of the method and equipment used is given in the article by Sitková et al. [12]. This study used average sap flow values from 12 measured beech trees.

Monitoring of Environmental Conditions
The meteorological variables used for the models were measured in an open grass area using an automatic weather station. The station was equipped with sensors for air temperature (AT, • C), relative humidity (RH, %), and global radiation (Rs, W·m −2 ) (EMS33 and EMS11; Environmental Measuring System (EMS Brno) Ltd., Brno, Czech Republic) placed at the height of 2 m above the ground (low cut grass). Precipitation (P, mm) was measured using a rain gauge type 370 with a collecting area of 320 cm 2 (1 m above ground, Met One Instruments Inc., Grants Pass, OR, USA). The measurements were recorded at intervals of 5 min, and data were stored every 20 min in the data logger EdgeBox V12 (EMS Brno Ltd., Brno, Czech Republic), powered by a 12 V solar-charged battery.
Soil water potential (SWP, MPa) was monitored using three calibrated gypsum blocs (Delmhorst Inc., Towaco, NJ, USA) and data logger MicroLog SP3 (EMS Brno Ltd., Czech Republic). The lowest measurable limit of the equipment was −1.5 MPa. The SWP measurements were taken at 15, 30, and 50 cm soil depths using six different soil probes across the experimental plot. In this study, we utilized the average SWP values of the research plot.
Based on the measured data, we calculated the derived variables that represent hypothetical atmospheric evaporative demands, i.e., vapor pressure deficit (VPD, kPa [15]) and potential evapotranspiration (PET, mm), according to Penman [16].
Before developing the models, the input data, which consisted of all measured meteorological variables (Rs, AT, RH, P, PET, VPD, and SWP), were centered and scaled. All data were randomly divided into training (60%) and validation (40%) subsets, with the validation subset used to evaluate model performance after training on the training dataset. Model performance was assessed using three metrics: the coefficient of determination (R 2 ), root-mean-square error (RMSE), and mean absolute deviation (MAD). The coefficient of determination is a statistical metric employed to evaluate how well a regression model fits the data. It quantifies the portion of the variation in the dependent variable that can be accounted for by the independent variables within the regression model. Ranging between 0 and 1, a higher value of R 2 suggests a stronger fit, with values closer to 1 indicating a more favorable fit. Root-mean-square error is commonly used to measure the average magnitude of the errors between predicted and actual values. It is calculated by taking the square root of the average of the squared differences between the predictions and actual values. Mean absolute deviation measures the average absolute difference between each predicted value and the actual value in a dataset. It provides a measure of the average absolute error of the predictions [17].
The modeling process utilized six variants, each incorporating different input data ranges to assess their impact on sap flow estimation. These variants aimed to explore the effect of varying conditions on plant transpiration.
The threshold of global radiation value was used to mitigate the effects of nighttime values on sap flow estimation. Additionally, wetter and drier periods were defined based on SWP values. At the same time, according to previous research, the effect of a one-hour shift in global radiation was introduced [15].
A description of all used variants is given in Table 1. Variant 1 contained all available data, while Variant 2 included all data with the global radiation data shifted by one hour. Variant 3 represented periods characterized by reduced water availability in the soil (SWP < −0.8 MPa) with a simultaneous displacement of Rs by one hour. Variant 4 utilized data from Variant 3 but only included data with Rs values higher than 200 W m −2 (daylight data). Variant 5 contained only daylight data from the wet period (SWP from 0 to −0.4 MPa) with Rs shifted by one hour. Variant 6 utilized all daylight data (Rs > 200 Wm 2 ) with no shift in Rs. For each dataset (Variants 1-6), four different models were used to predict sap flow: neural network, random forest, gradient boosting machine, and linear models. All applied models belong to supervised learning algorithms.
Before modeling sap flow by machine learning models, so-called hyperparameters were estimated. Hyperparameters for each model were tuned following the methodology used in our previous study [18]. We must be aware of the fact that different hyperparameters result in different model parameter estimates.
The neural network (NN) is a collection of algorithms that aims to identify underlying links in a data structure using a method that imitates how the human brain functions. For the analysis, the Keras package (version 2.9.0) and TensorFlow package (version 2.9.0) were used. The neural network architecture consisted of three hidden layers with 16, 32, and 16 neurons, respectively. Stochastic gradient descent was chosen as the optimization algorithm, with a batch size of 40 determining the number of training samples processed before updating internal parameters. The model was trained for 500 epochs and incorporated a dropout rate of 0.8 to prevent overfitting. A learning rate of 0.01 was employed to control parameter updates, and the activation function ReLU was chosen to introduce non-linearity and capture complex input representations.
Random forest (RF) is an ensemble of decision trees trained using the bagging approach, making up the "forest". The bagging method's central premise is that combining learning models improves the end outcome. The accuracy of the result grows as the number of trees increases. The random forest model was built using the Ranger method from the caret package (version 6.0-93). It consisted of 1000 decision trees, forming an ensemble to make predictions collectively. The mtry parameter was set to 9, indicating that 9 features were randomly sampled at each tree split. A minimal node size of 7 was specified, requiring a minimum number of observations to create a terminal node. The splitting rule employed was variance, a common criterion for assessing split quality during tree construction.
The extreme gradient boosting machine (XGBM) also utilizes decision trees as its basis, similar to random forest (RF). However, the key difference between the two is that RF utilizes averaging, while XGBM uses additive (ensemble) modeling. Furthermore, RF combines findings at the end of the process, while XGBM combines the results along the way [18]. For the XGBoost algorithm, the xgboost package (version 1.7.4.9) was utilized. A colsample_value of 0.9 was chosen, which randomly sampled 90% of features for each tree to promote diversity within the ensemble. An 'eta' value of 0.3, known as the learning rate, controlled the contribution of each tree to the final prediction. The 'max_depth' parameter was set to 3, restricting the complexity of individual trees. A 'min_child_weight' of 5 was used, ensuring a minimum sum of instance weight in each child node. The 'n_estimators' parameter was set to 150, determining the total number of boosting rounds and allowing the model to learn complex relationships through the ensemble effect. A subsample value of 0.8 was employed, randomly sampling 80% of observations for each tree.
In addition, a multiple linear regression (LM) model was utilized as a benchmark in the study, alongside the RF, XGBM, and NN models. To evaluate the performance of these models in estimating daily Fagus sylvatica transpiration, three commonly used statistical indices, the coefficient of determination, root-mean-square error, and mean absolute deviation, were employed.

Results and Discussion
In this article, we compared the model performance and prediction power of the XGBM, NN, RF, and LM methods for predicting sap flow based on 5-year observations. We used metrics R 2 , RMSE, and MAD to compare the performance of the various models (Variants 1-6). The overall prediction power is presented by the ratio between the modeled (predicted) and measured SF integrated over whole time intervals specific for each variant.
When inspecting sap flow in drier conditions, the performance of machine learning models (Figure 1: B2, C2, and D2; and Table 2  We observed comparably good prediction power of the LM over RF and XGBM. Although the NN had better performance metrics in terms of R 2 , RMSE, and MAD, it overestimated the sap flow for the projected period (Table 2). Therefore, it exhibited overall poor prediction power in most variants. Alternatively, the LM exhibited inferior values for the same metrics (i.e., lower R 2 and higher RMSE and MAD). Yet, it more appropriately pre-     A NN uses pattern matching to solve the regression problem. We noticed a conclusive systematic overestimation of about 10-30% in the predicted sap flow of Variants 1, 2, 5, and 6, as suggested by pred. SF/SF (Table 2). This could have occurred due to a positive skew of sap flow data and the fact that the NN was optimized by the squared loss function (mean squared error). This led to large estimating errors in the region of higher values, which were consequently weighted more strongly than many minor estimation errors in the region of low values, resulting in overall overestimation in presented models. When the data were non-skewed, the NN seemed to perform comparably to other algorithms (Variants 3 and 4). In our study, we observed relatively poor prediction power of the NN compared to the remaining algorithms (RF, XGBM, and LM), as shown in Table 2's predicted versus real measured sap flow values (pred. SF/SF). Despite the lower prediction power of the NN, the model performance metrics were comparable to those of RF and XGBM, mainly within Variants 3 and 4, hinting at a systematic bias when assessing sap flow values using the NN in wetter conditions (Variants 1, 2, 5, and 6). This contrasts with the fact that a NN easily outperforms other machine learning models such as XGBM and RF on datasets with distributed representation, such as pictures, voice, and text [19]. Sagawa et al. [20] pointed out that models that are very accurate on average can still perform poorly on rare and unusual examples.
Furthermore, we noted that all the models had a better performance when comparing between wet variants (SWP > −0.4 MPa) and dry variants (SWP < −0.8 MPa). The performance metrics were substantially better in Variant 5 (wet conditions) for LM (R 2 = 0.82) than in Variant 4 (dry conditions) (R 2 = 0.68) (Figure 2). The difference was also pronounced for the NN, R 2 = 0.90 and 0.83 for wet and dry conditions, respectively.
Better performance metrics, such as R 2 , were also observed for RF and XGBM in Variant 5. The graph comparing the measured and modeled sap flow indicates minimized bias and more precise values with RF and XGBM ( Figure 2). However, in Variant 5, we noticed that the NN tended to overestimate low sap flow values, particularly in the range of 0.0 to 0.5 kg h −1 cm −1 . While the model metrices were quite good, there was about 20% overestimation pred. SF/SF (Table 2). Overall, during wet conditions (Variant 5), it seemed that RF and XGBM were precise and unbiased in predicting sap flow and NN precise and biased, while the LM was imprecise and unbiased ( Figure 2). Similar model behavior was noticed during the dry conditions of Variant 4 except the NN was relatively less biased.
We observed comparably good prediction power of the LM over RF and XGBM. Although the NN had better performance metrics in terms of R 2 , RMSE, and MAD, it overestimated the sap flow for the projected period (Table 2). Therefore, it exhibited overall poor prediction power in most variants. Alternatively, the LM exhibited inferior values for the same metrics (i.e., lower R 2 and higher RMSE and MAD). Yet, it more appropriately predicted the amount of transpired water. The LM consistently overestimated the high values and underestimated the low values (e.g., Figure 1: A2), resulting in better or comparable prediction power better than or comparable to the NN and RF and XGBM models. Furthermore, when the global radiation was 1 h shifted, the LM surpassed the RF, XGBM, and NN models ( Table 2: Variant 2). This was unexpected since these RF, XGBM, and NN models can cover non-linear relations; however, the LM is not capable of that. So, it seems that by incorporating systematic biases (over-and underestimation), the LM produces a more precise overall prediction.
A NN uses pattern matching to solve the regression problem. We noticed a conclusive systematic overestimation of about 10-30% in the predicted sap flow of Variants 1, 2, 5, and 6, as suggested by pred. SF/SF (Table 2). This could have occurred due to a positive skew of sap flow data and the fact that the NN was optimized by the squared loss function (mean squared error). This led to large estimating errors in the region of higher values, which were consequently weighted more strongly than many minor estimation errors in the region of low values, resulting in overall overestimation in presented models. When the data were non-skewed, the NN seemed to perform comparably to other algorithms (Variants 3 and 4).
Our results agree with the 'no free lunch' theorem, stating that no "perfect" machine learning algorithm can solve any problem successfully. Every issue has a specific solution that works well, while other solutions can fall short significantly. In our case, model performance substantially varied with the portion and nature of the data used for model creation.
It has long been known that certain ecosystems, species, or geographic areas exhibit a time lag between SF and environmental variables [20,21]. Generally, sap flow is delayed relative to solar radiation [12,22]. The shifting of climatic variables against sap flow (1 h) did not yield any pronounced enhancement in our models. Nonetheless, global radiation shifting significantly downgraded the NN's prediction power and performance metrics when we compared Variants 1 and 2 (Table 1). We also recorded a slight increase in the tree-based approaches (XGBM and RF).
In the study of Li et al. [23], the performance of six models for simulating sap flow in Agathis australis using a sizeable dataset was assessed. From these models, a linear model, a back-propagation neural network (BPNN), and a convolutional neural network (CNN) were comparable with our linear model (LM) and neural network (NN) model. Our models were parameterized using the Variant 1 dataset, where there was no data manipulation (non-shifted). The performance of our models displayed good agreement with those used in that study. Specifically, the R 2 value of our linear regression model was 0.764, closely matching the R 2 value of 0.796 reported by Li et al. [23]. Similarly, our neural network model displayed an R 2 value of 0.939, comparable to the BPNN and CNN models in this study, with R 2 values of 0.928 and 0.936, respectively. This comparison implies that the performance of our models aligns well with those used in above mentioned study.
When applied to describe transpiration in Fagus sylvatica (European Beech), the performance of our NN and RF models was significantly better than that of the NN and RF models used in the Wu et al. [24] study on maize. In Amir et al.'s [25] research on tomatoes, a variety of models was applied, including a gradient boosting model, which reached a maximum R 2 of 0.8. Tu et al.'s study [26], which employed a three-layer back-propagation NN, demonstrated that varying the number of predictors led to an R 2 range from 0.80 up to 0.95, aligning with our findings.
It appears that models perform better when applied to trees compared to low-growing plants. This observation is supported by Xing et al. [7] and Liu et al. [27]. Similar to the approach proposed by Tu et al. in 2019 [28], our model could potentially be improved by introducing a phenological function.

Conclusions
In our investigation, we documented a remarkably high prediction power exhibited by the linear model when applied to the prediction of long-term cumulative sap flow. This particular model demonstrated its competitiveness when compared to more intricate machine learning techniques, including random forest (RF), extreme gradient boosting (XGBM), and neural networks (NN) techniques. It seems that by incorporating systematic biases (over-and underestimation), the linear model produced a more precise overall prediction then the NN and one comparable to RF and XGBM. We can conclude, when predicting integrated sap flow over longer time periods by using the linear model, that we do not obtain a markedly biased estimate. When modeling the amounts of transpired water, it is not considered incorrect to use a much easier-to-apply and computationally simpler linear model. By predicting the courses of sap flow (hourly steps), the advanced machine learning techniques (NN, RF, and XGBM) are more suitable.
The generalization of our results should not go beyond Fagus sylvatica species in oak-beech altitudinal forest zones (classification according to Zlatník et al. [11]). These promising results should be used with a certain amount of caution until they are confirmed in different conditions or on other species of trees.
Author Contributions: Conceptualization, methodology, software, validation, formal analysis, investigation, resources, data curation, writing-original draft preparation, P.N. and P.F.J.; writing-review and editing, M.M.; supervision, project administration, funding acquisition, Z.S. and K.S. All authors have read and agreed to the published version of the manuscript.