A Machine Learning Approach for the Estimation of Alfalfa Hay Crop Yield in Northern Nevada

: Increasing pressure over water resources in the western U


Introduction
Alfalfa (Medicago sativa L.) is one of the main crops in the state of Nevada [1].Alfalfa is a perennial crop, which means it is not replanted after each harvest and can be harvested several times before reestablishing the crop.Nevada is the driest state in the US, which makes agricultural production especially challenging.Due to these conditions, there is an increasing demand for more efficient water management techniques.
Yield estimators, along with a good understanding of the water-soil-plant-atmosphere relationship, are excellent tools that can be used to evaluate different irrigation management strategies with the goal of improving water use efficiency.Predicting and understanding the performance of agricultural systems is especially useful for the development of decision support systems in the field of crop irrigation [2].Crop yield modeling can be defined as a description of the economic output of an agricultural system using mathematical equations [3].Weather variables are among the main inputs for crop yield modeling, with the others being crop development variables (physiology and phenology), field management, and the soil where the crop grows.Thus, if a model's assumptions are met, a crop model can be used to study the effect of weather variables on a crop's yield.
According to Boogard et al. [4], a crop model can be a descriptive model or an explanatory model.Descriptive models simulate the growth and development of the crop without defining specific plant processes.On the other hand, an explanatory model uses mathematical expressions to define the development of the crop as the interaction of different processes that occur in the plant.Regression models have been widely used as descriptive crop models [5,6].There is a wide variety of regression models used to predict and describe crop yield, including linear regression (LR), random forest (RF), and artificial neural networks (ANNs) [7].Previous studies have assessed the reliability of conventional regression models and machine learning techniques for crop yield modeling for perennial crops, including alfalfa [8][9][10][11].These studies showed that machine learning and conventional regression models can accurately estimate the yield of perennial crops as well as identify important weather-and climate-related factors affecting the yield.Furthermore, they all agree that non-linear models outperform conventional linear models.
This work aims to develop two models to estimate the yield of alfalfa hay cultivated in northern Nevada, as well as to evaluate the importance and effects that different weather and crop development variables have on the estimation of alfalfa hay yield.The two crop yield models are a multiple linear regression model and a random forest model.The LR is a conceptually simple model that facilitates the analysis of weather-crop interactions.On the other hand, the RF model can account for non-linear relationships between weather-crop interactions that might not be captured when using the linear model.LR and RF models were selected because they are interpretable, do not require larger sample sizes to obtain good results compared to other machine learning regression models, and are proven to be reliable when modeling different biological processes, including crop yield [8,[10][11][12][13][14]. The models were trained using yield data from previous and recent experiments conducted in northern Nevada (Figure 1).
descriptive crop models [5,6].There is a wide variety of regression models used to predict and describe crop yield, including linear regression (LR), random forest (RF), and artificial neural networks (ANNs) [7].Previous studies have assessed the reliability of conventional regression models and machine learning techniques for crop yield modeling for perennial crops, including alfalfa [8][9][10][11].These studies showed that machine learning and conventional regression models can accurately estimate the yield of perennial crops as well as identify important weather-and climate-related factors affecting the yield.Furthermore, they all agree that non-linear models outperform conventional linear models.
This work aims to develop two models to estimate the yield of alfalfa hay cultivated in northern Nevada, as well as to evaluate the importance and effects that different weather and crop development variables have on the estimation of alfalfa hay yield.The two crop yield models are a multiple linear regression model and a random forest model.The LR is a conceptually simple model that facilitates the analysis of weather-crop interactions.On the other hand, the RF model can account for non-linear relationships between weather-crop interactions that might not be captured when using the linear model.LR and RF models were selected because they are interpretable, do not require larger sample sizes to obtain good results compared to other machine learning regression models, and are proven to be reliable when modeling different biological processes, including crop yield [8,[10][11][12][13][14]. The models were trained using yield data from previous and recent experiments conducted in northern Nevada (Figure 1).

Materials and Methods
The two alfalfa hay yield models were developed using data from previous experiments conducted from 1973 to 1981 at the Nevada Agricultural Research Station in Fallon, Nevada (NARS Fallon) [15], as well as data obtained during the first year (2021) of an ongoing experiment conducted at the Valley Road Field Laboratory (VRFL) in Reno, Nevada [16] (Figure 1).As part of the previous experiments conducted in Fallon, three nonweighing lysimeters were used to measure the weekly water use (i.e., evapotranspiration) of alfalfa cultivated inside the lysimeters.Irrigation amounts and their dates of application were recorded, as well as the dates of harvests and alfalfa hay yields obtained from each harvest.As part of the experiment conducted at Valley Road, three different irrigation treatments were applied to two alfalfa varieties, one marketed as a drought-tolerant variety (Ladak II, Great Basin Seeds, Ephraim, UT, USA) and the other as a highly productive variety (Stratica, Croplan, Arden Hills, MN, USA).The irrigation treatments consisted of a full irrigation treatment and two deficit irrigation treatments.For the full irrigation

Materials and Methods
The two alfalfa hay yield models were developed using data from previous experiments conducted from 1973 to 1981 at the Nevada Agricultural Research Station in Fallon, Nevada (NARS Fallon) [15], as well as data obtained during the first year (2021) of an ongoing experiment conducted at the Valley Road Field Laboratory (VRFL) in Reno, Nevada [16] (Figure 1).As part of the previous experiments conducted in Fallon, three non-weighing lysimeters were used to measure the weekly water use (i.e., evapotranspiration) of alfalfa cultivated inside the lysimeters.Irrigation amounts and their dates of application were recorded, as well as the dates of harvests and alfalfa hay yields obtained from each harvest.As part of the experiment conducted at Valley Road, three different irrigation treatments were applied to two alfalfa varieties, one marketed as a drought-tolerant variety (Ladak II, Great Basin Seeds, Ephraim, UT, USA) and the other as a highly productive variety (Stratica, Croplan, Arden Hills, MN, USA).The irrigation treatments consisted of a full irrigation treatment and two deficit irrigation treatments.For the full irrigation treatment, 100% of the water required to replenish the soil water content in the root zone to field capacity was applied.For the deficit irrigation treatments, 80% and 60% of the water amounts required by the full irrigation treatment were applied.For each season (year), there is alfalfa hay yield data corresponding to four different harvests obtained at each location.Both datasets (Fallon and Valley Road) were used and analyzed as a whole.The analyzed dataset consisted of 127 samples, of which 55 samples were from Valley Road and the remaining from Fallon.The variables in Table 1 were calculated for each yield sample; these variables were later used as predictors in the models.Additional details of the recent experiment conducted at Valley Road can be found in [16]., where J is the Julian day of the current harvest.

Php ph
Photoperiod of previous harvest = Cos π 0.94 + J

183
, where J is the Julian day of the previous harvest.

Water
Average of the total rain and irrigation input during the yield formation period (YFP).

SRad
Daily mean solar radiation during YFP.

GDD
Total growing degree days during YFP.

Wind
Average daily wind speed during YFP.

Tn b5
Fraction of days during YFP with minimum daily temperature below 5 • C.

Tx b5
Fraction of days during YFP with maximum daily temperature below 5 • C.

Tn a20
Fraction of days during YFP with minimum daily temperatures above 20 • C.

Tx a30
Fraction of days during YFP with maximum daily temperatures above 30 • C.

DLH
Days since the last harvest.

Cut
Number of harvests (cut) in that season.E.g., 1 for the first harvest, 2 for the second, and so on.
The following five daily weather variables were considered for the analysis: rainfall, maximum and minimum temperatures, mean solar radiation, and average wind speed.Irrigation amounts and their dates of application were also considered.As rainfall amounts are negligible in the region during most growing seasons compared to the irrigation amounts, they were merged into a single variable (water) along with the irrigation.Weather data for NARS Fallon was obtained from the FALLON NAAS weather station through NOAA Web Services API [17].Weather data for Valley Road were obtained from the Reno (UNR Campus) weather station from RAWS USA Climate Archive data [18].UNR weather station is located at the same facilities where the Valley Road experiment was conducted; meanwhile, the FALLON NAAS weather station is approximately 6 km away from the Fallon experiment location.
Constraint tests were applied as a basic quality control to the weather data, following World Meteorological Organization (WMO) guidelines [19].Missing data imputation procedures were also applied.As the FALLON NAAS station does not have records for radiation and more than 70% of wind records are missing, NOAA-CIRES-DOE V3 reanalysis [20] was used to fill the gaps for those variables.For the rest of the variables of the NARS weather data, and all the variables of VRFL, less than 4% of the data was missing.For these data, an autoregressive integrated moving average (ARIMA) model was used to fill the gaps in each station.For the case of NARS, missing data were evenly distributed through the dataset, with a maximum gap of approximately 50 days for minimum temperature during the spring of 1984.For VRFL weather data, there was a maximum gap of 20 days for all the variables for the summer of 2020.
Whitmire et al. [11] used weather indicators to summarize the weather variables throughout the yield formation period (YFP), i.e., the period between harvests.A similar approach was followed in this study to obtain different weather and water management indicators derived from the five aforementioned variables, dates of harvest events, and irrigation amounts and their dates of application.The photoperiod is an important regulator of the biomass partitioning of the alfalfa crop [21].Consequently, the photoperiod during YFP was included as a weather indicator and was accounted for by using a cosine transformation of the Julian day so that 1 corresponds to the longest day of the year and −1 to the shortest day.All the weather and water management indicators used in the analysis can be found in Table 1.
Each vector of indicators was paired with a yield value in such a way that alfalfa hay yield is a function of weather and water management indicators: where W i is any weather or water management indicator and D is the number of indicators in the vector.
The yield accumulates over time as the crop grows and the biomass increases.This establishes a direct relationship between the yield and the yield accumulation time until harvest (DLH, days since last harvest).When the crop water requirements are fully satisfied, the first harvest of the season usually has the highest yield [22], as the first harvest immediately follows the dormancy period (highest DLH); moreover, having Yield as a function of DLH might lead to a spurious relation.To remove the effect of DLH over Yield, the response variable to be analyzed was defined as the mean increment in yield ∆Yield = Yield/DLH, where Yield is alfalfa hay yield with 13% of moisture content.
All the predictor variables, except for Cut, were standardized to compare the resulting coefficients of the model and assess the importance of each predictor in the estimated mean daily increment in yield.Feature importance was assessed based on the statistical significance (p-values) of every predictor variable in the model.The multicollinearity was evaluated using the variance inflation factor (VIF).The predictor variables of the final LR model were selected based on their statistical significance and independence from the other predictors (which are measured using the VIF).A sequential process was carried out to obtain the final model.In that process, one predictor was dropped at a time based on their significance and VIF.This was repeated until only statistically significant independent predictors were left in the model.The LR goodness of fit during the model selection was assessed using the adjusted R 2 coefficient since its calculation is penalized by the number of parameters in the model; thus, its calculation is penalized for less simple models.The effect and importance of the predictors were assessed using the coefficients of the model and effect plots.
A random forest model was fitted as an alternative yield-forecasting model that, unlike the LR model, can account for possible non-linear relations between predictor variables and alfalfa hay yield.An RF model is an ensemble, machine-learning model that consists of several different decision trees.A decision tree is a single predictor which makes a prediction based on information learned from the data.The output of an RF regressor is the average of the predictions of all the trees that are part of it.Unlike LR, an RF model can capture non-linear effects.However, to ensure the generalization and interpretability of the model, it is necessary to evaluate the model using new data.For this reason, the available data must be split into training and testing datasets.That results in less available data for learning compared to the LR model.
For the RF model, variable importance and feature selection were assessed using permutation-based feature importance metrics.The effect of the predictor variables was evaluated using partial dependency plots.As RF is prone to overfit, 75% of the data was used to train the model, and the remaining data were used for model performance evaluation.The dataset is evenly split among the different locations and the soil water conditions.However, as all the data for VRFL was obtained during the same season, at least half of the data for both training and testing corresponds to the 2021 season harvest, which might lead to a model that is biased towards conditions like those at that season.
The performance of both models was evaluated using the R 2 score as defined in Equation (1).R 2 represents the proportion of variance in the response that is explained by the independent variables in the model.The performance evaluation was performed for both ∆Yield and Yield, and the latter was calculated using the response variable as follows: All the data processing and analysis were performed using the R language [23].LR was fitted using the R base package.The 'randomForest' package was used to build the RF model [24] and the 'pdp' package was used to generate the partial dependency plots [25].All the RF model hyper-parameters were left as default except for the maximum number of terminal nodes and number of trees.The RF model's performance in terms of precision is expected to increase as the number of trees and nodes increases; however, the improvement obtained by including more nodes and trees diminishes as more and more are added [26].The number of trees and terminal nodes was set to 16 and 100, respectively, which are the values above which the model's performance no longer showed a significant enhancement.Other parameters, like the number of variables for each split, were left as default since changing their values did not result in a significant change in the model's performance.The RF model's predictors were selected using an iterative process.During that process, the least important variable was dropped at each iteration.This continued until the dropping of any of the remaining variables caused a notable decrease in the R 2 of the model.An increase in Mean Squared Error was used to assess predictor importance.

Linear Regression Model
The final LR model is represented by Equation (2), where ":" indicates an interaction term and "*" indicates an interaction term plus the interacting variables themselves.This means that Php ph and Cut are included only as their interaction but Tx a30 and Tx b5 are included as their interaction and as independent variables too.Thus, the term Php ch : Cut in Equation ( 2) is equal to ∑ 4 i=1 Php Cut i ch , where Php Cut i ch is the photoperiod for the i-th cut of the season; the term Tx a30 * Tx b5 is equal to Tx a30 × Tx b5 + Tx a30 + Tx b5 .When the coefficients in Table 2 are included in Equation ( 2), the final LR model is presented in Equation (3).The response variable had to be log-transformed to fit the LR model's assumptions.It should be noted that all the variables in Equation (2) except Cut were standardized, and that Cut is a categorical variable.The final LR model achieved an R 2 of 0.948 for log ∆Yield , 0.904 for ∆Yield, and 0.854 for Yield estimation.Figure 2 shows the yield values estimated using the model against observed yield values for both log ∆Yield (a) and Yield (b).It can be said from Figure 2 that the model captures the variability of the response variable.The resulting coefficients of the model are shown in Table 1.All the coefficients are statistically significant (p < 0.05) except for ℎ in the third and fourth harvests.
As expected,  has a positive effect on the yield.The linear response of the yield of herbaceous crops to water consumption has been widely reported in the literature [27]. exhibits a negative effect on the yield.Wind has been shown to increase evapotranspiration in highly advective arid and semi-arid regions [28,29].Wind not only increases evapotranspiration rates but concomitantly increases soil salinity due to the accumulation of water-soluble salts on the soil surface after evaporative losses in arid and semi-arid environments [30].This increase in salinity has a negative impact on plant growth and, It can be said from Figure 2 that the model captures the variability of the response variable.The resulting coefficients of the model are shown in Table 1.All the coefficients are statistically significant (p < 0.05) except for Php ch in the third and fourth harvests.
As expected, Water has a positive effect on the yield.The linear response of the yield of herbaceous crops to water consumption has been widely reported in the literature [27].
Wind exhibits a negative effect on the yield.Wind has been shown to increase evapotranspiration in highly advective arid and semi-arid regions [28,29].Wind not only increases evapotranspiration rates but concomitantly increases soil salinity due to the accumulation of water-soluble salts on the soil surface after evaporative losses in arid and semi-arid environments [30].This increase in salinity has a negative impact on plant growth and, ultimately, crop yield and may have been a contributing factor in this case [31].Furthermore, wind has been reported to be an important factor in the spread of diseases in alfalfa crops, and those diseases can have a negative impact on the yield [32].
The temperature coefficients in the LR model had the highest values among all indicators (Table 2), which suggests that temperature has the greatest effect on the yield model. Figure 3 depicts the interaction effect of Tx a30 and Tx b5 .It suggests that the effect of extreme high temperatures is even higher when extremely low temperatures have occurred during the same YFP.Extreme temperatures lead to stress in the crop, which reflects a decrease in the yield [33,34].The effect of low temperatures is also related to the dormant season.Dormant season in northern Nevada commonly lasts up to 6 months.During the dormant season, the crop is the most exposed to extreme low temperatures.In northern Nevada, this period accounts for approximately 28% of the yield for the first harvest [35].Such a low yield accumulation rate during the dormant season might be driving the model to consider low temperatures as an important variable.
The photoperiod affects the yield accumulation rate by driving biomass partitioning.When the photoperiod starts to decrease, the crop starts to allocate its assimilates to the roots to have enough reserves for the dormant season [22].It can be noticed that the effect of the photoperiod is the highest for the second harvest (Table 2).Usually, the second harvest occurs during summer, i.e., when the photoperiod is at its peak.According to the LR coefficients shown in Table 2, yield accumulation decreases with the length of the day during the harvests for the first cut, which is the opposite of the second harvest.
A possible explanation for the contrasting trend in the second harvest compared to the first is the combined effects of the increased intensity and duration of solar radiation and heat accumulation (growing degree days) during the period of the second harvest (longer days) relative to the first (shorter days).These aforementioned environmental factors are integral across the cycle of crop growth and development and, ultimately, for the yield of the crop, especially in cases where water or other production inputs (e.g., nutrients) are not a limiting factor [36,37].
A comparison of the observed yield against the estimated yield for Valley Road can be found in Figure 4.The LR model shows a good response to the variability in water.The LR model also showed its capability of capturing the response of the yield to different irrigation treatments.An alfalfa hay yield model capable of capturing the response of the crop to water can be used to analyze the effects of multiple irrigation management strategies on yield.Such a model can be incorporated into Decision Support Systems to improve the irrigation management of alfalfa.The effect of low temperatures is also related to the dormant season.Dormant season in northern Nevada commonly lasts up to 6 months.During the dormant season, the crop is the most exposed to extreme low temperatures.In northern Nevada, this period accounts for approximately 28% of the yield for the first harvest [35].Such a low yield accumulation rate during the dormant season might be driving the model to consider low temperatures as an important variable.
The photoperiod affects the yield accumulation rate by driving biomass partitioning.When the photoperiod starts to decrease, the crop starts to allocate its assimilates to the roots to have enough reserves for the dormant season [22].It can be noticed that the effect of the photoperiod is the highest for the second harvest (Table 2).Usually, the second harvest occurs during summer, i.e., when the photoperiod is at its peak.According to the LR coefficients shown in Table 2, yield accumulation decreases with the length of the day during the harvests for the first cut, which is the opposite of the second harvest.
A possible explanation for the contrasting trend in the second harvest compared to the first is the combined effects of the increased intensity and duration of solar radiation and heat accumulation (growing degree days) during the period of the second harvest (longer days) relative to the first (shorter days).These aforementioned environmental factors are integral across the cycle of crop growth and development and, ultimately, for the yield of the crop, especially in cases where water or other production inputs (e.g., nutrients) are not a limiting factor [36,37].
A comparison of the observed yield against the estimated yield for Valley Road can be found in Figure 4.The LR model shows a good response to the variability in water.The LR model also showed its capability of capturing the response of the yield to different irrigation treatments.An alfalfa hay yield model capable of capturing the response of the crop to water can be used to analyze the effects of multiple irrigation management strategies on yield.Such a model can be incorporated into Decision Support Systems to improve the irrigation management of alfalfa.Figure 5 shows the estimated and observed total seasonal yield for Fallon and Valley Road.With an R 2 of 0.863, it can be noted that the model generalizes the seasonal yield well.The Root Mean Squared Error for the estimated seasonal yield was 1.68 t ha −1 .This value has to be considered as the mean error of the LR model for the seasonal yield prediction.

Random Forest Model
The resulting RF model has five variables as its predictor variables.In order of importance, the predictor variables of the RF model are , , ℎ , and ,  .Figure 5 shows the estimated and observed total seasonal yield for Fallon and Valley Road.With an R 2 of 0.863, it can be noted that the model generalizes the seasonal yield well.The Root Mean Squared Error for the estimated seasonal yield was 1.68 t ha −1 .This value has to be considered as the mean error of the LR model for the seasonal yield prediction. Figure 5 shows the estimated and observed total seasonal yield for Fallon and Valley Road.With an R 2 of 0.863, it can be noted that the model generalizes the seasonal yield well.The Root Mean Squared Error for the estimated seasonal yield was 1.68 t ha −1 .This value has to be considered as the mean error of the LR model for the seasonal yield prediction.

Random Forest Model
The resulting RF model has five variables as its predictor variables.In order of importance, the predictor variables of the RF model are , , ℎ , and ,  .

Random Forest Model
The resulting RF model has five variables as its predictor variables.In order of importance, the predictor variables of the RF model are DLH, Water, Php ph , and Cut, Tn b5 .R 2 for the estimation of ∆Yield for the RF model was 0.897 for the training dataset and 0.861 for the testing dataset.R 2 for the estimation of Yield was 0.793 (Figure 6).The presence   Figure 7 shows the partial dependency plots (PDPs) for the continuous predictor variables of the RF model.The PDPs showed the effect of the variation of a single variable while the outstanding variables remained constant in the model.The behavior of  (Figure 7a) is highly related to the dormant season.The first harvests are more influenced by  than the latter harvests; therefore, it is expected to have lower yield accumulation rates as the days since the past harvest increase.Figure 7 shows the partial dependency plots (PDPs) for the continuous predictor variables of the RF model.The PDPs showed the effect of the variation of a single variable while the outstanding variables remained constant in the model.The behavior of DLH (Figure 7a) is highly related to the dormant season.The first harvests are more influenced by DLH than the latter harvests; therefore, it is expected to have lower yield accumulation rates as the days since the past harvest increase.R 2 for the estimation of  for the RF model was 0.897 for the training dataset and 0.861 for the testing dataset.R 2 for the estimation of  was 0.793 (Figure 6).The presence of similar metrics on the training and testing datasets is an indicator of the capacity of the generalization of the model.Based on the R 2 , the RF model did not outperform the accuracy of the simpler LR model.Furthermore, when calculating R 2 for Valley Road (2021 season) and NARC (1974-1981 season), separately, we obtained values of 0.738 and 0.789, respectively.These values show that the model did not overfit towards the most abundant season in the dataset (2021).Figure 7 shows the partial dependency plots (PDPs) for the continuous predictor variables of the RF model.The PDPs showed the effect of the variation of a single variable while the outstanding variables remained constant in the model.The behavior of  (Figure 7a) is highly related to the dormant season.The first harvests are more influenced by  than the latter harvests; therefore, it is expected to have lower yield accumulation rates as the days since the past harvest increase.The response of alfalfa hay yield to water (Figure 7b) exhibits a nearly logistic behavior.It is reasonable to expect this shape from this predictor variable since there will be some point at which the positive effect of the applied water on the yield will be minimum or null.The photoperiod shows a linear behavior; therefore, it does not reach a constant value within the defined range (Figure 7c), which suggests that the yield would continue increasing as the length of the day increases if there was no other limiting factor.
The occurrence of extremely low temperatures (Figure 7d) follows the same trend as in the LR model, even though the shape is not linear.Like DLH, the behavior of this curve is highly influenced by the dormant season (winter).Overall, the curve suggests a constant yield decrease when the percentage of days within the YFP with temperatures below 5 • C is over 50%.
The relative importance of the most relevant predictor variables can be assessed from the range of the y-axis in the partial dependency plots.Taking that into consideration, Water and DLH could be the predictor variables with the highest effect on alfalfa hay yield.However, based on the slope of the curves, one can tell that the photoperiod (DLH) and temperature (Tn b5 ) can have a greater effect within a very short range of their values.
It must be said that other variables are relevant for the estimation of alfalfa hay yield.For instance, radiation is expected to have a significant importance on the yield formation process.However, as the data were obtained from irrigated alfalfa plots, the water consumption is proportional to the crop evapotranspiration, and, thus, it is also proportional to the radiation amount.
Overall, both models (RF and LR) estimated alfalfa hay yield for two locations in northern Nevada with enough accuracy to be used to analyze the effects of different irrigation management scenarios on yield.The exposed relationship between weather, irrigation, and yield is well established and supported by prior physiological knowledge of the crop [27].Even though the LR model showed the best performance, the weather-wateryield relationship might exhibit non-linear relationships under non-conventional irrigation management strategies, such as regulated deficit irrigation [38], where deficit irrigation is practiced at stages when the crop is less sensitive to water stress and avoided at stages when the crop is more sensitive to water stress.For this reason, it is important to advance our understanding of this relationship using non-linear models.The analysis performed in this work should be replicated, including more samples and exploring different nonlinear models.

Conclusions
Linear models are good approaches with which to understand and explain yield accumulation in an irrigated alfalfa crop in northern Nevada.The LR model achieved an R 2 of 0.854 and the RF model an R 2 of 0.793 for the prediction of yield.Both water and the incidence of extreme temperatures are the most important weather features for the alfalfa yield accumulation process observed in northern Nevada.Having an indicator of the dormant season is important when modeling the alfalfa crop yield.
A linear model can capture and generalize the effect of water on the alfalfa crop; therefore, it is a good model to consider for a decision support system that assists the irrigation management of alfalfa.
Further work is recommended to perform a similar analysis that includes additional data, as well as to more accurately represent the effect of the dormant season.This will lead to a better understanding of the effect of temperature and dormancy on the alfalfa crop yield.This could also help to identify responses in other variables.
The performed analysis, by itself, is not enough to establish a direct causal relationship between the studied variables and the crop yield.Physiological studies are necessary to support those relations that, so far, are only suggested by data and models.

Figure 1 .
Figure 1.(a) Locations of Valley Road Field Laboratory (VRFL) and Nevada Agricultural Research Station (NARS), where the data used to train the linear regression (LR) and random forest (RF) models were collected.Climographs for both locations were generated using weather data for the study period in each location: 2020-2021 for VRFL (b) and 1973-1981 for NARS (c).

Figure 1 .
Figure 1.(a) Locations of Valley Road Field Laboratory (VRFL) and Nevada Agricultural Research Station (NARS), where the data used to train the linear regression (LR) and random forest (RF) models were collected.Climographs for both locations were generated using weather data for the study period in each location: 2020-2021 for VRFL (b) and 1973-1981 for NARS (c).

Figure 2 .
Figure 2.Estimated yield values obtained by the LR model vs. observed yield: (a) Fitted model using the log of the mean daily increment in yield.(b) Yield at harvest calculated using the fitted model.Cut label corresponds to different harvests throughout the season.Red line is the 1:1 line.

Figure 2 .
Figure 2.Estimated yield values obtained by the LR model vs. observed yield: (a) Fitted model using the log of the mean daily increment in yield.(b) Yield at harvest calculated using the fitted model.Cut label corresponds to different harvests throughout the season.Red line is the 1:1 line.

Figure 3 .
Figure 3.Effect of the occurrence of extreme high temperatures interacting with the occurrence of extreme low temperatures through the yield formation period.The effect of the occurrence of extreme high temperatures increases as the occurrence of extreme low temperatures increases.

Figure 3 .
Figure 3.Effect of the occurrence of extreme high temperatures interacting with the occurrence of extreme low temperatures through the yield formation period.The effect of the occurrence of extreme high temperatures increases as the occurrence of extreme low temperatures increases.

Figure 4 .
Figure 4.The estimated yield obtained from the LR model vs. observed yield for Valley Road.Cut label corresponds to different harvests.The treatment label matches different irrigation treatments (100%, 80%, and 60% of soil water depletion replenishment to field capacity).Red line is the 1:1 line.

Figure 5 .
Figure 5. Observed yield vs. estimated yield obtained using the LR model for the entire season for both locations.Red line is the 1:1 line.

Figure 4 .
Figure 4.The estimated yield obtained from the LR model vs. observed yield for Valley Road.Cut label corresponds to different harvests.The treatment label matches different irrigation treatments (100%, 80%, and 60% of soil water depletion replenishment to field capacity).Red line is the 1:1 line.

Figure 4 .
Figure 4.The estimated yield obtained from the LR model vs. observed yield for Valley Road.Cut label corresponds to different harvests.The treatment label matches different irrigation treatments (100%, 80%, and 60% of soil water depletion replenishment to field capacity).Red line is the 1:1 line.

Figure 5 .
Figure 5. Observed yield vs. estimated yield obtained using the LR model for the entire season for both locations.Red line is the 1:1 line.

Figure 5 .
Figure 5. Observed yield vs. estimated yield obtained using the LR model for the entire season for both locations.Red line is the 1:1 line.
of similar metrics on the training and testing datasets is an indicator of the capacity of the generalization of the model.Based on the R 2 , the RF model did not outperform the accuracy of the simpler LR model.Furthermore, when calculating R 2 for Valley Road (2021 season) and NARC (1974-1981 season), separately, we obtained values of 0.738 and 0.789, respectively.These values show that the model did not overfit towards the most abundant season in the dataset (2021).

R 2
for the estimation of  for the RF model was 0.897 for the training dataset and 0.861 for the testing dataset.R 2 for the estimation of  was 0.793 (Figure6).The presence of similar metrics on the training and testing datasets is an indicator of the capacity of the generalization of the model.Based on the R 2 , the RF model did not outperform the accuracy of the simpler LR model.Furthermore, when calculating R 2 for Valley Road (2021 season) and NARC (1974-1981 season), separately, we obtained values of 0.738 and 0.789, respectively.These values show that the model did not overfit towards the most abundant season in the dataset (2021).

Figure 6 .
Figure 6.Observed yield vs. estimated yield obtained using the RF model for every cut in both locations.Red line is the 1:1 line

Figure 7 .
Figure 7. Partial dependency plots for four of the most relevant predictor variables identified for the RF model: (a) Days from past harvest; (b) Mean applied Water; (c) Photoperiod at previous harvest; (d) Days with temperatures below 5 o C. Black line indicates the actual RF output.The blue line is a smooth line fitted to the RF output.

Figure 6 .
Figure 6.Observed yield vs. estimated yield obtained using the RF model for every cut in both locations.Red line is the 1:1 line.

Figure 6 .
Figure 6.Observed yield vs. estimated yield obtained using the RF model for every cut in both locations.Red line is the 1:1 line

Figure 7 .
Figure 7. Partial dependency plots for four of the most relevant predictor variables identified for the RF model: (a) Days from past harvest; (b) Mean applied Water; (c) Photoperiod at previous harvest; (d) Days with temperatures below 5 o C. Black line indicates the actual RF output.The blue line is a smooth line fitted to the RF output.

Figure 7 .
Figure 7. Partial dependency plots for four of the most relevant predictor variables identified for the RF model: (a) Days from past harvest; (b) Mean applied Water; (c) Photoperiod at previous harvest; (d) Days with temperatures below 5 • C. Black line indicates the actual RF output.The blue line is a smooth line fitted to the RF output.

Table 1 .
Weather and water management indicators used in the analysis.

Table 2 .
Resulting coefficients for the LR model.

Table 2 .
Resulting coefficients for the LR model.