Prediction of Crop Yield for New Mexico Based on Climate and Remote Sensing Data for the 1920–2019 Period

: Agricultural production systems in New Mexico (NM) are under increased pressure due to climate change, drought, increased temperature, and variable precipitation, which can affect crop yields, feeds, and livestock grazing. Developing more sustainable production systems requires long-term measurements and assessment of climate change impacts on yields, especially over such a vulnerable region. Providing accurate yield predictions plays a key role in addressing a critical sustainability gap. The goal of this study is the development of effective crop yield predictions to allow for a better-informed cropland management and future production potential, and to develop climate-smart adaptation strategies for increased food security. The objectives were to (1) identify the most important climate variables that signiﬁcantly inﬂuence and can be used to effectively predict yield, (2) evaluate the advantage of using remotely sensed data alone and in combination with climate variables for yield prediction, and (3) determine the signiﬁcance of using short compared to long historical data records for yield prediction. This study focused on yield prediction for corn, sorghum, alfalfa, and wheat using climate and remotely sensed data for the 1920–2019 period. The results indicated that the use of normalized difference vegetation index (NDVI) alone is less accurate in predicting crop yields. The combination of climate and NDVI variables provided better predictions compared to the use of NDVI only to predict wheat, sorghum, and corn yields. However, the use of a climate only model performed better in predicting alfalfa yield. Yield predictions can be more accurate with the use of shorter data periods that are based on region-speciﬁc trends. The identiﬁcation of the most important climate variables and accurate yield prediction pertaining to New Mexico’s agricultural systems can aid the state in developing climate change mitigation and adaptation strategies to enhance the sustainability of these systems.


Introduction
The need for an increased food security has been the main global concern to meet the demand for a growing population that is expected to reach about 10 billion by 2050 [1]. However, crop yield, and thus the sustainability of agricultural production systems, is confronted by a range of factors that include agronomic practice (e.g., farming technologies, fertilizer applications, and irrigation methods), extreme weather events, limited water supply, variable environmental and economic conditions, among others [2][3][4]. Globally, it is evident that the accelerated pace of climate change can stress these systems beyond their ability to adapt [5][6][7]. Particularly, rising temperature, increased precipitation variability, and persistent and prolonged droughts have become the major sustainability challenges in regions prone to these conditions, such as the Southwestern United States (US), including New Mexico (NM) state [7][8][9][10]. With an increased sense of urgency, prediction of crop yield over New Mexico is critically needed, as recent projections indicated that the state would The study area covers the entire state of New Mexico in southwestern US ( Figure 1). NM's main forage crops include alfalfa, wheat, corn, and sorghum, which are mostly used as feeds and pastures for livestock and wildlife. NM's agriculture systems are characterized by less diverse crop production due to its normally dry climatic conditions and it is expected to get even warmer and significantly drier [49]. New Mexico is the sixth-fastest-warming state in the US [11], with an average annual temperature that showed an increasing trend, with a 0.6 • F increase per decade since 1970 or about 2.7 • F over 45 years [50]. NM has relatively low precipitation amounts, as it is among the five driest states in the US [51][52][53]. To predict crop yield, this study used three different datasets that include climate variables, remote-sensing-based NDVI, and crop yield.
remote-sensing-based NDVI were used to develop regression models for crop yield prediction in New Mexico. A piecewise approach was also used to evaluate observed shifts in crop yields over the past 100 years of available record. Evaluating climate and remote sensing variables and averaging periods can aid in accurate yield prediction and further improve our understanding of climate change impacts on the sustainability of agriculture systems [4].

Study Area
The study area covers the entire state of New Mexico in southwestern US ( Figure 1). NM's main forage crops include alfalfa, wheat, corn, and sorghum, which are mostly used as feeds and pastures for livestock and wildlife. NM's agriculture systems are characterized by less diverse crop production due to its normally dry climatic conditions and it is expected to get even warmer and significantly drier [49]. New Mexico is the sixth-fastestwarming state in the US [11], with an average annual temperature that showed an increasing trend, with a 0.6°F increase per decade since 1970 or about 2.7°F over 45 years [50]. NM has relatively low precipitation amounts, as it is among the five driest states in the US [51][52][53]. To predict crop yield, this study used three different datasets that include climate variables, remote-sensing-based NDVI, and crop yield.

Climate Variables
The climate data used in this analysis include annual minimum, maximum, and mean temperature (°C) (referred to as Tmean, Tmin, and Tmax, respectively), precipitation (P) (mm), and minimum and maximum vapor-pressure deficit (hPa) (referred to as VPDmin and VPDmax, respectively) from 1920 to 2019. The data were obtained from NOAA-NCEI climate database [50]. These variables were averaged over the water year and growing season. The total precipitation for the annual water year and growing season were referred to as Pwy and Pgs, the growing season mean, minimum, and maximum average tem-

Climate Variables
The climate data used in this analysis include annual minimum, maximum, and mean temperature ( • C) (referred to as T mean , T min , and T max , respectively), precipitation (P) (mm), and minimum and maximum vapor-pressure deficit (hPa) (referred to as VPD min and VPD max , respectively) from 1920 to 2019. The data were obtained from NOAA-NCEI climate database [50]. These variables were averaged over the water year and growing season. The total precipitation for the annual water year and growing season were referred to as P wy and P gs , the growing season mean, minimum, and maximum average temperature were referred to as T meangs , T mings , and T maxgs , and growing season average minimum and maximum VPD referred to as VPD mings and VPD maxgs , respectively. Figure 2 shows the trend of the selected climate variables from 1920 to 2019.
Land 2021, 10, x FOR PEER REVIEW 5 of 29 perature were referred to as Tmeangs, Tmings, and Tmaxgs, and growing season average minimum and maximum VPD referred to as VPDmings and VPDmaxgs, respectively. Figure 2 shows the trend of the selected climate variables from 1920 to 2019.

Remote Sensing Data
Remotely sensed NDVI can be used to describe crop growth conditions (i.e., greenness and leaf area index) as it uses the near-IR and red bands of multispectral images, such as those from Landsat sensors as (NIR − Red)/(NIR + Red). The NDVI ranges from −1.0 to 1.0. Landsat 32-day NDVI were obtained from all Landsat sensors that were acquired in each 32-day period between the 1st and 352 nd day of each year [56]. Landsat [57,58]. A range of NDVI values from 0.25 to 0.85 was selected within the crop-type masks to exclude bare lands from other vegetation. In addition to Landsat 32-day, Landsat 8-day NDVI product was also used to derive the relationship of alfalfa yield and NDVI. The growing season of annual crops can be easily described based on the planting and harvesting dates. However, the growing season of perennial crops, such as alfalfa, is complex, with variable harvesting schedules, varieties, and water availability, which might make it challenging to establish a relationship between NDVI and yield.

Crop Yield Data
Based on crop area statistics of 2017, alfalfa, corn, sorghum, and winter wheat (referred to herein as wheat) represented ~47% of the total harvested area in NM. These crops are used as feeds for livestock, except wheat, which is also consumed as food grain. Yield data were obtained from the USDA NASS for the 1920-2019 period ( Figure 3). The water year for NM spans from October to September [59]. The planting and harvesting dates of

Remote Sensing Data
Remotely sensed NDVI can be used to describe crop growth conditions (i.e., greenness and leaf area index) as it uses the near-IR and red bands of multispectral images, such as those from Landsat sensors as (NIR − Red)/(NIR + Red). The NDVI ranges from −1.0 to 1.0. Landsat 32-day NDVI were obtained from all Landsat sensors that were acquired in each 32-day period between the 1st and 352nd day of each year [56].
Landsat 32-day NDVI composites of 30-m spatial resolution from 1984 to 2016 were used to derive monthly crop-specific NDVI values for each crop type based on a mask crop map generated from the Cropland Data Layers (CDL) (https://nassgeodata.gmu. edu/CropScape/, accessed 1 December 2021). For NM, the CDL data are available from 2008 to present (https://www.nass.usda.gov/Research_and_Science/Cropland/Release/, accessed 1 December 2021). Because the crop type information prior to 2008 was not available, two nominal crop-type masks for the periods of 1984-2008 and 2009-2016 were used to assess crop-specific NDVI. Google Earth Engine (GEE) platform was used in this process. The different crop development stages during a growing season exhibit specific NDVI values and can effectively be used to distinguish between crops [57,58]. A range of NDVI values from 0.25 to 0.85 was selected within the crop-type masks to exclude bare lands from other vegetation. In addition to Landsat 32-day, Landsat 8-day NDVI product was also used to derive the relationship of alfalfa yield and NDVI. The growing season of annual crops can be easily described based on the planting and harvesting dates. However, the growing season of perennial crops, such as alfalfa, is complex, with variable harvesting schedules, varieties, and water availability, which might make it challenging to establish a relationship between NDVI and yield.

Crop Yield Data
Based on crop area statistics of 2017, alfalfa, corn, sorghum, and winter wheat (referred to herein as wheat) represented~47% of the total harvested area in NM. These crops are used as feeds for livestock, except wheat, which is also consumed as food grain. Yield data were obtained from the USDA NASS for the 1920-2019 period ( Figure 3). The water year for NM spans from October to September [59]. The planting and harvesting dates of selected crops were collected from the USDA NASS to identify the growing seasons [60] (Table 1).

Crop Yield Prediction Approach
This study followed two main analysis steps. First, a correlation analysis was performed to evaluate the colinear relationships between the predictors (i.e., climate and NDVI) and yield. Second, a stepwise modeling approach along with different forms of regression models (e.g., linear, polynomial, and spline cubic) were used to evaluate their ability to accurately predict crop yield.

Correlation Analysis
A parametric correlation analysis was performed to evaluate the collinearity between all predictors (i.e., variables) and crop yield to identify the most effective ones in developing robust prediction models. Pearson's correlation coefficient (R) was used to evaluate the linear dependencies between two pairs of variables. A correlation close to 0 indicates that the two variables are independent of each other-a change in a variable does not reflect a relative change in the other variable. The Pearson's correlation coefficient was used to evaluate correlation between all potential variables (about 13) and yield individually. The correlation between all potential variables was evaluated using a combination of Pearson's correlation coefficient and principal component analysis (PCA) [61][62][63]. The PCA was used to select the most significant variable based on the variance explained by the principal components, and the corresponding R between them was used to select those that are less correlated with each other and simultaneously correlated with yield. The statistical software R was used to conduct the PCA and Pearson's correlation. The identified variables were then evaluated individually and in combination to develop the most effective regression models for yield prediction. This approach of selecting important climate variables followed what is generally referred to as the stepwise method.

Crop Yield Prediction Approach
This study followed two main analysis steps. First, a correlation analysis was performed to evaluate the colinear relationships between the predictors (i.e., climate and NDVI) and yield. Second, a stepwise modeling approach along with different forms of regression models (e.g., linear, polynomial, and spline cubic) were used to evaluate their ability to accurately predict crop yield.

Correlation Analysis
A parametric correlation analysis was performed to evaluate the collinearity between all predictors (i.e., variables) and crop yield to identify the most effective ones in developing robust prediction models. Pearson's correlation coefficient (R) was used to evaluate the linear dependencies between two pairs of variables. A correlation close to 0 indicates that the two variables are independent of each other-a change in a variable does not reflect a relative change in the other variable. The Pearson's correlation coefficient was used to evaluate correlation between all potential variables (about 13) and yield individually. The correlation between all potential variables was evaluated using a combination of Pearson's correlation coefficient and principal component analysis (PCA) [61][62][63]. The PCA was used to select the most significant variable based on the variance explained by the principal components, and the corresponding R between them was used to select those that are less correlated with each other and simultaneously correlated with yield. The statistical software R was used to conduct the PCA and Pearson's correlation. The identified variables were then evaluated individually and in combination to develop the most effective regression models for yield prediction. This approach of selecting important climate variables followed what is generally referred to as the stepwise method.

Regression Models
The linear regression models used to predict yield based on single or multiple predictors [64,65] followed a typical formula (Equation (1)).
where y i represents crop yield, β i is the regression coefficient of variable (predictor) x i , and n is the total number of variables used. Multiple linear regression models were developed following two different approaches based on the available period of record by using (1) the entire historic record (100 years) and (2) piecewise segmented periods of records.

Long Time Period Regression
The simplest model included a single predictor (i.e., variable). Additional variables were included in a model based on the order of their correlation with each other and with crop yield. These models were also evaluated using different averaging periods for climate variables (i.e., the growing season and water year periods) and months for NDVI. All models' combinations were also evaluated using only climate, NDVI, or combinations of climate and NDVI variables.

Piecewise Regression
The crop yield trend, growth rates, and variability can be categorized into four groups based on crop yield patterns that indicate conditions when yield never improved, stagnated, collapsed, or was still increasing [66]. Widespread decadal-scale changes (denoted herein as breaks) have been observed in yield variability and growth rates since the 1930s that also substantially varied regionally [67]. Some examples of decadal-scale yield trends and variability of selected crops shown in Table 2 can be used to identify the breaks in crop yield patterns. These breaks in yield patterns can be used to guide the development of an effective regression model. Thus, previous studies have focused on shorter periods [66,68]. However, the choice of time period for statistical analysis of crop yield variability can impact any conclusions that can be drawn about the trends in yield variability during the 20th century, as indicated by [67]. Due to the appearance of significant yield variability over specific periods of time, yield data were segmented using these observed break points (patterns) in the yield of each crop ( Table 2). These break points can be caused by a number of factors, such as farming technology, irrigation, and management policies, that are beyond the scope of this analysis to identify. For the segmented periods of time, an alternative and more appropriate regression analysis (e.g., piecewise linear regression) was used to fit shorter data periods to different linear trends in crop yield. A typical regression equation can be seen in the form of Equation (2) as: where k 1 is the x location of the first break point, k 2 is the x location of the second break point, and so forth, until the last break point k nb for nb number of break points; y(x) is the crop yield, in tons per hectare; β i is the regression coefficient of variable x i ; and the independent error terms ε i follow a normal distribution with mean 0 and equal variance.

Models Evaluation
The goodness of fit of the developed models was evaluated using the coefficient of determination (R 2 ) as an indication of the explained variance, the root mean square error (RMSE) as a measure of prediction error, and the modified index of agreement (d) [75]. The index of agreement is a standardized measure of the degree of model prediction error, which varies between 0 and 1 and represents the ratio of the mean square error and the potential error. The agreement value of 1 indicates a perfect match, and 0 indicates no agreement at all [76]. The correlation relationships were arranged in decreasing order of correlation coefficients and helped in selecting the variables that can be used to develop the multiple regression models. Ranking from high to low was used to select the first important variable for yield prediction, followed by a stepwise inclusion of the second, third, and fourth important climate variables in developing multiple regression models. The first selected variable was treated as the most influential one for yield ( Figure 4a) for each crop type and was first used individually. The models with the best yield prediction were subsequently used in piecewise analysis.

Climate Variables, NDVI, and Crop Yield
A summary of the results of the correlation analysis between crop yield and climate variables averaged over the growing season and water year is shown in Figure 4a. The climate variables showed varying effects on yield depending on the averaging period. The maximum temperature averaged over the growing season (Tmaxgs) was negatively correlated (R = −0.09, p < 0.05) with sorghum yield. As expected, the precipitation amounts during the growing season, water year, and calendar year showed pronounced positive correlation with all the crops. The minimum vapor pressure deficit of the growing season (VPDmings) was highly correlated with all crop yields. Figure 4b indicated that monthly NDVI, with varying months depending on crop type, was strongly correlated with corn yield (R = 0.5, p = 0.0001) during the growing season. The May NDVI showed the highest correlation with sorghum (R = 0.16, p < 0.05), while that of July and August showed the highest correlation with corn (R = 0.52, p < 0.05) and wheat (R = 0.31, p < 0.05).
The climate variables and monthly NDVI were ranked from high to low based on the significance of correlation with yield using R. The climate variables that were significantly correlated with wheat included Pwy, Tmings, Pgs, VPDmings, VPDmin, and P (with R of 0.36, 0.3, 0.29, 0.28, 0.27, and 0.25); corn included VPDmings, Tmin, Tmean, and VPDmin (with R of 0.63, 0.54, 0.51, and 0.44); sorghum included VPDmings, VPDmin, Pwy, P, and Pgs (with R of 0.28, 0.23, 0.22, 0.21, and 0.21); and alfalfa included VPDmings, Tmean, Tmin, Tmax, Tmeangs, and VPDmin (with R of 0.43, 0.39, 0.38, 0.33, 0.31, and 0.31), respectively. These climate variables were used individually to develop the regression models (see Section 3.2). The value of May, July, and August NDVI, which are referred to herein as NDVI5, NDVI7, and NDVI8, were significantly correlated with the yield of sorghum, corn, and wheat, respectively.  Similarly, the most influential monthly NDVI was selected for constructing climateand NDVI-based regression models. The most important monthly NDVI was used in combination with climate variables and compared with climate only regression models.

Climate Variables, NDVI, and Crop Yield
A summary of the results of the correlation analysis between crop yield and climate variables averaged over the growing season and water year is shown in Figure 4a. The climate variables showed varying effects on yield depending on the averaging period. The maximum temperature averaged over the growing season (T maxgs ) was negatively correlated (R = −0.09, p < 0.05) with sorghum yield. As expected, the precipitation amounts during the growing season, water year, and calendar year showed pronounced positive correlation with all the crops. The minimum vapor pressure deficit of the growing season (VPD mings ) was highly correlated with all crop yields. Figure 4b indicated that monthly NDVI, with varying months depending on crop type, was strongly correlated with corn yield (R = 0.5, p = 0.0001) during the growing season. The May NDVI showed the highest correlation with sorghum (R = 0.16, p < 0.05), while that of July and August showed the highest correlation with corn (R = 0.52, p < 0.05) and wheat (R = 0.31, p < 0.05). The climate variables and monthly NDVI were ranked from high to low based on the significance of correlation with yield using R. The climate variables that were significantly correlated with wheat included P wy , T mings , P gs , VPD mings , VPD min , and P (with R of 0.36, 0.3, 0.29, 0.28, 0.27, and 0.25); corn included VPD mings , T min , T mean , and VPD min (with R of 0.63, 0.54, 0.51, and 0.44); sorghum included VPD mings , VPD min , P wy , P, and P gs (with R of 0.28, 0.23, 0.22, 0.21, and 0.21); and alfalfa included VPD mings , T mean , T min , T max , T meangs , and VPD min (with R of 0.43, 0.39, 0.38, 0.33, 0.31, and 0.31), respectively. These climate variables were used individually to develop the regression models (see Section 3.2). The value of May, July, and August NDVI, which are referred to herein as NDVI5, NDVI7, and NDVI8, were significantly correlated with the yield of sorghum, corn, and wheat, respectively.
The correlation analysis of alfalfa yield with 8-day NDVI values was also performed within the growing season. A significant correlation could not be established due to the complexity of harvesting and planting dates, alfalfa varieties, climate, water availability, and topography.

Selection of Significant Climate Variables
It appeared that four to six variables can be used to predict crop yield. Using a smaller number of variables can be more effective, especially in regions with limited data. Moreover, collinearity issues can arise when using these variables in developing multilinear regression models. The Pearson correlation coefficients between all climate variables are shown in Figure 5, with dark blue and dark red colors indicating strong positive and negative correlation, respectively. The white background and light color shades indicate no and relatively weak correlation, respectively, between the variables and, therefore, they were used in combination with each other to develop prediction. The identified variables for corn include VPD mings , Tmings, and P wy , with Pearson correlation of 0.63, −0.05, and 0.18 ( Figure 4); those for wheat include P wy , T mings , and VPD mings , with Pearson correlation of 0.36, 0.06, and 0.19 ( Figure 4); those for sorghum include VPD mings and P wy , with Pearson correlation of 0.28 and 0.17 ( Figure 4); and those for alfalfa include VPD mings , P wy , and T mings , with Pearson correlation of 0.43, 0.17, and 0.29, respectively ( Figure 4).

Models Based on Climate Variables
The relationships between yield and climate variables were evaluated using three different regression models (i.e., linear, polynomial, and spline cubic). However, the regression coefficients of linear regression models were the only ones that showed significant correlations. Consequently, only the results of linear regression models are presented and discussed further. The developed linear regression models are referred to herein as C1, S1, W1, and A1 for corn, sorghum, wheat, and alfalfa, respectively, with the letters referring to the crops and number 1 referring to the order of the model in the list of evaluated models. The list of all evaluated regression models is shown in Table A1 (Appendix A). The total number of models that were evaluated for each crop was 10, 11, 16, and 9 for sorghum, corn, alfalfa, and wheat, respectively. The models with the highest R 2 and d, and lowest RMSE were S8, C14, A15, and W8, as summarized in Table 3. For example, the 10 models that were evaluated for sorghum (i.e., S1-S10) indicated that model S8 was the best one, with the highest R 2 of 0.81 (p-value < 0.05), d of 0.35, and the lowest RMSE of 1.18 tons/ha using two climate variables that include the VPD mings and either P gs , P, or P wy . In other words, model S8 suggested that using VPD min (averaged over the growing season) along with precipitation (averaged either over growing season, water year, or entire year) can be more effective in predicting sorghum yield. The results indicated that VPD min and mean temperature T mean , both averaged over the growing season, were more effective in predicting corn yield. For both alfalfa and wheat, it appeared that VPD min and T min (both averaged over the growing season), and precipitation (averaged over the water year) were more effective in predicting their yields. and relatively weak correlation, respectively, between the variables and, therefore, they were used in combination with each other to develop prediction. The identified variables for corn include VPDmings, Tmings, and Pwy, with Pearson correlation of 0.63, −0.05, and 0.18 ( Figure 4); those for wheat include Pwy, Tmings, and VPDmings, with Pearson correlation of 0.36, 0.06, and 0.19 ( Figure 4); those for sorghum include VPDmings and Pwy, with Pearson correlation of 0.28 and 0.17 ( Figure 4); and those for alfalfa include VPDmings, Pwy, and Tmings, with Pearson correlation of 0.43, 0.17, and 0.29, respectively (Figure 4).

Piecewise Models
The results of all crop yield prediction models that were evaluated using segmented data periods between 1920 and 2019 are shown in Tables 4 and A2 (Appendix B). The total number of models evaluated for each crop was 6, 11, 5, and 2 for sorghum, corn, alfalfa, and wheat, respectively. Based on the R 2 , d, and RMSE, the best regression models for each segment of the data period for each crop were identified ( Table 4). The best model for the selected data segments for sorghum was ST4 (i.e., sorghum time segment model number 4) for the 1920-1956 period, with the highest R 2 of 0.92, d of 0.39, and RMSE of 0.31 tons/ha (Table A3). Likewise, the best models with the highest R 2 and lowest RMSE for all segmented data periods for the four crops are listed in Table 4.

Models Based on NDVI and Climate Variables
The combined climate and NDVI regression models were developed for the 1984-2016 period because of data availability of remote sensing based NDVI. A number of models that combined climate and NDVI variables were evaluated and summarized in Table A3 (Appendix C). Based on the obtained R 2 and RMSE, the best model for corn was CT11N8 (i.e., CT11 refers to the model for the segmented data period number 11 and N8 refers to the monthly NDVI for August). The CT11N8 model resulted in the lowest RMSE of 0.76 tons/ha and highest R 2 of 0.99 and index of agreement of 0.54 (Table 5)

Summary of Selected Models
The best regression models based either on climate only, climate and NDVI, or segmented data periods are shown in Tables 3-5, respectively. It was clear that the combined climate and NDVI models (Table 5) provided higher R 2 and d and lower RMSE compared to those based on climate only (Table 3). These results suggested that including NDVI as a predictor had the potential to improve the performance of yield prediction models. A comparison between the models that used the entire timeseries ( Table 3) and those that  (Table 4) indicated that an improved performance of yield prediction can be achieved with the latter. A summary of all three model types is provided in Table 6. Climate and NDVI models were not reported for alfalfa crop because of the obtained insignificant NDVI models. The best yield prediction models were those based on a segmented data period of climate only for all evaluated crops.

Prediction with Climate Only
Using climate only models, two sets of yield predictions were developed (i.e., based on the entire and segmented data periods). In Figure 6, the left panel shows predicted yield using the entire period of record (i.e., 1920-2019) and the right panel shows those based on segmented data periods. Yield estimates based on the entire period of record significantly overestimated and underestimated observed yields before and after the 1960s, respectively. Yield estimates based on segmented data periods provided a much better match with observed yield during the different segments.

Prediction with Climate Only
Using climate only models, two sets of yield predictions were developed (i.e., based on the entire and segmented data periods). In Figure 6, the left panel shows predicted yield using the entire period of record (i.e., 1920-2019) and the right panel shows those based on segmented data periods. Yield estimates based on the entire period of record significantly overestimated and underestimated observed yields before and after the 1960s, respectively. Yield estimates based on segmented data periods provided a much better match with observed yield during the different segments.  Figure 6. The comparison of observed corn, sorghum, wheat, and alfalfa yields with their respective predicted yields resulting from the best models using 100 years and segmented data period scales. Figure 7 shows scatter plots of predicted and observed yield derived from the models that used segmented data periods. The points are closely distributed around the 1:1 line, except for a few outliers. The scatter plot of predicted and observed yield derived from the entire data period (i.e., 1920-2019) is presented in Figure 8, which shows a wide distribution of points around the 1:1 line. Figure 7. The scatter plot of observed and predicted corn, sorghum, wheat, and alfalfa yields resulting from the best models in segmented time scales. Figure 6. The comparison of observed corn, sorghum, wheat, and alfalfa yields with their respective predicted yields resulting from the best models using 100 years and segmented data period scales. Figure 7 shows scatter plots of predicted and observed yield derived from the models that used segmented data periods. The points are closely distributed around the 1:1 line, except for a few outliers. The scatter plot of predicted and observed yield derived from the entire data period (i.e., 1920-2019) is presented in Figure 8, which shows a wide distribution of points around the 1:1 line.
Land 2021, 10, x FOR PEER REVIEW 15 of 29 Figure 6. The comparison of observed corn, sorghum, wheat, and alfalfa yields with their respective predicted yields resulting from the best models using 100 years and segmented data period scales. Figure 7 shows scatter plots of predicted and observed yield derived from the models that used segmented data periods. The points are closely distributed around the 1:1 line, except for a few outliers. The scatter plot of predicted and observed yield derived from the entire data period (i.e., 1920-2019) is presented in Figure 8, which shows a wide distribution of points around the 1:1 line.    Figure 8. The scatter plot of observed and predicted corn, sorghum, wheat, and alfalfa yields resulting from the best models in long data period from 1920 to 2019.

Prediction with Climate and NDVI variables
A comparison between observed and predicted yield of corn, sorghum, and wheat derived based on the three models' combinations (i.e., climate only, climate and NDVI, and NDVI only) is shown in Figure 9. The period used to develop these predictions was from 1984 to 2016. This comparison indicated that the models listed in Table 4 (i.e., based on segmented data periods) provided better predictions compared to the others. The scatter plots of predicted and observed yield derived from climate only, climate and NDVI, and NDVI only for the 1984-2016 period are presented in Figure 10, which shows a wide distribution of points around the 1:1 line. Figure 8. The scatter plot of observed and predicted corn, sorghum, wheat, and alfalfa yields resulting from the best models in long data period from 1920 to 2019.

Prediction with Climate and NDVI Variables
A comparison between observed and predicted yield of corn, sorghum, and wheat derived based on the three models' combinations (i.e., climate only, climate and NDVI, and NDVI only) is shown in Figure 9. The period used to develop these predictions was from 1984 to 2016. This comparison indicated that the models listed in Table 4 (i.e., based on segmented data periods) provided better predictions compared to the others. The scatter plots of predicted and observed yield derived from climate only, climate and NDVI, and NDVI only for the 1984-2016 period are presented in Figure 10, which shows a wide distribution of points around the 1:1 line.

Crop Yield and Climate
Understanding the changes in growing season characteristics, including length, planting, and harvesting dates, is critical in evaluating crop yields. Crop growth, and thus yield, can be enhanced or suppressed by the environmental conditions that prevail prior to and during the growing, such as excessive heat, freeze, or limited water availability, Figure 10. The scatter plot of observed and predicted corn, sorghum, and wheat yields resulting from the best climate only, climate and NDVI, and NDVI only models (1984 to 2016).

Crop Yield and Climate
Understanding the changes in growing season characteristics, including length, planting, and harvesting dates, is critical in evaluating crop yields. Crop growth, and thus yield, can be enhanced or suppressed by the environmental conditions that prevail prior to and during the growing, such as excessive heat, freeze, or limited water availability, that can cause delay in planting dates and stress plant growth. Thus, careful selection of averaging period of climate variables plays a key role in capturing the variability of these conditions and developing effective yield prediction models [41]. While many studies used growing season [24,[77][78][79][80][81][82][83] and monthly averages of climate variables [84,85] as appropriate averaging periods, this study evaluated three averaging periods that include annual (i.e., January-December), growing season, and water year. The results indicated that the total amount of precipitation received during the three averaging periods provided similar prediction quality for sorghum yield. The total precipitation in a water year, and growing season and annual averaged vapor-pressure deficit were similarly significant predictors of wheat yield. Vapor-pressure deficit (VPD) averaged over the growing season followed by temperature were significant predictors of corn and alfalfa yield. Minimum VPD averaged over the entire year and growing season and total precipitation of all the three averaging periods were significant predictors of sorghum yield. Moreover, VPD max for all the crops; temperature for sorghum; and precipitation and VPD max for alfalfa were significant predictors of yield. As expected, different averaging periods and climate variables were significant predictors of yield depending on the agronomic and environmental factors.
One of the identified climate change impacts on crops is related to the observed and projected impacts on agronomic conditions and yield [10]. For instance, increased temperature has a significant impact on grain yield than on vegetative growth because of the increased minimum temperatures. These effects are evident with the observed increased rate of senescence, which reduces the ability of crops to efficiently fill the grain [38]. Moreover, increased temperature can result in heat stress for some crops (e.g., corn and wheat) and modest boost in yields depending on the region (wheat yield in northern China). However, globally, increased temperature has shown an overall decline in crop yields [86]. Increased nighttime temperatures during the grain production period can result in lower productivity and reduced quality. During pollination growing stage, increased temperature plays a critical role in grain crop development. The exposure to such temperatures at the onset of the reproductive stage can reduce grain production [7]. High VPD affects evapotranspiration, photosynthesis, nutrient uptake, and almost all plant processes, such as allocation, use of carbohydrate reserves, and growth. Compared to temperature and VPD, precipitation did not have a significant role in alfalfa yield prediction. Precipitation (during a water year) had a significant effect on wheat and sorghum. Increased wheat yield can be attributed to current trends of winter precipitation and temperature, while increased summer precipitation and maximum temperature adversely affected corn yield [87]. Prediction of wheat yield was highly influenced by water year precipitation. Winter wheat yield was not much correlated with the maximum temperature, as it is grown during the winter season, while minimum temperature has more effects. Thus, if minimum temperature changes in the future over NM, negative impacts on wheat yield would be realized.

Segmeneted Data and Yield Prediction
Yield prediction models that were developed in this study using a long period of records from 1920 to 2019 resulted in insignificant predictions ( Figure 6)-indicating their inability to depict long-term variability. The models intrinsically assumed that the observed trend in yields is solely due to environmental and related climate change impacts. However, the long timeseries of yields exhibited breakpoints (about 30-40 years) that were possibly due not only to climate change, but also some socioeconomic factors-which pose additional modeling challenges. These breakpoints in yield trend appeared historically in concurrence with the introduction of advanced technology, policy and management decisions, increased demand use of hybrid varieties, or changes in import-export chains based on market demands [67,[69][70][71][72][73][74].
Due to rapid adoption of hybrid corn in the late 1930s, a significant improvement (referred to as a miracle in some cases) was observed in grain yield from 1937 through 1955 [88]. Again, a second miracle of corn grain yield improvement was observed in the mid-1950s due to continued improvements in genetic yield potential and stress tolerance, increased adoption of nitrogen fertilizer, chemical pesticides, agricultural mechanization, and overall improved soil and crop management practices [88]. Lastly, a third corn grain yield improvement was observed in the mid-1990s with the advent and rapid adoption of transgenic hybrid traits (insect resistance, herbicide resistance). Leath et al. [89] mentioned an increased world demand that was responsible for nearly tripled corn production between 1950 and 1979. These significant improvements in the corn yield trend indicated three breaks in the corn yield trend from 1920 to 2019.
Jackson et al. [90] indicated that the development of the feedlot industry in the southwest U.S. has led to the need to increase the production of a major feed crop-sorghum as, prior to the 1950s, sorghum yields were relatively low. However, the adoption of hybrid varieties and new technologies coincided with a sharp rise in average yields during the late 60s and 70s. The increased trend in sorghum yield leveled off after 1970 due to a shift in production from irrigated sorghum to corn, low grain prices, and rising costs [90]. Another socioeconomic factor that had an impact on sorghum yield was related to the establishment of the United Sorghum Checkoff Program (USCP) in 2008, with its mission to increase yields through investment in research and increase the demand using marketing and promotion programs [69]. The observed historical increasing trends and leveling off in sorghum yield were concurrent with the two identified breaks in yield.
Wheat production has shown a relatively smaller increasing trend over the past quarter century compared to corn and sorghum, mainly because its rising yields have offset the decline in harvested areas that was observed since the 1990s. Harvested areas of wheat have dropped from their early 1980s highs, due mostly to declining economic returns relative to other crops and cropping choice flexibility provided under a number of government programs, such as the authorization of the Conservation Reserve Program (CRP) in the 1985 Farm Act and the planting flexibility provisions in the 1990 Farm Act [73]. These programs allowed wheat farmers to pursue the cultivation of other crops. Because of these two policies, the historical wheat yield data were segmented by one breakpoint in this analysis.
The observed historical increasing trend in alfalfa can be explained by an increase in regional market demand for forage crops due to an expanding dairy industry and its ability to compete with other profitable crops [91,92]. The dramatic increase in the dairy industry in NM that triggered the need for increased forage crop production, such as alfalfa, has been met with a number of forage production challenges, including limited water supplies and associated water costs, and the potential replacement of alfalfa with higher value crops that have been supported by government subsidy programs [91]. The expansion of the dairy industry has seen a significant increase since the 1970s, as the herd size of dairy cows more than doubled by 2008 and continues to increase [74]. Consistent with the expansion of the dairy industry, alfalfa yield nearly doubled since 1920, with an observed increasing trend that started in the 1950-1960s but has stabilized in recent years of the 21st century. Therefore, alfalfa historical yield trend was segmented into three separate data periods.
In contrast to other studies that used recent 30+ years of data [44], this analysis used 100 years of record, during which there was considerable interannual variability in yield that was not entirely due to environmental variables. Using the data segments for the recent past 35 years, as shown in this analysis, allowed for this interannual variability to be accounted for and integrated already matured agricultural production practices in the region (i.e., irrigation, technology adoption, varieties)-resulting in more accurate predictions using climate only variables. Therefore, models based on recent data segments should be used for future predictions.

Remote Sensing and Yield Prediction
The findings of this study agreed with those that indicated the advantages of using remote sensing data in combination with climate variables [93] for improved yield prediction because it allows growth variability during critical agronomic stages, and consequently yield, to be efficiently captured (Table 5 and Figure 8). These stages are those that are most sensitive to climate variability [38,[94][95][96][97]. Thus, it is important to select NDVI (or other similar remotely sensed data) timing that effectively corresponds with the development, reproduction, and grain filling stages. For example, after physiological maturity, corn leaves start to senescence, at which point NDVI declines, thus affecting its effectiveness in yield estimation. This study identified the appropriate timing of monthly NDVI that is effective in yield prediction for the three major crops in New Mexico. Based on evaluating the different months during a growing season (between April and September) for annual crops (corn, sorghum, and wheat), the study showed that monthly NDVI (32-day average) was able to capture the variations in short development stages (e.g., flowering), along with the effects of climate variability on crop yield with low RMSE and high R 2 . It was challenging to develop an effective prediction model for alfalfa yield. This is because alfalfa is a perennial crop with multiple harvesting and planting dates; there are different varieties grown in different parts (counties) of NM with asynchronous planting and harvesting dates; NDVI values over specific alfalfa fields can have mixed signals from different cuts; and the timing of the 8-day NDVI can mismatch the planting and harvesting cycles (with values that are not always consistent with the same growth stage due to satellite overpass). The results can be improved by using specific harvesting information at the county level (instead of average NDVI values over crop mask) to capture the spatial and temporal variability. However, it should also be noted that crop-specific NDVI derived for nominal crop mask (Section 2.3) might not provide a pure response of individual crop and may include some spectral mixing from other crops and bare lands. Therefore, there is a need for more analysis to develop improved alfalfa yield prediction models.

Limitations and Future Directions
Statistical analysis allowed the development of empirical relationships between climate variables and crop yield and helped to identify the most effective climate variables that vary regionally. Future studies should investigate yield prediction using advanced techniques, such as machine learning algorithms. The deviation of crop yield trends from the linear response might be an indicator of nonlinear effects of climate variability on yield, as well as human factors. An improved understanding of the functional relationships between yield and climate variables is required for accurate predictions. Revealing such relationships requires comprehensive datasets that include those related to farming technology and management practices. The use of emerging new technologies, such as machine learning and deep neural networks, can allow big datasets to be synthesized and analyzed with high-performance computing [98][99][100][101], and would increase our capacity to predict crop yield more accurately [102].
The availability of high-resolution advanced remote sensing products (e.g., unmanned aerial system) can allow improvement of the selection of appropriate averaging periods of prediction variables [103]; help in capturing yield variability more effectively; and help in identifying important variables. In addition to the benefits of supplementing regression analysis with remote sensing variables, the remote sensing data have a limitation due to the nonexistence of datasets for future predictions.
The averaging periods and breakpoints identified in this study are applicable to NM. They might differ from other areas according to the region-specific yield trends [104]. Future use of averaging periods and piecewise regression analysis is only possible with the focus on the refinement of averaging periods based on crops and location. This can be challenging given the temporal diversity of the planting dates [105] and a proper knowledge about the breaks in linear trends relative to historical socioeconomic factors.

Conclusions
This study focused on developing statistical regression models to determine important climate and remotely sensed variables for crop yield prediction in New Mexico. Crop yields were best correlated by climate variable using linear regression models. The study evaluated three model combinations that included the use of climate only, climate and NDVI, and NDVI only to predict the yields for corn, sorghum, and wheat. The most effective prediction models are summarized in Table 7. The results indicated that the use of NDVI only is less effective in predicting crop yields. The combination of climate and NDVI variables provided better predictions compared to the use of NDVI only to predict wheat and corn yields. The models highlighted in bold in Table 7 are those considered appropriate for yield prediction for these crops in New Mexico. The findings of the analysis add to a vast literature that concluded that remote sensing data can provide improved results when combined with climate compared to using climate only for yield prediction. Moreover, yield predictions can be more accurate with the use of shorter data periods that are based on region-specific trends. The identification of the most important climate variables and accurate yield prediction pertaining to New Mexico's agricultural systems can aid the state in developing climate change mitigation and adaptation strategies to enhance the sustainability of these systems.

Acknowledgments:
The authors would like to thank two anonymous reviewers for their time and effort in providing valuable comments and suggestions that helped in improving this manuscript.

Conflicts of Interest:
The authors declare no conflict of interest." The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Appendix A
The Appendix A provides a summary of evaluated regression models in an alphanumerical format, in which the letter represents crop, and the number represents the order of the model for different climate variable combinations. This table included 11 models for Land 2021, 10, 1389 21 of 27 sorghum (i.e., S1-S11), 15 models for corn (i.e., C1-C15), 16 models for alfalfa (i.e., A1-A16), and 10 models for wheat (i.e., W-1-W10).

Appendix B
Appendix B provides a list of all models evaluated using segmented data periods for each crop. The model names are presented in an alphanumerical format. The first two letters represent crop type and segmented period, and the number presents the order of the model using step by step addition of climate variables.

Appendix C
Appendix C provides a list of climate and NDVI regression models in alphanumerical format, in which the letter represents the crop type, and the number represents the order of the model, and the letter 'N' represents NDVI. The regression models include both long and segmented data period models. The models constructed for segmented data period are named as crop type, time segment, model order, and NDVI (e.g., CT11N8 is the model number 11 developed for corn segmented data period using climate and NDVI of the month August).