Yield Prediction Modeling for Sorghum–Sudangrass Hybrid Based on Climatic, Soil, and Cultivar Data in the Republic of Korea

The objective of this study was to construct a sorghum–sudangrass hybrid (SSH) yield prediction model based on climatic, soil, and cultivar information in the southern area of the Korean Peninsula. Besides, the effects of climatic factors on SSH yield were investigated simultaneously. The SSH dataset (n = 105), including Dry Matter Yield (DMY, kg/ha), Seeding-Harvest Accumulated Temperature (SHaAT, ◦C), Seeding–Harvest Accumulated Precipitation (SHAP, mm), Seeding–Harvest Sunshine Duration (SHSD, h), Soil Suitability Score (SSS), and cultivar maturity information, was developed for model construction. Subsequently, using general linear modeling method, the SSH yield prediction model was constructed as follows: DMY = 6.5SHaAT – 4.9SHAP + 13.8SHSD – 54.4SSS – 1036.4 + Maturity. The impacts of the accumulated thermal climatic variables and accumulated precipitation during crop growth on the variance of SSH yield in this region were confirmed. The summer-concentrated precipitation in the southern area of the Korean Peninsula exceeded the proper range of SSH water requirement and led to stresses to its yield production. Furthermore, to improve the data quality for high fitness model construction, the standard schedule for forage crop cultivation experiment in this region was recommended to be developed, especially under the data requirement in the context of the big data era.


Introduction
Sorghum-sudangrass hybrid (Sorghum bicolor × Sorghum bicolor var. sudanense, SSH) is a representative summer forage crop in the southern area of the Korean Peninsula [1]. In this area, SSH was seeded during late April to May and its growth duration generally lasted until late September or early October [1]. The sunlight and soil moisture could be efficiently utilized by SSH to produce biomass, and SSH is not strict to edaphic requirement, high yield could be achieved with enough water and fertilizer in a thick soil layer [2]. A high portion of crude protein and extract ether is contained in SSH, meanwhile, the contents of easy digestible cellulose and semi-cellulose are high, and the lignin ratio is low [3]. These characteristics indicate that SSH is suitable for making silage to livestock to improve the digestibility substantially and subsequently raise its economic value [4,5]. At the same time, the requirement of high-quality domestic forage is drawing great concerns since importing forages costs a lot, and the long-distance transportation may lead to instability of forage quality. In recent years, the SSH cultivated area has ranked the first position among the summer forage crops in this region [1].
As the summer forage crop with the largest cultivated area in this region, its sustainable production is considered important for the supply of high-quality forage to the Korean livestock industry. For the time being, the sustainable development of the forage production sector brings the requirement of reducing the carbon footprint during its production. With the development of precision agriculture under the context of the big data era, crop yield modeling using environmental big data was considered as a useful tool in the promotion of agricultural management practices and optimization of cropping systems [6][7][8]. Crop model, remote sensing, and empirical modeling using statistical methods are the main tools for crop yield prediction. Crop models need numerous data items, which are hard to obtain in actual utilization [9]. Remote sensing is proper for yield estimation with large contiguous cultivation areas [10,11]. In the southern area of the Korean Peninsula, the scale of forage crop farmers is small and geographically scattered. Thereafter, the statistical method was taken for SSH yield modeling in this study.
Climatic and soil factors are considered as the main environmental causes of crop yield development and variability [12,13]. Climatic factors affect the phenological process of vegetation and also determine the yields [14]. Worldwide meteorological instability has been disturbing the sustainability and stability of crop production, and eventually threaten the supply safety of forage sources, especially in the perspective of the forage importers in the context of climate change [15,16]. Against this background, weather-crop yield modeling considering climatic factors could be a sufficient mean to understand agricultural productivity [17,18]. Studies on food crops yield prediction, such as rice [19] and grain maize [20] in the southern area of the Korean Peninsula, have been actively performed. Similar studies were also carried out in economic plants, such as Chinese cabbage [21] and apple [22], in this region. For forage crops cultivated in this region, yield modeling studies based on climatic data were carried out on whole crop rye [23], Italian ryegrass [24], and whole crop maize [25]. However, no research expounding climatic, soil, and cultivar data-based yield modeling of forage crops cultivated in the southern area of the Korean Peninsula was reported. Soil is the basic factor affecting the crop growth and yield production. Meanwhile, many new SSH cultivars were developed and introduced in Korea in recent years, and the cultivar maturity was considered a contributor to SSH yield variance [26]. Thus, this study was conducted to perform the SSH yield modeling using regional climatic, soil, and cultivar data. Besides yield prediction, weather-crop yield modeling could also investigate the effects of climatic factors on SSH yield simultaneously. The evaluation of the impacts of the detected driving climatic factors on yield is considered helpful to enhance the response capacities of SSH cultivation to the fluctuating meteorological conditions.

Crop Cultivation, Climatic, and Cultivar Data
The SSH cultivation data (n = 856), including dry matter yield (DMY), seeding dates, heading dates, harvest dates, cultivar information, and cultivated locations, was collected from the nationwide forage cultivation experiments operated by the National Agricultural Cooperative Federation of Korea and National Livestock Research Institute of Korea.
To generate the climatic variables for yield modeling, the raw meteorological data such as daily mean temperature, daily precipitation, and daily sunshine hours was collected based on the cultivated locations and dates in SSH cultivation dataset. Cumulative temperature, water, and solar radiation related variables during crop growth were considered as the main climatic factors affecting crop yield [27,28]. Thereafter, using the raw meteorological data, three climatic variables including Seeding-Harvest Accumulated Temperature (SHaAT, • C), Seeding-Harvest Sunshine Duration (SHSD, h), and Seeding-Harvest Accumulated Precipitation (SHAP, mm) were generated. The temperature related variable, SHaAT, assumed the daily mean temperature with 10 • C as the base temperature from the seeding date until the harvest date [29,30]. The solar radiation related variable, SHSD, assumed the daily sunshine hours from the seeding date until the harvest date [28]. The water related variable, SHAP, assumed the daily precipitation from the seeding date until the harvest date [28].
Furthermore, Seeding-Harvest Mean Ttemperature (SHMT, • C), which is the mean of the daily mean temperature from seeding date to harvest date, was also calculated to present the temperature situations during the SSH growth period in this region.
During the generation of the climatic variables, 641 data points that lacked seeding, heading, or harvest dates were eliminated. Besides those, 3 data points were deleted since their DMY values were detected as outliers under the normality assumption via box-plots using SPSS 24.0 (IBM Corp, Somers, NY, USA). In addition, 107 data points with no credible cultivar maturity information were eliminated. Therefore, an SSH dataset (n = 105) with yield values, 3 climatic variables (SHaAT, SHAP, and SHSD), and cultivar information ( Table 1) was generated. Table 1. Sorghum-sudangrass hybrid cultivar maturity groups.

Maturity Group Cultivars
Early-maturity

Generation of Soil Suitability Score
To quantitatively measure the effects of soil physical and chemical attributes on yield variance of SSH in the yield modeling, the calculation criteria for generating SSH soil suitability score (SSS) were developed [31,32] (Table 2). The criteria for calculating SSS include soil attributes, level score, and the weight of each soil attribute. The soil attributes included soil physical attributes (soil texture, drainage class, slope, effective soil depth, and gravel in top soil) and soil chemical attributes (acidity, salinity, and organic matter content). The level scores included improper (0), poor (0.5), possible (0.8), and proper (1). Each soil attribute was given a weight score. Acidity (pH) >7.5 or <4.5 4.5-6 6-7.5 10 Organic matter content (%) <0.5 0.5-1.5 >1. 5 15 Total score 100 According to the location recorded in the forage cultivation dataset, the soil data was gathered from the Korean soil information system, then the following equation derived from the above criteria was followed to calculate the scores: where G k is the level score, and W k is the weight score of the kth soil attribute in Table 2. Then, the generated soil scores were added to the SSH dataset for yield modeling.

Yield Modeling Method
Descriptive statistics of the response and explanatory variables were generated to check their distribution. Simultaneously, the VIF (variance inflation factor) values and the Pearson's, partial, and part correlation coefficients of SHaAT, SHAP, and SHSD, when running enter approach of regression analysis, were calculated to detect the existence of multicollinearity among the climatic explanatory variables [33].
The general linear regression model, including both continuous variables and the dummy variable, was utilized to construct the SSH yield prediction model. The equation of the general linear regression model is as follows: where Y is the vector of the response variable (DMY), X is the matrix of the continuous explanatory variables (SHaAT, SHAP, SHSD, and SSS), and Z is the matrix of the dummy variable, which is the cultivar maturity (early-, medium-, and late-maturities). Partial eta-squared of each explanatory variable in the model was calculated to evaluate their effect sizes.
To check the fitness of the model, residual diagnostics, including the normal quantile-quantile plot (Q-Q plot) of the standardized residuals and the scatter plot indicating the standardized residuals against the predicted DMY values, were performed. The 3-fold cross-validation method was applied to measure the accuracy of the constructed model. In 3-fold cross-validation, the SSH dataset was randomly divided into 3 sub datasets. Each of the sub datasets was used as the test set once with the rest 2 sub datasets used as a training set in the 3 times of validation. The R-squared and the normalized root-mean-square error (NRMSE) were utilized as accuracy indicators. The calculation equation of NRMSE and the judgement criteria followed the previous studies [34]. Microsoft Excel 2010 (Microsoft Corp, Redmond, WA, USA) was used to prepare the datasets, and SPSS 24.0 was adopted for the statistical analyses.

Results and Discussion
Descriptive statistics of DMY, SHaAT, SHAP, and SHSD were calculated ( Table 3). The mean and median of DMY had no large difference, and the difference between mean and the first quartile had no large gap with the difference between the mean and the third quartile. It could be considered that DMY was symmetrically distributed, which means it could be used as the response variable in the following general linear regression analysis. For the climatic variables used for SSH yield modeling, none of the variables were considered to have serious multicollinearity based on their VIF values which were much less than 10 (Table 4) [35]. In the meantime, by investigating the plus-minus signs of the magnitudes of Pearson's, partial, and part correlation coefficients of the variables, the independence of SHaAT, SHAP, and SHSD could be further recognized [35]. As presented in Table 4, no changes in the plus-minus signs of the three correlation coefficients of each of the three variables further confirmed no serious multicollinearity among the three climatic variables. Therefore, the three variables SHaAT, SHAP, and SHSD were confirmed as independent climatic variables for SSH yield prediction modeling with SSS and cultivar maturity.
where, for a cultivar, the Maturity predicator in the model should be substituted by the constant value, which was calculated as shown in Table 5. SHaAT and SHSD were found positive effects on the variance of DMY of SSH. The thermal temperature and solar radiation during the growth period of SSH had important effects on its yield [36,37]. The selection of SHaAT and SHSD supported that selecting proper seeding and harvest dates in the southern area of the Korean Peninsula to ensure sufficient accumulated temperature and light for SSH yield development is important in management since the rainfall in this region mostly concentrated in summer, which partially overlays its growth period.
The suitable precipitation range during the growth period of SSH is 500-800 mm [29], more or less precipitation might lead to yield production stresses. However, in the southern area of the Korean Peninsula, precipitation is concentrated in summer, which is the growing period of SSH. In the SSH dataset, the first quartile of SHAP is 701.6 mm (Table 1), at the same time, the mean (933.8 mm) and median (930.0 mm) of SHAP are all over 800 mm. Over 63% of the SHAP values in the SSH dataset were over 800 mm (Figure 1). Thereafter, the over-demanded precipitation leads to stresses to the yield development of SSH in this region. This was expressed as the minus signs of SHAP. The coefficient values of cultivar maturities (early: 2242.2, medium: 1470.9, and late: 0) also indicated that, as the growth period of SSH gets longer, the yield of SSH get more negative effects since the crop exposed longer period to the concentrated precipitation. Cultivars with strong waterlogging tolerance and upland fields with good drainage conditions were recommended for the cultivation of SSH in this area. The values of the partial eta-squared of SHaAT, SHAP, and SHSD were 0.040, 0.034, and 0.033 (Table 5), respectively. SSH is a C4 forage crop, therefore, its growth and development need sufficient thermal and moisture conditions. The suitable growth temperature of SSH is 24-33°C [38]. As presented in Table 3, the mean, median, and the first and third quartiles of SHMT values were all near the optimum temperature of the growth of SSH. Comparing with the SHAP conditions, the higher partial eta-squared value of SHaAT than the value of SHAP in the model could be interpreted.
The adjusted R-squared of the constructed model was 16.6%, which was lower than anticipated. Though the linear model has its advantages in yield prediction modeling under the context of insufficient data record conditions, it has its natural limitations on model predictability as well [37]. Except for the limitations from the modeling method itself, the following reasons were also considered as causes affecting the modeling results. Firstly, limited to the soil data record condition, soil attributes which are more directly related to SSH yield, such as total nitrogen, alkali-hydrolysis nitrogen, available phosphorus, and available potassium, were not measured. Secondly, converting the eight soil fertility attributes into one measure might cause information loss and subsequently lead to the shrink of its contribution to yield prediction (Table 5). Thirdly, management information was not considered in the modeling since the data was not well recorded. To overcome the mentioned points, the standard schedule for forage crop cultivation experiment was recommended to be developed, especially for the data required under the context of the big data era. Though, a lot of field studies were performed in the southern area of the Korean Peninsula, the data record conditions were not under a proper status [19,20]. As mentioned above, the dates related to SSH phenological phase were not well recorded, which led to the elimination of 641 data points during the preparation of the SSH dataset. Meanwhile, the cultivar maturity information was not sufficient and this led to the deletion of 107 data points in the SSH dataset. What's more, the research investment in this sector was not paid enough attention since the forage production sector in Korea is not large. Rare forage cultivation experiments performed and reported the soil attributes of the experimental sites. This led to the low contribution of SSS since the soil data were regional mean values obtained from the Korean soil information system. Thereafter, it is recommended to develop a standard protocol for forage crop cultivation experiments to accumulate high-quality data for future crop yield projection.
The points in the Q-Q plot (Figure 2) are arranging well on the 45-degree line. At the same time, the points around the zero line do not present a particular pattern in the scatter plot (Figure 3). The results of the residual diagnostics from the two plots confirmed the fitness of the model. The results of the 3-fold cross-validation ( Figure 4) were segregated into training and test sets in each subfigure. The values of the partial eta-squared of SHaAT, SHAP, and SHSD were 0.040, 0.034, and 0.033 (Table 5), respectively. SSH is a C4 forage crop, therefore, its growth and development need sufficient thermal and moisture conditions. The suitable growth temperature of SSH is 24-33 • C [38]. As presented in Table 3, the mean, median, and the first and third quartiles of SHMT values were all near the optimum temperature of the growth of SSH. Comparing with the SHAP conditions, the higher partial eta-squared value of SHaAT than the value of SHAP in the model could be interpreted.
The adjusted R-squared of the constructed model was 16.6%, which was lower than anticipated. Though the linear model has its advantages in yield prediction modeling under the context of insufficient data record conditions, it has its natural limitations on model predictability as well [37]. Except for the limitations from the modeling method itself, the following reasons were also considered as causes affecting the modeling results. Firstly, limited to the soil data record condition, soil attributes which are more directly related to SSH yield, such as total nitrogen, alkali-hydrolysis nitrogen, available phosphorus, and available potassium, were not measured. Secondly, converting the eight soil fertility attributes into one measure might cause information loss and subsequently lead to the shrink of its contribution to yield prediction (Table 5). Thirdly, management information was not considered in the modeling since the data was not well recorded. To overcome the mentioned points, the standard schedule for forage crop cultivation experiment was recommended to be developed, especially for the data required under the context of the big data era. Though, a lot of field studies were performed in the southern area of the Korean Peninsula, the data record conditions were not under a proper status [19,20]. As mentioned above, the dates related to SSH phenological phase were not well recorded, which led to the elimination of 641 data points during the preparation of the SSH dataset. Meanwhile, the cultivar maturity information was not sufficient and this led to the deletion of 107 data points in the SSH dataset. What's more, the research investment in this sector was not paid enough attention since the forage production sector in Korea is not large. Rare forage cultivation experiments performed and reported the soil attributes of the experimental sites. This led to the low contribution of SSS since the soil data were regional mean values obtained from the Korean soil information system. Thereafter, it is recommended to develop a standard protocol for forage crop cultivation experiments to accumulate high-quality data for future crop yield projection.
The points in the Q-Q plot ( Figure 2) are arranging well on the 45-degree line. At the same time, the points around the zero line do not present a particular pattern in the scatter plot ( Figure 3). The results of the residual diagnostics from the two plots confirmed the fitness of the model. The results of the 3-fold cross-validation ( Figure 4) were segregated into training and test sets in each subfigure. The average coefficient of determination (R 2 fit) and the average NRMSE (NRMSE fit) of the 3 training sets were 0.23 and 30.8%, respectively; while for the 3 test sets, the average coefficient of determination (R 2 val) and the average NRMSE (NRMSE val) were 0.16 and 33.3%, respectively. These results indicated that the accuracy of this model was at the common level.
Agriculture 2020, 10, x FOR PEER REVIEW 7 of 11 The average coefficient of determination (R 2 fit) and the average NRMSE (NRMSE fit) of the 3 training sets were 0.23 and 30.8%, respectively; while for the 3 test sets, the average coefficient of determination (R 2 val) and the average NRMSE (NRMSE val) were 0.16 and 33.3%, respectively. These results indicated that the accuracy of this model was at the common level.   The average coefficient of determination (R 2 fit) and the average NRMSE (NRMSE fit) of the 3 training sets were 0.23 and 30.8%, respectively; while for the 3 test sets, the average coefficient of determination (R 2 val) and the average NRMSE (NRMSE val) were 0.16 and 33.3%, respectively. These results indicated that the accuracy of this model was at the common level.

Conclusions
In this study, a yield prediction model for sorghum-sudangrass hybrid using climatic, soil, and cultivar data in the southern area of the Korean Peninsula was constructed. The impacts of the accumulated thermal climatic variables and accumulated precipitation during crop growth on the

Conclusions
In this study, a yield prediction model for sorghum-sudangrass hybrid using climatic, soil, and cultivar data in the southern area of the Korean Peninsula was constructed. The impacts of the accumulated thermal climatic variables and accumulated precipitation during crop growth on the variance of SSH yield in this region were confirmed. The summer-concentrated precipitation in the southern area of the Korean Peninsula exceeded the proper range of sorghum-sudangrass hybrid water requirement and led to stresses to its yield production. Furthermore, to improve the data quality for high fitness model construction, the standard schedule for forage crop cultivation experiment in this region was recommended to be developed, especially under the data required on the context of the big data era.