You are currently viewing a new version of our website. To view the old version click .
Applied Sciences
  • Article
  • Open Access

19 March 2020

Prediction of Weights during Growth Stages of Onion Using Agricultural Data Analysis Method

,
,
,
and
1
Department of Statistics, Chonnam National University, Gwangju 61186, Korea
2
Resource Management Office, Jeollanamdo Agricultural Research and Extension Services, Jeollanamdo 58213, Korea
3
Agriculture Bigdata Team, Rural Development Administration, Jeonbuk 54875, Korea
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Big Data Analysis and Visualization

Abstract

In this study, we propose a new agricultural data analysis method that can predict the weight during the growth stages of the field onion using a functional regression model. We have used onion weight on growth stages as the response variable and six environmental factors such as average temperature, average ground temperature, rainfall, wind speed, sunshine, and humidity as the explanatory variables in the functional regression model. We then define a least minimum integral squared residual (LMISE) measure to obtain an estimate of the function regression coefficient. In addition, a principal component regression analysis was applied to derive the estimates that minimize the defined measures. Next, to evaluate the performance of the proposed model, data were collected, and the following results were identified through analyses of the collected data. First, through graphical and correlation analysis, the ground temperature, mean temperature, and humidity have a very significant effect on the onion weights, but environmental factors such as wind speed, sunshine, and rainfall have a small negative effect on onion weights. Second, through functional regression analysis, we can determine that the ground temperature, sunshine, and precipitation have a significant effect on onion growth and are essential in the goodness-of-fit test. On the other hand, wind speed, mean temperature, and humidity did not significantly affect onion growth. In conclusion, to promote onion growth, the appropriate ground temperature and amount of sunshine are essential, the rainfall and the humidity must be low, and the appropriate wind or mean temperature must be maintained.

1. Introduction

In general, onions produced in various parts of Korea are highly value-added vegetables, as they are used not only for various dishes but also as part of a healthy diet. Therefore, farmers who cultivate vegetables in the field are concerned with cultivation strategies that can improve the vegetable yields. In addition, the related agencies, such as the Rural Development Administration, devote much attention to developing onion cultivation techniques to address the farmers’ needs [1].
However, the onion growth among vegetables is especially affected by the change of weather. In general, onions have the characteristic of being well grown in low temperature and dry weather. Because of this characteristic, if rainy weather prevails, onion yields drop significantly, whereas onion yields increase considerably in predominantly sunny weather. Therefore, if onion production is excessively increased, onion prices will drop, whereas prices will be excessively high if production is low. For this reason, maintaining an appropriate level of production in government offices and farms is vital.
Hence, we are going to examine how the production of onion grown in the field is affected by various weather conditions and environmental factors and try to develop the best farming strategy based on them. To get the right answers to these problems, we first examine both the mathematical theory and application of the functional regression model, which is the most suitable statistical model for analyzing time series data collected directly from the agricultural field. Second, on the basis of onion data collected from various regions, we examine the effects of various environmental factors and climatic conditions on the onion yields using the functional regression model. Finally, we would like to propose an optimal farming strategy that can maintain appropriate levels of the various environmental factors and weather.

3. Dataset and Method

3.1. Dataset

Here we are going to consider how to grow onions, which are popular among Koreans and used in various dishes, to improve onion growth. Generally, onions are sown in mid-August to mid-September, planted quite early in October to early November, and harvested between May and June of the following year. Figure 1 shows the onion growing step by step [11].
Figure 1. Onion growing states in the cultivation process.
In this study, we used datasets of the various environmental factors and the weights of onion during growth stages collected from farmers in several regions of Korea from March 2019 to June 2019. We used six factors such as mean wind speed, mean temperature, mean ground temperature, mean humidity, daily sunshine, and daily rainfall as the explanatory variables of the model, and the onion weight growth stage was used as a response variable. A description of each variable is given in Table 1 below.
Table 1. The response variable and environmental variables used in data set.
We collected a total of 178 observations from 28 farms and used them in the experiment. During the observation period, environmental factor data were used from the meteorological office data observed by region, and the onion weight was obtained from the measurements by the staff of the RDA (Rural Development Administration). Of the data collected from 28 farms, a total of 63 observations, repeated 9 times in 7 farms with relatively good measurements were used in the experiment. Here, each environmental variable was measured for 7 days and repeated for 9 weeks. Onion weight was measured 9 times from 7 farms during the same period. Finally, the data of environmental factors consisted of data obtained from measuring six factors through nine repeated measurements each week for 7 days for 7 objects. Therefore, the environmental factors are given in the form of a 64 × 35 matrix. In addition, data on onion weights were obtained from nine replicate measurements of 7 farms. Thus, onion weight data is given in a matrix of 9 × 7 . Table 2 and Table 3 below show some part of the data on average wind speed among the six environmental factors as well as onion weight observed during the growth period.
Table 2. Example of input data of environmental variables of growth period.
Table 3. Example of input data of onion weight of growth period.

3.2. Method

We considered a functional sequence regression model with p functional covariates to determine how p environmental factors influence onion weights n over time [12,13,14]. This model is defined as follows:
y i ( t ) = α ( t ) + k = 1 p s = 1 q β k ( s , t ) x i k ( s )   + ϵ i ( t ) , i = 1 , , n
where n is the number of observations, p is the number of functional covariates, y i ( t ) is onion weight in time t , x i k ( s ) is the functional covariate, α ( t ) is the mean functional, β k ( s , t ) is the regression functional for the k –th covariate, and ϵ i ( t ) is a random error function.
In this case, to derive the estimators of the function sequence parameter β k ( s , t ) of the proposed model, we use a criterion of the least minimum integral squared residual (LMISE) defined as:
LMISE = | | y y ^ | | 2 = i = 1 n T Y [ y i ( t ) ( α ( t ) + k = 1 p s = 1 q x i k ( s ) β k ( s , t ) ) ] 2 d t .
Here, we have applied the functional principal component analysis to obtain estimators of the functional regression parameters that can minimize the LMISE measure defined on above. In this case, we center the explanatory variables x i k and response variable y i as follows to remove the α ( t ) -intercept term of the model given in (1) above.
y i * ( t ) = y i ( t ) y ¯ ( t ) ,   x i k * ( s ) = x i k ( s ) x ¯ k ( s ) ,   k = 1 , , p
First, we express the x i k * s and y i * in terms of finite sum using the basic functions ϕ j k s and   ψ l as follows:
x i k * ( s ) = j = 1 J c i j k ϕ j k = c i k T ϕ k ,   k = 1 , , p ,
and
y i * ( t ) = l = 1 L d i l ψ l = d i T ψ
where ϕ and ψ are basis vector functions of size J and L, respectively. Furthermore, if we denote the coefficient matrix of these basis function vectors as C k and D , then we can write these expressions in the following matrix function form:
X k * = C k ϕ ,       y * = D ψ .
Here, we can express the regression functional coefficient β k ( s , t ) as a double sum as expansion:
β k ( s , t ) = j = 1 J l = 1 L b j l k ϕ j k ( s ) ψ l ( t ) = ϕ k T ( s ) B k ψ
where B k is a ( J × L ) matrix of coefficients b j l k , or, more compactly, as β k ( s , t ) = ϕ k T B k ψ .
We define J ϕ k and J ψ as the matrices of inner products between the elements of the ϕ k and ψ bases, respectively. Then, we have the following expression:
J ϕ k = I s ϕ k ( s ) ϕ k ( s ) T d s ,         J ψ = I s ψ ( s ) ψ ( s ) T d s .
In addition, if we substitute, respectively, the sum expressions given in Equations (4) and (7) for x i k * and β k in Equation (1), we can obtain:
y ^ * ( t ) = k = 1 p I s C k ϕ k ( s ) ϕ k T ( s ) B k ψ ( t ) d s   = k = 1 p C k J ϕ k B k ψ ( t ) .
If we denote D ^ to be the matrix of coefficients of the basis expansion of the vector of predictors y ^ * (corresponding to the matrix D for the vector y * ), we obtain the following matrix form for the estimated model:
D ^ = k = 1 p C k J ϕ k B k
Therefore, we can get a matrix form for the integrated squared residual:
y ^ i y i   2 =     y ^ i * y i *   2 = ( ( D ^ D ) J ψ ( D ^ D ) T ) i i
and, finally, a criterion of the least minimum integral squared residual (LMISE)
LMISE ( B k ) =   trace ( ( D ^ D ) J ψ ( D ^ D ) T )
is given by a sum of quadratic forms in the unknown coefficient matrices B k .
Furthermore, we are going to consider the minimization of the criterion LMISE ( B k ) given at Formula (12). In this case, if J ϕ k and J ψ are identity matrices, and the matrix B k will minimize the Formula (12) if and only if
C k T C k B k =   C k T D
so that
B k = ( C k T C k ) 1 C k T D .
The matrix B k is easily found by using the singular value decomposition (SVD) of C k . Here, we have obtained C k = U Δ C k V T , where Δ C k is a diagonal matrix with strictly positive diagonal elements and U and V have orthogonal columns. Then,
C k T C k = V Δ C k 2 V T ,
and hence the Moore–Penrose g-inverse of C k T C k is V Δ C k 2 V T . Substituting it into (13) gives us the following equation:
B k = V Δ C k 1 U T D
Therefore, if we substitute B k into β k ( s , t ) in Equation (7), we get the following functional estimates for functional regression coefficients:
β ^ k ( s , t ) = ϕ k T ( s , t ) B ^ k ψ
Finally, we can obtain the following predicted equation for response values:
y ^ ( t ) = y ¯ ( t ) + k = 1 p I s x i k ( s ) β ^ k ( s , t ) d s

4. Experimental Results

4.1. Graphical Analysis

First, we plotted a line graph of the onion weights collected from seven farmers over nine weeks from transplantation to harvest. From Figure 2, we can see that the onion weight increases over time.
Figure 2. Line graphs of weight and six environmental factors.
Second, a two-dimensional scatter plot was plotted to graphically determine the relationship between the onion weights and the mean of six environmental factors during the cultivated interval. Figure 3 shows the relationship between the onion weights and six environmental factors. As a result of Figure 3, we can see first that the average temperature and average ground temperature have a high positive correlation with the onion weights. It is evident that humidity has a small positive correlation with onion weight. Finally, three environmental variables such as wind speed, sunshine, and rainfall have no relation to onion weight.
Figure 3. Relationship between onion weights and six environmental factors.
Next, we calculated the correlation coefficient between each variable to statistically confirm the results derived so far. Table 4 shows the correlation coefficients between these variables. Given the results in Table 4, we can equally confirm the relationship between the variables discussed above.
Table 4. Correlation coefficients between onion weight and six environmental factors.

4.2. Functional Data Analysis

We have applied a functional regression model with functional covariates and functional responses to see how the six environmental factors affect the onion weights over growing time. Here, to perform a functional data analysis on onion data, we have used the “fda” package in the R-software.
First, we conducted a functional data analysis to determine how six environmental variables influence the onion weights. Figure 4 shows the time series properties of the functional regression coefficients of each environmental factor during onion cultivation. In Figure 4, we first see that wind speed has a positive effect on the onion weights from the beginning to the middle, has a negative effect beyond the midpoint, and then has a positive effect again after the third quarter. Second, both the ground and mean temperature have a similarly positive effect in the early stage of onion weight growth, a negative effect at the midpoint, and a positive effect in the later growth period. Therefore, the ground temperature and average temperatures have a similar effect on onion radish growth, but the ground temperature has a relatively large magnitude in terms of influence. Third, humidity has a negative effect on onion weight throughout the growing season, but its impact is not significant. Fourth, the amount of sunshine has a positive effect early in the growth period of onions, a negative effect from the beginning to the middle, and a positive effect after the middle. Finally, rainfall has alternating positive and negative effects during onion growth, but its impact is not significant.
Figure 4. Regression coefficients in the prediction of onion weights from six environmental variables. (a) Wind speed, (b) mean temperature, (c) ground temperature, (d) humidity, (e) sunshine, (f) rainfall.
Second, we calculated the coefficients of determination ( R 2 ) representing the explanatory power for the onion weights when environmental variables were used individually and when they were used together. We also calculated the F-test statistic to test the goodness-of-fit of each environmental variable for onion weight. Table 5 shows both coefficients of determination, R 2 represents the explanatory power of the onion weight for each environmental variable and the F-statistics used in the goodness-of-fit test. From Table 5 we can see that our results are similar to those above. We can also observe that if all environmental factors are used, the value of R 2 is the highest and the value of F-statistic is the largest. Additionally, individually, the value of R 2 and the F-statistic of ground temperature, sunshine, and rainfall is high. On the other hand, the value of R 2 and F-statistics of wind speed, mean temperature, and humidity are lower.
Table 5. Coefficients of determinations and F-statistics for six environmental factors.
Third, we graphically plot the actual observations of onion weights and the predicted values of each environmental variable to determine the predictive power of the six environmental variables. Figure 5 shows the relationship between actual observations and predicted values.
Figure 5. Plots for actual observations and predicted values. (a) Actual values for onion weights, (b) predicted values for weights by wind speed, (c) predicted values for weights by mean temperature, (d) predicted values for weights by ground temperature, (e) predicted values for weights by humidity, (f) predicted values for weights by sunshine, (g) predicted values for weights by rain, (h) predicted values for weights by all variables.
We also calculated a Root Mean Square Error (RMSE) to numerically measure the predictive power of each environmental variable. Table 6 shows the RMSE values for six environmental variables.
Table 6. RMSE for six environmental factors and all variables.
From Figure 5 and Table 6, we can get the following results. First, all environmental variables are included in the model, and have the most predictive power. Second, among the individual environmental variables, ground temperature, sunshine, and rainfall exhibited high predictive power. Third, wind speed, mean temperature, and humidity showed the lowest predictive power.
To summarize, we discuss the following results. First, we first find that both the ground and mean temperature have a high positive correlation with the onions weights. It can also be seen that humidity has a small positive correlation with the onion weights. Finally, three of the environmental variables, such as wind speed, sunshine, and rainfall, have a small negative correlation with onion weight. Second, we can confirm that ground temperature, sunshine, and precipitation have a significant effect on onion growth and are very significant in the goodness-of-fit test. On the other hand, mean temperature, wind speed, and humidity did not significantly affect onion growth.
In conclusion, to promote onion growth, the appropriate ground temperature and amount of sunshine are essential, rainfall and humidity must be low, and appropriate wind or mean temperature must be maintained.

5. Conclusions

In the study, we applied a statistical functional regression model to investigate the relationship between various environmental factors and onion weights during the growing season. To solve this problem, we performed the following two tasks. In the first one, we identified the six most important environmental factors among those that affect onion weight during the growing season. In the second, we proposed an optimal cultivation strategy that could suggest how to manage the six identified environmental factors to maximize the onion weights.
From the analysis results so far, we could note the following facts. First, through the graphical and the correlation analysis, we can see that ground temperature, mean temperature, and humidity are positively correlated with onion weights, while wind speed, sunshine, and rainfall have a small negative correlation with the onion weights. Second, from the functional regression analysis for six environmental variables and onion weight, we note that the ground temperature, sunshine, and rainfall have a statistically significant effect on the onion weights, but other environmental factors such as wind speed, mean temperature, and humidity have little effect on the onion weights. In conclusion, to promote onion weights, the appropriate ground temperature, amount of sunshine, wind speed, and mean temperature must be maintained, and rainfall and the humidity must be low.
Future works should utilize functional data analysis to investigate how these environmental factors affect onion yields. In addition, an overall study is needed to understand how environmental and growth factors and weights and yields of onions affect each other.

Author Contributions

Conceptualization, methodology and software, M.H.N.; validation and investigation, D.H.K.; writing—original draft preparation, Y.P.; formal analysis and writing—review and editing, W.C.; supervision and project administration, Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

This paper was partially supported by the Research Program of the RDA (Rural Development Administration) (Project No. PJ0138672020) and the Korea National Research Foundation (Project No. 2017R1D1A1B03028808) of the Korea Grant funded by the Korean Government.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Hwang, J.H. Guide to Agricultural Management: Onion Business Management; Report of Korea Rural Development Administration; Korea Rural Development Administration: Jeonju, Korea, 2015. (In Korean) [Google Scholar]
  2. Kamilaries, A.; Prenafeta-Boldu, F.X. Deep learning in agriculture: A survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef]
  3. Manikandan, K.; Vethamoni, P.I. A review: Crop modeling in vegetable crops. J. Pharmacogn. Phytochem. 2017, 6, 1006–1009. [Google Scholar]
  4. Bhange, T.; Shekapure, S.; Pawar, K.; Choudhari, H. Survey Paper on Prediction of Crop yield and Suitable Crop. India Int. J. Innov. Res. Sci. Eng. Technol. 2019, 8, 5791–5795. [Google Scholar]
  5. Mythra, N.; Velayudham, A.; Shamila, E.S.; Pavithra, M. A Survey on Crop Yield Prediction using Data Mining. Int. J. Comput. Trends Technol. (IJCTT) 2018, 65, 1–7. [Google Scholar]
  6. Sellam, V.; Poovammal, E. Prediction of Crop Yield using Regression Analysis. Indian J. Sci. Technol. 2016, 9, 5. [Google Scholar] [CrossRef]
  7. Abdelkhalik, A.; Pascual, B.; Najera, I.; Baixaulli, C.; Pascual-Seva, N. Regulated Deficit Irrigation as a Water-Saving Strategy for Onion Cultivation in Mediterranean Conditions. Agronomy 2019, 9, 521. [Google Scholar] [CrossRef]
  8. Maskey, M.L.; Pathak, T.B.; Dara, S.K. Weather Based Strawberry Yield Forecasts at Field Scale Using Statistical and Machine Learning Models. Atmosphere 2019, 10, 378. [Google Scholar] [CrossRef]
  9. Rathod, S.; Mishra, G.C. Statistical Models for Forecasting Mango and Banana Yield of Karnataka, India. J. Agric. Sci. Technol. 2018, 20, 803–816. [Google Scholar]
  10. Villiers, M.D. Predicting Tomato Crop Yield from Weather Data Using Statistical Learning Techniques. Master’s Thesis, Commerce in Mathematical Statistics in the Faculty of Economic and Management Sciences at Stellenbosch University, Stellenbosch, South Africa, 2017. [Google Scholar]
  11. Oscar, I. Start Onion Farming-The Complete Guide. 2018. Available online: https://farmingmethod.com/onion-farming-guide/ (accessed on 25 November 2019).
  12. Ramsay, J.O.; Silverman, B.W. Functional Data Analysis; Springer: New York, NY, USA, 1997. [Google Scholar]
  13. Ramsay, J.O.; Hooker, G.; Graves, S. Functional Data Analysis with R and MATLAB; Springer: Dordrecht, The Netherlands; Heidelberg, Germany; London, UK; New York, NY, USA, 2009. [Google Scholar]
  14. Greven, S.; Scheipl, F. A general framework for functional regression modelling. Stat. Model. 2017, 17, 1–35. [Google Scholar] [CrossRef]

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.