Stochastic Analysis and Neural Network-Based Yield Prediction with Precision Agriculture

: In this paper, we propose a general mathematical model for analyzing yield data. The data analyzed in this paper come from a characteristic corn ﬁeld in the upper midwestern United States. We derive expressions for statistical moments from the underlying stochastic model. Consequently, we illustrate how a particular feature variable contributes to the statistical moments (and in effect, the characteristic function) of the target variable (i.e., yield). We also analyze the data with neural network techniques and provide two methods of data analysis. This mathematical model and neural network-based data analysis allow for better understanding of the variability within the data set, which is useful to farm managers attempting to make current and future decisions using the yield data. Lenders and risk management consultants may beneﬁt from the insights of this mathematical model and neural network-based data analysis regarding yield expectations.


Introduction
The International Society of Precision Agriculture adopted the following definition of precision agriculture in 2019 (see The International Society of Precision Agriculture (n.d.)): "Precision Agriculture is a management strategy that gathers, processes, and analyzes temporal, spatial, and individual data and combines it with other information to support management decisions according to estimated variability for improved resource using efficiency, productivity, quality, profitability, and sustainability of agricultural production". Precision agriculture has emerged as a central tool to address current challenges in agricultural sustainability and profitability. Various methodologies of data-science, such as machine learning, have been implemented with this cutting edge technology. With the use of artificial intelligence and data-science, farmer managers can get the precise data that convey all the information related to the optimum health and productivity of the crops, thereby enabling informed decision-making.
There is an emerging literature on applications of data-science to agriculture. In Lemley et al. (2017), machine learning and deep learning techniques are implemented to solve both precision agriculture related problems and build better, smarter consumer devices and services. In Sharma et al. (2021), a systematic review of various machine learning applications in the field of agriculture is provided. The prediction of soil parameters such as organic carbon and moisture content, crop yield prediction, disease and weed detection in crops and species detection is presented. In addition, this paper demonstrates how knowledge-based agriculture can improve the sustainable productivity and quality of the product. In Bauer et al. (2019), the authors present an automated and open-source analytic platform that combines computer vision, and modular software engineering in order to measure yield-related phenotypes from ultra-large aerial imagery. An analysis is developed to map lettuce size distribution across the field, based on which associated global positioning system (GPS) tagged harvest regions have been identified to enable growers and farmers to conduct precision agricultural practices. In Chlingaryan et al. (2018), the authors analyze research developments performed within the last 15 years on machine learning based techniques for accurate crop yield prediction and nitrogen status estimation. In Treboux and Genoud (2020) the authors present the performances of machine learning algorithms on aerial images object detection for high-precision agriculture. The proposed approach in this paper improves the object detection and obtain an accuracy of 94.27%. In Horng et al. (2020) the authors propose a study of harvesting system based on the Internet of Things technology and smart image recognition. The proposed model is implemented for crop detection by collecting and tagging images. It is shown that the proposed model training has a mean average precision of 84%, which is better than the other existing models. In Bauer et al. (2019) a common problem for apple orchards, namely, the attack of the codling moth, is studied. It is shown that a data-science based near sensor neural network algorithms can be implemented to automatically detect the codling moth. The performance of this system is evaluated, and power consumption ideas are discussed for achieving the zero energy balance of the system. Finally, in Addey et al. (2021) the authors examine the implications of risks, uncertainties and random events on the prediction of crop yields.
Motivated by all these studies, we propose a general mathematical model for analyzing yield data. It is shown that a special case of such a model can be a generalized version of the well-known Barndorff-Nielsen and Shephard model. If the statistical moments of the yield data can be obtained from the stochastic model, then it can play a crucial role in understanding the empirical data set. In fact, this will illustrate how a particular feature variable contributes to the statistical moments (and in effect, the characteristic function) of the target variable (i.e., yield). Consequently, we derive expressions for statistical moments from the underlying stochastic model. This model allows for better understanding of the variability within the data set, which is useful to farm managers attempting to make current and future decisions using the yield data. In addition, this model may help to identify errors in yield data collection and recording. Finally, lenders and risk management consultants may benefit from this model's insights regarding yield expectations.
The data analyzed in this paper come from a characteristic corn field in the upper midwestern United States. The data were collected in 2010. Each observation in the data set represents two seconds of harvesting as a combine harvester traveling through the field. Each observation is associated with a precise latitude and longitude where it was recorded in the field. Yld Vol(Dry)(bu/ac) is the quantity of corn harvested during the 2-seconds interval. It is the variable predicted by the model. The mean Yld Vol(Dry)(bu/ac) for the field is 161.06 and the median Yld Vol(Dry)(bu/ac) is 172.07. Yld Mass(Wet)(lb/ac), Yld Mass(Dry)(lb/ac), and Crop Flw(M)(lb/s) are also measures of corn yield during each 2-seconds interval, but these measures have not been converted to bushels. Furthermore, Yld Mass(Wet)(lb/ac) has not been corrected for moisture, which is represented by the crop's relative moisture, Moisture(%), and Crop Flw(M)(lb/s) measures corn harvested per second rather than per acre. During each 2-seconds interval, the combine harvested travels a variable distance, Distance(ft), which is determined by the Speed(mph) that the combine is traveling during the interval. The combine's speed and the width of the combine's header during the 2-seconds interval, Swth Wdth(ft), determine how many acres could be harvested in one hour at those rates (Prod(ac/h). We implement neural network algorithms to analyze the dry yield volume based on the other observed and significant feature variables.
The organization of the paper is as follows. In Section 2, we propose a general statistical model, and analyze a special case of it-a generalized Barndorff-Nielsen and Shephard model. Theoretical results related to the statistical moments of the target variable are discussed in detail. In Section 3, we provide the description of the data. We analyze the data with neural network techniques and provide two methods of data analysis. Finally, a brief conclusion is provided in Section 4.

General Framework
For the empirical data, we assume that there is one target variable and n feature variables. We model the target variable S t by where b t is a deterministic function of t, W (i) t , i = 1, . . . , n, are independent Brownian motions, and J (i) t is the jump process with intensity λ i , i = 1, . . . , n. We assume that W (i) t and J (i) t , for i = 1, . . . , n, are independent. The coefficients θ In addition to that, σ t is assumed to be stochastic, and its dynamics are governed by for an appropriate function F, where H (j) t , for j = 1, . . . , n, are jump processes with intensities µ j , j = 1, . . . , n. The coefficients β (j) t , at every t satisfy ∑ n j=1 (β (j) t ) 2 = 1. For simplicity, for the rest of the proposal, we assume θ (i) = β (i) , for i = 1, . . . , n.
For a special case of (1), we assume that the individual dynamics of a feature variable where B t is a Brownian motion. Consequently, (1) can be written as The expression (3) provides an alternative explanation for the coefficients θ (i) t , i = 1, . . . , n, and those will be computed in the numerical section. Those represent the significance in terms of big fluctuations (or "jumps") of the i-th ingredient feature process Y (i) t . We write J (i) t in terms of integrals with respect to Poisson random measures N (i) (dt, dx), for i = 1, . . . , n. Consequently,

Special Case: A Generalized Barndorff-Nielsen & Shephard Model
There are some special cases of the proposed model that are studied in the literature in connection to the financial market; for example, the Barndorff-Nielsen and Shephard model (BN-S model). For such a model, the target variable is the stock, see, Shephard (2001a, 2001b); Habtemicael and SenGupta (2016); Issaka and SenGupta (2017) or the commodity price (see, Roberts and SenGupta (2020) where the parameters µ, β, ρ, λ ∈ R with λ > 0 and ρ ≤ 0 and r is the risk-free interest rate where a stock or commodity (the target variable) is traded up to a fixed horizon date T. In the above model, W t is a Brownian motion, and the process Z λt is a subordinator. Additionally, W and Z are assumed to be independent, and (G t ) is assumed to be the usual augmentation of the filtration generated by the pair (W, Z). We consider a special case of (4), where dZ . . , n are subordinators. Making a scaling in the time variable, we define s = λt, for λ > 0. Then, we obtain dZ . . , n, as subordinators. Consequently, we consider S = (S t ) t≥0 on some risk-neutral probability space (Ω, F , (F t ) 0≤t≤T , Q), given by (4). We consider the case that is more aligned with Nicolato and Venardos (2003). In this case, we assume that the generalized version of (6) takes the form where Z (i) , i = 1, . . . , n are independent subordinators. For the drift term, comparing with (6), we thus have µ = B and β = − 1 2 . Additionally, we assume that (F t ) is assumed to be the usual augmentation of the filtration generated by (W, Z (i) ), i = 1, . . . , n. In this case, (7) will be given by The solution of (9) can be explicitly written as The integrated variance over the time period [t, T] is given by σ 2 I = T t σ 2 s ds, and a straight-forward calculation shows We derive a general expression for the characteristic function of the conditional distribution of the log-asset price process appearing in the stochastic model given by Equations (5), (8) and (9).
We quote the following result from Nicolato and Venardos (2003), which is known as the "key formula". This result will play a significant role in the next section.
Lemma 1. Let Z be a subordinator with cumulant transform κ, and let f : We also note that where " d = " represents equality in distribution. Statistical moments are useful quantities, which when computable, can give significant insight into the underlying empirical data. Consequently, it is useful to obtain analytical expressions for statistical moments for the underlying model. In connection to the BN-S model, statistical moments for S t are derived in Ihsan and SenGupta (2018). With this motivation, we derive a couple of results related to certain moments of the target variable, S t . The first result is simple but restrictive. This result is subsequently generalized. For the rest of the paper, we take N ∈ N, where N is the set of natural numbers.
Theorem 1. For a given λ > 0 and ρ < 0, if N satisfies then the N-th moment of S t with respect to the measure Q is given by where a(t, N) = e NBt+ N(N−1)
Proof. Let G denote the σ-algebra generated by the Background Driving Lévy Process, BDLP, Z up to time t. We observe Using the first equality in (13) for the integrated variance process, we obtain where in the last step, we use the independence of Z (i) , i = 1, . . . , n. If N satisfies Nρ + N(N−1) 2λ ≤ 0, that is, N satisfies (14), then for 0 ≤ s ≤ t, Consequently, by using Lemma 1, we obtain (15).
Next, we derive a result without any restriction on N. For this paper, we denote the random measure associated with the jumps of a process A t , and the Lévy density of A t , by J A (·, ·) and ν A (·), respectively. The compensator for J A (dt, dx) is ν A (dx) dt, and we define For the proof of Theorem 3, we use the following version of the Girsanov theorem. The proof may be found in Øksendal and Sulem (2007).
Theorem 2. Let u(t) and θ(t, z) ≤ 1 be predictable processes such that the process exists for 0 ≤ t ≤ T and satisfies E P 1 [Z(T)] = 1. Define the probability measure P 2 by dP 2 = Z(T)dP 1 . Then u(t)dt + dW t is a Brownian motion, and θ(t, z)ν(dz)dt +Ñ(dt, dz) is a compensated Poisson random process with respect to P 2 .
We now proceed to prove a more general result related to the moments of S t , with respect to the BN-S model. This result does not assume any restriction on N ∈ N.
For the short-term, we assume θ (i) s to be a positive constant and θ where τ 2 (t, N) is a deterministic function of t, and κ (i) (·) is the cumulant generating function for Z (i) 1 , with respect to Q, where i = 1, . . . , n.
We remark that an explicit form of a particular case of the function τ 2 (t, N) in (18) will be found in the Corollary 4.

Proof.
Let U be a subordinator with Poisson measure J U (·, ·). We will characterize the process later in this proof. Consider the stochastic differential equation where α, β are constants and γ(y) > −1. Consequently, (see Øksendal and Sulem (2007)) we obtain Set β = 0 in (19) to obtain where α = α − λ y≥1 (ln(1 + γ(y)) − γ(y))ν U (dy). We choose α in such a way that Then by Cont and Tankov (2004) (Proposition 8.23), we obtain M t as a martingale. Consider a new measure dT(t) = M t dQ(t). Note that with respect to T, the Brownian motion W t still remains the same.

Corollary 4.
where κ (i) (·) is the cumulant generating function for Z  For Figure 1, from the histogram and the distribution plot for Yield Vol (Dry), respectively, we observe that the distribution of the sample data points of feature Yld Vol (Dry) is left-skewed. It can also be noted that the maximum count of observation falls within the range of 150-200 of Yld Vol (Dry) (bu/ac) for both these plots. In Figures 2 and 3, for our exploratory data analysis, we implement box plots to visualize the measures of dispersion of our data set for the variable Yld Vol (Dry) with respect to two variables-Moisture (%) and Distance(ft), respectively. We observe from both plots that for a certain fixed interval, there is a cluster of outliers underneath the lower whisker of the plots. The inter-quartile range tends to show an increasing pattern either right before or right after the big cluster of outliers. To help us visualize the distribution of a single variable and relationships between two variables, in Figure 4 we use the pairs plot. For our data set, the pairs plot in the diagonal gives us the univariate distribution via a histogram, and the scatter plots shows the bivariate relationship among variables. For Figure 5, the correlation matrix heatmap for our data set provides an overview on the relation among different variables (features). From the matrix heatmap color codes or annotated values, we can observe that the correlation coefficient between most features is low. The goal of the analysis is to predict the target variable (Yld Vol(Dry)(bu/ac)) based on the feature variables. We conduct the analysis in two different approaches.
Method 1: For this case, at first, we observe that for the "Yld Vol(Dry)(bu/ac)" variable, the maximum and minimum are 399.55 and 10.28, respectively. Consequently, we divide the target into eight equally spaced intervals, as [10,60,110,160,210,260,310,360,410].
Corresponding to each target variable, we create a list of three categorical data [a, b, c], where a, b, c ∈ {0, 1} by the following rule: if the target variable x is such that 10 ≤ x < 60, then After this, we train the neural network model with 80% data and test it on the remaining 20% data. We create a four-layered deep model with 42 nodes in the first layer, 30 nodes in the second layer, and 20 and 10 nodes in the last two layers, respectively. We use the activation function Rectified Linear Unit (ReLU) for the first input layer followed by the tanh activation function for the three hidden layers. For the output, we use the softmax activation function. To determine which hyper-parameter combination is most efficient for our model, we first conduct exploratory data analysis (EDA) to visualize the validation loss and accuracy of our model and then choose to train the model on 120 epochs, with a batch size of 32.
For the testing data, true positive, true negative, false positive, and false negative are denoted as TP, TN, FP, and FN, respectively. The following measurements are standard: The f1-score gives the harmonic mean of precision and recall. The scores corresponding to every class gives the accuracy of the classifier in classifying the data points in that particular class compared to all other classes. The support is the number of samples of the true response that lie in that class. Table 1 provides the classification report for Method 1.   Here we used the keras.callback "EarlyStopping" and monitored the validation loss. We use "EarlyStopping" as it terminates training when the chosen performance measure stops improving. As a result, if we look into the model accuracy plot (Figure 6) we observe that learning stops once the epoch number is 2. We also see that our model performs well enough with such a smaller epoch number and does not crudely over-fit or under-fit the training and testing data. Accuracy at that time is above 97%.
The line plot for the "model loss" (Figure 7) shows that the model is good at minimizing the loss function with fewer epochs.  Method 2: The first part of this method is the same as in Method 1. However, instead of considering a, b, and c separately (as done in Method 1), for this method, we consider those simultaneously. In this case, with the same neural network as described in Method 1, we find that for the test data, the model correctly predicts 97.06% of the time. We create learning curves for different learning rates (lr) in Figures 8-13. When lr = 1, 0.01, we can observe from the loss curve and accuracy curve that the model performs poorly when fitting the training and testing dataset. The test data are over-fitted, and rather than showing any decay, the loss curve shows an increase as the epoch number increases. When 10 −5 ≤ lr < 10 −6 , the model performance is much more improved. This can be justified, as in this case, both model loss and model accuracy have a nicely fitted training and testing data set. When lr > 10 −6 , the model performs much poorly, suggesting that there is probably no learning happening at all. The classification report for Method 1 and the model's ability to correctly predict the test data set in Method 2 tells us that the model is not perfect and there is room for improvement. In our future work, we plan to improve our procedure and further polish this model to enhance the predictive ability for both these methods. From the analysis, it is clear that (4) can be an appropriate model for the target variable. It is also clear that for both the methods, the neural network model provides a very appropriate prediction of the target variable, with the help of feature variables.

Discussion
This paper analyzed yield data from a representative corn field in the upper midwestern United States. Yield data and data on other variables were collected every two seconds as the corn was harvested. These data are modeled to better understand corn yield in this field. In our data analysis, we wanted to answer the question: "Given intervals of fixed length, can a machine-learning-driven neural network model help us predict the interval on which sample points of feature Yld Vol (Dry)(bu/ac) belong to? And if so, how accurately does the model predict that correct interval?" In order to answer these questions, our first task was to create intervals (of fixed size) using the feature Yld Vol (Dry) (bu/ac)) where the left end-points were kept as part of the interval. Once these intervals were constructed, we assigned the sample points of this feature Yld Vol (Dry) (bu/ac) into the correct interval. We labeled them with either 000 or 001 or 010 and so on (eight combination/labels) and created separate columns for these labels in our data set.
After this preparation stage was complete, we trained the data set with six features (Swth Wdth, Distance, Crop Flw, Moisture, Speed and Prod) where the label columns were treated as our test data. To avoid a highly complex deep neural network, we kept the number of hidden layers of our neural network to three. We compiled the model using the Adam optimizer and monitored loss using 'categorical cross entropy'. When fitting our model, we used callback 'EarlyStopping' that monitored the validation loss. 'EarlyStopping' was implemented to speed up model training and to stop model fitting when the training data points is no longer learning efficiently; this also helps avoid overfitting. To observe that the learning does not become inefficient and to avoid overfitting, we computed validation loss and accuracy with different learning rates and analyzed them visually. In the last phase of our analysis, we obtained a classification report for our prediction. The final step of our analysis involved investigating how well our model correctly predicted the intervals. When compared to the test data points, the model correctly predicted almost 97% of the sample test points.
In Stelzer and Barndorff-Nielsen (2013), the authors introduce and analyze a multivariate supOU stochastic volatility (SV) model where they present an example of long memory in log returns in the SV supOU model, and in Willinger et al. (1999), the authors investigate whether stock price returns exhibit long-range dependence. Their work motivates us to investigate further with our pricing model to observe if our model exhibits long-range dependence as a stylized factor in our future project. At the same time, an in-depth analysis to determine whether our time series data exhibits any long-range dependence is something we wish to incorporate in a future sequel of our current work. Even though we have a model that could efficiently predict the interval on which sample data points of Yld Vol (Dry) (bu/ac) belong to, there is still room for improvements and adjustments. Before training our data set, we can apply feature selection along with a random forest classifier to help us decide which features are more important than others. We can later replace them with the six features that we manually selected for our training purpose. Once we have allowed the machine to select the important features for us, in our modeling phase we can focus more on hyperparameter tuning by investigating with different epochs, batch sizes, and increasing the weights in the hidden layers.
Author Contributions: All the authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
Restrictions apply to the availability of these data. Data was obtained from a cooperating farm manager and are available from the authors pending the permission of that individual.