Learning from Data to Optimize Control in Precision Farming

Precision farming is one way of many to meet a 70 percent increase in global demand for agricultural products on current agricultural land by 2050 at reduced need of fertilizers and efficient use of water resources. The catalyst for the emergence of precision farming has been satellite positioning and navigation followed by Internet-of-Things, generating vast information that can be used to optimize farming processes in real-time. Statistical tools from data mining, predictive modeling, and machine learning analyze pattern in historical data, to make predictions about future events as well as intelligent actions. This special issue presents the latest development in statistical inference, machine learning and optimum control for precision farming.


Introduction
The worlds population is expected to be nearly 10 billion by 2050, corresponding to a 55 percent increase in global demand for agricultural production based on current trend. In 2011, according to FAO, agriculture made use of 2710 km 3 (70 percent) of all water withdrawn from aquifers, stream and lakes, but this number masks large geographical discrepancies. Middle East, Northern Africa and Central Asia, has already withdrawn most of the exploitable water with 8090 percent of that going to agriculture. Hence, rivers and aquifers are depleted beyond sustainable levels [1]. Shifting the focus to arable land, 1.6 billion hectares are arable worldwide. The total world land area suitable for cropping is 4.4 billion corresponding to around 40 percent of world's land. However, in several regions, soil quality constraints affect more than half the cultivated land base, notably in sub-Saharan Africa, Southern America, Southeast Asia and Northern Europe [1]. When forests are converted into farming land, the largest stores of carbon locked in those trees will be released to the atmosphere, contributing to global warming on top of today's level.
Clearly, crop production on current land needs to be increased through adopting new technologies. To increase profits, reduces waste and maintains environmental quality at the same time, farmers are supplied with decision support systems that propose the right dose/action at the right place and at the right time [2,3]. The core piece of such decision support system is a agricultural model related to either crop growth, epidemiology, or market development that optimizes a control function based on probabilistic assessment of causal relationships [4]. Satellite telemetry tracking data along with existing geo-referenced digital map as well as Internet-of-Things based sensor data act as input to the model. Automated data processing systems, often located in the cloud, train the model. The trend goes from manually trained to self-calibrating models that adapt to changes in the environment over time. Smartphone applications have become a key interface in precision agriculture between the farmer and the cloud. These applications not only visualize the control parameters and suggests possible actions but also return the farmers' reaction (irrigation, sowing, fertilization etc.) back to the cloud. Fully automatized actions that go beyond human level performance while minimizing resources are still subject to research.

Statistical Inference and Machine Learning
The key to effective experimentation in precision farming is blocking, replication, and randomization [5]. To analyze and interpret the experimental results as well as to predict upcoming data, tools from statistics are deployed. Probabilistic models approximate the complex dynamics of the underlying process using statistical assumptions on the generation of sample data. Statistics draws population inferences from data samples. Neither training and nor test sets are necessary to infer the parameters. The supervised machine learns from training data to build a statistical model that can be used to make repeatable predictions. The unsupervised machine, in contrast, learns the model on its own without external training data. With the development of Internet-of-Things, machine learning applications for precision farming have been rapidly developing over the last years [6].

Low-Order Statistics
Random variables have a discrete or continuous probability distribution. Loworder statistics denote the first and second moments of a sample from the distribution. The former and the second correspond to the mean and the statistical auto and cross power of the random variables. Low order statistics, however, require a very large number of samples to estimate with a reasonable level of confidence. When the random variables are Normal distributed, this now ranked data is often used for ANalysis Of VAriance (ANOVA), comparing the ratio of within group variance and between group variance, to assess systematic factors (bias) and random factors (covariance). The former has statistical influence on the data set while the latter does not. For example, there is an average weight variation within one kind of pumpkin but there might be another average weight variation among different pumpkin varieties. The Pearson correlation coefficient defines as ratio of covariance to the product of individual variances measures linear correlation between two random variables. For example, the Pearson correlation between evapotranspiration and precipitation is positive over the southern/deforested but negative over the northern/forested Amazonia [7].

Regression
Multiple regression models characterize the relationship between a dependent target variable and multiple weighted independent feature variables. The weights, also known as regression coefficients, are an average functional relationship between target and features which might be linear or non-linear. For example, an exponential regression is adequate to model the relation between tree height and leaf-area index of Prunus [8]. The least square fitting technique yields the model parameters. A probit regression, in contrast, considers binary target variables with Gaussian distributed model noise and possibly multiple weighted independent variables. The maximum likelihood technique is often used to obtain the model parameters. Voting with binary outcome is a typical application of probit regression. For example, Sevier and Lee used this method in [9] to predict the probability of Florida citrus producers adopting precision agriculture technologies. Note that regression analysis is sensitive to multicollinearity, arising whenever two or more independent variables used in a regression are strongly correlated with each other. In this case, the weights become very sensitive to small changes in the model.
An Artificial Neural Networks (ANNs) consists of many simple connected nodes dubbed neurons, each deploying a real-valued non-linear activation function. Input neurons are activated by data from external sensors. Other neurons are activated by weighted edges from previously active neurons. Feed-forward neural networks, forming a directed acyclic graph, process the sensed data without memory. In contrast, the recurrent neural network (RNN) allows connections among neurons in the same or previous layers. They have internal memory and their graph is directed with cycles. When fed with environmental and historical dynamic information, this type of neural network is well-suited to time series forecasting [10]. In the convolutional neural network (CNN), forward and backward propagations perform convolutional operations. Usually, the edge weights are point estimates based on stochastic gradient training. Bayesian Neural Networks model the uncertainty of the estimated edge weights by interpreting them as maximum likelihood or maximum a posteriori estimates. A comprehensive state-of-the-art overview of ANN is available in [11]. Notable examples in precision farming are the feedforward neural network by Adisa et al. in [12] for maize production prediction. In this work the feature space is spanned by the environmental parameters, potential evapotranspiration, soil moisture and land cultivated. Barbosa et al. deployed in [13] a CNN that predicts the spatial yield map of corn fields in Illinois, Nebraska and Kansas, USA. Here, satellite images as well as environmental data span the feature space. In a third notable application, multi-layer (deep) CNN have been applied in [14] to detect plant leaf diseass based on 54000 (large number) training images. Finally, we want to point out the example in [15] where RNN has been used for spatio-temporal prediction of leaf area index in rubber plantation. The feature space in the experiment was spanned by the individual CCD images. The underlying theory of many neural network architectures is still in research phase.
Bayesian time-series forecasting is another promising field of research in precision farming. Within this framework, all sources of uncertainty are expressed by stochastic processes. The Bayes Theorem turns the a priori probability and the distribution of the observed data, also known as likelihood, into the posteriori distribution of the parameters for predictive inference. A partially observed state-space model such as the Hidden Markov Model (for discrete states) or the Kalman filter (for continuous states) are ideally suited to describes the dynamics of the process. A typical example in agriculture research is price prediction of crops. In [16], a Kalman filter has been deployed to predict the price time-series of rice. When the model parameters are unknown, the observation sequence and the state sequence can be used to estimate them. The linear dynamic Bayesian network developed in [17] does this by relating indicative parameters of crop development to environmental control parameters. The expectation-maximization algorithm is used to track the states in the expectation step and to learn the parameters of the Bayesian network in the maximization. At iterative convergence, the algorithm provides a time-series predictor many time instants ahead. When the dynamics is non-linear on top of that, sequential Monte Carlo techniques often lead to accurate parameter predictions by sampling from the posterior distribution on the expenses of computational complexity. In the special case of sigmoid-type growth dynamics, a linear dynamic model leads to the exact predictor for the reciprocal time-series of the parameter [18].

Classification
Classification is a supervised learning problem as above regression is. Considering models for solving classification problems, the classical Fisher linear discriminant analysis is a standard multivariate technique both for dimension reduction and supervised classification. The data vectors are transformed into a low dimensional subspace the maximize separation of class centroids. In many applications, however, the linear boundaries do not adequately separate the classes. Roth and Steinhage present in [19] a nonlinear generalization of discriminant analysis that uses the kernel trick to replace dot products with an equivalent kernel function.
Sparse Kernel Machines evaluate the kernel function only at a subset of the training data points to predict a new data point, making the computation time feasible [20]. Specifically, the support vector machine (SVM) by Boser et al. in [21] discards all data points but the support vectors, once the model is trained. The determination of the model parameters is an convex optimization problem so that any local solution is also a global solution in contrast to many other algorithms. The SVM has become popular for solving problems in classification, regression and novelty detection. For example, Jheng et al. predicted in [22] the rice yield in Taiwan by a SVM using training data from 1995-2015. The relevance vector machine (RVM) [23] is a Bayesian sparse kernel technique that provides posterior probability outputs in contrast to the SVM. At the same time, RVM based prediction models utilise dramatically fewer basis function than a comparable SVM. To name an example from remote sensing, the RVM with plate spline kernel is able to spatially estimate chlorophyll from an unmanned aerial system at low computational cost [24]. Finally, we want to point out the Informative Vector Machine (IVM), constructing sparse Gaussian process classifiers by greedy forward selection with criteria based on information theoretic principles. The IVM performs similar to the SVM by only a fraction of training data. Roscher et al. uses in [25] an incremental version of the IVM to classify hyperspectral image data for various agricultural crops in Italy, Europe, and Indiana, USA.

Clustering
Clustering is an unsupervised process of partitioning a set of data (or objects) in a set of meaningful sub-classes, called clusters. Clustering techniques can be categorized into i) partitioning algorithms constructing various partitions and then evaluate the result by some criterion (k-means, k-medoids, CLARANS,...); ii) hierarchical algorithms creating a hierarchical decomposition of the set of objects by some criterion (AGNES, BIRCH, CURE, DIANA,...); iii) densitybased methods that are guided by connectivity and density functions (DBSCAN, OPTICS,...); iv) grid-based methods that are based on a multi-level granularity structure (STING, WaveCluster, CLIQUE, ...); and v) model-based clustering methods that find the best fit to a hypothetical model (Autoclass, Rock, EMalgorithm,...). Massive computing power makes it possible, for example, to mine large amount of existing crop, soil and climatic data. Clustering the result based on districts with maximum wheat yield gives the optimal range of best temperature, worst temperature and rain fall [26]. To scale clustering algorithms with the number of dimensions and the number of data items, attention has been drawn to distributed approach [27]. Nevertheless, the scaling problem is still a challenge for most of above clustering algorithms such as big data applications.

Closing the Loop
So far, machines have mostly be used to learn from the observations with the goal to predict future outcome given current conditions. Clearly with increasing number of observations, the machine becomes smarter over time but it does not have control over the environmental conditions. Currently, these are controlled by the agronomist's experience. A more efficient approach is to let agents make optimal actions subject to minimizing resources. The result is a close-loop precision farming system where the model learns from data in the forward loop and controls actuators in the backward loop, as outlined in [28]. Reinforcement learning, making smarter decisions over time, has enjoyed a great success in several domains such as computer game, medical diagnosis and energy management. Bu and Wang build in [29] a smart agriculture IoT system based on deep reinforcement learning that decides the amount of water needed to be irrigated by analyzing the collected agricultural environment data. Though there had been great progress, the technology cannot yet achieve the human-level performance in adaptation to dynamic environments and solving complex tasks. Ergo, there is still a lot of space for research towards optimum precision farming. Table 1 lists strengths and weaknesses of common statistical models and machine learning algorithms.

Conclusions
Precision farming for current arable land is a promising approach to meet the vast global demand for agricultural products on current land. Internet-of-Things provides vast real-time information on crop related parameters, soil and weather,

MANOVA
• Powerful test for finding truly significant factors. • Robust to Type I errors.
• Relation between independent grouping variable and dependent variables sometime ambiguous.

Multiple Regression
• Theory well understood.
• Good results are obtained with relatively small data sets.
• Ability to determine impact of independent variable on dependent variable.
• Missing data erroneously changes regression coefficients.
• Correlation does not necessarily correspond to a causation.

Deep Neural Networks
• Perform well on audio, image, text data.
• Architecture can be adapted to a number of problems.
• Computationally intensive to train.
• Tuning hyper-parameters needs expert knowledge.

Dynamic Bayesian Network
• Accurate prediction of temporal behavior.
• Flexible adapts to environmental changes.
• Underlying theory is well understood.
• Cannot handle real biological systems with feedback loops (cycles).
• Initial guess of parameters is crucial for convergence.
• Convex optimization problem with unique solution.
• Does not scale with data dimension.
• Finding a proper kernel is often cumbersome. k-means clustering fast, simple. Model order must be known in advance.
item Estimate is unbiased.
• Sensitive to choice of hyperparameters.
• Good results only for uniform densities.

Reinforcement Q-Learning
• Computes most successful rewards even when the environment is large.
• Convergence to the optimum policy is guaranteed.
• Computationally complex. • Assumes that all of the states and all of the actions are presentable as matrix. that feeds machine learning algorithms for better crop productivity while protecting the environment. The ultimate goal is to maximize yield by minimizing water consumption, usage of fertilizers, and amount of arable land in an automatic fashion. Although there has been an evolution of research in this area, more knowledge is needed to close the gap between current practice and optimum precision farming.