Bayesian Sigmoid-Type Time Series Forecasting with Missing Data for Greenhouse Crops

This paper follows an integrated approach of Internet of Things based sensing and machine learning for crop growth prediction in agriculture. A Dynamic Bayesian Network (DBN) relates crop growth associated measurement data to environmental control data via hidden states. The measurement data, having (non-linear) sigmoid-type dynamics, are instances of the two classes observed and missing, respectively. Considering that the time series of the logistic sigmoid function is the solution to a reciprocal linear dynamic model, the exact expectation-maximization algorithm can be applied to infer the hidden states and to learn the parameters of the model. At iterative convergence, the parameter estimates are then used to derive a predictor of the measurement data several days ahead. To evaluate the performance of the proposed DBN, we followed three cultivation cycles of micro-tomatoes (MicroTom) in a mini-greenhouse. The environmental parameters were temperature, converted into Growing Degree Days (GDD), and the solar irradiance, both at a daily granularity. The measurement data were Leaf Area Index (LAI) and Evapotranspiration (ET). Although measurement data were only available scarcely, it turned out that high quality measurement data predictions were possible up to three weeks ahead.


Introduction
The need to supply food to a growing population leads agriculture to deal with new and significant challenges. High-input and resource-intensive farming systems, which have caused massive deforestation, water scarcities, soil depletion, and high levels of greenhouse gas emissions, cannot deliver sustainable food and agricultural production. Instead, agriculture will increasingly need innovative systems that protect and enhance the natural resource base, while increasing productivity [1]. Precision agriculture has a significant potential to reduce agricultural inputs, enhance agricultural sustainability, and increase production, as it allows greater accuracy in targeting the correct amount of inputs, at the correct time, and in the correct location compared to conventional agricultural methods [2]. Within the precision agriculture approach are currently also included Internet of Things (IoT) technologies [3]. In a recent review [4], the authors analyzed the existing literature on the concept of IoT applications in precision agriculture, and they concluded that IoT may contribute significantly to modern agriculture, by providing automated and even remote control of farms, thus enabling a more effective management of agricultural activities, in this innovation supported by new age farmers that have already moved away from traditional farming to engage with technological farming. missing values are treated as they were known in the analysis. Consequently, the bias is reduced. Multiple imputation, in contrast, samples multiple values from the probability distribution of observed sensor data and takes the statistical mean, variance, and confidence interval to fill the gap [20]. A standard approach for multiple imputation is the Data Augmentation (DA) algorithm, which is a Markov chain Monte Carlo technique [23]. In Bayesian inference, the missing data can be resolved by iterating between the imputation step and posterior step [21]. The most efficient method, we follow subsequently, is to approach the maximum likelihood estimate in the presence of missing data using the expectation-maximization algorithm [24]. In contrast to the stochastic approximation in [25], our solution is exact.
In this study, we develop a DBN for time series forecasting with daily granularity of crop growth in the IoT based mini-greenhouse COLTIVazione Automatizzata Miniaturizzata Innovativa (COLTIV@MI) [26]. Our DBN learns the model parameters ad-hoc without the need for time consuming training cultivation cycles. Our DBN relates the non-linear crop growth associated measurement data LAI and ET to the environment associated control data GDD and solar irradiance via hidden states. Many plants have a logistic growth evolution. To resemble such growth dynamics, the DBN will be based on a highly complex non-linear model. We will show that the reciprocal function, though, follows a linear dynamic model with low computational complexity. Often, measurement data are only available scarcely at an irregular time spacing. For example, the LAI has been obtained by destructive measurement once every three weeks. To learn the parameters of the model in Gaussian noise when measurement data are MAR, we derive a novel tracking algorithm based on the exact Expectation-Maximization (EM) framework. The self-trained model is then used to derive a measurement predictor many days ahead. As an example, we consider the micro-tomatoes MicroTom [27] through the proposed approach, which can be applied to any other crop type with sigmoid-type growth dynamics. We will see that LAI and ET can be predicted up to 21 days ahead with high accuracy even if almost all of the measurement data are missing.
The paper is organized as follows. Section 2 develops the DBN, the underlying system model, and the EM algorithm to learn the parameters of the model. The plant growth parameter estimates at iterative convergence are then used to derive a multiple-step ahead predictor. The experimental settings will be discussed. The experimental results are discussed in Section 3 followed by conclusions and limitations in Section 4.

The Standard Models and Beyond
The LAI is a dimensionless measure for the total area of leaves per unit projected ground area and directly related to the amount of light that can be intercepted by plants. The LAI is a key parameter to predict photosynthetic biomass production. For tomatoes, it was shown in [28] that LAI from planting to the time of harvest can be modeled as a function of thermal time, expressed in growing degree days (GDD) ( • C). The resulting function: has the form of a Boltzmann sigmoid where α, β, ζ, and δ are constants to be obtained by regression analysis. The index t in (1) determines the t th Growing Day after Transplantation (GDT). For the calculation of GDD from average daily temperature, a value of 10 • C was selected as a base temperature according to [29]. The Evapotranspiration (ET) is a combination of the water transpired by plants during the growth or retained in the plant tissue plus the moisture evaporated from the soil surface and vegetation. Several ET models have been developed during the last few years, all based on the Penman-Monteith approach [30], which is a worldwide accepted modeling approach to determine evapotranspiration.
The current use of the Penman-Monteith model is based on the calculation of ET for outdoor climates, while the Stanghellini ET model [31] was implemented in high technology controlled environment greenhouses. Both equations require several parameters related to the weather, as well as the crop stomatal and aerodynamic resistances, which are not always available. Alternative relatively simple empirical models have been developed for irrigation scheduling purposes, while the other knowledge based mechanistic models have been developed for climate control purposes [32]. These models take into account greenhouse climate variables such as radiation and vapor pressure deficit and crop measurements such as the Leaf Area Index (LAI) or stomatal resistance [28,[33][34][35][36][37][38][39]. In some cases, a multiple linear regression of ET against vapor pressure deficit, as well as outside or inside solar radiation has been proposed for irrigation management in greenhouse crops [40]. For tomatoes, it was shown in [28] that the Baille model in [33] simplifies to: The ET is expressed in mm d −1 ; R t is the daily value of solar irradiance (MJ m −2 d −1 ); λ is the latent heat of water vaporization (2.45 MJ kg −2 ); k is the light extinction coefficient of the canopy (measured as 0.69); and and γ are the regression parameters. Inspecting (2), it can be seen that the ET is a modulated sigmoid function with a negative exponential envelope. The models (1) and (2) act as the benchmark for our dynamic Bayesian growth model.
Difference equations can be used to predict the growth status one discrete time step ahead. Specifically, the time series of the LAI in (1) resembles the scaled and time-shifted logistic function z t = [κ(1 + exp {−µ(t − t 0 )})] −1 with the system parameters κ, µ, and t 0 denoting the steady state, decay, and the point of inflection, respectively. The logistic function is the solution to a non-linear difference equation, which is cumbersome to implement. Its reciprocal dynamics, however, has an exponential shape that can easily be generated by the first-order linear ordinary difference equation: at low computational complexity. Here, ∆ denotes the time granularity. Note that the first term on the right-hand side of (4) depends on the latent variable, while the second does not. On the other hand, the ET time series in (2) has the form: Note that despite the different sign in front of the exponential, the expressions in (3) and (5) are the solution to the same difference equation in (4), but with distinct initial conditions.

Dynamic Bayesian Network
We now derive a Dynamic Bayesian Network (DBN) for stochastic crop growth with the sigmoid-type activation function when observation data are sparse. When the underlying process is Markovian, the DBN replicates a particular template over discrete steps in time. The template is a directed acyclic graph, representing the state transition distribution from one state to the next state and the emission distribution within the same state. The edges of the DBN reflect a conditional dependency, while the nodes correspond to one of the three kinds of variables. The arrows indicate the conditional dependencies. The expression in (4) motivates us to model the reciprocal dynamics of crop growth as DBN, having a length of T growing days. We subsequently define the states, derive the template, and draw the DBN.
For the states, the transition probability distribution from hidden growth state {x t : follows a first-order Markov process with additional Gaussian noise. Here, N (·; m, Σ) denotes a normal distribution with mean m and covariance Σ. Moreover, A ∈ R K×K and B ∈ R K×K denote the state matrix and the control matrix, respectively; I is the K-dimensional identity matrix, and the diagonal matrix U t diag {u t }. The control vector u t , u t ∈ R K is a deterministic function of the environmental parameters independent of the crop.
The template of the DBN is comprised of the above state distribution and subsequent emission distribution: instead of having perfect knowledge of the state variable. Here, the measurement matrix is denoted by C ∈ R 1×K . Moreover, y (obs) t ∈ R denotes the observed data. The remaining measurement data y (mis) t ∈ R are conceptual and denote the data that were not observed. For the initial state, With the state transition model defined in (6) and the emission model in (7), we are now ready to derive the DBN as illustrated by two time slices in Figure 1.

Measurement Hidden State Hidden State
Control data Control data In the sequel, the parameter vector θ = {A, B, Σ n , C, Σ w , µ 1 , Σ 1 }, the latent state sequence x {x 1 , . . . , x T }, and the missing data are unknown and, hence, require estimation.

The EM Algorithm
This section derives a maximum-likelihood based tracking algorithm for the state sequence. So far, we have designed the DBN in Figure 1. Unrolling the DBN for T time slices, it follows for the joint state-measurement probability distribution that: Due to the Markov property in (6), the joint state-measurement distribution decomposes into the product of individual conditional distributions. We now have two options for handling missing observations: (i) modeling the missing data mechanism or (ii) ignoring it, which is equivalent to integrating out the missing observations from the joint density function in (9). Following this approach, we get: Rubin showed in [20] that sufficient conditions for ignoring missing data are: • The data are MAR, i.e., the missing data mechanism is only allowed to depend on y (obs) ; • the model parameters governing absence and the parameters of interest θ reside in different spaces.
In our case, both conditions are fulfilled, so that we may safely ignore the missing data mechanism. Based on the marginal distribution in (10), we now use the EM algorithm to approximate the Bayesian inference of the state sequence. The EM algorithm postulates complete (hidden) data X that would ease the computation of θ if they were known. Following this approach, let X contain the reciprocal dynamics of plant growth along with the observed measurement data, i.e., X = {x, y (obs) }. Starting from iteration i = 0, the E-step of the algorithm provides the expected value of the entire complete data given the observed measurements and a guess of the parameter vector, i.e., The E-step iterates between prediction and correction if measurement data are available or only predicts the complete data based on previous state information without a response when measurement data are missing according to (10). The M-step, is learning from the updated complete data in the E-step. The sequence of log-likelihood values is non-decreasing and converges to a stationary point [24,41].
At iterative convergence, the expected state sequence has the form: with the error variance: V The derivation of (13) and (14) is provided in Appendix A.

Prediction of Measurement Data
If control data were available q days beyond the current growing day T, but measurement is missing, the expected state-sequence in (13)) would already be the solution to our problem. In practice, however, control data are only available until time step T. A simple yet efficient predictor is to deploy (13), but keep control data frozen at time step T. Following this approach, we obtain: Substituting (15) for (7), it follows for the q-step ahead predictor of the measurement data that: having error variance: The above predictor runs freely without response.

Initialization of the EM Algorithm
The stationary point of the log-likelihood function, reached by the EM algorithm, highly depends on the initial point [42]. Hence, initialization requires special care.
For the noise covariances, we start off with small values on their diagonal: The measurement matrix C [0] is initialized as: where E is the all-one matrix. Substituting C [0] for (8), the state-sequence starts off with: The initial matrices A [0] and B [0] depend on the growth parameter. For the LAI, the reciprocal logistic curve in (3) is a function of three parameters κ, µ, and t 0 , which are the solutions to three equations: This system of nonlinear equations, however, has no closed-form solution. For MicroTom tomatoes, the reciprocal LAI has the steady state of κ ≈ 1 [43]. Having fixed κ, the computation of t 0 is obsolete, and µ has the closed-form solution in (21). From (4) and (6), it can be seen that the state matrix A is proportional to the decay. Ergo, We normalized A [0] by the sample mean of the control signal. As the control matrix B is proportional to the decay and the steady state, it follows: For the ET, the negative exponential decay in (5) can be specified by two observation points according to: which can be solved in an iterative fashion. Substituting µ for µ in (22), as well as κ for κ in (23), we again obtain initial guesses for the matrices A [0] and B [0] , respectively. With this type of initialization, our EM algorithm is able to find the global maximum of the likelihood function with the aid of only two initial observations.  Table 1 lists their descriptive statistics.

Experimental Design
The different climatic conditions produced important differences in cumulated ET and growth. The ET was measured in the mornings by using a scale. Roughly 50 percent of all data points, however, were missing. Leaf area measurements were made on 5 plants as the sum of the area of the leaves of each plant by a planimeter (DT Area Meter MK2, Delta T-Devices, Cambridge, U.K.), starting from Growing Days after Transplantation (GDT) equal to zero and ending at the ultimate GDT with two samples in between.
The proposed DBN was benchmarked against the non-linear growth model by Carmassi in Section 2.1, as well as a low-complex Linear time-series Regression Model (LRM). For round-fruit tomatoes (Solanum lycopersicum L. cv. Jama F1), the parameters of the model were given in [28]. Since the vegetative habitus of our cv. "MicroTom" was quite different, it was necessary to re-calibrate the model parameters. Using the dataset of Env 1, it followed for the LAI reference model in (1) by Carmassi that α = 0.03, β = 0.90, ζ = 560, and δ = 50 when the base temperature was 10 • C. Following the same approach for the ET reference model in (2), we obtained = 0.109 and γ = 0.32 mm d −1 . The competing LRM used the same training data as our DBN did. Due to lack of measurement data, the LRM describes a linear relationship between LAI and cumulative GDD, only, according to: where ρ 0 and ρ 1 are the regression coefficients using a standard least squares fit and ε is the model error.
For the ET, the LRM assumes a linear relation between the dependent variable ET and the independent variables GDD accumulated over time and R according to: Finally, it should be noted that the EM algorithm in our DBN was iterated until the log-likelihood function ln p y (obs) |θ [i] increased by less than 1 o / oo or the limit of 100 iterations was reached.

Results and Discussion
Given the environmental parameters in Section 2.3 and sparse historical measurement data until some time instant T, our DBN was deployed to predict the posterior distribution q growing days ahead.
In MicroTom LAI prediction, the measurement data were observed in only 6% of all cases. The other 94 % were treated as MAR. The control data, however, were available at all time instants. The EM algorithm operated on the reciprocal LAI. The first two measurement points were used to initialize the algorithm according to Section 2.2.3 with κ = 1. Figure 2 reports the performance of our DBN as a function of GDT. At the day of measurement, our predictor forecast the next measurement, which was q = 21 days (very long) ahead. In the upper subplot, the blue markers indicate the reciprocal observations at T = {1, 22, 43, 64}. The black curves show the mean prediction value y T+21 of 1/LAI in (16). Its standard deviation Σ [∞] T+21 in (17) was encoded as half the width of the filled region around the mean value. Clearly, the predictor only had access to historical measurement data until time instant T. The lower subplot shows the LAI vs. GDT using the same line-styles. For comparison purposes, the Carmassi model in (1) and the LRM in (25) are also added to the plot as red and brown lines, respectively. It can be seen that our DBN correctly anticipated the reflection point of the sigmoid function in all three cultivation environments despite the extremely low number of measurement points. Clearly, with the increasing number of observed data points, the quality of the LAI forecasts improved. The average prediction error, averaged over all T, was equal to 15.5%, 12.2%, and 19.7% for Env 1, Env 2, and Env 3, respectively. Details are listed in Table 2. Note that the inverse of the reciprocal mean depicted in the lower subplot is a lower bound of the mean value shown in the upper subplot (Jensen's inequality). Moreover, the error variance of the reciprocal of a Gaussian random variable tends to infinity and, hence, was omitted from the lower subplot. The analytic Carmassi model accurately estimated the inflection point in Env 3 excluding the reference environment Env 1. The simple LRM in (25) could be seen as first-order approximation of the former in (1) and, hence, was quite accurate in the neighborhood of the inflection point of the sigmoid function, but became more erroneous the larger the time deviation was. Figure 2. Prediction of the LAI with prediction length q = 21 for micro-tomatoes (MicroTom) with sparse data in three environments: Env 1 (warm and bright), Env 2 (shaded and emergency heating), and Env 3 (indoor and artificial illumination). LRM, Linear time-series Regression Model. Table 2. Prediction error of LAI vs. GDT for MicroTom with sparse data in three environments: Env 1 (warm and bright), Env 2 (shaded and emergency heating), and Env 3 (indoor and artificial illumination).  Figure 3 shows the performance of our DBN with prediction length q = 1 day (very short). At time T, the EM algorithm either observed or missed the measurement point, requiring reconstruction. The resulting curve, influenced by the different control data every day, was rougher, but generally speaking, more accurate than for a prediction length of q = 21. In particular for Env 3, the prediction error was now under 10%. Table 3 indicates that the mean estimation error for the analytical Carmassi model was equal to 19.42% (13.46%) for Env 2 (Env 3). This error was dominated by the loose approximation near the initial growth state. The mean estimation error of the LRM was equal to 23.88 %, 34.0%, and 50.0% for Env 1, Env 2, and Env 3, respectively. In contrast to the Carmassi model, this error was dominated by the steady growth state farthest away from the likely missed inclination point due to the nature of the first order approximation. Figure 3. Prediction of the LAI with prediction length q = 1 for MicroTom with sparse data in three environments: Env 1 (warm and bright), Env 2 (shaded and emergency heating), and Env 3 (indoor and artificial illumination). Table 3. Prediction error of the Carmassi and the Linear Regression Model (LRM) for LAI vs. GDT. The parameters of the former were adjusted during the cultivation cycle in Env 1, while the slope and intercept of the latter during the first two measurement points in each cultivation cycle.  Figure 4 reports the ET vs. GDT for our MicroTom in all three environments. The upper, middle, and lower sub-plots represent the greenhouse environments Env 1, Env 2, and Env 3, respectively. This time, the EM algorithm operated directly on the ET (and not on its reciprocal). In this scenario, measurement data were missing roughly 50% of the time. The start of the prediction was at T = 10. The prediction length q = 1. It can be seen that our predictor was able to follow the trend of the measurement curve. It predicted small outliers caused by changes in the environmental parameters with increasing accuracy the longer the observation time was. However, it had difficulties with anticipating large outliers. This was true for all environments. The Carmassi model in (2) was only quite accurate for Env 1, on which it was calibrated. The linear dependency of the LRM in (26) overemphasized the impact of the environmental parameters on the ET, in particular for Env 2. Figure 5 illustrates the measured cumulative ET vs. the predicted for our predictor in (16). It can be seen that the predicted values were roughly Gaussian distributed around the 1:1 line, indicating freedom from bias. This was true for all three environments. The average prediction error of our DBN application was 29.42%.  Not shown in the already overcrowded plot, the Carmassi model and the LRM had average prediction errors of 37.21% and 67.23%, respectively. Thus, it could be concluded that our DBN performs similar to the analytical Carmassi model. Error performance details are listed in Table 4. Table 4. Estimation error of ET vs. GDT for the micro-tomatoes in three different environments: Env 1 (warm and bright), Env 2 (shaded and emergency heating), and Env 3 (indoor and artificial illumination).

Conclusions
We proposed a Bayesian machine learning approach for the inference of dynamical systems that modeled crop growth indicators with a sigmoid evolution over time. Such an evolution essentially corresponded to a two-phase dynamical process consisting of an exponential growth of the plant followed by a slow-down, leading to the achievement of a steady state. Both LAI and ET, which are among the most common growth-related indicators, exhibit such a dynamics for some types of plants.
The main advantage of our approach was flexibility. The model was inferred from the first items of the time series of indicators and environmental parameters, and it could be used to predict future items. This enabled real-time estimation of crop growth without the need to dedicate a whole cultivation cycle to a species-specific model calibration.
Compared to analytical models based on regression, which can fit calibration data in a very accurate way, our approach turned out to be more robust to changes in the environmental parameters. Again, this was a consequence of the fact that the model learned the influence of environmental parameters on the growth indicators from (the first items of) the same time series it would have to predict.
The proposed approach could also take advantage of available information about the usual trend of the growth indicators of interest. For instance, in the considered MicroTom tomatoes example, we used knowledge on the reciprocal LAI steady state (usually close to one) to initialize the matrices for the learning phase properly. This allowed us to obtain a more accurate inference of the dynamical model, and consequently more accurate predictions. Generalizing, this showed that the approach could also take available agronomic knowledge into account in order to improve the accuracy of predictions. For example, another piece of information that could be used, if available, was the usual time the steady state was expected to be reached (i.e., the expected length of the cultivation cycle). As future work, we plan to investigate how we could transform this information into an additional constraint for the learning phase and to measure the effect of such a constraint on the prediction capability of the inferred model.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; nor in the decision to publish the results. with the Kalman gain matrix K: and the short-cuts: and: The backward recursion, in contrast, yields the state estimate of interest, namely a normal distribution with mean and covariance: respectively, where: Finally, the boundary conditions are updated according to: The M-step of the EM algorithm in (11) learns the parameter vector θ by computing the partial derivative of the expected log-likelihood function in (11), setting the result equal to zero, and solving with respect to the respective parameter, leaving: (A13) Note that all the parameters related to the measurements in (A13) and (A14) are based on the observed measurement data, ignoring the missing data. The above state and parameter estimates are the base of our DBN in Section 2.2.