Dynamic Modeling of Crop–Soil Systems to Design Monitoring and Automatic Irrigation Processes: A Review with Worked Examples

: The smart use of water is a key factor in increasing food production. Over the years, irrigation has relied on historical data and traditional management policies. Control techniques have been exploited to build automatic irrigation systems based on climatic records and weather forecasts. However, climate change and new sources of information motivate better irrigation strategies that might take advantage of the new sources of information in the spectrum of systems and control methodologies in a more systematic way. In this connection, two open questions deserve interest: (i) How can one deal with the space–time variability of soil conditions? (ii) How can one provide robustness to an irrigation system under unexpected environmental change? In this review, the different elements of an automatic control system are described, including the mathematical modeling of the crop–soil systems, instrumentation and actuation, model identiﬁcation and validation from experimental data, estimation of non-measured variables and sensor fusion, and predictive control based on crop–soil and weather models. An overview of the literature is given, and several speciﬁc examples are worked out for illustration purposes.


Introduction
The rise of the world population motivates the increase in food production. To achieve this goal, irrigation plays a major role, since 70% of fresh water around the world is used in agriculture [1]. A challenge is to increase production while following a sustainable and efficient strategy [2,3]. Efficient irrigation has a determinant impact on the water footprint of agricultural production, which depends not only on the amount of water that is used, but also on the way it is exploited and on the water sources [4]. On the one hand, sustainability is linked to water scarcity and climate change, since historical climatic data may not be sufficient to determine irrigation strategies. This issue was discussed by [5], where the authors proposed an adaptive irrigation strategy. On the other hand, the design of irrigation systems involves a proper interpretation of the crop-soil system in order to propose an optimal irrigation policy. One of the most relevant open challenges is the management of land heterogeneity related to topography and soil properties [6]. Over the years, the representation of crop-soil dynamics has been addressed with mathematical models, where the cause-effect relations between environmental variables and plant/soil components have been incorporated. These models mainly include mass balance equations, heuristic relationships, and statistical functions. The historical evolution of these models was described by [6][7][8]. Nevertheless, modeling of irrigation systems has traditionally been limited to soil dynamics, neglecting the plants' interactions and effects [9], which has led to sub-optimal solutions [10]. Consequently, the design of irrigation systems can be enhanced by more comprehensive models that take interactions between the different system compartments into account and by a more systematic use of system and control theory to improve monitoring and predictive control.
This trend in smart farming and precision agriculture has been boosted by the availability of more sources of information, notably, through remote sensing [8,11,12], thus also leading to the use of machine learning [13]. The applications of these tools have led to improvements in production performance, as discussed in [14].
However, there is still room for a full exploitation of system and control methodologies, particularly in this new era of massive data sources, as recently pointed out by [15]. In this latter review article, the authors emphasized two research directions, i.e., minimal resource usage and the exploitation of current technology to build better management systems. The open challenges include the resource allocation problem, control-oriented modeling, and sensor and actuator placement.
In this present article, the last ten years of developments related to irrigation in agricultural systems are reviewed by following the methodological structure shown in Figure 1. The objective is to highlight the several steps leading to an advanced control strategy, including instrumentation, mathematical modeling, model identification and validation, state estimation (software sensors), and model-based control. The main concepts are illustrated by numerical examples that are carefully worked out. This review builds upon past reviews, such as [16,17], which summarized fundamental control concepts applied to irrigation systems.
The problem of irrigation can be divided into two subproblems, i.e., water transportation and water usage. In this review article, the focus is put on the latter subject, since a complete review about water transportation was recently published [18], where the authors presented a comprehensive description of an open-channel system that included modeling and control features. This paper is organized as follows. In Section 2, the main components of simulationoriented models are extracted to propose control-oriented models. Next, Section 3 presents the sensing and measurement elements. Afterwards, in Section 4, the system identification techniques used to determine the model parameters are presented, with an emphasis on parameter uncertainties and model validation. Section 5 reviews state estimation and the reconstruction of non-measured variables. Then, in Section 6, the most popular control techniques are discussed. Section 7 discusses spatial representation and soil heterogeneity. Finally, Section 8 draws conclusions and perspectives.

Dynamic Crop-Soil Model Components
Crop-growth models are developed to understand how environmental and management factors affect food production. In a predictive irrigation policy, these models are a core component, as illustrated in Figure 1. Two types of models can be distinguished: (i) simulation-oriented models (SOMs) and (ii) control-oriented models (COMs). The goal of the models in the first category is to faithfully mimic the system behavior and to understand the phenomena influencing crop growth. These models combine mechanistic descriptions of biophysical phenomena and heuristic laws. In [19], the authors presented an extensive list of crop-growth models (however, more than 50% of these models have become obsolete over time). The models in the second group are usually reduced or simplified versions of the previous models. Only 12 of the most cited SOMs are usually considered to build a control-oriented framework (see Table 1) based on a hierarchy in growth and production factors proposed by [20]. These include growth-defining factors, growth-limiting factors, and growth-reducing factors. The growth-defining factors determine the maximum possible growth and are dependent on crop-specific parameters and weather influences, such as temperature and solar radiation. The growth-limiting factors are due to a shortage or excess of water and nutrients such as nitrogen. The growth-reducing factors are due to, e.g., weeds and pests. Some of the mentioned factors are listed in Table 2 for representative SOMs, whose evolution has been reviewed by [21]. Other related works, such as [22], used a dynamic system approach to represent full agricultural production systems. These agricultural production systems include livestock production, economic and labor modules, environmental impact, and social components, which are outside of the scope of this review.
In this review, we describe the crop dynamics for a homogeneous portion of land by following two approaches.

Crop and Growth Models
Food production includes perennial and seasonal crops. As mentioned in the introduction, water has a determinant influence on the development of both. For instance, large crops of almonds have been identified as the most significant consumers of freshwater, followed by large-scale seasonal crops of rice, alfalfa, sugarcane, wheat, maize, and plantains [23]. Therefore, efficient irrigation offers a global benefit for long-and short-term crops. However, in short-term crops (e.g., cereals and legumes), the excess or lack of water can be catastrophic, since the crop might be under irrecoverable conditions [24]. Consequently, most COMs are focused on seasonal and short-term crops.
The dynamics of crop processes can be described by a system of differential or difference equations, which represent the plant growth with at least three levels of detail. In the first level, the crop growth is limited solely by weather conditions, primarily solar radiation and air temperature. In the second level, growth is water-limited, i.e., limited by the shortage or excess of water supply, in addition to the constraints imposed by weather conditions. In the third level, growth is also limited by the shortage of nutrients-mainly, but not limited to, nitrogen (N)-during some lapses in the growing season. The shortage of other nutrients, such as phosphorus or potassium, follows the same logic as for nitrogen, imposing more restrictive bounds on the growth function. These levels are explained next.   The time evolution of the biomass y is related to the energy provided by sunlight: where µ is the growth rate (i.e., the light-use efficiency (g/MJ)), w is the daily intercepted solar radiation (MJ/m 2 ), and y f is the maximum (or potential) biomass, which is related to the crop species and the plant density. After the limit of accumulation of solar energy is reached, the phenology of the crop involves maturity and the natural aging process. The stresses caused by the lack or excess of environmental inputs, including the shortage of nutrients (which can be remedied by fertilization), are accounted for by models in two ways-first, by speeding up the aging process (e.g., shortening the maturity time or cumulative temperature), and second, by using multiplicative factors of the growth function. These multiplicative stress factors are usually in the range of [0, 1], where 1 represents a non-stress condition and 0 represents a maximum stress factor. The shape of stress factors is related to the mathematical representation of the observed experimental impact (see Table 3). It is important to note that the multiplicative structure can present serious difficulties for parameter identification. Table 3. Stress factor shapes, where 0 ≤ S ≤ 1 is the relative stress level, α > 0 is the shape factor, S l and S u are the relative stress levels at the lower and upper threshold, respectively, and β is the rate factor.

Shape Equation Uses
Reference Air temperature [19,27] Pests and diseases [46] Other models consider the temperature (T) as a driving variable in order to describe matter formation. The crop grows according to the thermal pace, which corresponds to the number of days that the crop is subject to a range of temperature that has been experimentally proven to be optimal for stomatal exchanges. Crop growth is expressed as the product of the solar radiation (w), the radiation-use efficiency (r), the growth function G(T, t), and the stress function K(X, T, N, t) depending on water (X), temperature (T), and nutrients (N), i.e., dy dt = wrGK.
The growth function G ∈ [0, 1] is formed by the combination of two logistic shapes: one for the plant vegetative growth from germination to the half of crop life with a positive slope, and the second one for aging from the maturity to harvest with a negative slope. This type of model can be illustrated by considering the simple case in [43]. Originally, this model was presented in a discrete-time version that used daily steps, as the environmental inputs were typically sensed and reported daily. The potential growth function is given by where v represents the cumulative temperature, θ 1 is the maximum fraction of radiation interception that the crop can reach, θ 2 is the cumulative temperature requirement for leaf area development to intercept 50% of radiation, θ 3 is the cumulative temperature requirement from sowing to maturity, and v s is the temperature aging function accounting for the negative effects of water and temperature stresses. The stress function K, which penalizes the growth in Equation (2), is represented as the product of single environmental stresses, but also includes an additional positive effect of CO 2 (K CO 2 ∈ [1, 1.3]), i.e., K = K T K X K N K CO 2 . An example of wheat as a test crop that is sown on a deep loamy soil is shown in Figure 2. Figure 2a shows the shape of the growth function and senescence for non-stress crop growth (K = 1) and two stress levels (K < 1). The early senescence is directly related to stress amplitude. In correspondence, Figure 2b shows the biomass evolution. Notice that any stress factor in Equation (2) affects the growth function by narrowing the bell shape, but also by attenuating its height. Consequently, the presence of stress factors changes the slope of the biomass production graph and its final value.

Second Level
This level of model description involves water supply as a limiting growth factor. Most of the water exchanges occur below the ground, which is why the water-in-soil mass balance equation is considered. This equation is the core of COMs, and is often used to compute the water stress factors. The water content in soil x is computed by where x T represents the crop transpiration [47], x r represents the surface runoff, x d represents the drainage, u represents the irrigation, and w r represents the rainfall. As mentioned earlier, a discrete-time version of the model equations is often preferred for practical purposes. This discrete-time version can be obtained using an explicit Euler method for numerical integration, i.e., dx Equation (4) with ∆t = 1 day and k as the discrete time index, therefore, becomes An illustrative example of the behavior of water fluxes in a homogeneous area of soil (Equation (5)) is depicted in Figure 3. The dynamics of water in soil are affected by rainfall and temperature (not in the graph), where the negative effects (i.e., stresses for plants) occur if the water in soil is above the field capacity (saturation) and below the wilting limit. Notice that for proper crop development, it is not necessary to always fill the soil up to the field holding capacity; instead, it is essential to avoid soil dryness.  An example of a more elegant soil representation (i.e., with compartmental vertical layers) is the one proposed by AquaCrop [27]. In this latter model, the differential flow equation is replaced by a set of finite difference equations, which are written in terms of the dependent variable x and the independent variable of time (k). The vertical soil profile is divided into m layers indexed from the surface to the maximum depth considered (i = 1, ..., m), and the time k runs from the sowing to the harvest day. This representation includes the evaporation effect in the surface layer and the rising effect due to the capillarity in the presence of the water table. The moisture content x i (k) is determined for each layer of the soil and for every time step as follows: where the superscript i is the index of the soil layer, x i T (k) is the crop transpiration, x i d (k) is the drainage of the soil profile, x i f (k) is the water infiltration into the soil profile (after the subtraction of surface runoff), x i c (k) is the upward movement by capillary rise from a shallow groundwater table, and x i e (k) represents the amount of water lost by soil evaporation. Notice that the advantage of (6) over (5) is the separation of evaporation and transpiration, which address the temperature effect in the top layers, and the vertical movements of water. However, the price to pay is a large number of equations to solve.

Third Level
In this modeling level, the shortage of nitrogen (N) is considered. Nitrogen is fundamental for leaf development, production of chlorophyll, and the synthesis of carbohydrates. However, nitrogen cycles in agriculture are asynchronous and evolve at different rates. There is a slow rate of natural incorporation of nitrogen into the soil by microorganisms and nitrogen uptake by plants, in contrast with the fast rates of nitrogen use inside the plants and the nitrogen compounds withdrawn from the soil. Moreover, nitrogen provided by fertilization is not fully retained in the soil, as only a fraction is used by plants before emission of ammonia into the atmosphere and leaching losses [48].
The nitrogen balance of the soil can be represented as the difference between the nitrogen added by fertilization or mineralization and the nitrogen removed by crop uptake and losses [49]. Therefore, the rate of nitrogen in the soil (N s ) is given by: where N m is the nitrogen supply through mineralization and biological N fixation, α is the fertilizer nitrogen application rate, N R is the fertilizer nitrogen recovery fraction, and dN u dt is the rate of nitrogen uptake by the crop, which is calculated as the minimum of the N supply from the soil and the N demand from the crop, and N l represents the nitrogen losses by leaching. The net rate of change of nitrogen is where N t is the nitrogen translocation and N d is the loss due to death of the biomass. A crop is assumed to experience N stress at N concentrations below a critical value for unrestricted growth. To quantify crop response to nitrogen shortage, a Nitrogen Nutrition Index (NN I) is defined, ranging from 0 (maximum N shortage) to 1 (no N shortage) as follows: where N n , N c , and N r are the actual, critical, and residual nitrogen content, respectively. This index, as described in Section 2.1, can affect the growth function as a multiplicative factor or as a modulator of the aging function.

Compartmental Approach
A modeling approach that differs from the three levels of detail described in the previous subsections is the use of the concepts of storage compartments or pools. In these pools, the organic matter is divided into shoots, or aboveground biomass (AGB), and roots [50]. This compartmental approach has the advantage of discriminating phenological development before germination and after germination, which is convenient for the use of remote sensing technology to measure state variables. As the supply of carbon (C) creates a demand for nitrogen (N), the biomass above and below the ground can be divided into nitrogen and carbon pools. In this case, the biomass (Y in kg m −2 ) can be decomposed into the shoot (Y Xsh ) and root structures (Y Xrt ), which are related to the C and N substrate pools in shoots (Y CSsh and Y CSrt in kgC m −2 ) and roots (Y NSsh and Y NSrt in kgN m −2 ). Shoot and root substrate concentrations are defined by and the differential equations of the state variables are: where P net is the net photosynthetic rate (KgC m −2 d −1 ), G Xsh and G Xrt , are the shoot and root growth rates, f CX and f NX are the substrate fluxes (kgC,

State-Space Representation and Variable Definition
To design irrigation strategies, mathematical models can be built based on mass balance equations and mechanistic descriptions of crop-soil subprocesses, as illustrated in Table 2 for some popular SOMs. These models can be cast into a standard nonlinear state-space representation:ẋ where x is the vector of state variables, f (x, u, θ) is the vector of functions that describe the dynamics of the crop-soil system, u represents the vector of inputs, including the environmental inputs u e and the artificial, or manipulated, inputs u a , θ is the set of model parameters, y is the vector of measurement outputs, and h(x, θ) is a vector of nonlinear functions relating the state variables with the measurable variables. It is important to note that the crop-soil system is a positive system, i.e., the trajectory of the state variables starting from a positive initial condition always remains positive, regardless of the values of the input variables. However, crop-soil models can preserve this fundamental property only if they are properly formulated. The environmental inputs cannot be influenced, but can be sensed at regular time intervals (for instance daily), whereas the manipulated variables, e.g., fertilization or irrigation, allow one to act on the evolution of the soil-crop system. In Table 2, the main modules (in many reported studies, the models are actually not presented as a global system of equations, but more as a set of software modules achieving the computation of the outcome of specific subprocesses) of selected SOMs are presented based on a functional decomposition of the soil-crop subprocesses. Mass balance modules are solutions of differential or difference equations, whereas other modules are sets of algebraic equations or heuristic formulations, e.g., modules for the evapotranspiration subprocess. Pests, diseases, and weeds may further limit crop production at all developmental stages.
However, these factors are not considered in detail in this review, which is dedicated to mathematical modeling and control to support irrigation. The models have uncertainty related to the inputs, the structure, and the model parameters. These uncertainties can be formulated as probability distributions and can be accounted for in the models. In [51], the authors demonstrated that the uncertainty in the model structure was larger than in the model parameters. However, as SOMs were developed for scientific understanding as well as decision/policy support, the model uncertainty was limited to the bias in the final output, and there was no tracking of how the uncertainty was propagated in the model. As an alternative, in [52], the authors proposed the use of multi-model ensembles (MMEs) to quantify model structural uncertainty, showing that they rely on parts of models that are based on traditional heuristics. In contrast, the uncertainty in a COM can be analyzed by considering the state-space structure.

Sensors and Instrumentation
In modern agricultural systems, there is a network of interconnections among farmers, sensors, actuators, machinery, and cultivated areas, forming a so-called cyber-physicalhuman system (CPHS) [53]. In this context, farmers play a key role as decision makers regarding the management of cultivated areas and food production in general. A global picture is rpresented in Figure 4. . Sensors, actuators, scheduling, and control in the context of farm decision making. When automatic irrigation control is enforced, the feedback is implicit in this block, as the sensors continuously retrieve data and signals from the field to feed the control system.
Of course, the information provided by the sensors is a keystone for the development of dynamic models, the estimation of parameters, and the development of more elaborate monitoring and control schemes. For instance, soil moisture (SM) and soil water content (SWC) are essential in irrigation. Traditionally, these variables were measured using probes, e.g., stick probes, neutron probes, phene cells, time-domain reflectometers (TDRs), and tensiometers. The variables are related to physical properties, such as capacitance, resistivity, heat conductance, and mechanical force. Other methods involve collecting samples to be analyzed offline, such as gravimetric measurement, with the drawback of being noncontinuous in time and space. However, more reliable remote methods have gained interest in the last decade due to their coverage of large fields. These include ground-penetrating radar (GPR), aerial photography, multi-and hyperspectral imagery, and electromagnetic induction (EMI). A list of common state variables included in SOMs, as well as the possible types of sensing techniques, is provided in Table 4. The output equation of the state-space representation (Equation (11b)) is described by the nonlinear vector function h(x, θ), whose structure will highly depend on the underlying physical principles and technology of the measurements. In some specific cases, this nonlinear equation can reduce to a linear one in the form where C(θ) is a constant matrix depending on parameter θ. As a practical example, let us consider the soil moisture sensor presented in [81]. In this work, the output variable y (relative soil moisture) is given by where w is defined as the ratio of the mass of moisture present in the soil sample to the dried mass of the sample, γ d is the dry unit weight of the soil, and γ w is the unit weight of water. However, many of the output variables are not linearly related to the state variables. The data acquisition system includes sensors, measurement instruments, transducers, and algorithms. The sensing devices can retrieve information about physical variables through direct (i.e., ground sensors) or indirect contact (i.e., proximity or remote sensing). Traditionally, direct measurements are restricted by the locations of the sensors. This issue was discussed in [82,83], where the main conclusion was that direct measurements are still required as a part of a monitoring and sensing system. Moreover, there are soil properties that can only be sensed directly, e.g., soil organic carbon [84]. The big advantage of ground sensors is the precision and reliability of measurements independently of environmental conditions. The drawback is the limited capability of representing the heterogeneity of a cropping area, as the ground sensors are fixed in particular locations. To alleviate this latter issue, proximity sensors are used for their flexibility. These sensors require one to be close to the crop and soil, but are portable and, therefore, can be installed in unmanned aerial vehicles (UAV), ground robots, or agricultural machinery. However, portability and mobility cannot ensure a simultaneous and global measurement of a cultivated area, and arrays of sensors-especially wireless sensor networks, as suggested by [85,86]-can resolve this global picture. However, this approach is, in turn, limited by the size of the cropping field. Hence, remote sensing (RS) appears as the ultimate solution due to the ability to simultaneously encompass multiple measurements over diverse scales of cropping land. These techniques are particularly well suited for water-related variables in climate change scenarios [87].
RS is a current research and development trend due to its advantages of capturing large amounts of data all at once. RS gathers Earth observations (EOs), which are limited by the orbit and periodicity of satellites, and aerial imagery, which is limited by the frequency of capture and weather conditions. In either case, some variables must be inferred by combining data from multiple sensors in different wavelengths or other variables [67]. For instance, in [61], the evapotranspiration (ET) was derived from the solution of the energy balance equation by using short-wave and long-wave band inputs from satellite sensors to estimate net radiation (Rn), soil heat flux (G), and sensible heat flux (H), with the latent heat flux (LE) obtained as a residual (LE = Rn − G − H). The LE could then be extrapolated from a quasi-instantaneous value at the time of the satellite overpass to a daily ET value using different methods, such as the evaporative fraction or the reference ET fraction. For large-scale crops, EOs are suited to collecting information that is simultaneously related to crop and field variables. However, the spatial resolution is an issue for non-flat land. A descriptive presentation of the sensors embedded in active satellites is provided in [12], which includes frequency range and spatial and temporal resolution. Optical and thermal sensors are highly sensitive to climatic conditions. For this reason, other sensors, such as synthetic aperture radar (SAR), prove to be an effective technique for monitoring crops and other agricultural targets, as they do not depend on weather conditions [88]. This technology was discussed in [71], which provided a review of the diverse vegetation indexes associated with state variables and discussed the limitations linked to the current information retrieved from SAR sensors and optical/thermal signals to derive variables such as crop height (which is typically sensed by airborne instruments, such as light detection and ranging (LiDAR)).
Reviews of particular data acquisition systems have recently been published. For instance, the authors of [89] surveyed soil moisture measurements by considering ground, proximal, and satellite remote sensing. One of the main challenges is to combine point observations, which are commonly obtained with in situ techniques, such as simple gravimetric sampling or various electromagnetic sensors, with a spatial representation of the water distribution. The problem is to properly capture spatial variability without deploying too many sensors. A possible solution is to combine ground-based and airborne or spaceborne remote sensing techniques to provide a characterization and monitoring of large-scale near-surface soil properties and processes at reasonable temporal and spatial resolutions.
Other environmental and weather variables can be retrieved from either portable or official meteorological stations with regional coverage near the field. The portable stations located in the field have the advantage of reducing the uncertainty of some variables, as their measurement precision is comparable to that of regional meteorological stations, which are usually situated several kilometers away from the field [90].

Model Identification
Once a model structure has been established based on mass and energy balances and the use of physical and heuristic relations, the model needs to be calibrated in order to fit to the experimental observations. This operation of model calibration is called parameter estimation or identification in the systems and control literature. System identification gathers a set of statistical techniques that allow the designer to assess the values of model parameters based on signals and data from measurable inputs and outputs. Identification is a broad subject that is well documented in the literature, and we refer the interested reader to a few monographs and textbooks, including [91,92].
Basically, system identification proceeds in several steps, which can be iterated and can be summarized as follows: • data collection; • definition of the model objectives (simulation or control) and of the model structure; • identifiability study; • parameter estimation; • evaluation of parameter uncertainties and of confidence intervals in the model prediction; • model validation and cross-validation.
This section discusses the parameter estimation problem of a crop-soil mathematical model with a known structure, i.e., once the selection of an appropriate model structure has been achieved by following the ideas in Section 2, starting with the question of parameter identifiability.

Identifiability
Initially, an identifiability study will indicate whether it is possible to determine the parameter values from the available data. More precisely, two types of identifiability studies can be achieved, i.e., structural and practical [93]. The first assesses the possibility of estimating the model parameters given ideal measurements, i.e., it allows the determination of which output signals should be measured, assuming that they are available in continuous time and without noise. Several mathematical approaches can be used to achieve this analysis, and specialized software is available, e.g., DAISY [94], GENSSI [95], SIAN [96], and STRIKE-GOLDD [97], among others. To the best of our knowledge, these methods and software have rarely been used in the modeling of agricultural systems.
Practical identifiability deals with the possibility of assessing all or some of the model parameters under realistic conditions, e.g., sampled data and measurement noise.
The basic tool in this latter study is parametric sensitivity analysis, which can be local [91] or global, and it takes account of the interactions between parameter effects [98][99][100]. Sensitivity analysis has been applied in several studies of crop models. For instance, in [101], the sensitivity of four yield outputs of the APSIM-Sugar model to 13 parameters in rain-fed and irrigated conditions was studied using a global sensitivity method based on the decomposition of variance of the outputs known as Gaussian process emulation. As some crop models are difficult to analyze, as they are not fully mechanistic, but include empirical rules, it might also be interesting to using automatic differentiation tools to achieve sensitivity analysis, as demonstrated in [102].
In this paper, we limit ourselves to a brief introduction of the basic concepts related to a local analysis and consider the model m(t; θ), which is based on the vector of parameters θ, of the measured output y(t): In the case of a Gaussian distribution of zero-mean white noise ε(t), the Fisher Information Matrix (FIM) can be defined as ∂θ where Ω is the covariance matrix of the measurement errors ε(t), ∂m(θ) ∂θ represents the first-order parametric sensitivity matrix, and M represents the number of data points. The inverse of the FIM provides a lower bound (the so-called Cramer-Rao bound) of the covariance matrix of the parameter estimation errors, i.e., A parameter estimation method that can achieve the Cramer-Rao lower bound is called efficient. The FIM is a covariance matrix and, as such, is positive semidefinite. The FIM can be used to assess practical identifiability through a rank condition (a rank-deficient FIM indicates that the model is not identifiable with the data at hand). In turn, if the parameters are identifiable, the Cramer-Rao bound can be used to provide confidence intervals for the estimated parameters (which are, unfortunately, often not given in the published literature on parameter identification of crop models). In general, the use of (local or global) sensitivity analysis will be an important tool for determining the most influential parameters of a model, which requires estimation based on experimental data and may be re-estimated at a later stage if the environmental conditions have changed.
A challenge in the identification of crop-soil models is to deal with a large number of parameters. For instance, DSSAT [32] has more than 300 parameters, STICS [103] has 232 parameters, AquaCrop [104] has 29 parameters, and EPIC [33] has 22 parameters. An approach to reducing the number of parameters to consider is to use specific crop models instead of generic models, e.g., CERES-Wheat [47]. Another option is to derive more compact models, which are usually called summary crop models [6]. These summary models can be developed in several ways-first, as a simplification of full-scale models, e.g., FIELD presented by [105] as a crop-soil subsystem based on APSIM [44] and mini-STICS, a compact and limited version of STICS [103] presented in [106]. Further, compact models can be included in additional software tools, e.g., the OptimiSTICS package [107]. In this case, the aim is to guide the user into parameter estimation under various assumptions. For instance, this software package was used in [108] to estimate a set of parameters of the STICS model. Alternatively, it is possible to build simple new generic crop models on a mechanistic basis, e.g., the simple model [43], which includes 14 equations with 13 parameters. In conclusion, the modeler will generally have to either reduce the model under consideration or to decide on a subset of parameters to estimate in order to ensure practical parameter identifiability before embarking on the actual parameter estimation task.

Parameter Estimation
Parameter estimation is based on the minimization of the deviation between the model prediction m(t; θ), which depends on a set of parameters θ, and the measured outputs y(t). A general approach is the maximum likelihood estimator (MLE) [91]: where M is the number of data points and Ω is the covariance matrix of the measurement errors.
Other weighting matrices can be considered, leading to weighted least-squares estimators. For instance, the generalized least squares (GLS) method was used in [109] to estimate the parameters and to compute the variance-covariance matrix of the cultivar parameters of APSIM [44] and DSSAT [32]. In [108], a Bayesian approach was used to identify the parameter of the STICS crop model [103] based on the Differential Evolution Adaptive Metropolis (DREAM) algorithm [110], which is a variation of the Monte Carlo Markov Chain (MCMC) algorithm.
One of the main dangers in achieving parameter estimation is the risk of local minima of the cost function, leading to biased parameter estimates. This problem can be alleviated by using multistart strategies [111], i.e., repeated optimization starting from a large set of initial parameter guesses sampled in the design space, and through the use of global optimization algorithms [112].

Model Validation and Uncertainty Analysis
After the model parameters have been determined, model cross-validation should be achieved by comparing the model prediction to a fresh, unseen dataset in order to ensure that the model identification has not been subjected to local minima. The uncertainty of parameters should then be evaluated [113] by using various methodologies, including the Fisher Information Matrix and Monte Carlo analysis.
In [101], an uncertainty propagation analysis was performed assuming both normal and uniform distributions to conclude with a cross-validation. The results confirm the distinct variation of parameter influence between climatic and management conditions. The generalized likelihood uncertainty estimation (GLUE) method was used in [114] to generate 10,000 random parameter sets, including one calibrated parameter set, to explore the effects of parameter uncertainty on model output. GLUE was also exploited in [115] to analyze the DSSAT-CERES model [116]. In [108], the posterior distributions of nine specific crop parameters of the STICS model were sampled with the aim of assessing the growth of a winter wheat culture.
In addition, cross-validation is helpful in dealing with the parameter overfitting problem [117]. In [118], the authors identified ten parameters of AquaCrop and used the MCMC to infer the uncertainty in the parameters by approximating the posterior distribution function with eight experimental datasets.

Illustrative Example: Water Balance in Soil
To illustrate the previous concepts, the model introduced in Equation (5) is used as an example. This COM is based on water mass balance in soil. Considering that measurements are available for transpiration (x 1 ), runoff (x 2 ), drainage (x 3 ), and available water in the root zone (x 4 ), the model can be expanded as follows: where w 1 is rainfall, w 2 is the reference grass evapotranspiration, and there is no irrigation, i.e., u = 0. The set of parameters are the water uptake coefficient α, the drainage coefficient β, the runoff curve number η, the available water capacity θ, the root zone depth ζ, and the wilting point γ. S = 25,400 η−254 is the potential maximum retention depending on the soil type. First, we conduct an uncertainty and sensitivity analysis by applying Monte Carlo simulations and a Sobol global sensitivity method, as implemented in the Python library UncertainPy [119] and the Sensitivity Analysis Library SALib [120]. In [121], the same model (Equation (18)) was used to compute a drought index named ARID. The authors proposed different probability density functions (PDFs) (i.e., uniform, normal, triangular, and custom) for five parameters (i.e., α, β, η, θ, ζ) and performed a Fourier amplitude sensitivity test to assess the uncertainty and sensitivity. In contrast with this previous work, we include the wilting point γ and consider the cumulative available water in the root zone as an output variable, since our objective is to determine the positive and negative impacts of water in soil.
The uncertainty decreases from 40 to 24% when changing from uniform to normal distributions. The parameter β is the most relevant, with 36% participation in the global sensitivity. ζ and α follow with 26% and 25%, respectively. The full sensitivity test ranks the influence of the parameters as β > ζ > α > η > γ > θ, with negligible influence of the last three. This result suggests that the parameters related to the mechanical properties of soil are more relevant than the ones related to volume constraints.
Next, we perform a parameter estimation based on nonlinear least squares. The basic assumption is that measurements can be collected daily for the four x i (k) variables (i = 1, ..., 4). In the framework of this example, we generate the measurements using the same model and add Gaussian noise to x 1 , x 2 , x 3 , and x 4 corresponding to 8%, 3%, 5%, and 7% of the relative standard deviation, respectively. These noise levels are coherent with the corresponding instrumentation [55,122], but they are assumed to be unknown to the process modeler such that a simple weighted least-square approach is used here with a weighting matrix that takes account of the scaling of several variables (the variables have different orders of magnitude, and the scaling avoids distortion in the contributions of the different variables to the cost function). Notice that the information content of the data at hand is crucial. For x 2 , only seven measurements contain useful information, and for x 3 , sixteen samples do so (see Figure 5b,c). The nominal values of the process emulator parameters and their estimates are presented in Table 5. Confidence intervals of 95% are computed using the Fisher Information Matrix. The model prediction is shown in Figure 5.  A Monte Carlo simulation is used to assess the uncertainty of the model prediction with the identified parameter values of the parameters in their 95% confidence intervals. The results are shown in Figure 6, where 20,000 runs generate a shaded light-blue region and the solid blue line corresponds to the identified parameters.
Afterward, cross-validation is performed using a new dataset. The results show that the model with its previously estimated parameters is able to reproduce the dynamic behavior of the soil for other climatic conditions with an error of less than 3%, as shown in Figure 7. . Cross-validation with a previously unseen (for the identification procedure) dataset [123] for (a) transpiration and (b) available water in soil.

State Estimation and Software Sensors
State estimation algorithms allow the reconstruction of non-measured state variables in real time based on a model of the system and some on-line measurements collected from available sensors. In this way, a state estimator constitutes a so-called software sensor, in contrast with a conventional hardware sensor. Aside from the estimation of non-measured state variables, state estimators can also be used for the reconstruction of discrete-time information in continuous time, the fusion of redundant information, the detection and isolation of hardware sensor faults, or the estimation of model parameters. Some introductory surveys related to biological systems can be found in [124,125], and a monograph dedicated to crop systems is found in [113].
In the literature, the notion of data assimilation is also sometimes used to designate the process of combining a mathematical model with measurements to improve the prediction of some state variables or to estimate the initial condition of the model. Historically, data assimilation originates from the field of weather forecasting, and it shares many common features with state estimation techniques, even though the problem of reconstructing non-measured variables and the notion of observability are more typical of the literature dedicated to state estimation. In this review, we will stick with the wording of the system and control community and use state estimation instead of data assimilation.
A state estimator (also known as a state observer) usually has the following extended Luenberger form:˙x which embeds a copy of the dynamic systemẋ(t) = f (x, u) for predicting the state trajectory and includes a correction term K(y −ŷ) that measures the deviation between the hardware sensor information y and its prior estimateŷ(t). Many state estimation approaches have been proposed. Some of them take account of measurement noise (such as Kalman filters or receding-horizon observers), model uncertainties or disturbances (such as high-gain observers or sliding-mode observers), or uncertainties in model parameters and inputs (such as interval observers). In general, a system property called observability must be checked before embarking on the selection and design of an estimator.

Observability
A system is said to be observable if the current state vector can be estimated using the information from the sensors collected over a finite period of time. Observability is a structural property of a system, but for nonlinear systems, this property also depends on the input signals.
There are several approaches to the observability analysis of nonlinear systems. The basic one is based on the differentiation of the output and the construction of an observability map. For output j of system (11), the following set of derivatives can be computed: The observability map is given by: . . .
If a partition of the complete observability map is injective with respect to the state variables, then the system is observable. However, the analytical solution of the nonlinear system is often difficult, or even impossible, to find. Local observability can be checked thanks to the inverse mapping theorem (local inverse): If the Jacobian is nonsingular, an inverse function exists in some neighborhood of x, and local observability is proven.
Note that structural identifiability can be studied as observability by considering the parameters as state variables with zero dynamics [126].
A second approach to global observability analysis is provided by the use of a canonical form [127,128]: ... f i (x 1 , ..., x i+1 , u) ... where A system is said globally observable if Many systems naturally appear in this form, or can be brought into this form by a change of variables.
Finally, an alternative approach to studying global observability that includes unknown inputs (some system inputs can be uncertain or unknown, such as some environmental conditions) is provided in [129], where a dynamic interpretation of state observability is developed.

Observability Analysis of a Plant-Growth Model
To illustrate these observability tests, let us consider a qualitative mathematical model representing the relationship between supplied water and plant growth [130]. This model relates the water in soil x 1 (t) with the water inside the plant for its growth x 2 (t) and the amount of dry matter x 3 (t). The state-space representation is given bẏ where it is assumed that the variables x 1 and x 3 can be measured. In this model, the water in soil is related to the biomass production.The soil is modeled as a reservoir of water, and the changes are accounted for in (27a). The first term on the RHS represents the water going out by drainage and evaporation, the second term represents the plant water absorption per unit of time, and the last term u(t) is the irrigation input. Next, the water dynamics inside the plant are described by (27b), where the first term on the RHS stands for the water absorbed by the plants, the second term represents the water removed by transpiration, and the last term relates the water used to produce biomass. Finally, the change in biomass is accounted for in Equation (27c), where the first term on the RHS represents the growth function, and the last term represents the plant degradation. A complete description of the parameters and variables is given in Table 6. Observability can be locally assessed using the Jacobian of the observability map, considering a constant input u(t) = u: We can see that rank(O(x)) = 3, provided that either the third or fourth line of the Jacobian matrix is independent of the first two lines. This will be the case if x 1 and x 3 are not simultaneously equal to zero over some periods of time. This makes sense in an intuitive way, since this would correspond to the situation where the sensors would return zero signals (thus providing no useful information on the dynamical system).
The same conclusion can be reached using the canonical form, rearranging the model equations in the following way: which satisfies properties ( (25) and (26)).

Software Sensors in Agriculture
Bayesian estimation strategies have been particularly popular in the estimation of variables in crop-soil systems. These techniques can be classified as: (i) Kalman filtering, (ii) variational inference, and (iii) sampling algorithms [131]. Kalman filtering, e.g., the Kalman filter (KF) and the extended Kalman filter (EKF), is based on linear or linearized models and the assumption of Gaussian noise. The optimal estimator (or approximate estimator in the case of the EKF) is provided in the form of the mean and is complemented by the covariance of the estimation error, which is useful in assessing confidence intervals. In variational inference, the approximate posterior distribution of the state (given the measurements) can be obtained through the solution of an optimization problem. However, in crop models with nonlinear observation operators and non-Gaussian observations or non-Gaussian uncertainties, the assumption of normality in the posterior might be a poor choice. In these cases, approaches based on sampling, such as the unscented Kalman filter or particle filters, might be preferable. The first will be appropriate for nonlinear models with Gaussian observations, while the second is appropriate for nonlinear systems with non-Gaussian observations at the price of higher computational expenses. An overview of these estimation techniques and their applications to crop-soil models is given in Tables 7 and 8. The interested reader is also referred to the review papers in [8,11,132].
The Kalman filter is an optimal estimator for linear systems with Gaussian observations. It provides the expectation of the state and the covariance of the estimation error. It proceeds in two steps: prediction and correction. It exists in several forms: discrete time, continuous time and continuous (prediction)-discrete (correction) time.
NDVI [133] The extended Kalman filter is based on a model that is linearized along the state estimate trajectory. It requires the on-line computation of the Jacobian matrices. It is no longer an optimal estimator and may diverge. The EKF provides a practical and popular solution for nonlinear systems with Gaussian observations. SM [134] The unscented Kalman filter propagates a few sampling points, called sigma points, through the nonlinear model and computes the weighted sample's mean and covariance. The UKF does not require model linearization and provides more accurate and precise estimates in the case of nonlinear systems under Gaussian noise.
LAI, SM STICS [136] The ensemble Kalman filter is a Monte Carlo filter, which is suitable for systems with a large number of variables. It is well suited for systems described by partial differential equations and their discretization by finite difference or element techniques. It assumes Gaussian observations. Table 8. Summary of the main estimation techniques.

Method Main Equations Basic Description Estimated Variables Model References
PF In the particle filter, a large number of particles are propagated through the nonlinear model, and the posterior distribution is reconstructed. A PF is suitable for nonlinear systems with non-Gaussian observations. It is computationally expensive, but it is amenable to parallel computation.
AGB, LAI STICS [11] The variational filter aims at approximating the posterior distribution with a parametric density of an assumed form. The primary mechanism of the VF is to minimize the Kullback-Leiber divergence between the assumed posterior distribution and the hypothetically true posterior. This approximation approach lends itself to an optimization problem.
LAI, HUR1, HUR2 mini-STICS [136] Variational assimilation is the iterative minimization of a cost function J(x). This solution represents the a posteriori maximum likelihood estimate of the true state given the background (previous forecast) and observations. The use of adjoint operations (based on the chain rule for partial differentiation) allows the calculation of the gradient of the cost function.
4DVAR adds an extra time point in the cost function. 4DVAR is actually a direct generalization of 3DVAR for handling observations that are distributed in time.

HBM
p θ D , θ p , Y X ≈ Hierarchical Bayesian modeling is based on the theoretical foundation of conditional probability distribution. The problem is decomposed into layers. Each layer is connected by the conditional probability, and the solution of a complex joint probability problem is converted into the solution of a series of simpler problems.
NDVI [142] p(X 1 |Y1, θ D ) · · · p(X n |Yn, θ D ) p Y 1 θ p , Y 2 , ..., Y n · · · p Y n θ p p(θ D )p θ p The linear and extended Kalman filtering techniques often result in difficulties due to the nonlinearity of the models, and particularly the threshold functions that are often part of their expression. Moreover, the formulation of the uncertainty in the model itself due to parameter, structural, and input uncertainty is often difficult [143].
In [137], the authors used an ensemble Kalman filter (EnKF) in combination with the CERES-Wheat crop-growth model with remote sensing data in order to improve the estimation of LAI parameters. However, the authors did not discuss the setup of the filter or the convergence rate. In [140], the authors implemented an EnKF to estimate the root-zone soil moisture (RZSM) in a coupled SVAT-vegetation model. They compared the results with those of four variants of particle filter implementations. The results showed a small difference in the estimation capabilities of the filters, and the discussion was centered on the selection of the tuning parameters.
Sampling-based algorithms and, particularly, the particle filter (PF) or sequential Monte Carlo filter allow the estimation of information from sources that have non-Gaussian errors. In [144], a variant of the PF, which was called a convolution particle filter, was tested on two models, namely, STICS and LNAS, with several datasets.
The estimation of parameters and state variables can be carried out simultaneously. For instance, in [136], three parameters and three state variables (LAI and soil water content of two soil layers) of the mini-STICS model were estimated. Three techniques were used, namely, the EKF, PF, and VF. The improved performance of the VF method when compared to that of the EKF and PF was demonstrated. The variational filter had a low computational complexity, and the convergence speed of the state and parameter estimation could be adjusted independently. However, the tests of the observers were probably not fully conclusive, as the initial estimates did not display errors (and without initial errors, convergence is not challenged).

Illustrative Example: State Estimation with a Generic Crop Model
As an illustration of a state estimation problem, an EKF is designed for model (27) with the parameter values given in Table 6. For the sake of simplicity, the irrigation input u = 0.
Some underlying assumptions are considered: • The water absorption rate of the plant is less than or equal to the rate of loss of water from the pond, i.e., θ 4 ≤ θ 2 ; • The dry matter accumulation rate of the plant is approximately equal to the rate of decrease in the water inside the plant that goes toward photosynthesis, i.e., θ 8 ≈ θ 6 ; • The rate of water loss through transpiration is larger than the rate of water loss through evaporation, i.e., θ 5 > θ 1 ; • The degradation rate of the plant is larger than the rate of water loss through evaporation, i.e., θ 9 > θ 1 ; • The crop is wheat on a short life cycle of 140 days under no stress conditions. Therefore, water and nutrients are readily available and do not limit growth; • There is neither damage from pests and diseases nor any competition from weeds growing in the field; • x 1 and x 3 can be measured by existing technology, i.e., by RS based on images, and the measurements are collected daily.
The software sensor is in charge of reconstructing x 2 based on the model and the measurements of the other state variables. The EKF design is based on the selection of covariance matrices Q = 0.0001I and R = 0.001 0 0 0.01 , which are based on an evaluation of the model and measurement noises. In Figure 8, the performance of the observer is shown. Notice that the observer uses the first measurement values as initial conditions for the measured variables. On the other hand, the initial value ofx 2 is intentionally taken far away from the real trajectory to show the convergence speed of the observer (a test that is seldom illustrated in the literature on state estimation in agricultural applications).

Irrigation Control
In this section, we review some control techniques that can be used to improve and possibly optimize the irrigation of the field. The manipulated variable u(z, t) is the amount of water supplied in space and time, whereas the output variable is the biomass production. However, in most cases, developers do not consider biomass as an output variable. Instead, they consider a variable related to the soil water content, since models are mostly developed based on the soil behavior.
A preliminary step to the design of an automatic irrigation system is to verify if the state variables can be controlled using the manipulated variables. This issue is accounted for by a property called controllability, which represents the possibility of driving the states of the system from any initial value to a desired final value in a finite time. This property is useful for the control system designer; however, it was not considered in the applications reviewed.

Control Paradigms
Models for irrigation control can be classified into data-driven and mechanistic models [16]. Data-driven approaches are based on the collection of data from various sensors and the use of machine learning techniques. On the other hand, the mechanistic approach is based on the phenomenological models reviewed in the previous sections, which are possibly simplified to account for only the main phenomena (those that will have a significant effect in a closed-loop operation). Many control techniques can be applied, ranging from classical proportional integral derivative (PID) control to advanced model predictive control (MPC) [17].
A classical feedback control structure is presented in Figure 9. The error e(t k ), i.e., the difference between the reference r(t k ) and the measurement signal y(t k ), feeds the controller, which computes the control action u(t k ) sent to an actuator, e.g., a valve of a sprinkler. A PID controller [145] generates the following control law: where ∆t is the sampling time, K p is the proportional gain, T i is the integration time constant, and T d is the derivative time constant. This control law was implemented and tuned for irrigation using the heuristic methodology in [146], and was further elaborated in [147], where a dielectric tensiometer was used to regulate soil moisture in a strawberry crop.
Controller Crop-soil system y(t k ) + − Figure 9. Classical feedback control system. The signals u(t k ) and w(t k ) represent the vectors of manipulated and environmental inputs and y(t k ) represents the measured system output.
Model-based control systems make use of a hydrologic model, e.g., related to Equation (4), and some sensors of water in soil. These elements are coupled to an irrigation scheduler to determine the irrigation regime. The main problem is that the output variables (soil moisture, humidity, or evapotranspiration) are related to crop growth, but not to crop performance, which makes irrigation solutions based on the maximization of production suboptimal [148,149]. As an alternative, in [9], the authors described how hydrologic models can be combined with crop-growth models, such as APSIM (i.e., an SOM), to properly account for multiple effects of water on biomass production.
Model predictive control (MPC) is a real-time optimal control strategy based on the numerical optimization of a cost function J (or performance index) over a finite prediction horizon [149]: where x i is the ith controlled variable, r i is the ith reference variable, ∆u i is the ith manipulated variable move, and λ x i and λ u i are weighting coefficients, which allow the designer to assign a relative importance to the reference tracking and control energy. H p is the prediction horizon. The main components of MPC are shown in Figure 10.
In [150], the MPC is formulated based on a linear stochastic state-space model of the soil moisture that incorporates variables such as soil evapotranspiration and deep percolation. A similar work is presented in [148], where the model uses the soil moisture control and the climatic factors to determine the optimal amount of water required by the crop while keeping soil moisture under specific thresholds (avoiding water stress). MPC was also designed based on an SOM, i.e., AquaCrop as a crop emulator, in [151]. In this work, the authors described the design process and discussed the issues related to the implementation. MPC can be improved, as proposed in [152], where the authors presented a data-driven robust model predictive control (DDRMPC) approach for automatic control of irrigation. This control framework integrates a set of mechanistic models, which describe dynamics in soil moisture variations, and data-driven models, which characterize uncertainty in forecast errors of evapotranspiration and precipitation.
- Data-driven systems are usually simple and easy to implement. Their main advantage is that they do not require the development of mechanistic models of the crop dynamics [148], and they can be improved over time by optimal learning strategies [153]. In [154], the authors presented an example of a soil moisture control based on continuous measurements of soil water content. This system uses a soil water balance equation to calibrate the model, and then generates an irrigation schedule by combining previous knowledge that is updated by current measurements in an expert system framework. However, the main drawback of these approaches is that they are based on custom solutions for specific areas and crops, which must be constantly calibrated/adapted, and their extrapolation to new situations is not straightforward.
Fuzzy logic has been applied in various agricultural applications, such as irrigation and fertilization. Thanks to its flexibility, this approach allows a qualitative and quantitative interpretation of crop development, which can be defined using expert knowledge and field observations. A design of a fuzzy logic control (FLC) for irrigation was presented in [155]. The stability of the closed-loop system was analyzed using Lyapunov functions. Another application of fuzzy control was presented in [156], where a closed-loop control was integrated with remote sensing. In this work, the objective was to control the speed of the irrigation device (i.e., the central pivot) by sensing the soil moisture, soil temperature, and NDVI.
PID, MPC, and fuzzy logic controllers were compared in [157], where a soil water balance that included soil moisture, temperature, and evapotranspiration was considered. However, the authors only provided general comments on the applicability of such strategies, without design guidelines to support the comparison.

Illustrative Example: PID and Model Predictive Control
To illustrate an irrigation control application, let us consider the crop-soil model presented in [158]. The discrete-time equations can be summarized as follows: where the state variables are the water content available in soil x 1 , the cumulative temperature x 2 , the cumulative temperature until maturity to reach 50% radiation interception due to leaf senescence x 3 , and the biomass x 4 . The independent inputs are the precipitation w 1 , the reference evapotranspiration w 2 , the air temperature w 3 , solar radiation w 4 , and the management input, i.e., the amount of water irrigated u. The parameters K 1 , K 2 , K 3 , K 4 , K 5 and K 6 represent the maximum water content at field capacity, the root water uptake, the maximum daily reduction due to heat stress, the maximum daily reduction due to drought stress, the radiation-use efficiency, and the impact of the CO 2 growth rate on biomass, respectively. θ 1 is the drainage coefficient, θ 2 is the water-holding capacity, θ 3 is the maximum fraction of radiation interception, and θ 4 is the cumulative temperature requirement for leaf area development. h(·) represents an algebraic function computed by following the procedure described in [113].
For a wheat crop in loamy soil, a comparison of four situations aimed at assessing the biomass production is presented. First, the model is used under no irrigation policy to establish a minimum performance that is only bounded by environmental conditions. Second, the system is tested while following a fixed irrigation policy, which is a common practice in agriculture. Third, the model is used in a control loop, as shown in Figure 9, using a PID controller, and finally, the model is used in an MPC strategy. The design of the PID controller is achieved with the auto-tune algorithm in Matlab, whereas the MPC cost function targets a maximization of the final biomass in combination with the minimization of the amount of water required to achieve it, i.e., min u(t k ) where H p is the prediction horizon in days, t h is the harvest time, λ is a weighting factor between the biomass profitability and water expense, and W maxT = t h W max is the maximum amount of water available for the entire crop season. Notice that the proposed cost function considers the maximization of biomass over a limited period H p , whereas the minimization of the water usage is considered over the full crop season. The reason for the short span of maximization is that biomass production is highly affected by rainfall, while the availability of water for irrigation during the crop season is defined in advance. In addition, the weighting factor λ gives the designer the possibility to limit the water usage based on the price or availability of the resource. The results of the comparison are shown in Figure 11 for the water in soil x 1 and biomass x 4 . Without irrigation, the amount of water in the soil reaches a minimum, and biomass production reaches the lowest level for the season. For the case of fixed irrigation, a total of 340 mm of water is applied in the full season. This irrigation policy increases the production by 70% with respect to the no-irrigation case. However, when control actions are enforced, biomass production and water consumption increase compared to the fixed irrigation scenario. When PID control is active, the reference is to keep the water in the soil at field capacity, implying 405 mm of water, which is about 17% more water than the fixed irrigation. Nevertheless, this represents an increase in biomass production of 95% concerning the case of no irrigation. When the MPC is applied, the increase in the production is about 93% with respect to the no-irrigation case by using about 13 − % more water than in the case of the fixed irrigation policy. Hence, PID increases production by 2% more than MPC, but requires 6 − % more water. MPC can be adjusted to relax the constraint related to the water used to reach the same performance as that of the PID control regarding biomass production. The main advantage of MPC is the predictive capability, which can be combined with weather forecasts to reduce water usage to achieve production objectives.

Spatial Heterogeneity
The structure of cultivated fields across the world shows non-uniform characteristics. This can be attributed to the topography and soil properties that change spatially and over time. For a continuous modeling of specific soil properties (related to state variables), the formalism of partial differential equations (PDEs) provides an appropriate mathematical representation. In general, the variation of soil properties is considered only with respect to depth, and solvers of the corresponding PDEs have been developed (e.g., for seepage [159]). As an illustration of the above, let us consider the vertical water movement presented in [160], where the soil moisture is computed using the Richards equation, which is a nonlinear parabolic PDE that describes the movement of water in an unsaturated zone. The one-dimensional form of the Richards equation is given by: where θ is the volumetric soil moisture, t is time, z is the vertical coordinate (positive in the upward direction), k is the hydraulic conductivity, and h is the pressure head. A discrete-time formulation is used with regard to the standard availability of samples (i.e., daily) and the implementation of the models in dynamic simulation algorithms. Moreover, spatial discretization of the soil into layers or compartments is applied to the vertical water movement, yielding fully discrete (in time and space) simulators. In Section 2, we presented an example of such a model using soil layers, i.e., AquaCrop. A general assumption is a flat land topography with no significant side movements of water. An alternative to the PDE formalism and the associated spatial discretization is the agent-based approach proposed in [161]. In this work, the layers were treated as agents that interacted with their upper and lower neighbors, exchanging water downward through infiltration or gravity and upward through evaporation or capillarity. The monitoring of vertical water modeling is linked to the availability of sensors, especially for direct measurements. This has limited the applicability of the models.
Partial differential equations have apparently not yet been used by the crop modeling community to represent movements of water to the side, which is probably due to the multidimensional nature of the problem, the more advanced discretization into finite elements or volumes, and the distributed instrumentation required in order to develop monitoring or control.
Instead, other approaches are emerging, such as the stochastic distribution approach [162] and agent-based modeling [123]. In [162], the authors discussed modeling and monitoring complex systems that vary in both space and time given a limited number of agents or sensors providing measurements that are spread out across a large area. They described the land using Gaussian process modeling and addressed data incorporation with a machine learning technique named Kernel Observers. In [123], the idea was to represent the changes across the surface thanks to a discretization of the land into small patches that were linked by the side exchange of water due to the saturation of its retention capacity and the runoff. This latter representation is well suited for crops located in mountain slopes or rugged terrains.

Conclusions and Perspectives
A vast body of knowledge about dynamic modeling of crop-soil systems exists, which has resulted in various mathematical models and numerical simulators that allow the understanding and prediction of biomass production as a function of environmental factors and human actions, such as fertilization and irrigation. This modeling approach can be complemented by various concepts of system and control theory to develop efficient monitoring, control, and optimization tools. In this review article, we attempted to draw an overview of these concepts, including model parameter identification, validation, uncertainty analysis, observability analysis, nonlinear state estimation, model-based control, expert systems, and real-time optimization.
Moreover, predictive models can nowadays exploit various sources of information provided by ground sensors, aerial measurements, and satellite imaging, resulting in a bigdata perspective and the incorporation of machine learning techniques into the so-called smart farming trend. An important aspect to consider is stochasticity in environmental inputs, which is not a widespread practice in current studies.
There are many appealing directions for future research, including the design of state estimation techniques that exploit sensor networks and take account of uncertainty in the model description and environmental inputs, the development of agent-based models describing soil heterogeneity and incorporating distributed instrumentation and actuation in a possibly asynchronous way, and the design of distributed model predictive control to take full advantage of agent cooperation, real-time optimization, and embedded system technology.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.