Challenges and Opportunities on Nonlinear State Estimation of Chemical and Biochemical Processes

This paper provides an overview of nonlinear state estimation techniques along with a discussion on the challenges and opportunities for future work in the field. Emphasis is given on Bayesian methods such as moving horizon estimation (MHE) and extended Kalman filter (EKF). A discussion on Bayesian, deterministic, and hybrid methods is provided and examples of each of these methods are listed. An approach for nonlinear state estimation design is included to guide the selection of the nonlinear estimator by the user/practitioner. Some of the current challenges in the field are discussed involving covariance estimation, uncertainty quantification, time-scale multiplicity, bioprocess monitoring, and online implementation. A case study in which MHE and EKF are applied to a batch reactor system is addressed to highlight the challenges of these technologies in terms of performance and computational time. This case study is followed by some possible opportunities for state estimation in the future including the incorporation of more efficient optimization techniques and development of heuristics to streamline the further adoption of MHE.


Introduction
Modern chemical and biochemical plants must satisfy technical, economic, and environmental requirements that have become increasingly challenging. Stronger economic competition and sustainable development goals have imposed stringent environmental and safety regulations, as well as the need for more profitable plant operation and tighter product quality specifications. In this scenario, process monitoring, estimation, and control have gained increased attention, driving the development of new technologies to operate modern plants safely and profitably.
Process monitoring plays a key role among the technologies related to process control. Measuring instruments and mathematical tools are generally employed to monitor process variables, providing online state/output information for feedback control systems and for detecting abnormal process operations. When online measurements of state variables are noisy or not available in practice, mathematical tools known as state estimators can be used to reconcile noisy data and estimate missing variables [1,2].

State of the Art
State estimation refers to a set of mathematical techniques designed to reconstruct the state information of a system. In particular, unmeasured state variables are estimated based on available model output predictions and noisy measurements. The main challenge in the state estimation task is to provide reliable online estimation under process and measurement uncertainties while being computationally tractable. Process uncertainties usually come from plant-model mismatch and disturbances, whereas measurement noises are mainly related to the instrument's precision. Figure 1 presents a schematic diagram of state estimation that consists of two main stages: model prediction (i.e., a priori state estimate) and measurement update (i.e., a posteriori estimate, based on current and past measured variables). Past estimates may also be used in recursive estimation algorithms. state estimators in general, from the problem definition to the system analysis and performance evaluation, with a focus on Bayesian estimators. Such an approach provides guidelines to practitioners when designing or improving control and monitoring systems. Challenges and opportunities are also discussed on covariance estimation, uncertainty quantification, time-scale multiplicity, bioprocess monitoring, online implementation, and the incorporation of computationally efficient optimization solvers associated with nonlinear state estimators. The rest of the paper is structured as follows. First, a brief historical perspective on the main contributions to the estimation theory is presented, as well as a general and practical approach to design and test state estimators. Then, some important challenges on nonlinear state estimation are discussed. This discussion is followed by a section containing a nonlinear batch reactor case study for illustrating extended Kalman filter (EKF) and moving horizon estimation (MHE) implementations. Based upon the authors' perspective, opportunities and trends in the development of nonlinear state estimation are finally outlined, followed by the conclusions.

State of the Art
State estimation refers to a set of mathematical techniques designed to reconstruct the state information of a system. In particular, unmeasured state variables are estimated based on available model output predictions and noisy measurements. The main challenge in the state estimation task is to provide reliable online estimation under process and measurement uncertainties while being computationally tractable. Process uncertainties usually come from plant-model mismatch and disturbances, whereas measurement noises are mainly related to the instrument's precision. Figure 1 presents a schematic diagram of state estimation that consists of two main stages: model prediction (i.e., a priori state estimate) and measurement update (i.e., a posteriori estimate, based on current and past measured variables). Past estimates may also be used in recursive estimation algorithms. Historically, Thomas Bayes was the first main contributor to the state estimation field by proposing the Bayesian rule in the eighteenth century, which is the basis for statistical inference methods [21]. Some decades later, Carl Friedrich Gauss developed the technique of least-squares estimation applied to the inference of parameters related to the orbit of planets [22]. Another significant contribution to the estimation theory occurred in 1912, when Fisher introduced the  Bayesian methods assume that the process states and measurements are subject to random disturbances. This is the most popular class of state estimators used for chemical processes. The estimation procedure is essentially based on finding the conditional probability distribution of the state and output variables, described mathematically by Bayes' rule and characterized by parameters such as mean and covariance [17]. The Bayesian approach may be divided into two subgroups: recursive and optimization-based methods. Recursive methods consist of single-step estimators that are typically unconstrained and fast. However, they may not be robust to the presence of data outliers, unmodeled disturbances, and model errors. Extended Kalman filter (EKF), unscented Kalman filter (UKF), and particle filter (PF) are some of the most popular recursive estimation algorithms [3,27,32,[37][38][39][40]. Optimization-based approaches may use a sliding window of measurements that provides improved robustness. These estimators can also incorporate hard state constraints, such as nonnegative concentrations and pressures, into the problem formulation. Moving horizon estimator (MHE) is the main representative of the optimization-based methods. This category of estimation techniques will produce the same results as the recursive methods when considering an unconstrained, linear, infinite horizon length optimization problem, as the leastsquares optimization problem solution is identical to the solution from maximizing the conditional density function for Gaussian noises [3,9,31,41,42].
Deterministic methods, by contrast, are based on the assumption that both measurement and state variables are deterministic, i.e., system outputs can be considered to be fully determined by the operating conditions (initial conditions and inputs) and the parameter values in the process models. The deterministic approaches can be further divided into two subgroups: first-order systems (FOS)based and artificial intelligence (AI)-based methods. FOS-based estimators are designed for systems whose dynamics are represented by linearized differential equations of first-order nature. Extended Luenberger and asymptotic observers are the main state estimators under this subgroup. The extended Luenberger observer (ELO) is characterized by the addition of a correction term, that is proportional to the error between the measured and the estimated states, to the differential equation model. The proportional gain is chosen in such a way that the error dynamics is stable at the zeroobservation error case [43]. This estimator has been successfully used for systems described by accurate process models (i.e., considering perfect knowledge of the system parameters) [1,3,44,45]. Deterministic methods, by contrast, are based on the assumption that both measurement and state variables are deterministic, i.e., system outputs can be considered to be fully determined by the operating conditions (initial conditions and inputs) and the parameter values in the process models. The deterministic approaches can be further divided into two subgroups: first-order systems (FOS)-based and artificial intelligence (AI)-based methods. FOS-based estimators are designed for systems whose dynamics are represented by linearized differential equations of first-order nature. Extended Luenberger and asymptotic observers are the main state estimators under this subgroup. The extended Luenberger observer (ELO) is characterized by the addition of a correction term, that is proportional to the error between the measured and the estimated states, to the differential equation model. The proportional gain is chosen in such a way that the error dynamics is stable at the zero-observation error case [43]. This estimator has been successfully used for systems described by accurate process models (i.e., considering perfect knowledge of the system parameters) [1,3,44,45]. The asymptotic observer (AO), in turn, takes advantage of the dynamic model structure to rewrite the differential equation system, for example independently of the process kinetics [44]. Thus, this estimator is indicated for specific cases in which the process kinetics (e.g., reaction and heat transfer rates) are unknown [46][47][48]. However, the asymptotic observer is stable and convergent only for fed-batch and continuous operating systems (i.e., for non-zero dilution rates) [43]. AI-based estimators are an alternative to tackle complex and ill-defined problems using algorithms based on artificial intelligence, such as artificial neural networks (ANN) and fuzzy logic. These techniques, once trained with real data, are able to provide fast estimates based on information of the process inputs and measurements. These estimators have been increasingly employed for biochemical processes applications [49][50][51][52]. However, AI-based methods may demand additional time and efforts for formulation and fail when subjected to unmodeled conditions. Hybrid methods combine two or more estimation technologies, either as part of an integrated framework or to perform different tasks in the estimation procedure. Examples of the latter are the use of AI-based methods to describe the process dynamics (modeling task), combined with Bayesian algorithms to carry out the estimation task, taking into account the stochastic nature of the system [14,[53][54][55]. On the other hand, integrated frameworks considering, for instance, EKF/PF-MHE may use time-efficient technologies, such as EKF and PF, to obtain fast state estimates between MHE optimizations, which are robust to bad priors and unmodeled disturbances [13,17,36]. Therefore, the hybrid approach combines the best features of the estimation technologies and potentially overcomes the limitations of a single-based estimator design. Since the year 2000, hybrid state estimators have been increasingly developed for chemical and biochemical processes, which suggests the future trend towards the design of hybrid state estimators for nonlinear systems [1].

Approach for Nonlinear State Estimation Design
When designing a state estimator for a given nonlinear system, a list of tasks must be carried out. These tasks can be gathered in five major steps: (i) problem definition, (ii) process modeling, (iii) observability assessment, (iv) estimator formulation, and (v) performance evaluation. The proposed approach outlined in Figure 3 is suggested as a general guideline to assist researchers and practitioners in the design of nonlinear state estimators.
batch and continuous operating systems (i.e., for non-zero dilution rates) [43]. AI-based estimators are an alternative to tackle complex and ill-defined problems using algorithms based on artificial intelligence, such as artificial neural networks (ANN) and fuzzy logic. These techniques, once trained with real data, are able to provide fast estimates based on information of the process inputs and measurements. These estimators have been increasingly employed for biochemical processes applications [49][50][51][52]. However, AI-based methods may demand additional time and efforts for formulation and fail when subjected to unmodeled conditions. Hybrid methods combine two or more estimation technologies, either as part of an integrated framework or to perform different tasks in the estimation procedure. Examples of the latter are the use of AI-based methods to describe the process dynamics (modeling task), combined with Bayesian algorithms to carry out the estimation task, taking into account the stochastic nature of the system [14,[53][54][55]. On the other hand, integrated frameworks considering, for instance, EKF/PF-MHE may use time-efficient technologies, such as EKF and PF, to obtain fast state estimates between MHE optimizations, which are robust to bad priors and unmodeled disturbances [13,17,36]. Therefore, the hybrid approach combines the best features of the estimation technologies and potentially overcomes the limitations of a single-based estimator design. Since the year 2000, hybrid state estimators have been increasingly developed for chemical and biochemical processes, which suggests the future trend towards the design of hybrid state estimators for nonlinear systems [1].

Approach for Nonlinear State Estimation Design
When designing a state estimator for a given nonlinear system, a list of tasks must be carried out. These tasks can be gathered in five major steps: (i) problem definition, (ii) process modeling, (iii) observability assessment, (iv) estimator formulation, and (v) performance evaluation. The proposed approach outlined in Figure 3 is suggested as a general guideline to assist researchers and practitioners in the design of nonlinear state estimators. As the first step, the estimation problem must be defined. In particular, given a system, one should identify the relevant process inputs and outputs. In addition, it is recommended to separate the measurable states from those whose online measurements are not available or reliable in practice. Based on this step, the process states intended to be estimated are selected. It is worth mentioning As the first step, the estimation problem must be defined. In particular, given a system, one should identify the relevant process inputs and outputs. In addition, it is recommended to separate the measurable states from those whose online measurements are not available or reliable in practice. Based on this step, the process states intended to be estimated are selected. It is worth mentioning that model parameters, as well as process inputs, may also be estimated online by properly adapting the state estimators [56,57]. However, this article is focused on the estimation of process states. The next step concerns the development of the process model, i.e., the derivation of equations that describe the system dynamics (dynamic model), or simply its static behavior according to the operating conditions (steady-state model). Process modeling is based on the definition of the model structure (e.g., dynamic, static, order, etc.), followed by model fitting (e.g., parameter identification from experimental data) and validation (model checking based on training and new fresh data sets) [58]. In essence, process models may be fully derived from first principles (e.g., energy and mass balances), defined as white box models, purely data-driven (black box models), or a combination of both (gray box models) [50]. For Bayesian estimators, the main focus in this article, the nonlinear model should be discrete and stochastic, in which the states x k can be represented by the deterministic model, F, integrated over a sampling time ∆t = t k+1 − t k , with added process noises, w k , to cope with potential plant-model mismatches as shown in Equation (1) [15]. The measured outputs, y k , can be represented as a function of the states, h, with added measurement noises, υ k , as in Equation (2). Process and measurement noises are generally assumed to be Gaussian with mean zero and covariance Q and R, respectively [15]. The developed models should relate the measured outputs to the unmeasured/estimated states, as a necessary (but not sufficient) condition for the system observability, which is assessed in the following step.
in which x ∈ R n , y ∈ R p , and u ∈ R m are the process states, measured outputs, and input variables, respectively. G k = G(x k ) is the disturbance model that maps the process noises to state variables [5].
The third stage consists of checking the system observability. A system is observable if every state x(t 0 ) can be determined from the information of the inputs u(t) and measurements y(t) over a finite time interval t ∈ [t 0 , t] [59]. While the mathematical formulation of nonlinear observability is well defined in the literature [9], a full analysis in this case is often intractable. For practical applications, local observability may be adopted, and a necessary condition for local observability of a discrete-time nonlinear system at time t k is that the np × n observability matrix O in Equation (3) is of rank n [60]. As the observability of the dynamic system is dependent on its structure, a directed graph can be constructed to represent the dynamic system, and the structural observability of the system can be checked based on the association between the measured system outputs and the state variables using graph theory [61]. For a large-scale system, when the observability matrix suffers from ill-conditioning, structural observability can be evaluated using a bond-graph approach, in which the linearized system is transformed into a structurally equivalent system with Boolean coefficients [62]. The sensitivity matrices A k , B k , and C k are derived from the linearization of the process and measurement models at the corresponding time t k , according to Equations (4)- (6). If the necessary condition for local observability is not satisfied, the system is said to be unobservable and a revision of the process and/or measurement models as well as state estimation structure may be needed.
The fourth step aims at the state estimator formulation. The first task here is to select the estimation technology to be employed. A useful guideline is proposed in reference [1] to help designers in the selection of appropriate state estimators according to the knowledge on the system dynamics and parameters. Computational cost, stochastic nature of the system, and the quality of initial guesses for the states and covariances may also influence the choice of the estimation technology. After selecting the estimator, its implementation usually requires the adjustment of the tuning parameters, which impacts the estimation performance. For Bayesian estimators, the process and measurement noise covariance matrices (Q and R, respectively) are the tuning parameters. These matrices can be adjusted manually or automatically by employing advanced algorithms, such as the autocovariance least-squares (ALS) technique [4][5][6]15].
In the final stage, the estimation performance can be evaluated through simulations followed by the online implementation [3,5,6]. For simulation testing, the process can be simulated using stochastic models, with the addition of process and measurement noises (as in Equations (1) and (2)) [15]. Scenarios of poor initial guesses, high measurement noise, plant-model mismatch, and unmeasured disturbances may be simulated [13]. The performance of the state estimators can be assessed and compared with respect to the estimation accuracy and computational cost, quantified by the mean squared error (MSE) in Equation (7) and the average computation time taken for each iteration (CPU-time), respectively [3,13,39,63]. In the case of unsatisfactory results, one can readjust the tuning parameters or try another estimation technology.
in which x k andx k denote the actual and estimated states, respectively, at time t k , with k = 1, 2, . . . , N.
In practice, the actual state values are often unknown and thus may be substituted in this formulation by the measured variables or the process model simulation states.

Covariance Estimation
Many of the state estimation techniques used today are reliant on accurate estimates of both the process noise covariances and measurement noise covariances. As far back as the 1960s, years after the introduction of Kalman filtering, researchers emphasized that imperfect approximations can lead to significantly higher peak variances, slower convergence, and overall suboptimal estimation performance [64,65]. Despite these issues, it is commonplace in industrial applications to simply provide arbitrary covariance values, as it is often challenging to quantify model and measurement uncertainty in real processes [4].
Although there is no de facto way of deriving process noise and system noise covariances, there is an extensive amount of literature that attempts to address this challenge. Often, each of these methods requires different assumptions about the linearity (or lack thereof) of the plant model, the number of system noises, observability conditions, etc.
Modern covariance estimation techniques can be divided into four categories: correlation techniques, maximum likelihood methods, covariance matching methods, and Bayesian methods [66,67]. Of these four methods, both Bayesian and maximum likelihood methods have fallen out of favor for online use due to their large computational time [67].
The correlation-based methods include modern estimation techniques such as autocovariance least-squares (ALS). ALS has gone through several revisions and has been extended to nonlinear and time-varying models from its initial debut where only linear models were considered [15,67,68]. This flexibility has allowed ALS to be applied to both high dimensional nonlinear processes and batch processes with excellent successes, as demonstrated in the literature [4][5][6].
One of the more popular covariance matching methods used today for both linear time-invariant and linear time-variant systems is the noise covariance matrices estimation with gaussianity assessment (NEGA) method [66,69]. Based on the reported references, this method was used for linear systems and has the unique feature of performing a statistical test to examine the Gaussian property of the noises [66,69]. This is important as many estimation methods, such as the EKF and MHE, require Gaussian noise distributions.
A more in-depth analysis and overview of covariance estimation was published in 2017 [70]. That publication emphasized correlation-based methods but also introduced other techniques, and can serve as a basis of other survey papers that focus on those techniques. Overall, there are an extensive number of covariance estimation techniques that can be used to generate accurate and reliable measurement and system covariances for both linear and nonlinear processes with the main challenges related to the method selection and potential complexity for implementation.

Uncertainty Quantification
Uncertainty quantification remains an important challenge in nonlinear state estimation, as a parametrically distributed disturbance (such as a Gaussian noise) can result in a non-standard output measurement. Uncertainty quantification in state estimation can be categorized into two major types: forward uncertainty propagation and inverse uncertainty quantification. While these two classes of problems are correlated, the main objective of the first class is designing a model that accurately represents the real dynamic stochastic process, and the second class focuses on finding the state variables and the parameters of the system's model from available measurements. Typically, the models are formulated in a form of a Hidden Markov Model (HMM) as a prerequisite of a recursive estimation algorithm. HMM assumes that the current uncertainty is only affected by the immediately previous time's uncertainty, which allows a faster update of the current time distribution. HMM contains two stochastic processes, in which one cannot be measured directly ("hidden") and the other is a conditional process of the first process's realization. The hidden stochastic process represents the state evolution of a discrete-time first-principles model or black-box model and possesses the Markov property to ensure that the current uncertainty only propagates to the following time. The second stochastic process serves as the model of the measurement with probabilistic errors. Alternatively, the effects of unaccounted measurements in a distant past are accounted for with an arrival cost function, such as for the case of the MHE [31]. Additionally, the models can be designed to ensure the conjugacy to maintain the similarity between the prior and the posterior distribution, and the posterior distribution can be quickly updated online. The benefit of a conjugate model can be recognized in a linear time-invariant system with a Kalman filter, where the calculation of the estimated means of the estimated covariance matrix gives the full information on the probability density function of the state variables.
Mathematical models become more flexible when some parameters are considered to be time-variant. However, this approach introduces a new source of uncertainty called parameter uncertainty, which is distinct from process disturbances and measurement errors, and its effects need to be considered in the formulation of the nonlinear state estimation problem. Some approaches of simultaneously estimating the parameters along with the states are discussed in the literature, and a common approach corresponds to augmenting the parameters as pseudo-state variables [71], as already mentioned in Section 3.1. Different variations in Kalman filters can be applied to this augmented system, such as EKF [72,73], UKF [74], ensemble KF [75], and Kitanidis KF [76]. Additionally, optimization-based dual estimation approaches can be constructed similarly to the MHE [77] or with a moving window likelihood maximization [78].
Polynomial chaos was first introduced by Wiener as another method of propagating uncertainty without the need for sampling the prior [79], and the generalized polynomial chaos (gPC) was developed for various continuous and discrete distributions using orthogonal polynomials [80]. Even though the calculation of the coefficients of gPC, which often requires solving an online optimization problem, in parameter and state estimation frameworks typically takes more time than a sequence of important sampling methods, the overall online computational time of gPC approaches is shorter due to their fast prediction [81]. Additionally, the coefficients of gPC are sufficient to calculate the moments of the state variables [82], so they can be used to calculate the measurement update and the prediction step of the EKF [83] and ensemble KF [84].
In practice, the nonlinearity of the dynamic process often results in some non-Gaussian distributed outputs, which can be approximated by a finite convex sum of Gaussian distributions based on the Wiener approximation theorem [20]. This type of uncertainty modeling can be defined as a Gaussian Mixture (GM) model, and it is typically used for multimodal stochastic processes. When the system Processes 2020, 8, 1462 9 of 27 dynamics are linear or marginally nonlinear, only the GM's weights of the initial states are updated when the measurement data arrive using a Bayesian recursion [85,86]. The new weights are calculated from the minimization of the integral square errors of the prediction and true probability density function [87] and using maximum a posteriori estimation [88]. For nonlinear continuous-time systems, the exact forecast of the state probability density function is given by the solution of the Fokker-Planck-Kolmogorov linear partial differential equation [89], and various numerically approximated solutions are available [87,[90][91][92]. For practical applications, a more efficient solution of the Fokker-Planck-Kolmogorov equations for a multiple input and output system still remains challenging to obtain. Alternatively, a conjugate structure of the disturbance model for nonlinear systems needs further research to be able to quickly update the uncertainty's probability density function calculation.

Time-Scale Multiplicity
Much of the literature published on estimation lacks emphasis on systems where the dynamics vary significantly between parts of the system. An example of this occurring is on highly integrated systems in which mass and energy balance dynamics may occur at vastly different rates. Traditionally, these systems are described using differential-algebraic equations (DAE). In such systems of equations, the differential equations characterize the slower dynamics, while the algebraic equations characterize the fast dynamics. The uniqueness of this type of system often requires the use of specialized estimation techniques.
To specifically address nonlinear DAE systems, a set of modifications to the original EKF were proposed by references [93,94]. The main limitation of this work was the inability to incorporate measurements of algebraic states into the recursive algorithm. This limitation was resolved in references [95,96], as both algebraic and differential states could be measured and used to update the state estimates. In this algorithm, once new estimates are obtained for both the algebraic and differential states, only the differential states are retained. The algebraic states are re-estimated by solving the algebraic equations using the original filter estimates.
Work has also been conducted to deal with DAEs by employing MHE-based approaches such as in reference [97], where the Tennessee Eastman process (a benchmark estimation problem) was solved online using a multiple shooting technique. In this direct multiple shooting method, the DAE is relaxed using a set of equality constraints embedded in the online MHE problem. The time-scale multiplicity is overcome by increasing the number of partitions of the time interval to have a more accurate representation of the process's time integration. However, too many partition points may increase the complexity of the parameter estimation problem, thus challenging the method tractability for online implementation.
Additional MHE methods for addressing multi-rate systems have been published [8], in which the covariance matrix is updated using nonlinear programming sensitivity. An alternative approach published in reference [7] considers the decomposition of a two-time scale system into two separate reduced-order systems using a distributed moving horizon state estimation (DMHE) method. An interesting challenge that remains to be solved is on how to define when a system transitions into a time-scale multiplicity case. For example, how different do the system dynamics need to be before a system is classified as a DAE? Does a system become a DAE when the dynamics vary by at least three orders of magnitude, four orders of magnitude, etc.? Such a challenge provides opportunities for further research in state estimation under time-scale multiplicity.

Bioprocess Monitoring
Bioprocesses are usually subject to a highly nonlinear and time-varying influence of process variables, including temperature, pressure, pH level, and concentrations of substrate, product, inducers, and inhibitors. Minor adjustments to these variables often lead to abrupt changes to the dynamics of the cells' growth and production rates, as well as the enzyme kinetics [98]. Thus, in order to ensure the high quality and reproducibility of bioprocesses, it is imperative to accurately monitor and control key process variables, as outlined by the Process Analytical Technology (PAT) initiative [99,100].
The design and implementation of reliable control systems is based on accurate, precise, and real-time measurements of selected variables. The real-time measuring of the variables such as pH level and the oxygen and carbon dioxide compositions in the bioreactor (gas and liquid phases) is well established and utilizes affordable sensors. However, the composition of further compounds may not be measured online due to the high price of the sensors (e.g., impedance, infrared, Raman, and mass spectrometers) or even the lack of suitable measurement devices. Even when these sensors are used, they are subject to significant noise, making it hard to directly infer the concentration of certain compounds. Examples of such variables are the concentrations of reactants and products of enzymatic reactions or entire metabolic pathways, as well as the physiological state of cells. As a consequence, those process variables are usually measured only a few times a day in industrial plants, or even in research laboratories, by manual sampling and time-consuming analysis (i.e., at-line and off-line measurements) [101,102]. In this context, the use of nonlinear state estimators in conjunction with direct state measurements can enable the reliable bioprocess monitoring and control [103,104].
The development of state estimators for bioprocesses is a challenging task owing to the increased complexity of such systems, mainly associated with their highly nonlinear and time-varying dynamics. Consequently, the adequate description of the dynamics of the enzyme kinetics, or the cells' growth and production rates, usually involves nonlinear process models with a large number of parameters (that may be time-dependent or not) [105,106]. Moreover, single models are often valid only in a limited period of time or range of operating conditions [13,52,107,108]. A generic nonlinear and time-varying mathematical model to describe bioprocesses dynamics is illustrated in Equation (8).
in which θ j (t) ∈ R q are time-varying parameters. X j , U j , and T j are specific ranges of process states, input variables, and process time, respectively. Over the past few decades, several studies have proposed promising techniques to overcome the challenges related to the state estimation for biochemical processes. Some of the main strategies found in the literature are presented and discussed below.
The state estimation based on the linearization of the process models, such as in the EKF method, may provide poor or even diverging estimates for highly nonlinear systems. A promising strategy to tackle this limitation is the use of hybrid state estimators, as illustrated in Figure 4. For instance, a hybrid MHE and EKF has already been designed for a process to produce a recombinant protein by E. coli bacteria [13]. In this proposed framework, the estimators operate hierarchically, with the EKF providing fast estimates between less frequent MHE calculations, which demand higher computational cost. MHE estimates, in turn, correct potential poor estimates from the EKF. Another hybrid approach has been proposed using the ELO and AO techniques [109]. In this study, the authors used a confidence function δ, varying from 0 to 1, to weigh the state estimators differently. In particular, the better the bioprocess model accuracy is, the closer δ is to 1, approximating the hybrid estimator to the ELO algorithm. At the other end, when the process kinetics is unknown, δ is close to 0 and the state estimator corresponds to the AO.
When model parameters depend on time, the performance of the state estimators may be hampered due potential plant-model mismatches. In such cases, it is possible to carry out the joint estimation of both bioprocess states and model parameters, as represented in Figure 4. Through this approach, the mathematical model is updated over time, reducing the model mismatch and hence improving the state estimate's accuracy. This strategy has been proposed and tested for a variety of bioprocess case studies [106,110,111]. A simple approach to perform the joint estimation is to consider the parameters as extra states with no dynamics (i.e., constants), and define an augmented state vector x = [x θ] T , according to Equation (9) [44,112]. Using this alternative formulation, the parameter estimation task is restated as a state estimation problem, which can thus be addressed using nonlinear state estimators.
with x(t 0 ) = x 0 and θ(t 0 ) = θ 0 . When model parameters depend on time, the performance of the state estimators may be hampered due potential plant-model mismatches. In such cases, it is possible to carry out the joint estimation of both bioprocess states and model parameters, as represented in Figure 4. Through this approach, the mathematical model is updated over time, reducing the model mismatch and hence improving the state estimate's accuracy. This strategy has been proposed and tested for a variety of bioprocess case studies [106,110,111]. A simple approach to perform the joint estimation is to consider the parameters as extra states with no dynamics (i.e., constants), and define an augmented state vector = [ ] , according to Equation (9) [44,112]. Using this alternative formulation, the parameter estimation task is restated as a state estimation problem, which can thus be addressed using nonlinear state estimators.
Finally, there are systems whose dynamics may not be fully described by single models throughout the process or operating conditions, as represented by Equation (8). In these cases, multiple (simple) models can be used, either in an on-off switching policy or combining them using fuzzy logic, for example, to guarantee a smooth transition between process models [52], as outlined in Figure 4. Then, state estimators may be implemented based on multiple process models, switching the models abruptly or smoothly according to specific ranges of system states, input variables, and/or process time. The commutation (on-off switching) between three specific growth models, valid under different ranges of substrate concentrations, was applied in reference [107]. The authors used the Finally, there are systems whose dynamics may not be fully described by single models throughout the process or operating conditions, as represented by Equation (8). In these cases, multiple (simple) models can be used, either in an on-off switching policy or combining them using fuzzy logic, for example, to guarantee a smooth transition between process models [52], as outlined in Figure 4. Then, state estimators may be implemented based on multiple process models, switching the models abruptly or smoothly according to specific ranges of system states, input variables, and/or process time. The commutation (on-off switching) between three specific growth models, valid under different ranges of substrate concentrations, was applied in reference [107]. The authors used the MHE to estimate the cell and substrate concentrations, as well as the specific growth rate, for a generic cultivation process. In [108], on the other hand, trust factors β, ranging from 0 to 1, were implemented to value gradually the contribution of output predictions from three process models to the MHE objective function. These trust factors, in turn, depend on the past and current model accuracy (i.e., distance between measured and predicted outputs) at a time instant t k , during a bioprocess operation to produce poly-β-hydroxybutyrate (PHB).
Overall, the aforementioned approaches (for hybrid state estimation, joint state and parameter estimation, and estimation based on multiple process models) are promising to overcome most of the limitations regarding bioprocess applications. However, there is no general solution that will apply to all biochemical processes. In other words, each system will demand specific strategies to deal with its particular challenges, possibly apart from the techniques already existing in the literature. This fact reinforces the need for the constant development of novel nonlinear state estimation strategies in the field of bioprocess monitoring.

Online Implementation
As state estimation is critical for online process control, one crucial challenge of online nonlinear state estimation is the reduction in computational time for implementation. Reduced-order models are commonly formulated as an attempt to problem complexity reduction [113][114][115]. More efficient sampling methods were also developed specifically for state estimation using the properties of Markov model approaches as well as parallelization [116,117]. Moreover, more advanced numerical optimization methods were proposed to allow the solution of MHE to converge more efficiently by employing sensitive-based strategies [118,119].
In many industrial applications, some measurements instantly arrive from hardware sensors, while other measurements are accompanied by a random time delay due to the requirements of offline lab procedures or the soft sensor's computational time [120]. The combination of these measurement types often gives a more accurate state estimation for nonlinear processes, and data fusion techniques may be required for their integration [121]. Measurement fusion methods based on KF keep a time stamp on each fast measurement and update the estimations again after the delayed measurements are available [122,123]. Alternatively, the MHE can be reformulated offline according to different scenarios of delayed measurement arrivals, and an optimal procedure derived to select the appropriate formulation online to quickly update the state variables [124].
While traditional model predictive control (MPC) algorithms initialize with the state variables given by the state estimator, more advanced MPC algorithms also consider the uncertainty of the states to form more conservative constraints [125] or to guarantee a maximum chance of constraint violations [126,127]. However, more advanced methods for combining MPC and MHE are still lacking and thus could be a potentially fruitful area of research. In the next section, a case study is addressed to highlight some nonlinear state estimation challenges encountered during implementation.

Background
To highlight the performance and computational time challenges of nonlinear state estimation methods, a case study involving the implementation of EKF and MHE is performed considering the nonlinear case study from reference [128]. This case study consists of a batch reactor model considering the reaction mechanism shown below in Equations (10) and (11) and the corresponding system of differential equations in Equations (12)- (14).
in which k 1 , k −1 , and k 2 are the reaction kinetic parameters and C a , C b , and C c are the concentrations and state variables. For all case studies performed, k 1 and k −1 are fixed to a value of 1 with appropriate units, while k 2 can change to either a value of 0.8, 8, or 80 with appropriate units. These k 2 values are assumed to be the real k 2 values and are used to generate the simulated data/measurements associated with the plant model. As component C does not influence the estimation implementations below, its concentration has not been included in the state-space model shown in Equation (15). The system of differential equations above is integrated between 0 and 6 time units with a data point collected every 0.05 time units using the MATLAB ode15s function. As the value of k 2 increases, the system becomes increasingly stiff, mandating the use of a stiff ODE solver. To generate noisy measurement data, the MATLAB function randn is used to apply a 5% normally distributed noise to component B concentrations considered as the only measurements used in the case study. For this case study, initial concentrations of components A and B are set to a value of 1 and 0 concentration units, respectively. The concentration profiles for the different k 2 values are shown in Figure 5.  (15). The system of differential equations above is integrated between 0 and 6 time units with a data point collected every 0.05 time units using the MATLAB ode15s function. As the value of k2 increases, the system becomes increasingly stiff, mandating the use of a stiff ODE solver. To generate noisy measurement data, the MATLAB function randn is used to apply a 5% normally distributed noise to component B concentrations considered as the only measurements used in the case study. For this case study, initial concentrations of components A and B are set to a value of 1 and 0 concentration units, respectively. The concentration profiles for the different k2 values are shown in Figure 5. To increase the complexity of the estimation problem, the real k2 values are not used in the estimation model. Instead, a new set of k2 values known as the ideal k2 values (1, 10, and 100) are used to induce a plant-model mismatch. This mismatch is common in industrial settings where unmodeled disturbances such as catalyst deactivation or coking can occur over time and cause process models to deviate from the behavior happening in the real process. Furthermore, as the real state values of components A and B are known in this case study, the MSE calculation uses these values to characterize estimator performance.

EKF Results
Using the simulated measurements and ideal k2 values, an EKF is applied to the reactor system employing a testing value of 0.001 for the diagonal elements of the process noise matrix (Q) of size 2 × 2 and a testing value of 0.0035 for the measurement noise matrix (R) of size 1 × 1. These values were chosen as most of the noise in the system is derived from the measurements, hence R is greater than Q. Additionally, approximations of the noise were used, as it mirrors how estimation is often conducted in industrial processes. It is also assumed that an accurate initial estimate of the To increase the complexity of the estimation problem, the real k 2 values are not used in the estimation model. Instead, a new set of k 2 values known as the ideal k 2 values (1, 10, and 100) are used to induce a plant-model mismatch. This mismatch is common in industrial settings where unmodeled disturbances such as catalyst deactivation or coking can occur over time and cause process models to deviate from the behavior happening in the real process. Furthermore, as the real state values of components A and B are known in this case study, the MSE calculation uses these values to characterize estimator performance.

EKF Results
Using the simulated measurements and ideal k 2 values, an EKF is applied to the reactor system employing a testing value of 0.001 for the diagonal elements of the process noise matrix (Q) of size 2 × 2 and a testing value of 0.0035 for the measurement noise matrix (R) of size 1 × 1. These values were chosen as most of the noise in the system is derived from the measurements, hence R is greater than Q. Additionally, approximations of the noise were used, as it mirrors how estimation is often conducted in industrial processes. It is also assumed that an accurate initial estimate of the concentration of state B is unavailable, providing an additional challenge to the implemented state estimator. Thus, the initial concentrations of component A and component B are assumed to be 1 and 0.5 concentration units, respectively. Additionally, the initial value of the covariance matrix (P -(0)) for the EKF is specified as a diagonal 2 × 2 matrix with diagonal elements of values of 0.001. Equations (15) and (16) display the symbolic linearized statespace model for this system. Figure 6 shows the EKF implementation results, considering each ideal k 2 value.
The estimation results shown in Figure 6 are those taken after the correction step of the EKF. As a result, at time 0, the EKF generates an estimated concentration of B of approximately 0.38, which is between the poor initial guess of 0.5 and the measurement of 0. Figure 6 shows that when k 2 has a value of 1, there is a large offset between the EKF estimates and the real plant values for the concentration of component A. This offset is the result of linearizing the nonlinear reactor model at the current iteration of the EKF and conducting the forecasting step using the linearized model. As the value of k 2 increases, this offset becomes minimal due to a decrease in the linearization error of component A as reflected in the MSE values in Table 1. This decrease in the linearization error helps to reduce the offset between the nonlinear reactor model and the EKF model and ultimately lowers the MSE values. In terms of the estimation performance of component B, the EKF generates estimates that are mostly consistent with the measurement data and thus it does not appear to be impacted by this linearization error. The most significant offset for component B occurs during the first few estimation steps where only a poor initial value of the state is available. Although component A is not measured directly, the real plant data shown for this state serve as a way of gauging the performance of the EKF. To improve the EKF performance, the algorithm was modified to use ode15s to forecast the state estimate employing the original nonlinear differential equation model, while the linearized model was employed to update the covariance uncertainty. Figure 7 shows the results after making these modifications to the EKF forecasting step. Additionally, the initial value of the covariance matrix (P -(0)) for the EKF is specified as a diagonal 2 × 2 matrix with diagonal elements of values of 0.001. Equations (15) and (16) display the symbolic linearized statespace model for this system. Figure 6 shows the EKF implementation results, considering each ideal k2 value. The estimation results shown in Figure 6 are those taken after the correction step of the EKF. As a result, at time 0, the EKF generates an estimated concentration of B of approximately 0.38, which is between the poor initial guess of 0.5 and the measurement of 0. Figure 6 shows that when k2 has a value of 1, there is a large offset between the EKF estimates and the real plant values for the     Additionally, even the marginal offset present in the other k2 values is reduced when using nonlinear forecasting. In terms of the estimation of component B, the results remain largely unchanged as there is still offset during the first few estimation steps due to the poor initial guess. To quantify the improvement of the estimation performance, the MSE defined in Equation (7) is used to generate the values in Table 1 in terms of component A and component B using the true state value.
The values in Table 1 support the visual observations about the estimation performance. When nonlinear forecasting is applied in place of linear forecasting, there is a consistent decrease in the MSE for each of the k2 values shown. The most significant decrease is when there is a more than one order of magnitude decrease in the MSE when k2 has a value of 1. This result makes sense as nonlinear forecasting eliminates the linearization error associated with the EKF forecasting step. Although the EKF results improve when using this technique, it carries an increased computational time as discussed below in Section 4.4. A similar examination of linear vs. nonlinear forecasting was applied for the MHE implementation as well which is discussed in the next subsection.

MHE Results
When running the MHE implementations for this case study, the same covariances from the EKF and initial concentration estimates are applied. Additionally, as the MHE corresponds to a constrained optimization problem, both lower and upper bounds on the state variables are introduced. The lower bounds are specified such that all states have a concentration greater than or equal to 0. The upper bounds are specified to limit the concentrations to 2 arbitrary units. Initially, MATLAB's fmincon solver using sqp was employed to solve the optimization problem with a horizon  In terms of the estimation of component B, the results remain largely unchanged as there is still offset during the first few estimation steps due to the poor initial guess. To quantify the improvement of the estimation performance, the MSE defined in Equation (7) is used to generate the values in Table 1 in terms of component A and component B using the true state value.
The values in Table 1 support the visual observations about the estimation performance. When nonlinear forecasting is applied in place of linear forecasting, there is a consistent decrease in the MSE for each of the k 2 values shown. The most significant decrease is when there is a more than one order of magnitude decrease in the MSE when k 2 has a value of 1. This result makes sense as nonlinear forecasting eliminates the linearization error associated with the EKF forecasting step. Although the EKF results improve when using this technique, it carries an increased computational time as discussed below in Section 4.4. A similar examination of linear vs. nonlinear forecasting was applied for the MHE implementation as well which is discussed in the next subsection.

MHE Results
When running the MHE implementations for this case study, the same covariances from the EKF and initial concentration estimates are applied. Additionally, as the MHE corresponds to a constrained optimization problem, both lower and upper bounds on the state variables are introduced. The lower bounds are specified such that all states have a concentration greater than or equal to 0. The upper bounds are specified to limit the concentrations to 2 arbitrary units. Initially, MATLAB's fmincon solver using sqp was employed to solve the optimization problem with a horizon length (N) of 30 considering linear forecasting of the states where the objective function is the minimization of the MSE. For this specific case study, the arrival cost was not considered in the MHE formulation. The results for this case are shown in Figure 8.  The estimation results shown in Figure 8 are the final converged values of the MHE at the corresponding time instant. As a result, it can be seen that the MHE is able to overcome the poor initial concentration of component B by generally following the trajectory of the measurements to minimize the MSE. When using a horizon length of 30 and the fmincon subroutine, the estimation results of the MHE with linear forecasting are poor, as the offset between the model and estimation results for component A is significant for all k2 values. Beyond this offset, the estimated concentration of A experiences unexpected behavior as seen with the perturbations such as the one around 2 time units when k2 has a value of 1. As the value of k2 increases, the estimation performance appears to improve as the estimation results begin to converge to the real plant model with little offset. To attempt to improve the quality of these estimation results, nonlinear forecasting of the states in conjunction with different optimization solvers were explored. In particular, in addition to fmincon, the modified sequential quadratic programming (modSQP) algorithm from reference [129] was used to increase computational efficiency. The MHE results using modSQP and nonlinear forecasting are shown in Figure 9. The estimation results shown in Figure 8 are the final converged values of the MHE at the corresponding time instant. As a result, it can be seen that the MHE is able to overcome the poor initial concentration of component B by generally following the trajectory of the measurements to minimize the MSE. When using a horizon length of 30 and the fmincon subroutine, the estimation results of the MHE with linear forecasting are poor, as the offset between the model and estimation results for component A is significant for all k 2 values. Beyond this offset, the estimated concentration of A experiences unexpected behavior as seen with the perturbations such as the one around 2 time units when k 2 has a value of 1. As the value of k 2 increases, the estimation performance appears to improve as the estimation results begin to converge to the real plant model with little offset. To attempt to improve the quality of these estimation results, nonlinear forecasting of the states in conjunction with different optimization solvers were explored. In particular, in addition to fmincon, the modified sequential quadratic programming (modSQP) algorithm from reference [129] was used to increase computational efficiency. The MHE results using modSQP and nonlinear forecasting are shown in Figure 9. When using the modSQP optimization algorithm with an N of 30, more robust estimation results are able to be achieved in terms of several metrics. First, the offset of component A appears to be less pronounced as it is consistently decreasing at a smooth rate. Additionally, the estimation of component B is still acceptable, and the poor initial concentration estimate issue is overcome. Although the estimation performance of B may appear worse when compared to the linear case when k2 has a value of one, the MSE values in Table 2 do not support this conclusion. Even though the decrease in MSE is less drastic than for the EKF implementation, switching to nonlinear forecasting presented smoother estimation curves with less MSE for all cases. To ensure that a horizon length of 30 was sufficiently large, a horizon length of 40 was applied to the nonlinear forecasting case. Ultimately, this had no effect as the estimation performance remained similar and the MSE values remained constant.

Computational Performance Analysis
One of the most important aspects of estimation that must be considered for online use is the balance between computational time and performance. Sections 4.2 and 4.3 show that nonlinear forecasting generally produces more accurate estimation results, but do not highlight the increased computational time that occurred when changing the estimator formulation. For systems where rapid estimation is required, high computational times make the estimation become unsuitable and may require using less robust forms of estimation to maintain computational tractability. To compare the computational times of the case studies performed above, the stopwatch timer in MATLAB is used by calling the tic and toc functions. For reference, the computer used for all simulations is a Predator Helios 300 laptop equipped with a quad-core hyperthreaded i7-7700 HQ processor with a base clock When using the modSQP optimization algorithm with an N of 30, more robust estimation results are able to be achieved in terms of several metrics. First, the offset of component A appears to be less pronounced as it is consistently decreasing at a smooth rate. Additionally, the estimation of component B is still acceptable, and the poor initial concentration estimate issue is overcome. Although the estimation performance of B may appear worse when compared to the linear case when k 2 has a value of one, the MSE values in Table 2 do not support this conclusion. Even though the decrease in MSE is less drastic than for the EKF implementation, switching to nonlinear forecasting presented smoother estimation curves with less MSE for all cases. To ensure that a horizon length of 30 was sufficiently large, a horizon length of 40 was applied to the nonlinear forecasting case. Ultimately, this had no effect as the estimation performance remained similar and the MSE values remained constant.

Computational Performance Analysis
One of the most important aspects of estimation that must be considered for online use is the balance between computational time and performance. Sections 4.2 and 4.3 show that nonlinear forecasting generally produces more accurate estimation results, but do not highlight the increased computational time that occurred when changing the estimator formulation. For systems where rapid estimation is required, high computational times make the estimation become unsuitable and may require using less robust forms of estimation to maintain computational tractability. To compare the computational times of the case studies performed above, the stopwatch timer in MATLAB is used by calling the tic and toc functions. For reference, the computer used for all simulations is a Predator Helios 300 laptop equipped with a quad-core hyperthreaded i7-7700 HQ processor with a base clock of 2.81 GHz, running MATLAB R2020a. Table 3 depicts a comparison of the computational times of the two different EKF implementations tested. The linear forecasting of Table 3 refers to the results in Figure 6 and the nonlinear forecasting refers to the results in Figure 7. The linear forecasting version of the EKF is an order of magnitude faster on average than the nonlinear forecasting version due to the absence of using ode15s to forecast the state estimate. Despite this order of magnitude difference, nonlinear forecasting can still be used by real processes, as all 120 estimation points were able to be solved in under 0.1 s while still providing excellent estimation results. On average, this method solves a single iteration of the EKF in under 0.001 s. Table 4 depicts the various MHE computational times for comparison against the EKF.  Table 4, it is immediately evident that the MHE implementation is much slower than the EKF as the computational times are often three or more orders of magnitude greater than the EKF. Intuitively, this is reasonable, as a large optimization problem must be solved to generate the estimation results, unlike the EKF, which directly calculates the estimation forecast without optimization. To reduce the size of the optimization problem, the horizon length of the modSQP case was lowered to an N of 20 and yielded computational times approximately 57% less than the N of 30 case. However, there was an approximately 2% increase in the average MSE values when compared to the N = 30 case.
When attempting nonlinear forecasting using fmincon, the computational time increased by an order of magnitude when compared to the linear forecasting case as the complexity of the optimization problem increased in difficulty and the computational time per step was 46.2 s. This computational time value may make the fmincon-based MHE intractable depending on the sampling time rate. In contrast, the nonlinear forecasting-based modSQP MHE outperformed the estimation ability and computational time of the linear and nonlinear forecasting fmincon-based cases. Furthermore, for the k 2 = 1 case, the average computational time to complete the estimation of one point was under 3 s. This relatively low computational time significantly improves the tractability of the MHE and makes it possible to implement in many real processes.
Therefore, by conducting this relatively simple nonlinear case study on a batch reactor model, the benefits of EKFs and MHEs were shown. Both techniques fail to generate desirable estimation results for the concentration of component A when using the linearized model to forecast the state estimate, despite having the lowest computational times. As a result, nonlinear forecasting should be employed to improve performance despite the increased computational time. This is of particular concern when using MHEs, as the computational time can approach intractability but can be mitigated through more efficient optimization techniques such as modSQP with an appropriate horizon length. Furthermore, based upon this case study, the MHE was shown to be more tolerant of a poor initial guess of the states, as it outperformed the EKF during the first steps of the estimation algorithm, as seen with their respective MSE values.

Development of Fast Optimization Techniques
As the case study highlighted, any MHE implementation will be highly dependent on the underlying optimization algorithm used in the state estimation formulation. Even though this paper is not a review of state-of-the-art optimization techniques, some of the most promising techniques for use in state estimation must be discussed. From the case study, it was shown that modSQP may be among the list of optimization algorithms that can be used in the next generation of state estimators.
The modSQP algorithm from [129] builds upon the existing SQP framework by utilizing a backtracking line search framework with a relaxed group of conditions to reach faster convergence. When applied as part of a model predictive control (MPC) implementation on a supercritical coal-fired power plant cycling problem, there was approximately a 25% decrease in computational time when compared to classical SQP MPC solvers used as a benchmark [129]. This feature was also demonstrated in the case study above when comparing the fmincon algorithm with sqp and the modSQP algorithm.
In addition to modSQP, other algorithms that have been successful in nonlinear MPC applications, such as IPOPT, and can likely reduce the computational time of the MHE optimization problem as well [130]. IPOPT is of interest as it is an open-source solver that has already been implemented on a case study of a low-density polyethylene (LDPE) tubular reactor as a solver for MHE and nonlinear model predictive control (NMPC) problems with success [119]. More specifically, this implementation of IPOPT was combined with an advanced-step framework which uses the current estimate for process control-associated predictions of both the states and measurements [118,119]. Alternatively, optimization algorithms for MHE such as the ones in the CasADi framework can be used for solving large-scale optimization problems [131].
Furthermore, as the number of cores on a CPU increases, the benefits of incorporating parallelization into optimization also increases. Modern CPUs are now available in a variety of multicore configurations including 8, 16, 32, and even 64 cores. Concurrently, performance of consumer grade GPUs has also rapidly increased in a similar manner in which new product launches offer top of the line performance with relatively low cost for performance. This decrease in cost and increase in performance creates an interesting opportunity to incorporate algorithms that can easily be parallelized, such as particle swarm optimization (PSO) or ant colony optimization into estimation applications, as they bring many unique benefits.
Global optimization techniques such as PSO disperse agents into the search space for an optimization problem and converge on the most optimum agent. Often in this convergence process, there is a chance that the global optimum can potentially be found, but these techniques have a history of solving complex problems at the cost of being very computationally heavy [132]. To overcome these limitations, GPU-based versions of PSO have been used since 2009 to decrease the computational time expense by orders of magnitudes [132]. Theoretically, as computer performance increases with new product launches, this decrease will likely continue further. Additionally, it may be possible to combine optimizers such as PSO and SQP to improve the performance of optimization algorithms as seen in reference [133]. In this reference, PSO provides the initial guess of the SQP optimizer to improve the robustness of the gradient optimizer, while minimizing the computational time of running a full PSO algorithm. Although the specific application of this joint optimizer was dynamic real-time optimization (DRTO), similar hybrid optimizers could be formulated in conjunction with MHE to provide robust and fast results.

Heuristics for MHE Implementation
As the MHE is one of the state-of-the-art estimation techniques and is highly customizable, characterization of best practices or heuristics for MHE is sparse in the literature. For example, in many MHE publications, no guidelines are available for choosing horizon length, which may be crucial, as this directly affects the computational time for the implementation. The main rule of thumb reported corresponds to simply using a horizon length of twice the order of the system [31]. For more widespread adoption and implementation of MHE, a more comprehensive set of guidelines/heuristics should be available, as seen with chemical process design [134]. For example, in process design, there are often heuristics associated with economics, operation of process equipment, and safety considerations. Although these heuristics are not always directly applicable to all cases, in general they assist in a great number of cases.
If a set of MHE heuristics are developed, they could include topics such as optimization method selection, horizon length selection, objective function selection, etc. Characteristics such as model complexity, required computational time, available computational power, amount of system or measurement noise could guide the selection of the previously mentioned topics. It is likely that the hardest characteristic to define will be the model complexity. Tests could be developed to determine how well conditioned these models are, how nonlinear they are, or how computationally expensive they are to solve. Overall, the goal of these heuristics would be reducing the time to successfully implement an MHE. After successful implementation, case-specific tuning could be conducted to fine-tune performance.

Integration of Estimators with Commercial Software
As commercial chemical process simulators become more comprehensive through the incorporation of advanced control algorithms, dynamic optimization, and process monitoring tools, there is an excellent opportunity to develop state estimation tools in conjunction with such simulators. Although estimation may be available in some commercial simulators such as Aspen Dynamics, it is often limited to parameter estimation to improve dynamic models using a dataset. The features of Aspen DMC3 are limited to model conditioning, controller tuning, model adaptation, and Linear Programming tuning [135]. Previous controllers from AspenTech, such as the APC State-Space Controller, did incorporate a state estimation layer, but this was noted to come at the cost of having to develop a disturbance model and tune the filter. This additional task is often challenging to complete in industrial processes as process engineers often lack the formal control theory training that may be required [136].
Despite the limited industrial adoption of state estimation, Shell has historically incorporated these technologies in their products, such as SMOC and SMOCPro (Shell multivariable optimizing control) advanced controller. This technology employs disturbance models, model uncertainty, measurement noise, and state estimation via Kalman filtering [137,138]. More recently, Shell released a new generation of SMOC called PACE (Platform for Advanced Control and Estimation), which attempts to provide more flexibility in model structures [137].
The next generation of process simulators could follow Shell's approach and introduce estimation into software. Many of these companies often offer products for online monitoring of real processes or that can connect to existing monitoring systems through OPC (Object Linking and Embedding for Process Control) servers. OPC is already an industrial standard and has been adopted by around 20 industries including pharmaceuticals [139]. As a result, the infrastructure already exists to combine and share data between process simulators, plant sensors, estimation algorithms, and advanced control systems towards the concept of Industry 4.0.

Conclusions
Today, state estimation is a very broad field of study and can be applied to almost any process including those outside of the chemical and biochemical process industries. Due to the vast size of the state estimation field, the primary goal in this work was to provide a general overview of the field with focus on chemical and biochemical processes. As Bayesian estimation methods are the most commonly applied and studied techniques, they were the primary focus of this work, with some discussion on elements of deterministic and hybrid methods. Bayesian estimation methods include recursive techniques such as the EKF, UKF, and PF along with optimization-based methods such as MHE.
As part of the general overview of the field, special attention was paid to some of the current challenges on nonlinear state estimation in bioprocess monitoring, uncertainty quantification, online implementation, covariance estimation, and time-scale multiplicity. Bioprocess monitoring provides many challenges to be addressed due to the lack of suitable measuring devices, highly complex nonlinear models (often black or grey box in nature), and time-varying parameters that require innovative estimation techniques.
Following the discussion on the current challenges, a case study was presented where EKF and MHE were applied to a nonlinear batch reactor model. Through the use of this case study, it was shown that linear forecasting of state estimates often results in poor estimation performance in nonlinear systems, despite conducting local linearization of the nonlinear model before forecasting. However, it was shown that with nonlinear forecasting improved estimation performance can be achieved. Additionally, the case study also examined the computational time of EKF-and MHE-based estimation implementations and noted that MHE was often three to four orders of magnitude more computationally expensive than the EKF methods. Furthermore, it was noted that for MHE estimation, special attention needs to be paid to the optimizer selection, as this directly affects the computational time and robustness of the estimation results. Finally, a brief discussion on the opportunities and future trends of state estimation was included where optimizer selection was emphasized.
Even though the focus of this paper was on chemical and biochemical processes, industries such as navigation, automotive control, aviation, etc., can benefit from the development of state estimation for chemical processes. We envision a growing reliance on estimation in the future as the Internet of Things, big data, and cloud computing grow in size and increase the access to raw data from sensors. Today, almost everyone is reliant on state estimation in some form or another. For example, today most people carry around a GPS-equipped smartphone which heavily relies on estimation to provide access to location services. As this reliance on state estimation increases, it will likely promote unique and additional opportunities for research in the future.
Author Contributions: This work was a collaborative effort amongst all authors. R.A., G.C., and S.D., wrote parts of the manuscript. R.A. performed the case study overseen by F.V.L. All authors assisted in editing the paper and contributed to the structure of this project and provided critical references. All authors have read and agreed to the published version of the manuscript.