Estimation of Biomass Enzymatic Hydrolysis State in Stirred Tank Reactor through Moving Horizon Algorithms with Fixed and Dynamic Fuzzy Weights

: Second generation ethanol faces challenges before profitable implementation. Biomass hydrolysis is one of the bottlenecks, especially when this process occurs at high solids loading and with enzymatic catalysts. Under this setting, kinetic modeling and reaction monitoring are hindered due to the conditions of the medium, while increasing the mixing power. An algorithm that addresses these challenges might improve the reactor performance. In this work, a soft sensor that is based on agitation power measurements that uses an Artificial Neural Network (ANN) as an internal model is proposed in order to predict free carbohydrates concentrations. The developed soft sensor is used in a Moving Horizon Estimator (MHE) algorithmtoimprovethepredictionofstatevariablesduringbiomasshydrolysis. Thealgorithmisdeveloped and used for batch and fed-batch hydrolysis experimental runs. An alteration of the classical MHE is proposed for improving prediction, using a novel fuzzy rule to alter the filter weights online. This alteration improved the prediction when compared to the original MHE in both training data sets (tracking error decreased 13%) and in test data sets, where the error reduction obtained is 44%.


Introduction
The optimization of reactors to operate at high solids concentrations is an important step to enable the enzymatic hydrolysis of lignocellulosic biomass as a feasible technology. However, operating under these conditions can cause some issues in the saccharification kinetics, from substrate and product inhibition to heterogeneous agitation, due to several challenges. Some of these challenges can be attributed to reactor homogenization, which is a difficult task at high apparent viscosity. The suspended solids also greatly increase the power consumption from the stirring motor. The energy to agitate the reactor can be as drastic to account for half of the production cost of the final bioethanol [1]. Thus, great attention should be paid to this production state if a financially sound second generation production technology is one's end goal. This challenge motivates the study of the hydrolysis reactor and its architectures.
To overcome these bottlenecks, the torque necessary to stir the reactor, and thus, indirectly, the reactor rheology must be studied. Stirring torque, or agitation power, is a measurement that is submerged fermentations [18,19], metabolic fluxes estimation [20], and lipids production [21]. Thus, it is a good candidate for application in the enzymatic hydrolysis of biomass.
The aim of this work is first to design and test a soft sensor that is based on agitation power measurements to predict free carbohydrates concentrations while using an ANN as internal model. The soft sensor and a recently developed kinetic model are used in an MHE algorithm. The MHE is then tuned to optimize the monitoring of the enzymatic hydrolysis using sugarcane bagasse as substrate. The Materials and Methods Section was divided into three subsections to explain the performed procedure. Section 2.1 provides information regarding the source of the experimental data and kinetic model, including how experimental data were obtained and how the kinetic model was developed. Section 2.2 details how the soft sensor was developed. Section 2.3 explains how the MHE parameters (weights and membership degree) were optimized. The methods section is followed by the results of the employed methods and the conclusions.

Hydrolysis Data and Kinetic Model
The experimental assays that were used in this paper to tune both the ANN and MHE algorithm, as well as the kinetic model used in the state estimator, are presented in [22]. Sections 2.1.1-2.1.3 detail the material and analytical methods.
All of the numerical procedures (model integration, soft sensing and filter tuning) were performed in SCILAB (v.6.0.2, ESO Group, Paris, France, 2019) in a computer with an AMD FXTM-8350 and 15.7 Gb of random access memory, running a 64-bit Linux Mint 18.3 as operating system. For the model integration, a Runge-Kutta 4th order algorithm was used (ode function with "rk" specification).

Enzyme and Substrates
The enzymatic complex employed in all assays was CELIC CTEC 2, donated by Novozymes Latin America (Araucária, Paraná, Brazil). The enzymatic complex protein concentration was 75 mg protein mL −1 , with activity of approximately 203 FPU mL −1 .
Steam-exploded sugarcane bagasse was used as substrate (1667 kPa and 205 • C for 20 min) provided by Centro de Tecnologia Canavieira (Piracicaba, São Paulo, Brazil). The biomass main components were: 43.1 ± 0.1% cellulose; 12.4 ± 0.1% hemicellulose, 28.8 ± 1.9% lignin and 4.7 ± 0.1% ash (g compound g dry biomass −1 ); the remaining are non-analyzed compounds, not considered in the models. The bagasse composition was determined according to Sluiter and collaborators [23]. The moisture content of the pretreated bagasse was 59.9 ± 1.6%. The volume of water at the end of all assays was 3 L.

Assay Conditions
Assays were conducted with three feeding profiles. All of the profiles added up to 600 g substrate and 2.22 g protein (approximately 10 FPU.g substrate −1 ) of enzymatic complex at the end of each process, regardless of the feeding policy. The first assay, Data Set 1, is a High Solids Batch (HSB). This assay had all of the compounds added at the beginning of the process. The other two assays were operated in fed-batch mode, where the feeding profiles added up to the same substrate mass as the one of the batch processes. Feedings were performed at discrete times, as presented in Table 1. Data Set 2 is the Low Solids Fed-Batch (LSF) as the feeding intervals were spaced enough not to generate a visible high solids concentration throughout the hydrolysis. In Data Set 3, additions were closer in time when compared to LSF. In this assay, an initial low solids concentration occurred, but with following substrate additions the medium turned to a high-solids load process. Thus, this assay was named Mixed Profile Fed-Batch (MPF). Duplicates were performed for all assays by conducting the experiment in different days under the same policy.
The reactor used was a 3 L working volume baffled jacketed stirred reactor (reactor's diameter 0. 16  elephant ear impellers (diameter 0.08 m, New Brunswick Scientific ® , Edison, NJ, USA) equidistant from one another, the reactor bottom and the liquid surface. The upper impeller generates a downwards pumping flow and the lower impeller pumps upwards. The stirring motor was placed on top of a plate with bearings, so the motor was free to rotate when opposing forces acted upon it. A digital dynamometer (FG 6005 SD, Lutron Electronics, Coopersburg, PA, USA) was coupled to the bearing assembly in order to measure the opposing force. This allows the agitation power to be measured. The reactor media was kept at 50 • C by a thermostatic bath. The agitation speed was 470 RPM. The sampling was manually done at 0.5, 1, 2, 4, 6, 8, 12, 24, 36, 48, 60, 72, and 96 h. Samples were placed in boiling water bath for 10 min. to stop reaction and then refrigerated. These were analyzed for glucose and xylose by high performance liquid chromatography [24]. The samples were filtered (hydrophilic polyvinylidene fluoride filter, 0.2 µm) into autosampler vials. These were analyzed in a Shimadzu SCL-10A chromatograph (Shimadzu Corp., Kyoto, Japan), with refraction index detector RID10-A (Shimadzu Corp., Kyoto, Japan), column Animex HPX-87H Bio-rad (Bio-rad Laboratories Inc., Rio de Janeiro, Brazil), and mobile phase sulfuric acid 5 mM at 0.6 mL min −1 . The sample values were compared to the previously established standards.

Mathematical Modeling
The kinetic model that was used as part of the MHE implementation was developed in a previous work [22]. This kinetic model was the one that best describes glucose and xylose during enzymatic hydrolysis of a steam-exploded sugarcane bagasse under batch and fed-batch operations. A brief description is given in this subsection, and more information can be obtained in the dedicated paper.
Reactions 1 and 3 are considered to be heterogeneous, as they represent the breakdown of cellulose and hemicellulose. These are modeled via Modified Michaelis-Menten kinetics with competitive product inhibition, presented in Equation (1).
Processes 2020, 8, 407 5 of 19 In this equation, α i (g L −1 min −1 ) are reaction rates, where the subscripted "i" denotes the reaction used, either 1 or 3. The same nomenclature is used in the subsequent variables, k i are kinetic constants (min −1 ), E i are enzyme concentrations (g L −1 ), [S i ] are substrate concentrations (g L −1 ) and Km i are the Michaelis-Menten modified constant (g Enz L −1 ), Kp i are competitive inhibition constants for products (g L −1 ), and P i are product concentrations (g L −1 ).
The hydrolysis of cellobiose into glucose (Reaction 2) is homogeneous. This equation is modeled with a Pseudo-Homogeneous Michaelis-Menten with Competitive Product Inhibition model, as presented in Equation (2).
The same variables are used in this equation; however, Km 2 is the Michaelis-Menten constant (g L −1 ) for substrate. Reaction 5 is included, despite lignin being an inert. Thus, this "reaction" just reflects the accumulation of lignin in the reactor when it operates in fed-batch mode.
Enzymatic inactivation is represented by a generic first order equation, as presented in Equation (3).
where ke is a first order inactivation parameter (min −1 ). Table 2 presents the parameters of these equations. Table 2. High and low models parameters-parameter ± standard error.

Reaction Solids Model
Parameters Following the formalism that is proposed in reference [26], Equation (4) represents the mass balance for the components in the reactor. Since substrate feeding was discrete, the fed-batch process was solved as a sequence of batch processes by re-initializing the initial conditions at each feeding with the previous final concentrations.
In Equation ( ). The resulting 7 × 5 matrix at the right-hand side of the equation is the pseudo-stoichiometric matrix and the vector (α i ) are the reaction rates of reaction 1 to 5, as previously described.
The novelty of this model lies in how it was fitted and how it operates. A fuzzy membership rule interpolates between two different set of parameters for the described kinetic equations, one set was fitted using data from Data Set 1, High Solids Model (HSM), and the other was fitted using Data Set 2, the Low Solids Model (LSM). Thus, when solving for the reaction kinetics, both models generate independent reaction rates, α HSM (High Solids Model reaction rate) and α LSM ( With the HSM and LSM membership degrees, the reaction rate for the FM was calculated with Equation (5). It is the output of the Takagi-Sugeno system.
where α FUZZY are the reaction rates for each hydrolysis reaction. Thus, a prediction of the entire state vector is obtained. SCILAB's Ordinary Differential Equation Solver (ODE) was used to integrate the state variables, with the generated model parameters.

Soft Sensing Data
ANN training requires a relatively large amount of data to fully recognize the useful patterns. Using only analytical data from the data sets would not be enough in this case, or at least it would generate a network with poor performance. A cubic spline was used in the internal batches to generate new data points where previously there were none (in the periods between feeding times) of an assay to generate more glucose and xylose data points. The interpolated data may not completely match the actual state variable profiles; however, this procedure is necessary when the available data set are not enough for fitting complex empirical models, such as ANNs.
Agitation power measurement was realized during the assays every 10 s, and the average from all of the measurements within 10 min. was used as one data point for that time period. The resulting data was fed through a LOWESS (Locally Weighted Scatterplot Smoothing) filter to reduce noise. This smoothed data was used as an input for the ANN.
Other inputs to the neural network were the first derivative of the agitation power and the biomass added to the reactor until the current data point. The agitation power derivative was calculated within internal batches while using numerical differentiation with forward, central, and backward finite differences for first, internal, and last data points, respectively.
Data from the HSB and LSF were used to train the network, and data from the MSF were used as test assay.

Local Linear Model Trees Algorithm
A Local Linear Model Trees (LOLIMOT) algorithm was used to translate the input variables into the desired carbohydrates concentration. LOLIMOT is a special case of the Local Linear Neuro-Fuzzy Models (LLNFM), which is a subset of ANNs. In local linear models, different models are generated for different regions of the input domain. These models may interact in order to compose the predicted value following different approaches. LLNFMs use different neurons in a hidden layer of an ANN for each Local Linear Model (LLM). Each neuron also possesses a validity function that regulates where each LLM is used and to what degree. In LLNFMs, a fuzzy membership function is also used alongside the linear model, which allows for multiples models to be "active" for a given data point, generating a more flexible model structure. Several types of membership functions can be used; however, normalized Gaussian functions are usually chosen, since they are highly flexible with a relatively small number of parameters [27].
The "tree" in LOLIMOT regards the manner by which the validity range of each LLM and membership functions are generated. The algorithm that is illustrated here is based on reference [28], and it was implemented while using the Lolimot Scilab ATOM module, based on the same algorithm.
This algorithm uses axis-orthogonal cuts in the input space to generate hyper-rectangles, where the center of the Gaussian function and the linear models are placed. Each additional partitioning of the input variables generates a new LLM and activation function. Figure 1 demonstrates how subsequent cuts are added in a two-dimensional input variables space. The "tree" in LOLIMOT regards the manner by which the validity range of each LLM and membership functions are generated. The algorithm that is illustrated here is based on reference [28] , and it was implemented while using the Lolimot Scilab ATOM module, based on the same algorithm.
This algorithm uses axis-orthogonal cuts in the input space to generate hyper-rectangles, where the center of the Gaussian function and the linear models are placed. Each additional partitioning of the input variables generates a new LLM and activation function. Figure 1 demonstrates how subsequent cuts are added in a two-dimensional input variables space.  The algorithm loops around and looks for the worst partition to be cut if a stopping criterion is not met [28] .
Several stopping criteria can be used. In this manuscript, optimum architecture was achieved when the average standard error from the validation data departs from the average standard error from the training data. When this happens, the addition of new neurons starts to model the noise from the samples, which disrupts the network inference and characterizes overfitting [27] .
HSB and LSF data were combined and normalized between 0 and 1. Twenty percent of the dataset was randomly separated as a validation set, being used only to evaluate and choose the network complexity. The remainder of the combined dataset was used as a training set to fit the model parameters. The data from the MSF was only used as a test dataset, in order to evaluate the extrapolation capacity of the optimized LOLIMOT.
The LOLIMOT algorithm is highly optimized to be independent of external operators and the Scilab Lolimot Toolbox implementation only leaves one tuning parameter for the operator, a smoothing parameter of the predicted output. As this is an empirical parameter, six different levels were tested for the smoothing parameter, which range from 0.1 to 0.6 and uniformly increasing by 0.1 in each level. Furthermore, separate models were composed for glucose and xylose. Each network was tested with up to 90 rules. The main reason for using different models for glucose and xylose, as noted by Nelles (2001) [27], is that MISO (multiple-input single-output) models tend to be more flexible, and easier to construct and apply than MIMO (multiple-input multiple-output) models.  Figure 1 illustrates how the LOLIMOT algorithm adds a new partition in a simple model. First, it evaluates the loss function of all existing LLMs, the worst partition is then divided into new hyper rectangles, as many as the number of input variables. New membership functions and LLMs are added and optimized, for all new partitions. The algorithm loops around and looks for the worst partition to be cut if a stopping criterion is not met [28].
Several stopping criteria can be used. In this manuscript, optimum architecture was achieved when the average standard error from the validation data departs from the average standard error from the training data. When this happens, the addition of new neurons starts to model the noise from the samples, which disrupts the network inference and characterizes overfitting [27].
HSB and LSF data were combined and normalized between 0 and 1. Twenty percent of the dataset was randomly separated as a validation set, being used only to evaluate and choose the network complexity. The remainder of the combined dataset was used as a training set to fit the model parameters. The data from the MSF was only used as a test dataset, in order to evaluate the extrapolation capacity of the optimized LOLIMOT.
The LOLIMOT algorithm is highly optimized to be independent of external operators and the Scilab Lolimot Toolbox implementation only leaves one tuning parameter for the operator, a smoothing parameter of the predicted output. As this is an empirical parameter, six different levels were tested for the smoothing parameter, which range from 0.1 to 0.6 and uniformly increasing by 0.1 in each level. Furthermore, separate models were composed for glucose and xylose. Each network was tested with up to 90 rules. The main reason for using different models for glucose and xylose, as noted by Nelles  [27], is that MISO (multiple-input single-output) models tend to be more flexible, and easier to construct and apply than MIMO (multiple-input multiple-output) models.

Moving Horizon Estimator
MHE is a modification of the full information estimator to diminish the total number of points used in the optimization problem solved for state estimation. This is achieved by using a moving window of prediction. As the process occurs, new time periods to be used in the optimization are added to the window until the size of the window (N) is filled with T time periods. When the window is full (T = N), the oldest state group is dropped to make room for a new time period set of states. Figure 2 presents a schematic of how the moving window works.

Moving Horizon Estimator
MHE is a modification of the full information estimator to diminish the total number of points used in the optimization problem solved for state estimation. This is achieved by using a moving window of prediction. As the process occurs, new time periods to be used in the optimization are added to the window until the size of the window (N) is filled with T time periods. When the window is full (T = N), the oldest state group is d where the notation ‖ ‖ 2 represents and the squared norm of a vector z weighted by a matrix R. ̂ and ̂ are, respectively, the state and output vectors that were predicted by the MHE at the j-th time. − is the state predicted by the model alone, while not considering the current measurement . N is the window size and T is the current estimation time. The instrumentation data ( ) are obtained via soft sensing, and the model prediction ( − ) is obtained by solving the kinetic model from T-N until the current time T. The filter estimation (̂ and ̂) is computed at each observation step (each movement of the window). In this work, the instrumentation directly predicts glucose and xylose concentrations, and these are the only two state variables that were subjected to optimization. Other variables are stoichiometrically calculated from the estimated glucose and xylose concentrations.
The MHE cost function is optimized in each step via a Levenberg-Marquardt algorithm, operating up to 1000 iterations, or until the error improvement between two iterations is less than 10 −6 .
The filter estimation is highly sensitive to the matrices Lw and Lv. In the classical implementation of the MHE, these are the inverse of the covariance matrices of the process and measurement noises, respectively. However, in complex systems, as in the case of biomass hydrolysis, obtaining the noise covariance and the associated weights may not be so trivial. Lw, for instance, could be approximately estimated from the parameter covariance matrix that was obtained during the fitting procedure. It would be a function of state and the parameter sensitivity matrix would have to be evaluated each time step. This approach would be computationally demanding in a system with high output data frequency. Usually, Lw is assumed as a constant matrix and it is estimated by the variances of the errors between model and analytical data or by the noise covariance. Lv is usually a constant matrix and it might be obtained by output replicates.
In a practical sense for the implementation, Lw and Lv are used as weights for the model and measurement errors. These weights regulate the MHE optimization. Therefore, when Lw and Lv estimates do not provide good inference results due to the complexity of the system, an where the notation z 2 R represents z T R z and the squared norm of a vector z weighted by a matrix R. X j andŶ j are, respectively, the state and output vectors that were predicted by the MHE at the j-th time. X − j is the state predicted by the model alone, while not considering the current measurement Y j . N is the window size and T is the current estimation time. The instrumentation data (Y) are obtained via soft sensing, and the model prediction (X − ) is obtained by solving the kinetic model from T-N until the current time T. The filter estimation (X andŶ) is computed at each observation step (each movement of the window). In this work, the instrumentation directly predicts glucose and xylose concentrations, and these are the only two state variables that were subjected to optimization. Other variables are stoichiometrically calculated from the estimated glucose and xylose concentrations.
The MHE cost function is optimized in each step via a Levenberg-Marquardt algorithm, operating up to 1000 iterations, or until the error improvement between two iterations is less than 10 −6 .
The filter estimation is highly sensitive to the matrices L w and L v . In the classical implementation of the MHE, these are the inverse of the covariance matrices of the process and measurement noises, respectively. However, in complex systems, as in the case of biomass hydrolysis, obtaining the noise covariance and the associated weights may not be so trivial. L w , for instance, could be approximately estimated from the parameter covariance matrix that was obtained during the fitting procedure. It would be a function of state and the parameter sensitivity matrix would have to be evaluated each time step. This approach would be computationally demanding in a system with high output data frequency. Usually, L w is assumed as a constant matrix and it is estimated by the variances of the errors between model and analytical data or by the noise covariance. L v is usually a constant matrix and it might be obtained by output replicates.
In a practical sense for the implementation, L w and L v are used as weights for the model and measurement errors. These weights regulate the MHE optimization. Therefore, when L w and L v estimates do not provide good inference results due to the complexity of the system, an optimization methodology can be applied to determine these weights. Thus, in this work, a tuning methodology was used to ensure that the value of the matrices L w and L v would generate adequate prediction. In this implementation, L w and L v are 2 × 2 diagonal matrices, where the weights are placed in the main diagonal, as only the free carbohydrates (xylose and glucose) are considered in the state estimation.

Fixed Weights Tuning
An optimization loop is used outside of the MHE to better estimate L w and L v . A Levenberg-Marquardt algorithm is employed to estimate the values of each weight. The analytical data from the HSB and LSF is used to build a weighted sum of squared errors (F) to generate a performance index for this optimization. Equation (7) demonstrates this cost function.
where the error vector e is the difference between the final MHE prediction and experimental data. Q is an n × n diagonal weight matrix and n is the number of experimental data. The elements Q ii are the inverse of the carbohydrate replicate variance (1/σ 2 i ). The variance that is estimated for glucose was 0.392 g 2 /L 2 , and for xylose was 0.773 g 2 /L 2 . The analysis of errors that led to these weight values is presented in the Supplementary Material of reference [12]. Data Set 3 (MSF) is used as a test case in order to check the estimator extrapolation capacity.
The tuning algorithm operated for 100 iterations. The only parameter of the filter optimized outside of the tuning algorithm was the size of the window. Tuning was performed with window sizes of 4, 5, and 6 time periods. State variables estimation occurred every 10 min. of real process time, i.e., optimizations occurred six times for every hour of process operation. Table 3 presents a pseudo-code of the tuning algorithm.

Fuzzy Weights Tuning
As mentioned earlier, L w could be estimated from the parameter covariance, turning it in a time varying matrix. In this work, a different approach is proposed in order to obtain dynamic weights in the state estimation to improve prediction in such complex system. Subsequently, more pertinence is given to the model or to the instrumentation where they can provide more information about the system. Notice that the use of the fuzzy dynamic weights is computationally intensive to build, but it requires less computational effort when the filter is running.
Thus, an alteration of the MHE algorithm is proposed here, in which the diagonal matrices L w and L v are multiplied by the linear fuzzy membership function. The membership function is a linear fuzzy rule, ranging from 0 to 1. This function is presented in Figure 3 and it is described by Equation (8). The parameters LL, LU, HU, and HL are optimized together with the weights in the outer optimization loop. The membership degree of the model is obtained by subtracting the MD Inst from 1. The weight vectors then multiply these membership degrees. Thus, as solids are hydrolyzed, the weights would change, altering Equation (6) into (9).
The rest of the tuning procedure remains the same as the one that is presented in Table 3. As mentioned earlier, Lw could be estimated from the parameter covariance, turning it in a time varying matrix. In this work, a different approach is proposed in order to obtain dynamic weights in the state estimation to improve prediction in such complex system. Subsequently, more pertinence is given to the model or to the instrumentation where they can provide more information about the system. Notice that the use of the fuzzy dynamic weights is computationally intensive to build, but it requires less computational effort when the filter is running.
Thus, an alteration of the MHE algorithm is proposed here, in which the diagonal matrices Lw and Lv are multiplied by the linear fuzzy membership function. The membership function is a linear fuzzy rule, ranging from 0 to 1. This function is presented in Figure 3 and it is described by Equation (8).
The rest of the tuning procedure remains the same as the one that is presented in Table 3.

Soft Sensing Optimization
The best results for the network number of rules optimization are obtained when using the smoothing parameter of 0.4 for both networks based on the MSE of training and validation data sets. Figure 4 presents the error plots for these optimizations. Figure 4 demonstrates that, for both networks, the validation MSE initially presents a somewhat erratic pattern. This situation is addressed when more rules are added.
As more rules are added, the validation MSE becomes greater than the training error, which is the expected behavior, and it follows the decrease of the training MSE. It is noticeable that, at some points, the validation error increases from one quantity of rules to another. At such points, one could be tempted to consider the previous smaller error condition for the optimum architecture. However, if the validation MSE continues to decrease at following rule additions, it is possible to attribute the increase in the error to small variations in the current fitting stage. The optimum number of rules is the one where the validation MSE deviates clearly from the training MSE decreasing tendency. In the presented algorithm, the number of rules where this was considered to occur was at 38 and 67 rules for the glucose and xylose networks, respectively. These points are marked in Figure 4 with vertical solid bars. Figure 5 presents the prediction of the optimum networks for the training sets HSB and LSF input data and analytical data.

Soft Sensing Optimization
The best results for the network number of rules optimization are obtained when using the smoothing parameter of 0.4 for both networks based on the MSE of training and validation data sets. Figure 4 presents the error plots for these optimizations.  Figure 4 demonstrates that, for both networks, the validation MSE initially presents a somewhat erratic pattern. This situation is addressed when more rules are added.
As more rules are added, the validation MSE becomes greater than the training error, which is the expected behavior, and it follows the decrease of the training MSE. It is noticeable that, at some points, the validation error increases from one quantity of rules to another. At such points, one could be tempted to consider the previous smaller error condition for the optimum architecture. However, if the validation MSE continues to decrease at following rule additions, it is possible to attribute the increase in the error to small variations in the current fitting stage. The optimum number of rules is the one where the validation MSE deviates clearly from the training MSE decreasing tendency. In the presented algorithm, the number of rules where this was considered to occur was at 38 and 67 rules for the glucose and xylose networks, respectively. These points are marked in Figure 4 with vertical solid bars. Figure 5 presents the prediction of the optimum networks for the training sets HSB and LSF input data and analytical data.  Figure 5 shows that the estimates in the training data have high inherent noise; however, for a prediction with no phenomenological or kinetic data, using only instantaneous measurements, the prediction follows the analytical values fairly well, being capable of predicting the steps in concentrations, due to biomass feedings in the LSF. Figure 6 presents the network prediction when dealing with data that are not present in the training/validation data set.  Figure 5 shows that the estimates in the training data have high inherent noise; however, for a prediction with no phenomenological or kinetic data, using only instantaneous measurements, the prediction follows the analytical values fairly well, being capable of predicting the steps in concentrations, due to biomass feedings in the LSF. Figure 6 presents the network prediction when dealing with data that are not present in the training/validation data set.
The prediction in Figure 6 is clearly less accurate than the one that is displayed in Figure 5. This is expected for the LOLIMOT, as it is an empirical model and, thus, might have poor extrapolation capability. The subsequent feedings in the MSF profile (from 0 h to 2 h), much closer than the times in the LSF (from 0 h to 24 h), generates an overestimation of the carbohydrates' concentrations. However, towards the end of the process, the prediction becomes more accurate. From Tables 4 and 5, it is also clear that the soft sensor is less accurate than the model prediction. Nevertheless, the behavior of such predictions can be improved using the process model and the state estimation implementation.
bottom graphs are glucose and xylose predictions; error bars in data points are standard errors between experiment duplicates for compound concentration; vertical dotted lines are the biomass feeding times. Figure 5 shows that the estimates in the training data have high inherent noise; however, for a prediction with no phenomenological or kinetic data, using only instantaneous measurements, the prediction follows the analytical values fairly well, being capable of predicting the steps in concentrations, due to biomass feedings in the LSF. Figure 6 presents the network prediction when dealing with data that are not present in the training/validation data set. The prediction in Figure 6 is clearly less accurate than the one that is displayed in Figure 5. This is expected for the LOLIMOT, as it is an empirical model and, thus, might have poor extrapolation capability. The subsequent feedings in the MSF profile (from 0 h to 2 h), much closer than the times in the LSF (from 0 h to 24 h), generates an overestimation of the carbohydrates'

Moving Horizon Tuning
The soft sensing predictions are then used in the MHE algorithm. In a first attempt to use MHE, L w was a diagonal matrix that was estimated by the model prediction RMSE obtained for the Training Data Sets (Table 5), which results in the elements of 0.359 for glucose and 1.23 for xylose. On the other hand, L v was a diagonal matrix that was estimated by the Soft Sensing RMSE (Table 5), resulting in elements of 0.140 for glucose and 0.350 for xylose. Although the variance of the soft sensor error is greater than the variance of the model, the MHE still gives too much weight to the soft sensor. The MHE results practically follow soft sensor prediction (data not shown). Hence, no significant improvement was obtained in the RMSE when compared to the use of soft sensor alone.
The next approach was to find the optimal fixed and dynamic weights, L w and L v (Sections 2.3.1 and 2.3.2), to predict the training data set. A summary of the tuning algorithms' Root Mean Squared Errors (RMSE) is presented in Table 4, for both the fixed weights and fuzzy weights and for all of the window sizes. Table 5 shows the results separated by component.
From Table 4, it is clear that using the smallest window of prediction (four elements) yields the best results for the MHE.
The fixed weights MHE with four elements in the window employed optimized weights of 32.06 for glucose and 78.72 for xylose in L w , and 3.42 for glucose and 2.65 for xylose in the L v matrix. Effectively, this means that, in this implementation, the knowledge from the model was weighted more than the one from the instrumentation. This can be observed in the training data set predictions, as presented in Figure 7.  Figure 7 illustrates the results that are presented in Table 4. A slight decrease in prediction performance (increase of approximately 11% in the RMSE in relation to the model) occurs in the HSB data set to improve prediction in the LSF data set. However, as depicted in Table 5, when compared to the pure model prediction, MHE reduced the errors for both products when HSB and LSF are treated together. Thus, the overall decrease in the RMSE when using the MHE algorithm was of approximately 17% across assays, showing that the use of the agitation measurements may improve the prediction. However, an issue occurs when the prediction of the test data set (MSF) is considered, as presented in Figure 8.   Table 4. A slight decrease in prediction performance (increase of approximately 11% in the RMSE in relation to the model) occurs in the HSB data set to improve prediction in the LSF data set. However, as depicted in Table 5, when compared to the pure model prediction, MHE reduced the errors for both products when HSB and LSF are treated together. Thus, the overall decrease in the RMSE when using the MHE algorithm was of approximately 17% across assays, showing that the use of the agitation measurements may improve the prediction. However, an issue occurs when the prediction of the test data set (MSF) is considered, as presented in Figure 8.
performance (increase of approximately 11% in the RMSE in relation to the model) occurs in the HSB data set to improve prediction in the LSF data set. However, as depicted in Table 5, when compared to the pure model prediction, MHE reduced the errors for both products when HSB and LSF are treated together. Thus, the overall decrease in the RMSE when using the MHE algorithm was of approximately 17% across assays, showing that the use of the agitation measurements may improve the prediction. However, an issue occurs when the prediction of the test data set (MSF) is considered, as presented in Figure 8. In the test data set, the prediction performance is less than ideal. The fixed weights that are tuned in other data sets are not capable of sustaining the MHE performance in this situation. Although an initial prediction for glucose concentration is improved, it reaches concentrations that are far above the actual value. Additionally, glucose concentration is underestimated due to a decrease in the network prediction towards the end of the reaction. The algorithm is more accurate for xylose, as the final concentration is predicted to be better than when using the pure model (see Table 5). However, the MHE with fixed weights does little to improve the initial concentration predictions. Nevertheless, this behavior does not occur when using the fuzzy weights implementation. In the test data set, the prediction performance is less than ideal. The fixed weights that are tuned in other data sets are not capable of sustaining the MHE performance in this situation. Although an initial prediction for glucose concentration is improved, it reaches concentrations that are far above the actual value. Additionally, glucose concentration is underestimated due to a decrease in the network prediction towards the end of the reaction. The algorithm is more accurate for xylose, as the final concentration is predicted to be better than when using the pure model (see Table 5). However, the MHE with fixed weights does little to improve the initial concentration predictions. Nevertheless, this behavior does not occur when using the fuzzy weights implementation.
When analyzing the MHE with fuzzy weight results in Tables 4 and 5, it is clear that the dynamic weight approach greatly improves the state prediction within the training data, even when considering the HSB model prediction, especially for the smallest windows, for which the best results are obtained. In this implementation, the value for the weights that are optimized by the tuning algorithm are L w of 69.01 for glucose and 126.90 for xylose, as well as L v of 1.08 for glucose and 0.54 for xylose. This followed the same trend of giving higher values to the weights that multiply the model prediction, as this clearly provides a more accurate model prediction.
The fuzzy rule thresholds, for the four-element moving window, are 10.77, 63.63, 80.06, and 190.23 g L −1 for LL, LU, HL, and HU, respectively. These values mean that above 190 g L −1 of solids within the reactor, only the model prediction is used and below 10 g L −1 only the soft sensing is used. However, these situations do not occur within the data set, meaning that the algorithm is always using both predictions to compose the new state estimation, while using more the model when solid concentration is high, and more the network when solids are low. The results for the tuned fuzzy weights MHE with four elements in the prediction windows for the training data sets are presented in Figure 9.
The behavior that is expected from the fuzzy rule is clearly seen in Figure 9. In the HSB, the prediction with the MHE closely follows the model prediction. The use of this methodology also represents a small improvement when compared to using only the model in this data set (1.63%). A more interesting trend is observed in the LSF. During low solids concentrations (before the second addition of biomass at 24 h), the MHE prediction closely follows the network prediction. After this point, an interpolation between both predictions prevails. This greatly improves prediction, decreasing the RMSE in approximately 30.8% when compared with the model results alone.
Yet, the great strength of this version of the MHE is observed when analyzing the test data set prediction, as presented in Figure 10.
190.23 g L −1 for LL, LU, HL, and HU, respectively. These values mean that above 190 g L −1 of solids within the reactor, only the model prediction is used and below 10 g L −1 only the soft sensing is used. However, these situations do not occur within the data set, meaning that the algorithm is always using both predictions to compose the new state estimation, while using more the model when solid concentration is high, and The behavior that is expected from the fuzzy rule is clearly seen in Figure 9. In the HSB, the prediction with the MHE closely follows the model prediction. The use of this methodology also represents a small improvement when compared to using only the model in this data set (1.63%). A more interesting trend is observed in the LSF. During low solids concentrations (before the second addition of biomass at 24 h), the MHE prediction closely follows the network prediction. After this point, an interpolation between both predictions prevails. This greatly improves prediction, decreasing the RMSE in approximately 30.8% when compared with the model results alone.
Yet, the great strength of this version of the MHE is observed when analyzing the test data set prediction, as presented in Figure 10. In the test data set, a decrease of 36.8% in the overall RMSE was obtained while using Fuzzy MHE, when comparing to the model prediction alone. The error was reduced for both Xylose and Glucose when compared to the pure model prediction (Table 5). Xylose prediction improved more towards the end of the process. Glucose prediction is improved throughout the entire reaction, especially at the beginning, where the interpolation between model and soft sensing predictions generated a more accurate estimation.
Therefore, it is clear that using the proposed approach not only improves the prediction within training data, but it also improves prediction in previously unforeseen circumstances. This helps to enhance the robustness of the state estimation and it can greatly improve the control of the process if a controller is used alongside the proposed MHE implementation.

Conclusions
In this study, a LOLIMOT-based soft sensor using torque measurements was successfully built to monitor the glucose and xylose profiles during batch and fed-batch enzymatic hydrolysis of sugarcane bagasse. The developed soft sensor did not present good performance when considering the extrapolation capabilities. The use of MHE together with the soft sensing was successful in improving the prediction of desired state variables, regardless of the weights used within the implementation. In particular, the novel fuzzy dynamic weights for MHE greatly improved the prediction in both training data sets (16.2% RMSE reduction when compared to the model prediction alone) and test data sets (36.8% RMSE reduction when compared to the model prediction alone). The approach proposed here might be a valuable tool for monitoring the hydrolysis of In the test data set, a decrease of 36.8% in the overall RMSE was obtained while using Fuzzy MHE, when comparing to the model prediction alone. The error was reduced for both Xylose and Glucose when compared to the pure model prediction (Table 5). Xylose prediction improved more towards the end of the process. Glucose prediction is improved throughout the entire reaction, especially at the beginning, where the interpolation between model and soft sensing predictions generated a more accurate estimation.
Therefore, it is clear that using the proposed approach not only improves the prediction within training data, but it also improves prediction in previously unforeseen circumstances. This helps to enhance the robustness of the state estimation and it can greatly improve the control of the process if a controller is used alongside the proposed MHE implementation.

Conclusions
In this study, a LOLIMOT-based soft sensor using torque measurements was successfully built to monitor the glucose and xylose profiles during batch and fed-batch enzymatic hydrolysis of sugarcane bagasse. The developed soft sensor did not present good performance when considering the extrapolation capabilities. The use of MHE together with the soft sensing was successful in improving the prediction of desired state variables, regardless of the weights used within the implementation. In particular, the novel fuzzy dynamic weights for MHE greatly improved the prediction in both training data sets (16.2% RMSE reduction when compared to the model prediction alone) and test data sets (36.8% RMSE reduction when compared to the model prediction alone). The approach proposed here might be a valuable tool for monitoring the hydrolysis of lignocellulosic material, or any other type of process that has online agitation measurements, especially when the state estimation is needed in a feedback control system.