Review of Soft Sensors in Anaerobic Digestion Process

: Anaerobic digestion is associated with various crucial variables, such as biogas yield, chemical oxygen demand, and volatile fatty acid concentration. Real-time monitoring of these variables can not only reﬂect the process of anaerobic digestion directly but also accelerate the efﬁciency of resource conversion and improve the stability of the reaction process. However, the current real-time monitoring equipment on the market cannot be widely used in the industrial production process due to its defects such as expensive equipment, low accuracy, and lagging analysis. Therefore, it is essential to conduct soft sensor modeling for unmeasurable variables and use auxiliary variables to realize real-time monitoring, optimization, and control of the an-aerobic digestion process. In this paper, the basic principle and process ﬂow of anaerobic digestion are ﬁrst brieﬂy introduced. Subsequently, the development history of the traditional soft sensor is systematically reviewed, the latest development of soft sensors was detailed, and the obstacles of the soft sensor in the industrial production process are discussed. Finally, the future development trend of deep learning in soft sensors is deeply discussed, and future research directions are provided.


Introduction
Anaerobic digestion is a highly complex biochemical reactions process, with characteristics such as multi-factor influence, dynamic change, and complex nonlinearity [1]. Anaerobic digestion can not only treat organic pollutants but also produce clean energy [2]. Therefore, anaerobic digestion technology has broad development space in the treatment of wastewater and organic solid waste [3] and is one of the practical ways to solve energy and environmental problems. However, anaerobic microorganisms of the anaerobic digestion process are intensely sensitive to changes in the digestion environment, and methanogens have extremely strict requirements on the external environment [3]. The unexpected changes in the external environment have an impact on the hydrolysis, acidification, and methanation processes of anaerobic digestion [4,5]. This will cause numerous volatile fatty acids (VFA) to accumulate in the reactor, inhibit the progress of methanation, and even result the failure of the anaerobic reactor operation [6][7][8]. Therefore, a more advanced online measurement system must be used to fully monitor the anaerobic digestion process in real-time to ensure that the anaerobic digestion process is stable and efficient while obtaining a higher biogas yield [9].
In terms of anaerobic digestion process variables monitoring, there is mature and reliable online monitoring equipment for temperature, pressure, flow rate, gas composition, and other variables [10,11]. However, there are still many key variables that cannot be directly measured, or the measurement equipment is expensive [12], such as biogas yield, chemical oxygen demand (COD), and VFA concentration. Online monitoring equipment for these variables cannot be widely used in industrial production due to factors such as expensive equipment, low accuracy, and lagging analysis [13][14][15][16]. Consequently, the soft sensor using online measurable auxiliary variables to estimate the unmeasurable variables 1.
PH: The optimal pH range of different microorganisms is different. Methanogens are extremely sensitive to pH, and the optimal pH range is 6.5-7.2 [30]. The fermenting microorganisms produce acetic acid and butyric acid when the pH is low. Acetic acid and propionic acid are formed when the pH is higher than 8.0 [31]. Therefore, reasonable monitoring of pH can ensure the maximum biological activity of microorganisms.

2.
Alkalinity: Methanogens usually produce alkalinity in the form of carbon dioxide, ammonia, and bicarbonate, contributing to neutralizing VFA produced during anaerobic digestion [32]. Thus, real-time monitoring of alkalinity can improve the stability of the anaerobic digestion process when the concentration of carbon dioxide is stable.  3. Temperature: Temperature has a crucial influence on the physical and chemical properties of anaerobic digestion and fermentation substrates. It affects the growth rate and metabolism of microorganisms, which in turn influences the population dynamics of the anaerobic digestion process [33]. When the temperature changes more than 1 • C/day, the biochemical activity of methanogens will be severely affected, causing the process to fail. 4.
VFA concentration: VFA concentration is an intermediate product of the anaerobic digestion process. Excessive accumulation of VFA can reduce the pH of the system and inhibit the activity of methanogens. The VFA concentration can reflect the current operating conditions of the system while being extremely sensitive to the incoming feed imbalance [34]. Hence, it is urgent to establish a soft sensor to predict the VFA concentration by monitoring the measurable and easy-to-obtain process variables in real-time. 5.
COD and biogas yield: COD is an imperative indicator to measure the organic content of the effluent from the anaerobic digestion process [35]. Biogas yield is a vital indicator to measure the efficiency of anaerobic digestion [36]. Real-time monitoring of COD and biogas yield can demonstrate the operating efficiency and stability of the anaerobic digestion process and contribute to achieving the real-time calibration and optimization of production conditions and control methods.

Anaerobic Digestion Process
In the industrial production process, anaerobic digestion processes are usually classified according to factors such as operating temperature, feeding method, and the number of reactors [37]. It can be divided into single-phase digestion and two-phase digestion based on the number of reactors [38]. The single-phase digestion process was widely used in the immature stage of the early anaerobic digestion theory due to its low price and simple operation. Single-phase digestion suggests that the hydrolysis, acidification, acetic acidification, and methanation processes of degrading macromolecular organics are all conducted in the same digestion tank, and the inhibition of any one step will affect the overall digestion efficiency [39]. With the development of the anaerobic digestion theory, researchers and technologists have developed a two-phase digestion process to avoid acid inhibition. Two-phase anaerobic digestion suggests the hydrolysis, acidification, and acetic acid stages are conducted in the acid production tank, while the methane production stage is performed in the methane production tank [40]. This method can effectively avoid mutual inhibition between the steps, improve the efficiency of anaerobic digestion, shorten the reaction time, and increase methane production [41].
According to the biodegradability of the input materials, different two-phase anaerobic digestion devices are generally selected [42]. When industrial wastewater is treated with low solid content, the acid production tank and the methane production tank usually adopt a continuous stirred tank reactor and an up-flow anaerobic sludge blanket, respectively [43]. When organic wastewater is treated with high solid content, both the acid production tank and the methane production tank use the up-flow solid reactor [44]. When organic sludge is processed with higher solid content, both the acid production tank and the methane production tank employ the continuous stirred tank reactor [45]. The specific process flow is described as follows [28]. First, the pretreated organic materials are fed into the hydrolysis acidification tank to perform the hydrolysis reaction of macromolecular organics and the acidification reaction of small molecular organics. Then, the acidified product is input into the methane-generating tank for methane production reaction. Since the stages of acid production and methane production are performed separately, it is ensured that acid-producing bacteria and methanogens are in optimal environmental conditions and can exert maximum activity. Moreover, the acid production process improves the biochemical properties of the material, and the acidified product provides a suitable substrate for methanogens. The two-phase anaerobic digestion process is illustrated in Figure 1.
the stages of acid production and methane production are performed separately, it is ensured that acid-producing bacteria and methanogens are in optimal environmental conditions and can exert maximum activity. Moreover, the acid production process improves the biochemical properties of the material, and the acidified product provides a suitable substrate for methanogens. The two-phase anaerobic digestion process is illustrated in Figure 1.

Soft Sensor Based on Process Mechanism
Mechanism modeling is to determine the mathematical relationship between the target variables and the auxiliary variables through the establishment of a balance equation based on a deep understanding of the process mechanism [23]. It has the advantages of high accuracy, strong interpretability, and clear industrial background. However, the biochemical reaction process of anaerobic digestion is extremely complicated, with strong nonlinearity and uncertainty, making it difficult to establish an accurate mechanism model [46,47]. Moreover, the biochemical reaction process is described by a large number of algebraic equations and differential equations. Therefore, there are defects such as large calculation amount and slow convergence, impeding it in meeting the requirements of real-time monitoring of target variables [48][49][50]. From another perspective, the mechanism model parameters of anaerobic digestion, such as Monod maximum specific absorption rate and the first-order decay rate in the kinetic parameters of the Anaerobic Digestion Model No.1 (ADM1), are mostly empirical values [51]. The determination of these parameters requires considerable experimental verifications, and the various indicators in the industrial production process will not be tested. Therefore, it is proposed to combine the mechanism model and the data-driven model to establish a hybrid model of anaerobic digestion [52,53]. The hybrid model fully takes advantage of the data-driven model that only pays attention to input and output and does not require a clear internal mechanism, contributing to a decrease in the difficulty of modeling the mechanism model. Moreover, the interpretability of the data-driven model is enhanced using the mechanism model. However, the prediction accuracy and generalization ability of the hybrid model need to be further improved.

Soft Sensor Based on Process Mechanism
Mechanism modeling is to determine the mathematical relationship between the target variables and the auxiliary variables through the establishment of a balance equation based on a deep understanding of the process mechanism [23]. It has the advantages of high accuracy, strong interpretability, and clear industrial background. However, the biochemical reaction process of anaerobic digestion is extremely complicated, with strong nonlinearity and uncertainty, making it difficult to establish an accurate mechanism model [46,47]. Moreover, the biochemical reaction process is described by a large number of algebraic equations and differential equations. Therefore, there are defects such as large calculation amount and slow convergence, impeding it in meeting the requirements of real-time monitoring of target variables [48][49][50]. From another perspective, the mechanism model parameters of anaerobic digestion, such as Monod maximum specific absorption rate and the first-order decay rate in the kinetic parameters of the Anaerobic Digestion Model No.1 (ADM1), are mostly empirical values [51]. The determination of these parameters requires considerable experimental verifications, and the various indicators in the industrial production process will not be tested. Therefore, it is proposed to combine the mechanism model and the data-driven model to establish a hybrid model of anaerobic digestion [52,53]. The hybrid model fully takes advantage of the data-driven model that only pays attention to input and output and does not require a clear internal mechanism, contributing to a decrease in the difficulty of modeling the mechanism model. Moreover, the interpretability of the data-driven model is enhanced using the mechanism model. However, the prediction accuracy and generalization ability of the hybrid model need to be further improved.

Soft Sensor Based on State Estimation
In the soft sensor based on state estimation, the method of state observation and state estimation is adopted to obtain the predicted value of the state variable through auxiliary variables and then acquire the predicted value of the target variable [54,55]. With the development of anaerobic digestion soft sensors, various soft sensors based on state estimation have been proposed [56][57][58][59][60][61]. Among them, the nonlinear observer presented by Dochain under the improved anaerobic digestion model can estimate the VFA concentration online under different working conditions [62]. The improved anaerobic digestion model can be expressed as: . where ∆ f denotes the uncertainty item related to unmodeled dynamics and load disturbance; x is the vector of dynamic states; f denotes the vector field; C = [0, 0, 1]; u and y denote the input and output of the model, respectively. For the improved anaerobic digestion model, the nonlinear observer can be expressed as: wherex ∈ R 3 represents the state estimation vector,ŷ indicates the predicted value of the output signal, and k l , k d and γ denote the observer gains. The estimated error of the model is presented in Formula (3). .
where Ce = y −ŷ. This nonlinear observer overcomes the disadvantage of the poor performance of the local observer under non-set conditions and solves the problem that the progressive observer is very sensitive to unknown load disturbances [63]. Additionally, the author has verified the convergence of the observer through Lyapunov stability. Soft sensors based on state estimation can handle situations such as dynamic characteristic differences between the variables and system lag. However, state estimation mainly applies to mature models and models that can reflect the characteristics of the measured object after approximation. Moreover, an increase in the online estimation error would be caused by simplifying the system to reduce the difficulty of modeling, and the use of this method would be restricted by the anaerobic digestion model's requirements for modeling accuracy [64][65][66][67].

Soft Sensor Based on Regression Analysis
The soft sensor of anaerobic digestion based on regression analysis majorly includes soft sensors based on multiple linear regression (MLR) and soft sensors based on partial least squares regression (PLSR).
MLR is able to establish a linear mapping between auxiliary variables and target variables through the least square method [68]. The soft sensor of anaerobic digestion based on MLR proposed by HU assumes the following linear relationship between auxiliary variables and biogas yield [69]: where X is the auxiliary variable, θ is the parameter to be calculated, andŷ is the predicted value of biogas yield. The target parameter θ is solved by minimizing the error of the real biogas yield and the predicted biogas yield with the least squares method. However, the biochemical reaction process of anaerobic digestion is significantly nonlinear, and MLR cannot accurately describe the nonlinear process. Therefore, the anaerobic digestion soft sensor based on MLR has disadvantages such as low accuracy and susceptibility to external interference [70]. The anaerobic digestion soft sensor based on PLSR, which was proposed by Yang [1], can extract the principal components of auxiliary variables and target variables while maximizing the correlation between them [71]. The objective function of the soft sensor is expressed as maxCov(t, y) = var(t)var(y)corr(t, y) where t represents the main component of the auxiliary variable, and y denotes the COD. The Lagrange multiplier l is introduced to solve the objective function. where x and p indicate the auxiliary variable and the weight coefficient, respectively. Subsequently, the linear fitting between the principal component and the COD is realized by the MLR algorithm. This model solves the problem of the collinearity of auxiliary variables in the anaerobic digestion process. Unfortunately, the process of dimensionality reduction may eliminate the secondary principal components that are beneficial to regression and retain irrelevant noise, affecting the accuracy of the model. Meanwhile, PLSR is a linear algorithm and is only suitable for linear and weakly nonlinear models. However, there is severe nonlinearity in the anaerobic digestion process, limiting the prediction accuracy and generalization ability of the model.

Soft Sensor Based on Artificial Neural Network
Artificial neural networks can establish a non-linear mapping relationship between auxiliary variables and target variables through network learning, including back propagation (BP) neural networks and radial basis function (RBF) neural networks.
The soft sensor based on the BP neural network for the anaerobic digestion process was proposed by researchers [72][73][74][75][76][77][78]. In this soft sensor, the gradient descent algorithm is used to update the network weight. Therefore, the soft sensor can approximate the continuous nonlinear function with arbitrary precision and solve the highly nonlinear and uncertain problems in the anaerobic digestion process [79,80]. However, it is prone to fall into a local optimal or over-fitting state, affecting the prediction accuracy and generalization ability of the soft sensor [81].
To handle the complication that the anaerobic digestion soft sensor based on the BP neural network is prone to fall into the local minimum, Yilmaz proposed a soft sensor based on the RBF neural network to predict COD [82]. The soft sensor based on the RBF neural network has the characteristics of global best approximation and strong nonlinear mapping ability. The loss function of the soft sensor is expressed as where Y andŶ denote the test and predicted values of COD, respectively; λ represents the weighting factor of the regular term; D indicates the linear differential operator. With the regularization term, the curvature of the approximation function can be controlled, and the problem that the model is prone to overfitting is addressed. The soft sensor based on the neural network can better handle the problem of nonlinearity in the anaerobic digestion process. However, the performance of soft sensors is dramatically affected by the network topology and hyperparameters in practical applications. Therefore, proper hyperparameters and network topology are selected through optimization algorithms such as genetic algorithm and particle swarm optimization algorithm to improve model prediction accuracy and generalization ability [83][84][85][86][87].

Soft Sensor Based on Statistical Machine Learning
The soft sensor based on support-vector regression (SVR) uses the kernel function to map auxiliary variables to the high-dimensional feature space and adopts linear algorithms to analyze the nonlinear characteristics of the samples in the high-dimensional feature space. The convex quadratic programming is solved by the structural risk minimization criterion, which also addresses the high-dimensional and small-sample problems that cannot be solved by artificial neural networks [88]. Given the small-sample problem caused by the difficulty of obtaining target variables in the anaerobic digestion process, Kazemi proposed the soft sensor based on SVR to predict the VFA concentration [89]. The loss function of the soft sensor is expressed as The constraints are where x is the auxiliary variable; y indicates the VFA concentration; a i and a * i are Lagrangian multipliers; k(·) represents the kernel function; ε is the insensitivity coefficient.
Given the problem of high complexity in solving SVR models, a soft sensor based on least-squares support-vector regression (LS-SVR) was proposed by Liu to monitor the VFA concentration in the anaerobic digestion process in real-time [90]. In the soft sensor based on LS-SVR, the slack variable in the optimization objective is replaced with the quadratic square term of the training error.
Then, the inequality constraints are replaced with the following equality constraints.
where w and b indicates the learnable parameter of the model, γ denotes the regularization coefficient, and ξ refers to the training error. Solving the problem of convex quadratic programming is transformed into solving a set of linear equations, reducing the complexity of the model. However, the simplified soft sensor is more sensitive to abnormal values in the anaerobic digestion process, weakening the robustness of the soft sensor. Therefore, optimization algorithms are used to select the appropriate kernel function and hyperparameters to improve the prediction accuracy and generalization ability of the model [91][92][93].

Practical Application of Soft Sensors for Anaerobic Digestion
The soft sensor of anaerobic digestion is widely used in various industries owing to its advantages of low price, easy development, and maintenance. The soft sensor based on the process mechanism proposed by Fan [53] is employed to predict the bacterial concentration of high-temperature anaerobic digestion of cow manure. The kinetic model of anaerobic digestion of cow manure is expressed as: where X, P, and S denote cell concentration, product concentration, and substrate concentration, respectively; µ max and X max indicate the maximum growth rate and concentration of the bacteria, respectively; k 1 , k 2 , k 3 , and k 4 represent the cell growth rate, acid production rate coefficient, total enzyme activity, and cell activity coefficient, respectively, and the latter two factors can directly affect the cell growth rate and fermentation cycle. It can be observed that the cell concentration and substrate concentration are the direct factors affecting anaerobic digestion. Therefore, the cell growth rate, acid production rate, total enzyme activity, and cell activity are selected as auxiliary variables. However, the versatility of the soft sensor is poor. The prediction accuracy of the model will significantly decrease when fermentation conditions and fermentation batches change. The robust nonlinear observer proposed by Dochain [62] is adopted to predict the VFA concentration during the anaerobic digestion process of industrial wastewater. The mass balance equation of anaerobic digestion is expressed as: .
where X, S, and Q M indicate the methanogenic biomass, the soluble organic substrate, and the methane outflow rate, respectively; k t and k m represent the yield coefficient related to substrate degradation and the yield coefficient of methane production, respectively; u, a, and µ(·) denote the dilution rate, the proportion of bacteria that are not attached to the support, and the growth rate of methane bacteria, respectively. Considering the limited online monitoring equipment available in the actual factory, the soft sensor only uses the methane outflow rate as an auxiliary variable to predict the VFA concentration under different working conditions and has high engineering practicability. However, the prediction accuracy of the soft sensor is generally not high when an observation model is established by simplifying the biochemical reaction and mass balance equations. Strik [75] employed a soft sensor based on the BP neural network to predict the content of ammonia in biogas. According to the kinetic model of anaerobic digestion, the calculation formula of related variables in biogas can be expressed as: where C N , C TAN , T, pH, and K denote the ammonia content, the total inorganic nitrogen concentration, the reaction temperature of anaerobic digestion, the pH value of the collected sample, and the rate constant of methane production, respectively; M 0 and M t represent the methane production potential and the cumulative methane production at time t, respectively. As revealed from the model, pH, total inorganic nitrogen concentration, ammonium ion concentration, and temperature are the direct factors influencing ammonia content, and methane production is its indirect influence factor. Therefore, the ammonia content, ammonium ion concentration, total inorganic nitrogen concentration, nitrogen loading rate, pH, biogas production, and organic loading rate in the reactor are selected as the auxiliary variables of the model. However, the soft sensor lacks a real-time correction function. With the changes in actual working conditions and external interference factors, the prediction accuracy of the model will continue to decrease.

The Latest Development of Anaerobic Digestion Soft Sensor
The previous chapter introduced traditional anaerobic digestion soft sensors, reflecting the mapping relationship between auxiliary variables and target parameters to a certain extent. The characteristics of traditional soft sensors are summarized in Table 1. However, soft sensors still face many challenges in practical applications. For example: 1.
The traditional soft sensor cannot extract the deep features of auxiliary variables. The performance of traditional soft sensors depends on the auxiliary variables provided, and the selection of auxiliary variables requires rich prior knowledge [94].

2.
The traditional soft sensor does not consider the large number of unlabeled samples in the anaerobic digestion process. There are many unlabeled samples in the anaerobic digestion process. The semi-supervised learning mechanism, which is used to mine unlabeled sample information, can effectively improve the prediction performance of soft sensors [95]. 3. The traditional soft sensor does not consider the dynamic and time lag characteristics of anaerobic digestion. The traditional soft sensor cannot adapt to changes in work and production conditions, and the prediction accuracy of the soft sensor gradually deteriorates over time [96]. Meanwhile, the slow hydrolysis process of anaerobic digestion would lead to a certain time lag between the real-time monitoring variables of the acid-producing tank and the real-time monitoring variables of the methaneproducing tank. 4.
The traditional soft sensor only considers the mapping relationship between auxiliary variables and target variables while ignoring the mutual influence between auxiliary variables [97]. In the actual industry, the combined auxiliary variables are generally highly correlated with the target variable while the single auxiliary variable often has a weak correlation with the target variable. In this chapter, the latest developments in anaerobic digestion soft sensors are introduced in detail. Furthermore, suitable solutions have been proposed regarding the obstacles encountered by traditional soft sensors in the industrial production process.

Soft Sensors for Extracting Deep Features
The deep belief network (DBN) achieves the approximation of complex functions through unsupervised layer-by-layer pre-training and supervised backpropagation finetuning [98,99]. In the process of unsupervised pre-training, the auxiliary variables are subjected to nonlinear mapping through the stacked restricted Boltzmann machine to extract the abstract features of the training samples. In the process of supervised backpropagation fine-tuning, the weights are fine-tuning through the backpropagation of the supervised signal to realize the further adjustment and optimization of the weights of the network.
To overcome the dependence of the traditional anaerobic digestion soft sensor on the features selection, Li proposed a soft sensor based on a deep belief network to predict the concentration of VFA for the anaerobic digestion process [100]. The structure diagram is illustrated in Figure 2. The gradient descent algorithm cannot effectively train the deep network. Therefore, the contrast divergence (CD) algorithm is adopted to update the weights of the restricted Boltzmann machine, layer by layer: where v denotes the state vector of the visible layer, h refers to the state vector of the hidden layer, represents the learning rate, and w and b denote the weights and biases of the network, respectively. The soft sensor, with excellent feature learning capabilities, can effectively learn the essential features from the training samples and address the defects of excessive dependence on prior knowledge in feature selection. However, the random setting of the weights of DBN's output layer increases the randomness of the model's prediction performance. To further improve the stability of prediction performance and generalization performance, Li proposed to adopt the extreme learning machine (ELM) algorithm after the weights of the first n-1 layers were obtained using the CD algorithm to determine the weights of the output layer, and establish a soft sensor based on an improved deep belief network (IDBN) to predict the VFA concentration. IDBN structure diagram is presented in Figure 3.
where ℎ −1 ( ,̂) + indicates the output of the hidden layer of the n-1 layer, β represents the weights of the output layer, and y denotes the VFA concentration. Compared with the soft sensor based on DBN, the improved soft sensor has preferable prediction accuracy and generalization performance in the experimental. However, the unsupervised layerby-layer training process based on the CD algorithm requires a lot of iterative calculations, and the training process does not consider the mapping relationship between auxiliary variables and target variables. Therefore, Wang proposed a soft sensor based on the stacked supervised autoencoder combined with the kernel extreme learning machine (SSAE-KELM) algorithm to predict the VFA concentration [101]. The structure of SSAE-KELM is shown in Figure 4. The gradient descent algorithm cannot effectively train the deep network. Therefore, the contrast divergence (CD) algorithm is adopted to update the weights of the restricted Boltzmann machine, layer by layer: where v denotes the state vector of the visible layer, h refers to the state vector of the hidden layer, η represents the learning rate, and w and b denote the weights and biases of the network, respectively. The soft sensor, with excellent feature learning capabilities, can effectively learn the essential features from the training samples and address the defects of excessive dependence on prior knowledge in feature selection. However, the random setting of the weights of DBN's output layer increases the randomness of the model's prediction performance. To further improve the stability of prediction performance and generalization performance, Li proposed to adopt the extreme learning machine (ELM) algorithm after the weights of the first n-1 layers were obtained using the CD algorithm to determine the weights of the output layer, and establish a soft sensor based on an improved deep belief network (IDBN) to predict the VFA concentration. IDBN structure diagram is presented in Figure 3.
where h n−1 w i ,b i + indicates the output of the hidden layer of the n-1 layer, β represents the weights of the output layer, and y denotes the VFA concentration. Compared with the soft sensor based on DBN, the improved soft sensor has preferable prediction accuracy and generalization performance in the experimental. However, the unsupervised layer-bylayer training process based on the CD algorithm requires a lot of iterative calculations, and the training process does not consider the mapping relationship between auxiliary variables and target variables. Therefore, Wang proposed a soft sensor based on the stacked supervised autoencoder combined with the kernel extreme learning machine (SSAE-KELM) algorithm to predict the VFA concentration [101]. The structure of SSAE-KELM is shown in Figure 4. Processes 2021, 9, x FOR PEER REVIEW 11 of 21  For the soft sensor, the ELM algorithm is employed to train supervised autoencoders (SAE), and the deep features of auxiliary variables are extracted through stacked SAE. The loss function of the training process is expressed as: By minimizing the loss function, the output weight is obtained: refers to the auxiliary variables; represents the VFA concentration; denotes the hidden layer output; 1 and 2 indicate the hidden layer weights and the supervised item weights, respectively; 1 and 2 are the weight coefficients. Finally, the kernel extreme learning machine is adopted to establish a regression model to predict the VFA concentration on the extracted deep abstract features. Compared with soft sensors based on IDBN, the soft sensor introduces supervised items by improving the loss function. As a result, the soft sensor can extract the deep features of the auxiliary variable while   For the soft sensor, the ELM algorithm is employed to train supervised autoencoders (SAE), and the deep features of auxiliary variables are extracted through stacked SAE. The loss function of the training process is expressed as: By minimizing the loss function, the output weight is obtained: refers to the auxiliary variables; represents the VFA concentration; denotes the hidden layer output; and indicate the hidden layer weights and the supervised item weights, respectively; and are the weight coefficients. Finally, the kernel extreme learning machine is adopted to establish a regression model to predict the VFA concentration on the extracted deep abstract features. Compared with soft sensors based on IDBN, the soft sensor introduces supervised items by improving the loss function. As a result, the soft sensor can extract the deep features of the auxiliary variable while For the soft sensor, the ELM algorithm is employed to train supervised autoencoders (SAE), and the deep features of auxiliary variables are extracted through stacked SAE. The loss function of the training process is expressed as: By minimizing the loss function, the output weight is obtained: where X refers to the auxiliary variables; Y represents the VFA concentration; H denotes the hidden layer output; r 1 and r 2 indicate the hidden layer weights and the supervised item weights, respectively; C 1 and C 2 are the weight coefficients. Finally, the kernel extreme learning machine is adopted to establish a regression model to predict the VFA concentra-tion on the extracted deep abstract features. Compared with soft sensors based on IDBN, the soft sensor introduces supervised items by improving the loss function. As a result, the soft sensor can extract the deep features of the auxiliary variable while considering the mapping relationship between the auxiliary variable and the VFA concentration. Then, it can extract the essential features that have a greater impact on the VFA concentration. Moreover, the ELM algorithm is used to compensate for the shortcomings of the slow training speed of the traditional CD algorithm and improve the training efficiency of the model.

Soft Sensors for Extracting Information from Unlabeled Samples
In the anaerobic digestion process, the long period and high cost of target variable collection make it difficult for soft sensors to obtain sufficient labeled samples [102]. However, there are many unlabeled samples composed of process variables in the industrial process. With the semi-supervised learning mechanism, the information of unlabeled samples can be fully mined, and the prediction accuracy and generalization ability of soft sensors are improved. In recent years, semi-supervised learning mechanisms have been widely used in deep neural networks. Therefore, Yan proposed a soft sensor based on the semi-supervised hierarchical extreme learning machine to predict VFA concentration in the anaerobic digestion process [103]. The model structure of the semi-supervised hierarchical extreme learning machine is illustrated in Figure 5. considering the mapping relationship between the auxiliary variable and the VFA concentration. Then, it can extract the essential features that have a greater impact on the VFA concentration. Moreover, the ELM algorithm is used to compensate for the shortcomings of the slow training speed of the traditional CD algorithm and improve the training efficiency of the model.

Soft Sensors for Extracting Information from Unlabeled Samples
In the anaerobic digestion process, the long period and high cost of target variable collection make it difficult for soft sensors to obtain sufficient labeled samples [102]. However, there are many unlabeled samples composed of process variables in the industrial process. With the semi-supervised learning mechanism, the information of unlabeled samples can be fully mined, and the prediction accuracy and generalization ability of soft sensors are improved. In recent years, semi-supervised learning mechanisms have been widely used in deep neural networks. Therefore, Yan proposed a soft sensor based on the semi-supervised hierarchical extreme learning machine to predict VFA concentration in the anaerobic digestion process [103]. The model structure of the semi-supervised hierarchical extreme learning machine is illustrated in Figure 5. Hierarchical extreme learning machine (HELM) is a multi-layer feedforward neural network composed of a multi-layer extreme learning machine-autoencoder (ELM-AE). During the training process, ELM-AE can achieve the lossless reconstruction of auxiliary variables. Therefore, the combined feature information of auxiliary variables can be extracted to a certain extent when the number of neurons in the hidden layer of ELM-AE is less than the number of neurons in the input layer [104]. The reconstruction loss function of ELM-AE is expressed as: The reconstruction loss function is minimized to obtain the output weight.
where indicates the weight of the output layer of ELM-AE; is the weight factor; denotes the output of the hidden layer; and represent auxiliary variables and VFA concentration, respectively. Manifold regularization is used as a semi-supervised learning Hierarchical extreme learning machine (HELM) is a multi-layer feedforward neural network composed of a multi-layer extreme learning machine-autoencoder (ELM-AE). During the training process, ELM-AE can achieve the lossless reconstruction of auxiliary variables. Therefore, the combined feature information of auxiliary variables can be extracted to a certain extent when the number of neurons in the hidden layer of ELM-AE is less than the number of neurons in the input layer [104]. The reconstruction loss function of ELM-AE is expressed as: Processes 2021, 9, 1434 13 of 21 The reconstruction loss function is minimized to obtain the output weight.
where γ indicates the weight of the output layer of ELM-AE; C is the weight factor; J denotes the output of the hidden layer; X and Y represent auxiliary variables and VFA concentration, respectively. Manifold regularization is used as a semi-supervised learning mechanism to learn the distribution of unlabeled samples. It can preserve the manifold domain relationship between the data vectors in the original space. The essential idea of manifold regularization is to keep the local geometric structure of the original feature space in the new projection space. The loss function of HELM that introduces the manifold regularization term is: The loss function is minimized to acquire the output weight.
where γ indicates the output layer weight of HELM; λ is the weight factor; Tr(·) represents the trace of the matrix; L refers to the graph Laplacian matrix; H andŶ denote the hidden layer output and prediction output of all samples, respectively. Compared with traditional soft sensors, soft sensors based on a semi-supervised learning mechanism can learn both unlabeled sample information and label sample information. The semi-supervised learning mechanism can make full use of many unlabeled samples in the industrial process, contributing to the improvement of the prediction accuracy and generalization ability of soft sensors.

Soft Sensors for Extracting Dynamic Information
In the industrial production process of anaerobic digestion, changes in operating tasks, production materials, and production environment would cause changes in system operating conditions, making the prediction accuracy of soft sensors gradually decrease over time. Moreover, the different start-up times of the methane tank could lead to large differences in the digestion degree, substrate concentration, and biological activity, leading to inconsistent data distribution in the original data set. To handle this complication, Wang proposed to use the domain space transfer extreme learning machine (DSTELM) algorithm to adjust the data distribution [103]. The reconstruction loss function of DSTELM is: where c and λ are weighting factors; r denotes the output weight; X T represents the auxiliary variables of the test set; H = [H S ; H T ] indicates the output of the hidden layer; Tr(·) refers to the trace of the matrix. The M is defined as: The loss function is minimized to obtain the output weight.
where C = diag(0 n S ×n S , c, c, . . . , c). The algorithm can minimize the distribution distance between the training set and the test set while retaining the essential characteristics of the test set. Moreover, it can address the problem of low model prediction accuracy caused by the inconsistent data distribution of the training set and the test set. Furthermore, a soft sensor based on the domain space migration hierarchical extreme learning machine (DSTHELM) is established by stacking DSTELM to extract the deep features of auxiliary variables. Compared with traditional soft sensors, soft sensors based on DSTHELM can better adapt to modal changes and data drift and thus present higher prediction accuracy and generalization ability. Additionally, the hydrolysis reaction process is slow in the anaerobic digestion process, resulting in a certain time lag between the real-time monitoring variables of the acidgenerating tank and the real-time monitoring variables of the methane generating tank. This suggests that the target variable is affected by the auxiliary variable in the current state, the changes in the operating conditions, and production conditions at the last moment, as well as the target variable in the current state. Therefore, Mccormick proposed a dynamic soft sensor based on long short-term memory (LSTM) network to predict biogas yield [105]. The LSTM structure is exhibited in Figure 6. state, the changes in the operating conditions, and production conditions at the last moment, as well as the target variable in the current state. Therefore, Mccormick proposed a dynamic soft sensor based on long short-term memory (LSTM) network to predict biogas yield [105]. The LSTM structure is exhibited in Figure 6. In the training process, the soft sensor realizes the retention or deletion of current information and historical information through the gate control unit. The input gate determines the extent to which the current input is retained to the current state. The forget gate determines the extent to which the state at the previous moment is retained to the current state. The output gate determines the extent to which the current state is retained to the output. The specific formulas are where , , ℎ , and σ represent the input gate, the forget gate, the output gate, and the sigmoid activation function, respectively. The soft sensor can extract the different characteristics of the auxiliary variable at different times. Meanwhile, the soft sensor can retain historical biogas yield and its main influencing factors as auxiliary variables for current biogas yield forecasting, realizing the persistence of historical information.
The dynamic soft sensor considers the influence of historical data on the current state and overcomes the defect that the traditional soft sensor neglects the time scale information. Therefore, the dynamic soft sensor, to a certain extent, addresses the time lag caused by the slow reaction of the anaerobic digestion process. Furthermore, a dynamic soft sensor based on a combined convolutional neural network and long short-term memory network is established using the deep feature extraction ability of the convolutional neural network and the dynamic information extraction ability of LSTM to predict biogas yield. It can effectively extract the deep features of the data while using LSTM for timing error compensation. Thus, dynamic correction of the model is realized, and the prediction accuracy and generalization ability of the model are further improved. In the training process, the soft sensor realizes the retention or deletion of current information and historical information through the gate control unit. The input gate determines the extent to which the current input is retained to the current state. The forget gate determines the extent to which the state at the previous moment is retained to the current state. The output gate determines the extent to which the current state is retained to the output. The specific formulas are where i t , f t , h t , and σ represent the input gate, the forget gate, the output gate, and the sigmoid activation function, respectively. The soft sensor can extract the different characteristics of the auxiliary variable at different times. Meanwhile, the soft sensor can retain historical biogas yield and its main influencing factors as auxiliary variables for current biogas yield forecasting, realizing the persistence of historical information. The dynamic soft sensor considers the influence of historical data on the current state and overcomes the defect that the traditional soft sensor neglects the time scale information. Therefore, the dynamic soft sensor, to a certain extent, addresses the time lag caused by the slow reaction of the anaerobic digestion process. Furthermore, a dynamic soft sensor based on a combined convolutional neural network and long short-term memory network is established using the deep feature extraction ability of the convolutional neural network and the dynamic information extraction ability of LSTM to predict biogas yield. It can effectively extract the deep features of the data while using LSTM for timing error compensation. Thus, dynamic correction of the model is realized, and the prediction accuracy and generalization ability of the model are further improved.

Soft Sensors for Extracting Spatiotemporal Information
In recent years, the graph convolutional network (GCN) has been widely used, owing to its powerful feature representation ability [106]. GCN can reduce the complexity of the soft sensor through the parameter sharing of the convolution kernel in the local area. Moreover, the adjacency matrix of the GCN enables the soft sensor to quantify the mutual influence between auxiliary variables, that is, considering the degree of influence of surrounding nodes on the target node and extracting the spatial information of the sample data. In the actual industry, the combined auxiliary variables are generally highly correlated with the target variable while the single auxiliary variable often has a weak correlation with the target variable. Therefore, researchers proposed a soft sensor based on GCN to predict VFA concentration [107]. The GCN structure is exhibited in Figure 7. lated with the target variable while the single auxiliary variable often has a weak correlation with the target variable. Therefore, researchers proposed a soft sensor based on GCN to predict VFA concentration [107]. The GCN structure is exhibited in Figure 7. The output of the soft sensor can be expressed as: indicates the auxiliary variable; refers to the output of the soft sensor; represents the nonlinear activation function; ̂ is the normalized adjacency matrix; denotes the learnable convolution kernel parameter. A proper adjacency matrix can be adopted to effectively extract the spatial information between auxiliary variables and improve the prediction accuracy and generalization ability of the soft sensor. Since the maximal information coefficient (MIC) can calculate the correlation between auxiliary variables, the normalized MIC is used to construct the adjacency matrix. = ( , ) where represents the MIC between auxiliary variables and ; denotes the normalized MIC between auxiliary variables and ; indicates the normalization function. Compared with the traditional soft sensor, the soft sensor can learn the spatial information of the auxiliary variable by fully considering the influence of the combined feature information on the VFA concentration.
Given the dynamic characteristics and time lag characteristics of the anaerobic digestion process, a dynamic soft sensor based on the spatiotemporal graph convolutional network (STGCN) is established by introducing a gated recurrent unit (GRU). GRU can learn the dynamic changes of sample data to capture time information and consider the impact of historical sample information on current sample information. Therefore, this soft sensor can simultaneously consider the time information and spatial information of the anaerobic digestion process data. The structure of STGCN is presented in Figure 8. The output of the soft sensor can be expressed as: where X indicates the auxiliary variable; Y refers to the output of the soft sensor; f represents the nonlinear activation function;Â is the normalized adjacency matrix; W denotes the learnable convolution kernel parameter. A proper adjacency matrix can be adopted to effectively extract the spatial information between auxiliary variables and improve the prediction accuracy and generalization ability of the soft sensor. Since the maximal information coefficient (MIC) can calculate the correlation between auxiliary variables, the normalized MIC is used to construct the adjacency matrix.
where m ij represents the MIC between auxiliary variables i and j; α ij denotes the normalized MIC between auxiliary variables i and j; so f tmax indicates the normalization function. Compared with the traditional soft sensor, the soft sensor can learn the spatial information of the auxiliary variable by fully considering the influence of the combined feature information on the VFA concentration.
Given the dynamic characteristics and time lag characteristics of the anaerobic digestion process, a dynamic soft sensor based on the spatiotemporal graph convolutional network (STGCN) is established by introducing a gated recurrent unit (GRU). GRU can learn the dynamic changes of sample data to capture time information and consider the impact of historical sample information on current sample information. Therefore, this soft sensor can simultaneously consider the time information and spatial information of the anaerobic digestion process data. The structure of STGCN is presented in Figure 8. During the training process, the STGCN can better handle the spatial and temporal characteristics of samples. The combined feature information of the sample is extracted using GCN to obtain its spatial dependence. Moreover, GRU is used to capture the dynamic change information of historical information and obtain temporal dependence. The specific calculation formulas are: is the adjacency matrix; ( , ) represents the graph convolution process; denotes the reset gate; represents the update gate; ℎ refers to the state of the hidden layer; is the activation function; ⊙ represents the Hadamard product. Compared with the traditional soft sensor, the dynamic soft sensor based on STGCN can effectively extract the time information and spatial information from the anaerobic digestion process data, contributing to the achievement of the accurate prediction of the current VFA concentration.

Conclusions
The anaerobic digestion process is a time-varying, non-linear, and highly complex system with constraints. It is difficult to establish an accurate mechanism model to describe the anaerobic digestion process. The soft sensor based on regression analysis is more suitable for handling linear problems. However, there are strong nonlinear characteristics in the anaerobic digestion process. Soft sensors based on artificial neural networks are significantly affected by the network topology and the quality of training samples. They are prone to a local optimal or over-fitting state. Moreover, their generalization ability is weak. The soft sensor based on statistical learning is not suitable for processing largescale data and is unable to monitor the anaerobic digestion process in real-time with high precision. However, soft sensors based on deep learning can learn essential features from training samples, introduce a semi-supervised learning mechanism to fully use unlabeled sample information, consider the dynamic characteristics in actual working conditions and the mutual mapping relationship between auxiliary variables, and extract the time information and space information of the sample data. Therefore, the soft sensor based on During the training process, the STGCN can better handle the spatial and temporal characteristics of samples. The combined feature information of the sample is extracted using GCN to obtain its spatial dependence. Moreover, GRU is used to capture the dynamic change information of historical information and obtain temporal dependence. The specific calculation formulas are: where A is the adjacency matrix; f (X t , A) represents the graph convolution process; r t denotes the reset gate; z t represents the update gate; h refers to the state of the hidden layer; σ is the activation function; represents the Hadamard product. Compared with the traditional soft sensor, the dynamic soft sensor based on STGCN can effectively extract the time information and spatial information from the anaerobic digestion process data, contributing to the achievement of the accurate prediction of the current VFA concentration.

Conclusions
The anaerobic digestion process is a time-varying, non-linear, and highly complex system with constraints. It is difficult to establish an accurate mechanism model to describe the anaerobic digestion process. The soft sensor based on regression analysis is more suitable for handling linear problems. However, there are strong nonlinear characteristics in the anaerobic digestion process. Soft sensors based on artificial neural networks are significantly affected by the network topology and the quality of training samples. They are prone to a local optimal or over-fitting state. Moreover, their generalization ability is weak. The soft sensor based on statistical learning is not suitable for processing large-scale data and is unable to monitor the anaerobic digestion process in real-time with high precision. However, soft sensors based on deep learning can learn essential features from training samples, introduce a semi-supervised learning mechanism to fully use unlabeled sample information, consider the dynamic characteristics in actual working conditions and the mutual mapping relationship between auxiliary variables, and extract the time information and space information of the sample data. Therefore, the soft sensor based on deep learning has higher prediction accuracy and generalization ability. The general idea of this paper is illustrated in Figure 9. At present, a soft sensor for anaerobic digestion based on deep learning can be further developed. In the industrial production process, the mechanism model is combined with deep learning to enhance the interpretability of the soft sensor and realize the closed-loop guidance of the industrial process. Furthermore, the difficulty of sample collection during anaerobic digestion hinders researchers to obtain enough samples to train soft sensors. Therefore, constructing generated samples by the generative adversarial network is an effective solution for the shortage of soft sensor training samples.   At present, a soft sensor for anaerobic digestion based on deep learning can be further developed. In the industrial production process, the mechanism model is combined with deep learning to enhance the interpretability of the soft sensor and realize the closed-loop guidance of the industrial process. Furthermore, the difficulty of sample collection during anaerobic digestion hinders researchers to obtain enough samples to train soft sensors. Therefore, constructing generated samples by the generative adversarial network is an effective solution for the shortage of soft sensor training samples.