Causal Network Structure Learning Based on Partial Least Squares and Causal Inference of Nonoptimal Performance in the Wastewater Treatment Process

: Due to environmental ﬂuctuations, the operating performance of complex industrial processes may deteriorate and affect economic beneﬁts. In order to obtain maximal economic beneﬁts, operating performance assessment is a novel focus. Therefore, this paper proposes a whole framework from operating performance assessment to nonoptimal cause identiﬁcation based on partial-least-squares-based Granger causality analysis (PLS-GC) and Bayesian networks (BNs). The proposed method has three main contributions. First, a multiblock operating performance assessment model is established to correspondingly extract economic-related information and dynamic information. Then, a Bayesian network structure is established by PLS-GC that excludes the strong coupling of variables and simpliﬁes the network structure. Lastly, nonoptimal root cause and and nonoptimal transmission path are identiﬁed by Bayesian inference. The effectiveness of the proposed method was veriﬁed on Benchmark Simulation Model 1.


Introduction
Along with the continuous development of industrial technology, the requirements of modern industry are increasing. Process monitoring is no longer limited to fault detection, and the operating state of industrial process with low economic benefits needs detection. Even though nonoptimal operating state is not as serious as faults, it still affects the economic benefits of the process. In order to ensure the economic benefits of processes, a nonoptimal operating state needs to be immediately detected. Due to production environment changes, equipment aging, parameter drift, etc., industrial processes may deviate from the optimal state, showing multimode characteristics. Therefore, operating performance assessment is increasingly important, and it divides operating conditions into an optimal and multiple nonoptimal grades according to the economic benefits of the corresponding states. Due to the high complexity of industry processes, it is difficult to establish a model according to the process mechanism alone. Data-driven methods are attracting increasing attention [1,2], and many basic data-driven methods were applied in performance assessment, such as principal component analysis (PCA) [3]. Then, with the enlargement of data, complex characteristics in these data have gradually attracted the attention of researchers. For example, in consideration of the nonlinearity of the process, Liu et al. [4] put forward a method based on kernel total projection to latent structures and kernel-optimality-related variations. Considering the existence of process noise and outliers, Chu et al. [5] proposed a total robust kernel projection to the latent structure algorithm. The above methods aimed at single-process characteristic problems, while operating performance assessment was oriented to complex industrial systems with multiple process characteristics. In order to evaluate performance in a multiple-characteristic process, the whole process can be divided into multiple blocks [4,[6][7][8].
In addition, in order to improve the comprehensive production profit, when the process enters nonoptimal state, operators need to make adjustments according to different nonoptimal causes. The process of nonoptimal cause identification is similar to fault diagnosis. Venkatasubramanian et al. [9] classified fault diagnosis methods into qualitative-model-, quantitative-model-, and process-history-based methods. The most popular quantitative-model-based method is contribution plots [10] , which calculates the influence of each variable to the statistics to identify the cause variable. Because the basic contribution-plot method suffers from the fault smearing effect, Cheng et al. [11] proposed a moving average residual difference reconstruction contribution plot to identify the root cause in wastewater treatment processes. In addition, due to the existence of dynamic characteristics, Li et al. [12] proposed a dynamic time-warping-based causality analysis method to perform root diagnosis for nonstanionary faults. However, these methods can only find the most related variable to the fault. Detecting the cause-effect relationship between variables is needed, on which many studies were conducted [13]. The most prevalent measure is Granger causality analysis (GC) [14]. Granger causality analysis was initially used in the time series of economic studies, with the continuous research of other scholars, it has shown promise in many other fields [15,16]. Because GC is only useful in linear processes, aiming at nonstationarity and nonlinearity in industrial processes, Chen et al. [17] embedded Gaussian process regression into a multivariate Granger causality framework. Transfer entropy can also effectively solve nonlinear problems. Lindner et al. [18] proposed a method to find the optimal parameters of transfer entropy by the dynamism of the process. In addition to the above methods, Bayesian networks are also widely used in root-cause identification [19] because they have the structure of directed acyclic graph (DAG), which is very helpful in the identification of fault transmission paths. However, it is difficult to construct Bayesian networks in a complex industrial process. Therefore, the hierarchical approach was used in many studies. Chen et al. [20] constructed a hierarchical Bayesian network structure, and built a statistical index for process monitoring and fault diagnosis. Suresh et al. [21] proposed a hierarchical approach to capture cyclic and noncyclic features.
Although the above methods are effective, they also come with defects. For example, GC can only be used for linear systems, transfer entropy is sensitive to parameters, and BN is only suitable for directed acyclic graphs (DAGs). Chemical industrial processes are composed of a large number of process variables with high correlation, so the causal structure is probably not acyclic. Therefore, in this paper, contribution plots are used to select some variables before constructing the causal network, which simplifies the calculation and reduces the possibility of generating cynic structures. Then, considering the strong coupling characteristics between process variables, partial least squares (PLS) algorithm [22,23] is incorporated into the regression operation of GC to eliminate the influence of the correlation between variables. This operation also reduces the number of detected causal relationships, further reducing the possibility of generating cynic structures in causal network. Lastly, on the basis of the causal structure established by PLS-GC, the root cause can be identified through BN. The main contributions of this paper are summarized as follows: (1) A complete framework from operating performance assessment to nonoptimal cause identification is established. Process data are divided into multiple operating grades, so that field operators can detect operational states with poor economic benefits and adjust them in time. (2) In order to establish a causality network, contribution plots and Granger causality analysis are used in this paper, which avoid the NP-hard problem of searching for the causal network structure. (3) PLS-GC method is proposed to replace simple GC, which can remove false causalities caused by variable coupling and reduce the possibility of generating a cyclic structure in causal networks.
(4) Through Bayesian network inference, both nonoptimal causes can be identified, and the transmission path of nonoptimal causes can be obtained.
The rest of this paper is organized as follows. In Section 2, the basic concepts of GC and BN are introduced. Subsequently, Section 3 presents an operating performance assessment strategy and the nonoptimal root-cause identification method. In Section 4, the effectiveness of the method is proved on the basis of an experiment on Benchmark Simulation Model 1. Section 5 concludes this work.

Granger Causality Analysis
In the definition of GC [14], if a variable X 1 causes another variable X 2 , knowing the past of X 1 is beneficial in predicting X 2 . Consider two time series, X 1 and X 2 . If the prediction of X 1 considering the past information of X 1 and X 2 is more accurate than that only considering the past information of X 1 , according to the definition of GC, X 2 is considered to be the Granger cause of X 1 .
An autoregression model that contains only the past information of X 1 is constructed as follows: If the past information of X 1 and X 2 is taken into consideration, the union-regression model is defined as follows: where p and q are the lags of X 1 and X 2 , respectively, which can be determined by the Akaike or Bayesian information criterion. a 0 , b 0 , a i and b j denote the regression coefficient. ε 1 (t) and ε 12 represent the autoregressive residual of X 1 and the union-regression residual of X 1 and X 2 , respectively. According to the definition of GC, prediction accuracy is expressed by the variance of residuals. Granger indicators from X 1 to X 2 can be constructed as follows: If F X 2 →X 1 > 0, X 2 can be considered the Granger cause of X 1 , and if F X 2 →X 1 ≤ 0, there is no causal relationship between X 1 and X 2 .
Then, the statistical significance of F X 2 →X 1 can be tested by F statistics: where RSS AR and RSS UR represent the residual sum of squares of autoregression and union-regression respectively. N denotes the number of samples.
There are many variables in the actual industrial process. In order to solve multivariate problems, researchers extended GC to multivariate conditional Granger causality analysis. The past information of other variables is introduced into autoregression and union regression in order to reduce their interference on causal analysis between X 1 and X 2 . The autoregression and union-regression models are calculated as follows: where m is the number of variables, a ij represents regression coefficient of variable j when time delay is i. Similar to GC, the causal relationship between variables X 1 and X 2 can also be analyzed by F statistics.

Bayesian Network
Bayesian network [24] is a direct graphical model composed by nodes and directed edges. Through a directed acyclic graph, the conditional dependencies of a set of variables are presented [25]. In the last few decades, BN has been widely used in fault diagnosis [26][27][28].
Because BN is a model based on a causal graph, it is suitable for root identification. Fault transmission paths can be obtained through the causal relationship structure of BN.

Fundamentals of Bayesian Networks
BN consists of a qualitative and a quantitative part. The qualitative part is the Bayesian network structure, and the quantitative part is the conditional probability table (CPT) that represents dependencies between variables. The structure of BN can be expressed as follows: where G represents the network structure, and P represents network parameters. G = V, E , where V denotes the variable set, and E denotes an unidirectional arc set that describes dependencies between variables. In chemical processes with recycling, we can use duplicate dummy variables [29] to remove the circular structure, as shown in Figure 1. Node A is divided into two nodes so as to construct a directed acyclic graph. Considering n nodes of BN X = {X 1 , X 2 , · · · , X n }, the general expression of Bayesian networks is as follows: where pa(X i ) denotes the parent nodes of X i , P(X 1 , X 2 , · · · , X n ) is joint probability distribution. Network parameters consist of prior and conditional probabilities. Prior probabilities are usually calculated according to expert knowledge observations. Conditional probabilities are usually contained in CPT. For instance, a simple Bayesian network model is given in Figure 2. On the left is a Bayesian network structure, and on the right is the CPT of node E, where E = 0 represents that probability E does not occur. Because C is a parent node of E, whether C occurs should also be taken into consideration in the calculation.

Inference Algorithm of Bayesian Network
BN can perform backward inference through the Bayes theorem. The BN inference problem is NP-hard, and many scholars conducted indepth research and achieved extensive progress [30,31]. In general, inference algorithms can be divided into two categories: exact and approximate inference. Exact inference, such as junction trees, can obtain the exact probability value of each node, while approximate inference uses statistical methods to compute approximate probabilities. Next, two exact inference algorithms are introduced: variable elimination and belief propagation.
The variable elimination algorithm is a basic exact inference algorithm. The core idea is dynamic programming. Conditional independence is used to reduce calculation. Through changing the operational order of summation and product, the elimination order calculating joint probability is changed. Figure 3a is an example, and X 1 , X 2 , X 3 , X 4 , X 5 are nodes in BN. If our target is calculating marginal probability P(X 5 ), X 1 , X 2 , X 3 , X 4 should be eliminated.
According to the relationships between variables, it can be rewritten as: If we change the calculation order, there is Then, m ij (X j ) is used to represent the intermediate results, in which i means it is the result of summing X i and j represents the other variables in this item. According to this idea, the above formula can be transformed into P(X 5 ) = m 35 (X 5 ), which is only related to X 5 . The calculation is simplified.
A belief propagation exact inference algorithm is proposed on the basis of a variable elimination algorithm to overcome redundant computation In belief propagation algorithms, and summation operation is a process of message passing as shown in Figure 3b. The process of message passing comprises two steps. First, a root node is assigned, and messages are delivered to the root node from all leaf nodes until the root node receives messages from all adjacent nodes. Then, the message is delivered from the root node to leaf nodes until all leaf nodes receive messages. Because each variable node receives information from adjacent nodes, we can calculate the edge probability distribution of each variable.

Method Development
In order to obtain better economic benefits, we need to monitor the operational process so as to detect nonoptimal operation states in time. When the operating process is nonoptimal, it is necessary to identify its root cause for further adjustment. Complex industrial processes are composed of a large number of high coupling process variables, many of which may change during nonoptimal grades, but we need to find out which variable causes the change in others, that is, the root cause of nonoptimal grades. Therefore, a framework from operating performance assessment to nonoptimal cause identification is established in this section.

Establishment of Operating Optimality Assessment Model
Inspired by the multiblock technique, mutual information is used to divide process variables into two blocks. Then, dynamic and economic relevant information is extracted from two blocks. The process of performance assessment is shown as Figure 4. First, training process data are divided into several operating performance grades according to the comprehensive economic indicator (CEI). Here, we assumed that process data were from three different performance grades. Then, the performance grade with the highest economic benefit was labeled as good, and the two other grades are labeled as medium or poor, and both are considered to be nonoptimal. According to the correlation between variables and economic indicators, we divided process variables into economic-related (ERS) and economic-independent (EIS) subspaces. Correlation is measured by mutual information (MI). MI is a measure of interdependence between random variables [32].
where H(X) represents the information entropy of X and H(X|Y) represents conditional entropy, which means the uncertainty of X while Y is known. From the perspective of probability, MI is represented as where p(x, y) denotes joint probability distribution, and p(x), p(y) denotes marginal probability distribution. Offline training data from one performance grade are denoted as where m represents the number of process variables and n represents the number of samples. The corresponding economic indicator (CEI) is represented as y CEI ∈ R 1×n . According to the mutual information between each variable and economic index, process variables are divided into two subspaces: economic subspace X E and economic-independent subspace X D . ERS contains more directly related information to the economic indicator, so canonical correlation analysis (CCA) is used to extract economic information in ERS and construct T 2 D statistics [33]. Although EIS contains less economic related information, it covers a lot of process variation information. Therefore, slow feature analysis (SFA) [34] is used to extract dynamic features and construct T 2 E statistics. Then, for an input sample x k with statistics T 2 l , the probability that x k belongs to operating performance grade C l is expressed as: where Pr(C 1 ) is the probability that the current sample belongs to grade C l , and Pr(C l ) is the prior probability that the current sample belongs to other grades. Pr[T 2 l (x k )|C l ] and Pr[T 2 l (x k )|C l ] are likelihoods calculated as: Lastly, according to Bayesian inference, global probability can be obtained [35], which is also the similarity index of x k to grade C l : which combines the statistics of the two subspaces.

Nonoptimal Root Cause Identification
In order to maintain the best performance of an operating process, operators must know the cause of nonoptimal states. In this section, a data-based cause identification method is proposed. A system block diagram is given in Figure 5. Nonoptimal root cause identification comprises two steps. First, a causal network is constructed. Then, the evidence node is determined, and the nonoptimal root cause is identified.

Causal Network Establishment
First, the network structure need to be constructed. Because chemical processes are composed of complex structures, we conducted the preliminary screening of variables with contribution plots and selected the candidate variables most related to the economic indicators to reduce calculation. The PCA contribution plot method is simple and intuitive, and was widely studied in fault diagnosis [4,10].
Assume that map matrix P of PCA was obtained. The T 2 statistic of sample x is: where S is covariance matrix of training data. Then, the contribution rate can be calculated as: where ξ i is a unit column vector. If the contribution rate is above average, the corresponding process variable is included in the candidate variable set. After selecting the candidate variable set, the network structure can be constructed with PLS-GC. Because multivariate conditional Granger causality analysis only uses the least-squares method to construct regression models, it is easy to mistakenly regard correlation between variables as a causal relationship, which leads to wrong results. Therefore, PLS is used to replace the autoregression model in GC, which can effectively deal with union correlation in multiple variables, and construct a network structure with clear causality relationships. Because GC requires variables to be stationary, we need to ensure that candidate variables are all stationary. Generally, stationarity is tested by whether there is a unit root in the time series. Therefore, the augmented Dickey-Fuller (ADF) unit root test [36] was conducted in advance. Differential operation is performed on nonstationary variables.
The specific process of network construction is given in Figure 6: 1.
Candidate variables are selected for causality structure construction by PCA contribution plots.

2.
ADF test is conducted to ensure that all variables are stationary 3. For all stationary variables x 1 , x 2 , · · · , x n , we selected two variables x i and x j , i, j ∈ {1, · · · , n}, establish autoregression model and union-regression model by PLS and calculate F statistics. If p value of F statistic is in the confidence interval β , there is a causal relationship from x i to x j . Such a causality test is conducted on each pair of variables in turn.

4.
A causal network structure is constructed and adjusted according to expert knowledge.

Nonoptimal Cause Identification
Before nonoptimal cause identification, we need to calculate the network parameters, that is, nonoptimal and conditional probabilities. Nonoptimal probability is generally determined by parameter learning with historical data. Since process data were known, maximum likelihood estimation (MLE) could be used to learn the parameters of the Bayesian network.
MLE is the most popular parameter learning method at present, which is effective and suitable for large-scale datasets. The whole process contains optimal and nonoptimal processes. Data in optimal state were considered within the threshold range, and data in nonoptimal state were considered outside the threshold range. The threshold range of each variable can be determined by kernel density estimation [37], so that each variable can be divided into two states.
Given a sample set containing p variables, D= u 1 , u 2 , · · · , u p , each of which contains N samples, u i = {u i1 , u i2 , · · · , u iN }, i = 1, 2, · · · , p, MLE requires sample set D to satisfy independent identically distributed hypothesis, so the joint probability can be written as follows: where L(Θ|D) is the likelihood function of P(D|θ). If variable u i comprises r i values, and its parent node π u i is composed of q i different combination values, then the unknown parameters can be expressed as Θ = θ ijk |i = 1, · · · , p; j = 1, · · · , q i , k = 1, · · · r i : where θ ijk represents the probability of u i being k when the parent node is j. In order to find the parameters that satisfy the following condition: where the logarithmic likelihood function L(Θ|D) can be expressed as: where m ijk is the number of u i in D with the value k and parent nodes with the value j.
Then calculate maximum likelihood values and obtain the CPT of each node. After determining the network structure and parameters of BN, network inference can be carried out. First, evidence nodes are determined according to expert knowledge and contribution plots. Second, the posterior probability of each variable is calculated with sum-product belief propagation, and the CPT of Bayesian network is updated. Detailed steps of BN nonoptimal cause identification are given as follows: 1.
On the basis of PCA contribution plots, select the candidate variable set.

2.
Construct the network structure on the basis of PLS-GC.

3.
Calculate the conditional probability and obtain CPT.

4.
Determine evidence variable according to industry field experience and contribution plots. Among the resulting variables in the causal network, the one with the largest contribution is the evidence node. Then, update the network parameters with the belief propagation method and identify the root nonoptimal variable.

Process Description
Benchmark Simulation Model 1 (BSM1) was developed by the International Water Quality Association and the European Cooperation in the field of Scientific and Technical Research, and was widely used in control simulation and performance evaluation of wastewater treatment process. Activated sludge model ASM1 and double index secondary sedimentation tank model were used to simulate the actual wastewater treatment process. The research object of BSM1 is predenitrification biological nitrogen removal technology, which is composed of five activated sludge reaction units and a secondary sedimentation tank. The structure diagram of the model is shown in Figure 7 [38]. The activated sludge reactor comprises two anoxic tanks and three aerobic tanks. Part of the water from the reaction tank flows into the sixth layer of the secondary sedimentation tank, wastewater is discharged from the tenth layer after sedimentation, and the excess sludge is discharged from the bottom layer. In order to evaluate the performance of BSM1 under different operating conditions, the model provides an overall cost index (OCI) [39], which reflects the overall cost of the whole process during operation and is calculated as follows: where AE is aeration energy consumption, PE is pump energy consumption, SP is sludge production, EC is external carbon source consumption, and ME is mixing energy consumption.

Data Preparation
In order to verify the effectiveness of the proposed method, two training datasets with three performance grades were simulated on BSM1. The dissolved oxygen concentration of tank 5 was changed in Dataset 1, which was set to 2 , 3, and 4 gCOD/m 3 , respectively. Nitrate and nitrite concentration of tank 2 was changed in Dataset 2, which was set to 1, 1.5, and 2 gN/m 3 . According to the OCI of the training data, the operational state with higher overall cost was of poor grade. In this way, training data were divided into three grades (good, medium, poor).
All experiments were carried out on MATLAB. Process variables of training data are given in Table 1. According to the structural schematic diagram in Figure 7, variable location and relationships are given in Figure 8. In order to render the model more realistic, a delay device with a delay factor of 0.001 was added after the first reaction tank of model BSM1.

Operating Performance Assessment and Nonoptimal Cause Identification
First, performance grades were divided according to the average OCI of each states as shown in Figure 9. The operating state with minimal OCI was considered to be of good grade. Then, the mutual information between each variable was calculated, and process variables were divided into two subspaces. Parameters are set as follows: window width, 10; threshold of mutual information, 0.3; threshold of similarity index, 0.95. The similarity index of the test dataset is given in Figure 10. In order to prove the effectiveness of this method, we conducted a comparative experiment on a test dataset. In this test dataset, the grade of samples 1-192 was good, that of 194-385 was medium, and that of 386-672 was poor. Then, accuracy was calculated according to the known label of the test dataset. Table 2 shows that CCA-SFA was more effective than CCA or SFA alone.   In these three operating grades, grades of medium and grade poor were both considered to be nonoptimal. In follow-up experiments, the medium grade was taken as an example for nonoptimal cause identification. Results of contribution plots are given in Figure 11. The red line represents the average contribution rate, and we selected above average variables as the candidate sets. According to Figure 11, variable set 1,4,6,15,21,26,27,28,29,30,33,34,35 was selected for Dataset 1, and variable set 1,6,7,9,12,17,22,28,29,30,31,32,33,34,35,36 was selected for Dataset 2. Then, the network structure diagram was established with PLS-GC, delay factors were set to 2, and the number of hidden variables in PLS was set to 6. The results of the two datasets are given in Figure 12, where X axis represents cause variables, and Y axis represents result variables. The black square in the i-th row and j-th column represents that the variable in the j-th column is the Granger cause of the variable in the i-th row. As a comparison, the result of multivariate conditional Granger causality analysis is shown in Figure 13. By introducing PLS into Granger causality analysis, many indirect causalities which may lead to complicated causality networks are eliminated. According to Figure 12, corresponding causal networks are shown in Figures 14 and 15. In order to show the accuracy of causality identification, the causality diagram according to the process mechanism is given in Figure 16.     As shown in Figures 14 and 15, causality in a causal network basically exists directly or indirectly, as shown in Figure 16. With multivariate conditional Granger causality analysis, the causal structure is much more complex, and there are many cyclic structures (e.g., 1 → 30 → 4 → 1 in Figure 13a). Due to the decoupled effect of PLS, the causality networks that we obtained were both directed acyclic graphs. Then, in order to find the rootcause variable, causality relationships in Figures 14 and 15 were input into Bayesian network.
After constructing the causal network, we calculated the initial probability with MLE. Then, an evidence node was selected. According to knowledge on the process mechanism, effluent variables AE and PE were highly related to process operation quality. Therefore, one of the effluent variables with the highest contribution rate was considered to be the evidence node. Effluent variables were variables 29 and 31-36. Then, according to Figure 11, the evidence node of Dataset 1 was variable 27, and the evidence node of Dataset 2 was variable 33. We set the probability of evidence node to 1 and updated the network with a belief propagation algorithm. The results of Bayesian network inference are shown in Figures 17 and 18, where the red part represents the probability of a nonoptimal state, and the blue part represents the probability of an optimal state. Detailed data are given in Tables 3 and 4.  Figure 17. Network update results of Dataset 1.   Table 3, after being updated, the probability of variable 21 changed from 0.83 to 0.9943. According to the result of PLS-GC, variable 21 had no parent node, but from the perspective of the mechanism, it had an indirect causal relationship with variable 1. This causal relationship is marked with a dashed line in Figure 14, but variable 1 showed no obvious change in probability update. Therefore, the root cause was variable 2, S Or5 . The cause of the nonoptimal state was identified correctly.
For Dataset 2, results could be analyzed the same way. The evidence node was variable 33, which had four parent nodes: variables 12, 29, 31, 32. After being updated, only the probability of variable 12 greatly increased, to 0.9730. The parent node of variable 12 was variable 9, which increased to 0.9960. Therefore, the root cause was variable 7, S NOr2 . The nonoptimal root cause of Dataset 2 was identified correctly. We compared our result with that of the traditional contribution-plots method. Figure 13 shows that the variables with the largest contribution rate were variables 21 and 33. Only the root cause of Dataset 1 was correctly identified, while variable 33 was not the root cause of Dataset 2. Such a comparison shows that the proposed method could identify the root cause.

Conclusions
In previous works, it was difficult to determine the causal relationship between variables due to the complex variable coupling relationships in industrial processes. In this work, a complete framework from operating performance assessment to nonoptimal cause identification was established. This scheme could handle strong coupling between process variables by using PLS in GC. We also employed BN to identify the root cause according to evidence nodes, which could also find the transmission path of the nonoptimal cause in the causal network. This root-cause identification method can help field operators in taking corresponding measures to improve current operating performance. As a case study, we tested its effect on BSM1. According to the results, the proposed method could correctly identify the root cause.
However, this method still has a few limitations to be improved. On the one hand, because we used methods to transform the causal structure into an acyclic graph, these treatments may render the nonoptimal transmission path incomplete. This problem can be improved by using suitable methods for cyclic problems, such as the dynamic causality diagram. On the other hand, the process of root-cause identification still needs knowledge on the process mechanism to assign an evidence node. It may be helpful to design an evaluation index to select evidence nodes. In the future, we aim to conduct more indepth studies on these aspects.
Author Contributions: Methodology, Y.W.; software, Y.W.; validation, Y.W.; resources, W.Z.; data curation, D.Y. and Y.W.; writing-original draft preparation, Y.W.; writing-review and editing, D.Y., X.P. and H.C.; supervision, W.Z. and H.C.; project administration, X.P. and W.Z.; funding acquisition, X.P. and W.Z. All authors have read and agreed to the published version of the manuscript. Data Availability Statement: Data used in this paper were generated by BSM1, which can be found at http://iwa-mia.org/benchmarking/.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: