A Novel Ensemble Adaptive Sparse Bayesian Transfer Learning Machine for Nonlinear Large-Scale Process Monitoring

Process monitoring plays an important role in ensuring the safety and stable operation of equipment in a large-scale process. This paper proposes a novel data-driven process monitoring framework, termed the ensemble adaptive sparse Bayesian transfer learning machine (EAdspB-TLM), for nonlinear fault diagnosis. The proposed framework has the following advantages: Firstly, the probabilistic relevance vector machine (PrRVM) under Bayesian framework is re-derived so that it can be used to forecast the plant operating conditions. Secondly, we extend the PrRVM method and assimilate transfer learning into the sparse Bayesian learning framework to provide it with the transferring ability. Thirdly, the source domain (SD) data are re-enabled to alleviate the issue of insufficient training data. Finally, the proposed EAdspB-TLM framework was effectively applied to monitor a real wastewater treatment process (WWTP) and a Tennessee Eastman chemical process (TECP). The results further demonstrate that the proposed method is feasible.


Introduction
Due to the increasing diversification of industrial demand, the combination of process and equipment results in system structures become increasingly complex. Therefore, if the operation status of a plant cannot be monitored comprehensively and efficiently, it will not only cause serious economic losses [1], but also may cause irreversible damage to social communities. Timely detection and prediction of faults has become a focus of attention in academia and industry [2][3][4]. Recently, data-driven process monitoring has developed as the best form of "whistleblower" for extreme or abnormal events in a plant. This is because the data-driven process monitoring method does not need to establish an accurate mechanism model; rather, it uses a data-driven model to establish a global monitoring method for complex large-scale industrial processes [5]. Moreover, data-driven monitoring methods have been successfully applied in many different scenarios [1,[6][7][8][9][10][11]. Liu et al., proposed a variational Bayesian principal component analysis (PCA) model to effectively monitor a wastewater treatment process (WWTP) [6]. Ge et al., proposed a two-step information extraction strategy to monitor a Tennessee Eastman chemical process (TECP) [7]. Zhu et al., proposed a novel two-step strategy probabilistic independent component analysis-probabilistic PCA (PICA-PPCA) to improve the robustness of the traditional method [8]. However, the above-mentioned data-driven methods ignore some will be increased to ensure more attention is paid to its optimization in the next iteration. In each iteration period, the updated data will be used to train a novel PrRVM detection model.
Note that the data collected by the process industries (WWTP and TECP) are not designed for transfer learning. Therefore, the dataset must be split before executing the corresponding strategy. Firstly, the real-time collected TD dataset is split into two components: the first component is the labeled target domain (LTD) dataset, which is defined as the training dataset. The second component is the unlabeled target domain (ULTD) dataset, which is defined as the real-time testing dataset. The SD dataset is the auxiliary training dataset, which is the out-of-date dataset. Then, the SD dataset and LTD dataset are updated by adaptive boosting technology and transfer learning. To summarize, we propose a modified version of a PrRVM for fault diagnosis that can enable a high quantitative fault diagnosis performance in the design process. Additionally, transfer learning is embedded in the PrRVM to solve the problem of insufficient training data. The ensemble monitoring model constructed using two-layer iteration (weight iteration and hyperparameter iteration) with the ensemble rule is termed the ensemble adaptive sparse Bayesian transfer learning machine (EAdspB-TLM). Finally, key performance indicators (KPIs) are used to evaluate the performance of different methods.
The paper is organized as follows: Section 2 presents the basic theory of the approach. Section 3 provides a detailed formula derivation of the EAdspB-TLM. In Section 4, the EAdspB-TLM is used to monitor different types of faults, and the experiment results are discussed and analyzed. Finally, the paper ends with conclusions in Section 5.

Transfer Learning
The purpose of transfer learning is to gain knowledge from an environment (source domain) to help the learning task in a new environment (target domain) [23]. To facilitate the subsequent use of transfer learning algorithms, the general symbols related to transfer learning are defined as follows: (1) Detection model Φ: X −→ Y , where X represents the training data or testing data. Y represents the corresponding sample label. In this study, the premise is to assume that the training data are not sufficient to train a reliable detection model Φ.
(2) Domain (D): The symbol of domain is represented by D = X, P(X) , where X = x 1 · · · x n ∈ X, X is a feature space. D s = X s , P(X s ) is the source domain (SD). D t = X t , P(X t ) = D t1 ∪ D t2 is the target domain (TD). D t1 and D t2 are the LTD and ULTD, respectively. In this paper, LTD data are used as the training data; ULTD data are used as the testing data.
, 1} is the sample label. f ( * ) is the corresponding prediction function, f (X) = P(Y|X). Its task is to minimize the deviation between the predicted label and the real label Y.

Sparse Bayesian for Fault Diagnosis
The essence of data-driven fault diagnosis is to identify the running state of the equipment. The corresponding labels can be set for different running states; for example, the data label of the normal working condition is set to 0, and the data label of the fault state is set to 1. Then the fault diagnosis model in the framework of a sparse Bayesian (PrRVM) is equivalent to a supervised classifier. In this study, the PrRVM is a sparse model with probabilistic output. Suppose the training dataset is , where x i . is the put data, then y i ∈ {0, 1}. is the corresponding label. The prediction formula of PrRVM can be expressed as follows: where w = [w 0 , w 1 , · · · , w n ] is the weight vector. ε represents the additive noise, let ε ∼ N 0, σ 2 .
f i x j = k x j , x i is kernel function, which aims to map low dimensional non-separable data to high dimensional space. When the weight vector w and variance σ 2 are known, the label vector y = [y 1 , w 2 , · · · , w n ] T can be derived using the following probability expression: according to [17], Ψ is the n × (n + 1) "design" matrix, where w and σ 2 can be estimated by expectation maximization, but it is subject to over-fitting [17]. To avoid over-fitting, a common approach is to impose some additional constraints on the parameters. We use Bayesian strategy and define an explicit prior probability distribution on the parameters to "constrain" the parameters. Assuming that the zero-mean Gaussian prior distribution on the weight vector w can be expressed as follows: a = [a 0 , a 1 , · · · , a n ] is the hyperparameter vector. w and σ 2 can be further solved by Bayesian inference and rules. Here, we first assume that w and σ 2 are known, and then derive the solution formula of the classification problem. Firstly, the logistic sigmoid function σ(z) = (1 + e −z ) −1 is introduced. Assuming that the data obey the Bernoulli distribution, the corresponding likelihood function can be expressed as follows: When the predicted value y = 1, it indicates that the system is out of control.

Adaptive Boosting Technology and Transfer Learning
The proposed process monitoring framework is shown in Figure 1. The adaptive sparse Bayesian transfer learning machine is mainly composed of two components. The first component is the adaptive boosting technology in the transfer learning framework, and the second component is the PrRVM fault diagnosis model in the Bayesian framework. The first part was proposed by Dai et al. [24], and named the TrAdaBoost algorithm. In this paper, TrAdaBoost is used to assign the data weights. Before the algorithm is implemented, suppose that the following symbols represent the divided SD data and label: Data: X s ∈ R p s ×n s , label Y s1 ∈ R 1×n s . LTD data: X t 1 ∈ R p t ×n t1 , and the corresponding label Y t 1 ∈ R 1×n t1 . ULTD data: X t 2 ∈ R p t ×n t2 . n s and p s represent the source domain sample number and the corresponding monitored variable number, respectively. n t1 and n t1 represent the number of samples of LTD and ULTD, respectively. p t is the number of monitored variables in the target domain, and p t = p s . The procedure of the TrAdaBoost algorithm can be derived as follows: Sensors 2020, 20, 6139

of 17
Finally, L sub-detection models ( * , * , ⋯ , * ) are obtained through L iterations of the whole process.
In this paper, the common formulas are presented. The corresponding rigorous theoretical proof can is provided in previous research papers. For example, the weight distribution formula refers to the Hedge (β) theorem [25]. The proof of 1/(1 + 2 can be found in [26].

Adaptive Probabilistic Relevance Vector Machine
In this section, the evolution steps of the adaptive PrRVM within the joint framework of transfer learning and sparse Bayesian are further deduced. According to Section 2.2, we can obtain the probability derivation process of ( | , ). In the derivation process, and need to be Firstly, initialize the weight vector τ 1 = τ 1 1 · · · τ n s +n t1 1 , where: Secondly, call the detection model (PrRVM); according to the detection results, the corresponding data weights are updated process as follows: Based on [24], set β = 1/ 1 + 2ln n s L . L is the number of iterations. Re-define the weight of SD data and LTD data, where: The sub-PrRVM (ϕ j ) is trained using the data with the weight distribution of Equation (7). Then, return the detection model ϕ j : X −→ Y . Calculate the error of ϕ j on X t 1 : The change parameter β j of X t 1 is obtained as follows: Then, updating the weight vector: Finally, L sub-detection models (ϕ * 1 , ϕ * 2 , · · · , ϕ * L ) are obtained through L iterations of the whole process. In this paper, the common formulas are presented. The corresponding rigorous theoretical proof can is provided in previous research papers. For example, the weight distribution formula refers to the Hedge (β) theorem [25]. The proof of 1/(1 + 2ln n s L can be found in [26].

Adaptive Probabilistic Relevance Vector Machine
In this section, the evolution steps of the adaptive PrRVM within the joint framework of transfer learning and sparse Bayesian are further deduced. According to Section 2.2, we can obtain the probability derivation process of p y w, σ 2 . In the derivation process, w and σ 2 need to be updated in each training process. Therefore, the iterative process of w and σ 2 in the transfer learning framework is re-defined. Assume that the posterior probability of w, σ 2 , and a can be expressed as p w, σ 2 , a y .
According to Bayesian inference, p w, σ 2 , a y can be further decomposed as follows: The solution of unknown parameters w, σ 2 , and a depends on p w y, σ 2 , a and p σ 2 , a y . For the classification problem, the posterior probability of weight w cannot be calculated directly. Here, we assume that the hyperparameter vector a is known, and p w y, σ 2 , a can be further derived as follows: To facilitate the subsequent derivation, we omit the indirect relationship between variables. According to the Bayes rule and Markov property, Equation (12) can be further transformed as follows: Therefore, p w y, σ 2 , a ∝ p y w, σ 2 p(w|a). In addition, we can further deduce p σ 2 , a y ∝ p y σ 2 , a p σ 2 p(a). Because we cannot directly solve p w y, σ 2 , a and p σ 2 , a y , we can solve p y σ 2 , a and p y w, σ 2 to derive the desired result. Here, the formulas of p y w, σ 2 and p(w|a) can be expressed as follows: where Sensors 2020, 20, 6139 7 of 17 When the hyperparameter vector a is fixed, Newton's method can be used to solve p y w, σ 2 p(w|a): (17) is a penalized logistic log-likelihood function, and necessitates iterative maximization [17]. The second-order Newton method is used to derive the target function. In addition, it can be further deduced that log p w y, σ 2 , a ∝ log p y w, σ 2 p(w|a) . Based on [27], we should take the derivative of w. Assuming that the solved extreme point is w MP , the second derivative result of w can be expressed as follows: Based on [28], the covariance matrix and −Ψ T HΨ − Λ can be linked as follows: It can be seen that the Laplace approximation effectively maps the classification problem to a regression problem with data-dependent noise [29], with the inverse noise variance for ε given by σ(Φ i )(1 − σ(Φ i ). In addition, according to ∂ ∂w log p w y, σ 2 , a w MP = 0 and Σ, w MP can be further derived as follows: Next, we can iteratively update hyperparameter vector a by fixing Σ and w MP . According to the relation p σ 2 , a y ∝ p y σ 2 , a p σ 2 p(a), we only need to further simplify log(p y σ 2 , a ), and then repeat the previous derivation steps. The following relation can be obtained: where u i = ΣΨ T Hy = w MP . Equation (22) can be further converted as follows:

Updating the Weight Vector and Sparse Analysis
In this section, the weight w is defined as the "hidden" variable. Then a general algorithm of expectation maximization (EM) is selected accordingly. EM mainly includes an expectation (E) step and a maximization (M) step. The adaptive PrRVM derived in this paper is used for classification. Assuming that ε is random additive noise, when the output is n i=0 w i f i x j + ε ≥ 0, the corresponding Sensors 2020, 20, 6139 The probit mode can be presented as follows: The probability derivation of the weight vector w can be expressed as p w y, σ 2 , a ∝ p y w, σ 2 p(w|a). The corresponding log-posterior is log p w y, σ 2 , a = log p y w, σ 2 + log p(w|a).
Suppose that the hyperparameter at time t is denoted as α (t) i . According to [30], define a new Q function and let Q(w t w t+1 ) = log p w t y (t+1) , σ 2 (t) , a (t) . The following expectation step can be derived: In the maximization step stage, we can update a (t+1) at the time of t + 1 through w t . Calculating the partial derivative of Q( * ), we can then obtain the following α This scenario illustrates that the hyperparameters can be updated adaptively with available new inputs. In addition, during the update process, it is found that some α new i will approach infinity. At this time, the automatic correlation decision (ARD) can be used to update the corresponding u and Σ. When α new i approaches infinity, ARD will make the corresponding u i and Σ ii equal to zero [20]. w i is updated to zero. In this way, the matrix becomes sparse. Finally, it is assumed that the parameter probability estimation of the adaptive PrRVM is expressed by the following symbols: the weight parameter w * = w * 0 , w * 1 , · · · , w * n and Λ * = diag a * 0 , a * 1 , · · · , a * n . Additionally, rank(Λ * ) < n + 1.
The iteratively updated sparse matrix Σ * = , and the final prediction label y * can be obtained.

Ensemble Detection Model and Key Performance Indicator
The finite number of adaptive sparse Bayesian transfer learning machines ϕ * 1 , ϕ * 2 , · · · , ϕ * L can be derived by Sections 3.1 and 3.3. Moreover, the effective system decision making needs to consider the detection results of multiple adaptive sparse Bayesian transfer learning machines simultaneously. Based on [24], the following ensemble detection model can be constructed: where: When the above ensemble detection model is obtained, it is necessary to verify the performance of the model. KPIs are the critical decision tools for evaluating the method performance. They are the quantifiable and results-based statements. In this study, missed alarm rate (MAR), false alarm rate (FAR), accuracy, and pre-alarm rate (PAR) were carefully selected as KPIs. The corresponding formulas are as follows: Note that "normal" is the fault-free condition. Fr( ) represents the conditional frequency [12]. TP is true positive; TN represents true negative; FP is false positive. The PAR is constructed by combining the false alarm and missed alarm indicators [12]. is the weight parameter (0 ≤ ≤ 1).

Experimental Design and Compared Approaches
In this section, the dataset splitting steps are introduced in detail. Firstly, the SD data are filtered according to LTD data, in such a way that SD data and LTD data have the same types of labels. The data splitting is shown in Figure 2. Firstly, LTD data have the same distribution as ULTD data. IN contrast to the previous transfer learning, we combine the SD data and LTD data to form a new training set, and use the ULTD data as a testing set. The proportion of LTD data is 1%-10%. The "proportion" formula is defined as PR = LTD SD , where LTD is the number of labeled samples in the target domain, and SD is the number of samples in the source domain. Moreover, the main aim of the experiment is to monitor the single fault of the system. Multiple fault cases can be expanded accordingly. To verify the proposed method, traditional statistical methods and transfer learning methods were used to monitor a chemical plant and WWTP simultaneously. The traditional statistical methods PCA-T 2 [31], SVM [32], and RVM [17] were trained using LTD data. RVMt and the proposed method were trained by the SD data and LTD data simultaneously.  [31], SVM [32], and RVM [17] were trained using LTD data. RVMt and the proposed method were trained by the SD data and LTD data simultaneously. In this study, the proposed EAdspB-TLM framework was used to monitor the TE chemical plant and a full-scale wastewater treatment plant (WWTP). The main tools used in the study were a personal computer (PC), MATLAB R2016a, SigmaPlot 12 and office software. The parameters of the PC are CPU Intel Core i7-6700HQ, 8 GB RAM, and 1 TB SSD. The data are from a TE simulation platform and a real WWTP.

Case Study on the Tennessee Eastman Chemical Process
The Tennessee Eastman chemical process (TECP) was designed by a chemical company as a testing process control and diagnosis method. As shown in Figure 3, the TECP consists of five core units: reactor, compressor, stripper, condenser, and separator. The process includes measured In this study, the proposed EAdspB-TLM framework was used to monitor the TE chemical plant and a full-scale wastewater treatment plant (WWTP). The main tools used in the study were a personal computer (PC), MATLAB R2016a, SigmaPlot 12 and office software. The parameters of the PC are CPU Intel Core i7-6700HQ, 8 GB RAM, and 1 TB SSD. The data are from a TE simulation platform and a real WWTP.

Background
The Tennessee Eastman chemical process (TECP) was designed by a chemical company as a testing process control and diagnosis method. As shown in Figure 3, the TECP consists of five core units: reactor, compressor, stripper, condenser, and separator. The process includes measured variables and manipulated variables. There are four gaseous reactants (A, B, C, D, E) and two liquid products (G and H). The reaction equation is as follows: where F is the byproduct in the reactor, and the process is irreversible. More detailed reaction information of the TEP can be found in [33]. Moreover, the simulation program and operation introduction can be downloaded from http://depts.washington.edu/control/LARRY/TE/download.html#Basic_TE_Code. According to [33], 52 observation variables were selected for process monitoring depending on the process importance. Firstly, the platform started with a 25 h steady state. Then, the simulation ran for 97 h in each case. The sampling time was set up as 3 min. The source domain dataset resulted from the initial 59 h simulation. The corresponding dataset started with a normal working condition, but with faults being imposed after 39 simulation hours. Target domain (TD) data were collected from the simulation period of 59-97 h. TD data mainly includes two parts: the LTD and ULTD datasets. In this study, the ULTD dataset is defined as the testing dataset.
Sensors 2020, 20, x FOR PEER REVIEW 10 of 17 simulation. The corresponding dataset started with a normal working condition, but with faults being imposed after 39 simulation hours. Target domain (TD) data were collected from the simulation period of 59-97 h. TD data mainly includes two parts: the LTD and ULTD datasets. In this study, the ULTD dataset is defined as the testing dataset.

Analysis and Discussion of Experimental Results
EAdspB-TLM differs from previous modeling methods. It has the abilities of adaptive adjustment and transfer learning. To verify the performance of the proposed EAdspB-TLM, five fault cases were used. The fault type description is shown in Table 1. Simultaneously, according to

Analysis and Discussion of Experimental Results
EAdspB-TLM differs from previous modeling methods. It has the abilities of adaptive adjustment and transfer learning. To verify the performance of the proposed EAdspB-TLM, five fault cases were used. The fault type description is shown in Table 1. Simultaneously, according to the engineering experience and cross-validation, the proposed method basic parameters were set by trial and error: the kernel function is "Gauss" = 0.6, and the maximum number of iterations and period are 1000 and 100, respectively. In this study, step and sticking faults are the most noteworthy among the above five type faults. When the external disturbance is strong, it is easy to cause step faults of the sensor or other equipment. Table 2 presents monitoring results for Fault 1. In addition to the transfer learning methods of EAdspB-TLM, unsupervised statistical (PCA) and supervised statistical (SVM, RVM) methods are also presented. It is worth noting that the step fault of D feed temperature is a kind of fault that is difficult to monitor. Because the abnormality is not obvious, most monitoring models cannot effectively monitor this fault [3,34]. According to the experimental results, when PR = 8%, the detection accuracy of EAdspB-TLM is the highest among the five methods, with an accuracy of 87.2%. When PR = 3%, the accuracy of EAdspB-TLM is only 85.48%. In addition, the PAR of EAdspB-TLM is the lowest among the five methods. For example, when PR = 8%, the PAR of EAdspB-TLM is 13.52%. Moreover, the PAR of PCA-T 2 and RVM are 49.06% and 16.83%, respectively. Moreover, the missed alarm rate of EAdspB-TLM is the lowest among the five methods. This shows that the proposed EAdspB-TLM is effective. Fault 5 relates to a control problem with the reactor cooling water valve, which is a common sticking fault in the Tennessee Eastman chemical process (TECP). The reactor is an important component in the normal operation of the chemical plant. Once the fault occurs, other components (reactor, compressor, etc.) will not function normally. Therefore, it is imperative to monitor Fault 5 in real-time. The monitoring results for Fault 5 are tabulated in Table 3. When the PR value increased from 3% to 8%, the detection accuracy of EAdspB-TLM improved from 93.22% to 96.88%. Moreover, the detection accuracy of EAdspB-TLM is much higher than that of the other four methods. It is worth noting that when the PR value reaches 8%, the accuracy of EAdspB-TLM is 96.88%. The detection accuracy of EAdspB-TLM to Fault 5 is much higher than that of Fault 1. This indicates that the complexity of Fault 1 is higher than that of Fault 5. Overall, the performance of the monitoring method improves with the increase of PR value. Table 4 shows the average values of false and missed alarms of the five methods, that is, the average value of all faults in monitoring the TE chemical process. In terms of false alarm, when PR increases from 3% to 8%, false alarms of EAdspB-TLM decrease from 2.2% to 1.62%, false alarms of RVM decrease from 21.6% to 5.12%, and false alarms of SVM decrease from 32.02% to 19.06%. This shows that with the increase of PR, the false alarms of the methods will gradually decrease. In addition, the average value of the two comprehensive KPIs is shown in Figure 4: the PR alarm rate is shown in Figure 4a, and the fault diagnosis accuracy is shown in Figure 4b. It can be seen that with the increase of PR value, the fault diagnosis accuracy of the five monitoring methods gradually increases. In addition, the EAdspB-TLM method has the highest fault diagnosis accuracy; when the PR value increases to 8%, the accuracy of EAdspB-TLM is 94.61%.

Background
In this case study, the proposed method was used to monitor a real full-scale WWTP. The plant serves a population of 480,000, with a daily treatment flow of 170,000 m 3 and a hydraulic retention time of 16.5 h. A long solid residence time (SRT) is used to achieve good nitrogen removal performance, and is typically maintained at 15-22 days. The schematic of the WWTP is shown in Figure 5. It is mainly composed of three components: selector, oxidation ditch, and secondary settler. Due to external disturbances, such as weather, temperature, and sludge activity, the filamentous sludge bulking occurs frequently and is difficult to monitor online in real time. The data were collected from 1 September to the following 31 March. Fifteen observation variables were selected as modeling variables. The sampling interval was one day and filamentous sludge bulking occurred during this period. The source domain dataset is based on the first samples of 110 days. This dataset starts with normal working conditions, but with faults occurring after 70 sample days.

Background
In this case study, the proposed method was used to monitor a real full-scale WWTP. The plant serves a population of 480,000, with a daily treatment flow of 170,000 m 3 and a hydraulic retention time of 16.5 h. A long solid residence time (SRT) is used to achieve good nitrogen removal performance, and is typically maintained at 15-22 days. The schematic of the WWTP is shown in Figure 5. It is mainly composed of three components: selector, oxidation ditch, and secondary settler. Due to external disturbances, such as weather, temperature, and sludge activity, the filamentous sludge bulking occurs frequently and is difficult to monitor online in real time. The data were collected from 1 September to the following 31 March. Fifteen observation variables were selected as modeling variables. The sampling interval was one day and filamentous sludge bulking occurred during this period. The source domain dataset is based on the first samples of 110 days. This dataset starts with normal working conditions, but with faults occurring after 70 sample days.

Analysis and Discussion of Experimental Results
Filamentous sludge bulking is a type of drift fault [35]. In contrast to the abrupt fault, sludge bulking may return to normal after the self-regulation of microorganisms in the early stage [36]. During this period, the abnormality is less obvious (Figure 6). Figure 6a shows the dynamic trend of BOD5 (the five-day biochemical oxygen demand), and Figure 6b shows the curve of the sludge

Analysis and Discussion of Experimental Results
Filamentous sludge bulking is a type of drift fault [35]. In contrast to the abrupt fault, sludge bulking may return to normal after the self-regulation of microorganisms in the early stage [36]. During this period, the abnormality is less obvious (Figure 6). Figure 6a shows the dynamic trend of BOD 5 (the five-day biochemical oxygen demand), and Figure 6b shows the curve of the sludge volume index (SVI). These can be used to determine if sludge bulking occurs in the WWTP. Although these indicators can be used to identify whether there is sludge bulking in the WWTP, the experiment is time-consuming. Therefore, real-time monitoring of the WWTP cannot be effectively implemented. In addition, the consecutive filamentous sludge bulking will cause the secondary pollution to the environment. Therefore, it is desirable to design an effective method for real-time monitoring of sludge bulking of WWTPs.

Analysis and Discussion of Experimental Results
Filamentous sludge bulking is a type of drift fault [35]. In contrast to the abrupt fault, sludge bulking may return to normal after the self-regulation of microorganisms in the early stage [36]. During this period, the abnormality is less obvious (Figure 6). Figure 6a shows the dynamic trend of BOD5 (the five-day biochemical oxygen demand), and Figure 6b shows the curve of the sludge volume index (SVI). These can be used to determine if sludge bulking occurs in the WWTP. Although these indicators can be used to identify whether there is sludge bulking in the WWTP, the experiment is time-consuming. Therefore, real-time monitoring of the WWTP cannot be effectively implemented. In addition, the consecutive filamentous sludge bulking will cause the secondary pollution to the environment. Therefore, it is desirable to design an effective method for real-time monitoring of sludge bulking of WWTPs. When the data are obtained by the TECP, the technology of Section 4.1 was used to split the data accordingly. Then the proposed method and the other four methods were simultaneously used to monitor the wastewater treatment process; the false alarm rate, missed alarm rate, accuracy, and pre-alarm-rate of the five methods are tabulated in Table 5. Because sludge bulking is a slow drift fault, false and missed alarms become more obvious. When PR = 8%, the false alarm rate of PCA-T 2 is 11.11%. The FAR of the SVM and RVM are both 1.85%. In comparison, the false alarm rate of EAdspB-TLM is zero. Furthermore, EAdspB-TLM is not the best in terms of missed alarms. The When the data are obtained by the TECP, the technology of Section 4.1 was used to split the data accordingly. Then the proposed method and the other four methods were simultaneously used to monitor the wastewater treatment process; the false alarm rate, missed alarm rate, accuracy, and pre-alarm-rate of the five methods are tabulated in Table 5. Because sludge bulking is a slow drift fault, false and missed alarms become more obvious. When PR = 8%, the false alarm rate of PCA-T 2 is 11.11%. The FAR of the SVM and RVM are both 1.85%. In comparison, the false alarm rate of EAdspB-TLM is zero. Furthermore, EAdspB-TLM is not the best in terms of missed alarms. The missed alarm rate of RVMt is higher than that of EAdspB-TLM. This unconventional result implies that EAdspB-TLM may not always be optimal. Thus, we need to further explore the effectiveness of EAdspB-TLM using the comprehensive KPIs (PAR and accuracy). According to Table 5 and Figure 7, the EAdspB-TLM-based pre-alarm rate is the lowest among the five methods (Figure 7a). For example, when PR = 8%, the PAR of EAdspB-TLM is 4.62%. In addition, the PARs of the comparison methods RVMt and SVM are 5.19% and 13.05%, respectively. In addition, when the PR value increased from 3% to 8%, the pre-alarm rate of EAdspB-TLM decreased from 5.09% to 4.62%. Based on the above analysis, we can conclude that the performance of EAdspB-TLM is the best among the six monitoring methods. At the same time, with the increase of PR value, the performance of the six methods is improved. Additionally, the fault diagnosis accuracy further verifies this conclusion, which is shown in Figure 7b. When PR = 8%, the average detection accuracy of EAdspB-TLM reaches 96.77%. comparison methods RVMt and SVM are 5.19% and 13.05%, respectively. In addition, when the PR value increased from 3% to 8%, the pre-alarm rate of EAdspB-TLM decreased from 5.09% to 4.62%. Based on the above analysis, we can conclude that the performance of EAdspB-TLM is the best among the six monitoring methods. At the same time, with the increase of PR value, the performance of the six methods is improved. Additionally, the fault diagnosis accuracy further verifies this conclusion, which is shown in Figure 7b. When PR = 8%, the average detection accuracy of EAdspB-TLM reaches 96.77%.

Conclusions
In this paper, a process monitoring framework, termed EAdspB-TLM, is proposed for monitoring nonlinear large-scale processes. When training data are insufficient to train a reliable model, traditional process monitoring methods cannot work well. As a result, faults of wastewater treatment and chemical processes cannot be identified and pre-alarmed in time, thus increasing the cost of system maintenance. Therefore, the proposed EAdspB-TLM was equipped with the ability of transfer learning, which allows useful information of unused data to be transferred to assist in training the model. EAdspB-TLM effectively alleviates the problem of insufficient label data in factories. Furthermore, the corresponding results also further verify the feasibility of the proposed

Conclusions
In this paper, a process monitoring framework, termed EAdspB-TLM, is proposed for monitoring nonlinear large-scale processes. When training data are insufficient to train a reliable model, traditional process monitoring methods cannot work well. As a result, faults of wastewater treatment and chemical processes cannot be identified and pre-alarmed in time, thus increasing the cost of system maintenance. Therefore, the proposed EAdspB-TLM was equipped with the ability of transfer learning, which allows useful information of unused data to be transferred to assist in training the model. EAdspB-TLM effectively alleviates the problem of insufficient label data in factories. Furthermore, the corresponding results also further verify the feasibility of the proposed EAdspB-TLM. According to the experimental results, with the increase of labeled target domain data, the diagnostic accuracy of EAdspB-TLM is improved. In addition, the pre-alarm rate (PAR) of EAdspB-TLM is also reduced. Overall, EAdspB-TLM achieved the best performance in monitoring the wastewater treatment and TE chemical processes. Using the WWTP as an example, when PR = 8%, the accuracy of the five methods can be ranked as follows: EAdspB-TLM (96.77%) > RVMt (92.47%) > SVM (90.32%) > PCA-T 2 (84.95%) > RVM (68.82%).
The batch dataset needed for wastewater treatment process monitoring is drawn mostly from a collection of sensors. However, the data collected by some sensors has little value in training the monitoring model. Therefore, future research work will aim to optimize the number of selected sensors for monitoring and improve the monitoring efficiency of EAdspB-TLM.
Author Contributions: Material preparation, data collection, and analysis were performed by H.C., C.X., J.W., H.C. performed the experiments and wrote the paper. Y.L. reviewed and revised the paper. The funding was provided by J.W., D.H. and Y.L. All authors have read and agreed to the published version of the manuscript.