Membrane Fouling Diagnosis of Membrane Components Based on MOJS-ADBN

Given the strong nonlinearity and large time-varying characteristics of membrane component fouling in the membrane water treatment process, a membrane component-membrane fouling diagnosis method based on the multi-objective jellyfish search adaptive deep belief network (MOJS-ADBN) is proposed. Firstly, the adaptive learning rate is introduced into the unsupervised pre-training phase of DBN to improve the convergence speed of the network. Secondly, the MOJS method is used to replace the gradient-based layer-by-layer weight fine-tuning method in traditional DBN to improve the ability of network feature extraction. At the same time, the convergence of the MOJS-ADBN learning process is proven by constructing the Lyapunov function. Finally, MOJS-ADBN is used in the membrane packaging diagnosis to verify the performance of the model diagnosis. The experimental results show that MOJS-ADBN has a fast convergence speed and a high diagnostic accuracy, and can provide a theoretical basis for membrane fouling diagnosis in the actual operation of membrane water treatment.


Introduction
Membrane bioreactor (MBR) technology, as an important means in sewage treatment engineering, is a new wastewater treatment process that combines membrane technology and biological treatment technology and is mainly composed of membrane components and the bioreactor [1 -3]. It has been recognized as one of the most promising new technologies in the field of water treatment in the 21st century due to its excellent comprehensive performance. However, membrane fouling of membrane components will increase the operating cost of the MBR, becoming a bottleneck problem that restricts its wide application [4,5]. Therefore, researchers are gradually focusing on membrane component-membrane fouling diagnosis technology in the field of water treatment. The traditional fault diagnosis method is divided into three steps. Firstly, the signal is preprocessed by denoising and decomposing. Secondly, the preprocessed signal can obtain its time domain, frequency domain, or other features through certain feature extraction methods. The feature extraction methods include wavelet transform [6], synchronous extraction [7], empirical wavelet transform [8], and so on. These methods filter the useless features of the signal, making the desired fault features more obvious. Finally, the extracted features are input into the classifier based on machine learning for training; the classification of faults can be recognized by the training classifier. The backpropagation neural network (BP-NN) [9] and support vector machine (SVM) [10] have been applied to fault classification. The above methods have the characteristics of simple feature extraction and easy adjustment of classifier parameters, and the final diagnostic recognition rate can meet most requirements. However, the above methods still separate fault feature extraction and diagnosis and recognition, which require a lot of expert experience in signal feature processing and rely on the ability of manual feature extraction, which is limited [11][12][13]. Similarly, the traditional fault diagnosis method based on signal processing adopts the manual extraction of features and input of the classification model for fault identification [14][15][16]. The process relies heavily on manual experience and prior knowledge, which are insufficient in large data scales and fast acquisition speeds.
In view of the dynamic and nonlinear characteristics of the membrane water treatment system, the traditional diagnostic model is inefficient, and the potential valuable features are ignored in the offline modeling stage, resulting in false alarms and inaccurate interpolation [17]. As a breakthrough in the field of modern artificial intelligence, deep learning can automatically learn valuable features from original feature sets and even original data, which means that deep learning can largely get rid of the dependence on advanced signal processing technology, artificial feature extraction, and cumbersome feature selection technology. Therefore, deep learning is widely used in the field of fault diagnosis with its powerful learning ability and feature extraction ability [18][19][20]. Ba-Alawi et al. proposed an inclusive framework for missing data interpolation and sensor self-verification based on the variational automatic encoder and deep residual network structure integration [21]. By learning the potential probability distribution of input data, complex features are automatically extracted to reduce the risk of gradient disappearance. By inputting missing data, detecting anomalies, identifying fault sources, and reconstructing fault data to a normal state, the reliability of fault sensors is improved. In recent years, a series of deep learning fault diagnosis models based on the convolutional neural network (CNN) have been greatly improved in diagnosis efficiency and accuracy. Shi et al. used attention mechanisms and improved convolutional neural networks to diagnose membrane pollution, which improved diagnostic accuracy and efficiency [22,23]. However, the deep learning model requires a large number of data to optimize parameters and is prone to over-fitting [24,25]. More researchers have studied the application of deep belief networks (DBNs) in the field of fault diagnosis. DBNs have strong feature extraction abilities, which can automatically extract features from a large number of data, reduce the dependence on expert fault diagnosis experience and signal processing technology, and reduce the uncertainty of feature extraction and fault diagnosis caused by manual participation in traditional methods [26][27][28]. A DBN characterizes the complex mapping relationship between signals and the health status by establishing a deep model, which is suitable for the diagnosis and analysis of diverse, nonlinear, high-dimensional health monitoring data in the context of big data [29]. Therefore, applying a DBN to the field of fault diagnosis has certain timeliness, practicality, and versatility. Zhao et al. proposed a fault diagnosis method based on a DBN, which adaptively extracted features from the original time series signals, increasing flexibility [30]. Simulation results show the effectiveness of this method in fault diagnosis. The structural parameters of a typical DBN model are determined by the learning rate [31]. Therefore, Liu et al. applied an optimized DBN to improve the accuracy of fault diagnosis [32]. Zhang et al. proposed a fault diagnosis model of complex chemical processes based on an extensible DBN [33]. With the help of mutual information technology, a DB subnetwork is used to extract individual fault features in the space-time domain. A global two-layer backpropagation network has been trained and used for fault classification, and the effect of fault diagnosis of this method was verified. Dai proposed a DBN fault diagnosis model with an improved model structure, which adopted multi-layer and multi-dimensional mapping to extract more detailed fault type differences and accurately diagnose faults [34]. Zhu et al. introduced a DBN network into a multi-sensor information fusion model to identify uncertain, unknown, and changing fault modes [35]. Compared with the traditional artificial neural network information fusion diagnosis method, this method has higher recognition accuracy. Su et al. used the model after GWO optimized the parameters of the support vector machine to diagnose the signal features extracted by DBN, realizing the online detection of equipment faults, and improving the diagnostic accuracy [36]. Zhu proposed an intelligent fault diagnosis method based on PCA and DBN [37]. The PCA method is used to reduce the dimension of the original signal, to extract fault eigenvalues and eigenvectors. The modified samples are then trained and tested by DBN for fault classification and diagnosis. This method does not need complex signal processing of the original data, so it is easy to implement and has wide applicability. Due to the uncertainty of the dynamic system model of membrane water treatment, the nonlinearity of data signals, and the uncertainty of the membrane fouling state, the extraction of membrane fouling characteristics from membrane components is in trouble. In addition, with the increase in the scale and complexity of industrial control systems, membrane fouling data signals are often composed of a large number of high-dimensional data, which makes the processing of original membrane fouling data more complex.
Based on the above problems, this article proposes a membrane-packing diagnosis method based on MOJS-ADBN to optimize the DBN from the perspective of unsupervised learning and supervised learning: we used an adaptive learning rate to accelerate network convergence, and prove that the unsupervised part optimized by the adaptive learning rate is stable. The supervised part uses the MOJS algorithm optimization to fine-tune the weight and proves that MOJS optimization has global convergence and stability in the Lyapunov meaning. We used the MOJS-ADBN model as an example of the membrane fouling diagnosis of the parallel ultrafiltration membrane component and verified the comprehensive performance of the MOJS-ADBN model through a number of comparative tests.

Subsection
In 2006, Hinton proposed a DBN, which is a probability generation model composed of multiple restricted Boltzmann machine (RBM) stacks.
As a two-layer network, a RBM is bidirectionally connected by the visible layer and the hidden layer, and the neurons of the same layer network are independent of each other. The visual layer is used to input training data, while the hidden layer is used to extract features. The structure diagram of the RBM is shown in Figure 1. In the formula, w R 1 represents the connection weight, b is the bias coefficient of the hidden layer, and a is the bias coefficient of the visible layer. method, this method has higher recognition accuracy. Su et al. used the model after GWO optimized the parameters of the support vector machine to diagnose the signal features extracted by DBN, realizing the online detection of equipment faults, and improving the diagnostic accuracy [36]. Zhu proposed an intelligent fault diagnosis method based on PCA and DBN [37]. The PCA method is used to reduce the dimension of the original signal, to extract fault eigenvalues and eigenvectors. The modified samples are then trained and tested by DBN for fault classification and diagnosis. This method does not need complex signal processing of the original data, so it is easy to implement and has wide applicability. Due to the uncertainty of the dynamic system model of membrane water treatment, the nonlinearity of data signals, and the uncertainty of the membrane fouling state, the extraction of membrane fouling characteristics from membrane components is in trouble. In addition, with the increase in the scale and complexity of industrial control systems, membrane fouling data signals are often composed of a large number of high-dimensional data, which makes the processing of original membrane fouling data more complex.
Based on the above problems, this article proposes a membrane-packing diagnosis method based on MOJS-ADBN to optimize the DBN from the perspective of unsupervised learning and supervised learning: we used an adaptive learning rate to accelerate network convergence, and prove that the unsupervised part optimized by the adaptive learning rate is stable. The supervised part uses the MOJS algorithm optimization to fine-tune the weight and proves that MOJS optimization has global convergence and stability in the Lyapunov meaning. We used the MOJS-ADBN model as an example of the membrane fouling diagnosis of the parallel ultrafiltration membrane component and verified the comprehensive performance of the MOJS-ADBN model through a number of comparative tests.

Subsection
In 2006, Hinton proposed a DBN, which is a probability generation model composed of multiple restricted Boltzmann machine (RBM) stacks.
As a two-layer network, a RBM is bidirectionally connected by the visible layer and the hidden layer, and the neurons of the same layer network are independent of each other. The visual layer is used to input training data, while the hidden layer is used to extract features. The structure diagram of the RBM is shown in Figure 1. In the formula, The feature extraction process of the DBN is classified into two stages: the pre-training stage and fine-tuning stage. In the pre-training stage, all RBMs are first pre-trained layer-by-layer, unsupervised, to form a feature model of unsupervised learning. Next, the supervised algorithm is used for reverse training, and all the initial connection weights of RBM are fine-tuned, to reduce the error caused by training, which is conducive to the DBN to extract the essential characteristics of the input data. The structure is shown in Figure  2. The feature extraction process of the DBN is classified into two stages: the pre-training stage and fine-tuning stage. In the pre-training stage, all RBMs are first pre-trained layerby-layer, unsupervised, to form a feature model of unsupervised learning. Next, the supervised algorithm is used for reverse training, and all the initial connection weights of RBM are fine-tuned, to reduce the error caused by training, which is conducive to the DBN to extract the essential characteristics of the input data. The structure is shown in Figure 2.

Unsupervised Learning
To determine the initial weight of the network, Hinton used an unsupervised training method to learn the parameters. One RBM includes a visible layer and a hidden layer, which are represented by v and h, respectively. Given the model parameter θ = w R , a, b , the joint probability distributions P(v, h; θ) of the visible layer and the hidden layer are defined by the energy function E(v, h; θ) as:

Unsupervised Learning
To determine the initial weight of the network, Hinton used an unsupervised training method to learn the parameters. One RBM includes a visible layer and a hidden layer, which are represented by v and h, respectively. Given the model parameter For an RBM with Bernoulli (visible layer) distribution-Bernoulli (hidden layer) distribution, the energy function of the unit joint configuration is defined as: In the formula, R ij w is the connection weight of RBM, In the formula,  is the activation function. For an RBM with Bernoulli (visible layer) distribution-Bernoulli (hidden layer) distribution, the energy function of the unit joint configuration is defined as: In the formula, w R ij is the connection weight of RBM, a i and b j are the offsets of the visible layer cells and hidden layer cells, respectively.
The conditional distributions of v and h are: In the formula, σ is the activation function. The probability value standard of the visible layer and the hidden layer is usually achieved by setting a threshold, because the visible layer and the hidden layer are Bernoulli binary states. Taking the hidden layer as an example, it can be expressed as: In the formula, δ is a constant between 0.5 and 1. We calculate the gradient of the log-likelihood function lgP(v; θ), and the RBM weight update formula can be obtained as: Membranes 2022, 12, 843 5 of 24 In the formula, η represents the learning rate, E data v i h j are the data expectations observed in the training set, E model v i h j is the expectation on the distribution determined by the model, and E model v i h j can be obtained by the Gibbs approximation.

Supervised Learning
Supervised learning involves fine-tuning the weight w R obtained by unsupervised learning. Taking the output layer and the last hidden layer of Figure 2 as examples, let F be the expected output of the model and define the cross-entropy function as the loss function: In the formula, y i represents the output of the target output after SoftMax, y i represents the output of the expected output after SoftMax, and n represents the number of categories.
The weight update formula can be expressed as: Using this method, the weight w = (w out , w l , w l−1 , · · · , w 2 , w in ) of the whole DBN network can be obtained by fine-tuning from the top output layer to the bottom input layer.

Adaptive Learning Rate CD Algorithm
In the unsupervised learning process of the DBN, Gibbs sampling, as the core of the contrastive divergence (CD) algorithm, is a Markov chain Monte Carlo (MCMC) algorithm. When it is difficult to directly sample the joint distribution, it is used to generate a set of approximate observations of a specific multi-parameter probability distribution. Gibbs sampling mainly consists of three steps.
(1) The Gibbs chain is initialized with sample V to obtain the visual layer input v (0) .
(3) We repeat the second stage. Because each RBM requires multiple iterations, the fixed learning rate η is prone to convergence difficulties. Therefore, the adaptive learning rate is used to determine the learning rate η according to the updates direction of the parameters. The principle of the adaptive learning rate is that the learning rate will increase if the parameter update direction is the same after two consecutive iterations, and the learning rate will decrease if the parameter update direction is opposite after two consecutive iterations. The update mechanism of the adaptive learning rate η is as follows: In the formula, B = 1.4, b = 0.7.

Supervised Fine Adjustment Based on MOJS
The three main features of MOJS are as follows: (1) Archiving is integrated into the jellyfish search to save and retrieve Pareto optimal solutions. (2) The crowding distance and roulette selection are used to effectively manage the archive population, including the optimal non-dominated solution in the spatial search process. (3) To alleviate local optimization, Lévy flight, an elite group, is added to MOJS based on opposite jumping. The weights obtained from the unsupervised process are fine-adjusted by MOJS.

Time Control Function
Jellyfish are attracted by nutrients in the ocean current; they gather in the ocean current (and thus form jellyfish groups). There are also movements in jellyfish groups, namely passive movement (A-type movement) and active movement (B-type movement). The transformation of jellyfish (from A-type movement to B-type movement) is affected by the time control function c(t), and its expression is as follows: In the formula, c 0 = 0.5.

Elite Choice
We added a file to the MOJS algorithm to store and retrieve the best approximation of the real Pareto optimal solution in the optimization process. The selection of elite targets was set in the area with the least jellyfish in the Pareto optimal frontier. The recognition method of this region involved dividing the search space by finding the best elite target and the worst target of the obtained Pareto optimal solution, defining a hypersphere and n grid elements covering all solutions, and dividing the hypersphere into equal sub-hyperspheres in each iteration; the roulette mechanism was used to select. The roulette mechanism can improve the distribution of the whole Pareto optimal frontier. When there are more Pareto optimal numbers, the probability of being selected is smaller, as shown in the following formula: In the formula, C = 10, N i is the number of Pareto optimal solutions obtained in segment i.

Lévy Flight
The behaviors of most flying animals can be described by Lévy flight when the spatial dimension of a random walk is higher than one dimension and the step size distribution of Lévy flight is isotropic; we used the Mantegna algorithm to generate a stable step size: In the formula, u and v obey normal distributions: In the formula, Γ(z) is Gamma distribution:

Update and Archive
In the iteration process, the archived file will be updated every time, and it may reach the upper limit of the total number in the optimization process. We used the management mechanism to filter the archived files, and the specific contents are as follows: (1) If there is a solution that can play a leading role in the Pareto optimal solution in the original archive, we store the solution and delete the dominant solution in the original archive.
(2) If there is a solution A, and there is no dominant relationship between the original archived solution, the solution in the original archive will be retained. If the number of archives does not reach the upper limit, solution A will be added to the archive file.
(3) If the number of archives reaches the upper limit, the solution will be deleted from the stage with the most filling, and solution A will be added to the archive file.
(4) If there is a solution A that can be dominated by the original archiving solution, then we eliminate solution A.
To effectively select solutions to be deleted from the archive, the worst (the most jellyfish) hypersphere should be selected to prevent jellyfish from searching in crowded areas without food. The selection method is realized through the roulette wheel mechanism, and the probability of each segment is: In the formula, C = 10, N i is the number of Pareto optimal solutions obtained in segment i.

MOJS
We used Lévy flight to speed up the local search along the ocean current; the formula of the ocean current motion is: In the formula, EL_X i (t) is the elite member in X i (t), ∑ EL_X is the elite group, n pop is the group size, X * (t) is the elite solution with time t selected in the archive.
Similarly, we used elite solutions to replace the current best solutions of active movements and passive movements in jellyfish groups.
Passive movement: Active movement: In the formula:

Population Initialization
Logistic mapping, compared with the random initialization, is not easy to produce premature convergence and ensures population diversity. The formula is as follows: In the formula, X i is the logistic value of the i-th jellyfish position, X 0 is used to generate the initial population of jellyfish, η is equal to 4, X 0 ∈ (0, 1), X 0 / ∈ { 0.0, 0.25, 0.75, 1.0} .

Increase Diversity through Opposition-Based Jumping
This mechanism is effective when the population is approximately transformed into the optimal solution. If the jump condition rand(0, 1) < t Max iter is satisfied, the corresponding population based on opposition X i (t) is calculated and n pop is calculated. After generating a new population through evolution, the most suitable individual is selected from the current population and the opposite population. In the formula, based on the opposite X i (t) population, the calculation formula is: We extracted the hidden layer states obtained by unsupervised learning, and then carried out MOJS fine adjustment in sequence, according to the above steps, w = (w out , w l , w l−1 , · · · , w 2 , w 2 , w in ).
So far, the supervised fine adjustment based on MOJS is complete. Firstly, the adaptive learning rate is used to accelerate the unsupervised training process and obtain the initial weight. Secondly, the MOJS algorithm is used to fine-tune the initial weight obtained from the unsupervised process to complete the MOJS-ADBN algorithm process.

Adaptive Learning Rate CD Algorithm Analysis
(1) Convergence rate refers to the time taken by RBM to use Gibbs sampling many times in order to achieve the expected reconstruction error. The shorter the training time is, the faster the convergence speed is. As a probability model, the unsupervised learning of RBM is mainly used to learn features, which is called the coding adaptive learning rate, which automatically adjusts the learning factor by changing the step size. By comparing the sampling states of the visible layer and the hidden layer every two times, the efficiency of Gibbs sampling improves, and the convergence of the CD algorithm accelerates. Professor Hinton pointed out that hierarchical dimensionality reduction can achieve the effect of exponential reduction in the dimension of high-dimensional data. Similarly, since MOJS-ADBN is a hierarchical representation of multiple RBMs when a single RBM can accelerate convergence through the adaptive learning rate, the convergence speed of DBN will increase exponentially.
(2) The learning process of RBM weights is different from that of traditional BP networks. RBM is unsupervised learning, while BP is supervised learning; therefore, similar conclusions of the BP algorithm cannot measure RBM. In the unsupervised training stage, the adaptive learning rate algorithm adaptively increases or decreases the learning rate according to the parameter update direction. In addition, in the supervised fine-tuning stage, the algorithm can avoid being in cyclic fluctuations and falling into local optimization in the optimization process.
At the same time, the adaptive learning rate involves regularly increasing or decreasing the learning intensity of the algorithm on the internal correlation of data in the way of the variable step size, and converging in the shortest time.

Unsupervised Training Phase
In the unsupervised training phase of DBN, to quickly converge, RBM is trained in turn by using the adaptive learning rate. To avoid particularity, in Formulas (4) and (5), the upper and lower asymptotes of the sigmoid function are represented by A H and A L , and the input information of the RBM visual layer and the reconstruction state obtained after t samplings are represented by f 0 i and f t j , respectively. Then, the visual layer and hidden layer are expressed as follows in a Gibbs sampling process: It can be concluded that, after t Gibbs sampling From the formula, the network output is related to the intermediate state of the sampling process. At the same time, the convergence speed and accuracy of the algorithm are related to the adaptive learning rate. Too large or too small adaptive learning rates will affect the convergence speed and even make the network unstable. From the above, we can obtain the following performance analysis: (1) Proof of sufficiency.
If the network is stable, the visual and hidden layers of each RBM layer meet the input-output boundedness. Because the sigmoid function is monotonically increasing, and the number of open neurons is also increasing, we can obtain: Then Furthermore, we know: Assume that f 0 j , f t i , f t j represent the input state, intermediate state, and output state of RBM, respectively, the sufficient and necessary condition for network stability is: According to Formula (6), the greater the δ, the smaller the probability that the neuron takes 1, resulting in the increased sparsity of neurons in the visible layer and hidden layer in the Gibbs sampling process, and the possibility that the weight update direction is the same in the Gibbs sampling iteration process for two consecutive times will increase.
According to (8)- (12), if the error fluctuation is not obvious in the process of adjusting the weight, the increase in the learning rate can accelerate the convergence of the weighted network.
Then there is: According to Gibbs sampling, every time the weight is updated once, the intermediate state is accompanied by two binarization samples, and the updated weight is proportional to the state sampling, so the relationship between δ and the learning rate coefficients B and b can be obtained: The purpose of δ is to judge the state of binary neurons, which is generally 0.7.

Multi-Objective Jellyfish Behavior Process
For the optimization problem, the calculation formula is as follows: In the formula, f (X) is the objective function, g i (X) is the i-th constraint, M is the total number of constraints, X is the n-dimensional unknown variable, and Z is the search space. The position state of jellyfish is equivalent to the Pareto optimal solution, and its set represents the Pareto solution set, which is expressed as follows: Assuming that the search space Z is a continuous state space, the interval X l i , X h i where X is located can be decomposed into h-l discrete values. Then the accuracy can be expressed as ε = X h i −X l i h−l , in the formula, ε is the accuracy of the optimal solution. Z is a discrete space, and its state size is: The position state X ∈ Z of each jellyfish, and its food energy, is defined as: Then |F| < |Z| is obtained, so: According to the difference of energy, the search space set Z can be classified into several non-empty subsets {Z i }, in the formula: The energy of jellyfish (that is food) is defined as: Let X s be a set of all jellyfish, X is n-vector variable, X satisfies ∀X ∈ X S , and ∀X ∈ X S , so F |F| ≤ E(X) ≤ F 1 , set X s can be reduced to a non-empty subset, and the expression is shown as follows: So, Let X i,j satisfy i = 1, 2, · · · , |F|, j = 1, 2, · · · , X i S . X i,j represents the position of the j-th jellyfish in X i . Multi-mechanism jellyfish include ocean current movement, jellyfish Atype movement, and jellyfish B-type movement. Assume that the transition of j-th jellyfish from one motion state to another is represented by X i,j → X m,n , and the probability of occurrence is P ij,mn , assume that the transition of the j-th jellyfish from the i-th region to the m region in X i represents X i,j → X m , and the probability of occurrence is P ij,m , and Assume that the jellyfish in X i changes from the i-th region to the m-th region, indicating X i → X m , and the probability of occurrence is P i,m and satisfies P i,m ≥ P ij,m .

Stability of Reducible Random Matrix
Theorem 1: Let P be a reducible random matrix of order N, after the same row transformation and column transformation, P = C · · · 0 R · · · T , in the formula, C is a primitive random matrix of order M, R and T are matrices of order N-M, and neither R nor T is a matrix of 0. Therefore, In the formula, P ∞ is a stable random matrix, and P ∞ = 1 P ∞ , P ∞ = P 0 P ∞ are uniquely determined and independent of the initial distribution, P ∞ satisfies the condition:
Let X i,j be the artificial jellyfish after t iterations, and record it as X(t), the jellyfish with the highest energy in X(t) is X Best . In the formula, and X Best is the n-dimensional vector, then there is E(X Best ) = F i . According to the definition of the update archive in multi-mechanism jellyfish, the highest energy jellyfish update in the iteration process can be known as: here is the proof of Formula (53). According to the change of time state and environmental state, there will be ocean current movement, jellyfish A-type movement, and jellyfish B-type movement. If X(t + 1) is the best jellyfish and B(t + 1) = X(t + 1), the following three phenomena will occur. Phenomenon 1. Let the jellyfish carry out ocean current movement, and let the probability of producing the ocean current movement be P Ocean ≥ 0, then the jellyfish group will be attracted by the nutrients in the ocean current to update its position. Then the food concentration at the position before moving is lower than that at the position after moving; that is, E(X(t + 1)) > E(X(t)), which proves that ∃m< i, P i,m >0. Phenomenon 2. If the jellyfish carries out jellyfish A-type movement, set the probability of generating the jellyfish A-type movement as P A ≥ 0, and it will move around its own position. Then two situations will occur. Situation 1. The food concentration at the position after moving is higher than that at the position before moving. Let the probability of this phenomenon be P A1 , which proves to be the same as Phenomenon 1.
Situation 2. The food concentration at the current location of the jellyfish is higher than that at the surrounding location. If the probability of this situation is P A2 = 1 − P A1 , the surrounding location needs to be re-selected. Assume t attempts, the probability is P t A2 . If the food concentration in the position after moving is higher than that in the position before moving, it is the same as that in situation 1. Therefore, if it is still not satisfied after t iterations, according to the time control function c(t), the jellyfish movement gradually changes from the A-type movement to B-type movement with the increase of times, as shown in Phenomenon 3.
Phenomenon 3. If jellyfish carries out jellyfish B-type movement, it is caused by two conditions. Situation 1. Jellyfish are produced at the beginning. Let the probability of producing jellyfish B-type movement be P B1 ≥ 0 and P B2 = 1 − P Ocean − P A , if the food concentration at the location of a jellyfish in the neighborhood is higher than the food concentration at the current set location, so E(X(t + 1)) > E(X(t)), which means ∃m < i, P i,m . Situation 2. The jellyfish gradually evolves from A-type movement to B-type movement with the time control function c(t); assume that the probability of occurrence is P B2 ≥ 0, if the food concentration at the location of a jellyfish in the neighborhood is higher than the food concentration at the current set location, then E(X(t + 1)) > E(X(t)), which means ∃m < i, P i,m .
With the increase of t, the use of the jump based on opposites can effectively prevent local optimization.
According to the multi-mechanism jellyfish algorithm, the three movements of jellyfish meet P B2 + P Ocean + P A = 1, and ∃m < i, P i,m > 0 are proved in each case.
Theorem: 2-the multi-mechanism jellyfish algorithm has convergence.
Proof: X i S , i = 1, 2, · · · , |F| is only related to current changes and has nothing to do with history, and the sample space is limited, so it can be regarded as a finite Markov Chain. According to Lemma (1) in Section 4.3.3, the transfer matrix of Markov Chain is: According to Lemma (2) in Section 4.3.3: If P is a reducible random matrix of order N, then is a stable random matrix, which leads to: In the formula, F B is the optimal objective function, so the multi-mechanism jellyfish algorithm has global convergence.

Global Stability Proof
From Section 4.3.3, it can be seen that the multi-mechanism jellyfish algorithm finally converges to the global best, so the initial position of X 0 will eventually converge to the global best x max . x max is assumed to be the equilibrium point under the Lyapunov meaning.
Proof: Assume the objective function of the multi-mechanism jellyfish algorithm is f (X), then the dynamic formula is: .
Let the x axis translate f (x max ) upward, then the dynamic formula is updated as: .
According to the convergence of algorithm, when t → ∞ , the position state X of jellyfish tends to the global best x max : So, for all t, the equilibrium state is satisfied.
. In the formula, x max is the equilibrium point in the MOJS algorithm, and .
X e = f (x max , t) is the equilibrium state. Therefore, there are equilibrium points and equilibrium states in the MOJS algorithm.

Stability of the MOJS Algorithm in the Lyapunov Meaning
Assume that the initial condition state of the MOJS algorithm is within the hypersphere S(δ) with the equilibrium point x max as the center and δ as the radius, then X ∈ S(δ) can represent S(δ) = {X| X − x max ≤ δ }; that is: As shown in Figure 3, S(γ) is a circle with a center radius of γ and a circle S(δ) with a center radius of δ, the points on both sides of the side are set as x 1 , x 2 , the circle S(γ) and f (x) intersect with x 3 , x 4 , assume f (x) is the objective function graph, f (x max ) is the maximum value of the function, f (x max1 ) is the next largest value of the function, and S max is the region between the maximum value and the next largest value of the objective function, which is called the optimal region.
So, for all t, the equilibrium state is satisfied.
In the formula, max x is the equilibrium point in the MOJS algorithm, and   e = max X f x ,t is the equilibrium state. Therefore, there are equilibrium points and equilibrium states in the MOJS algorithm.

Stability of the MOJS Algorithm in the Lyapunov meaning
Assume that the initial condition state of the MOJS algorithm is within the hypersphere   S  with the equilibrium point max x as the center and  as the radius, then As shown in Figure 3,   S  is a circle with a center radius of  and a circle   S  with a center radius of  , the points on both sides of the side are set as 12 x ,x , the circle   S  and   fx intersect with 34 x , x , assume   fx is the objective function graph,   max fx is the maximum value of the function,   1 max fx is the next largest value of the function, and max S is the region between the maximum value and the next largest value of the objective function, which is called the optimal region. It is assumed that the MOJS algorithm satisfies the stability in the Lyapunov meaning, and the equilibrium state is uniformly asymptotically stable.
Proof: According to the global convergence of MOJS in 4.3.3, when X is in max S , X will be attracted by food and move towards max x , so the initial solution It is assumed that the MOJS algorithm satisfies the stability in the Lyapunov meaning, and the equilibrium state is uniformly asymptotically stable.
Proof: According to the global convergence of MOJS in Section 4.3.3, when X is in S max , X will be attracted by food and move towards x max , so the initial solution X(t : X 0 , t 0 ) of the equation is located in S max , and S max is included in the intersection region of S(δ) and f (x), then X will not escape S(δ). Then δ satisfies: Therefore, when t → ∞ , ∀t, makes X(t : X 0 , t 0 ) ∈ S(γ), it satisfies the stability under the Lyapunov meaning.
If ∀γ > 0, ∃δ and δ satisfy formula (67), and the initial state x 0 satisfies x 0 − x max ≤ δ, then x 0 satisfies X(t : X 0 , t 0 ) − X e ≤ δ . Therefore, it can be concluded that δ is independent of t 0 , and the equilibrium state x max of the MOJS algorithm is uniformly stable, which is proved.
So far, the stability proof of the membrane fouling fault diagnosis model based on MOJS-ADBN has been completed.

Membrane Fouling Data Acquisition
We used CFD software aimed at the problem that the membrane flux is easily affected by influent flow and temperature; this article used the parallel hollow fiber membrane device as the research object, and accurately classified the factors that cause membrane pollution. CFD software was used to simulate and calculate the water production in the MBR system to collect fault data.
Using the modeling process of the parallel hollow fiber membrane unit as an example, the Euler multiphase flow model was selected to simulate and build the MBR simulation system. The equation of mass and momentum conservation is as follows.
Mass-conservation equation: In the formula, α q is volume, ρ q is density (kg · m −3 ), µ q is he average velocity vector of q-th (m · s −1 ), and q is liquid s.
Momentum conservation equation: In the formula, q represents the liquid phase, j represents x, y, z in three directions, α q is volume fraction, µ q is velocity (m · s −1 ), ρ q is density (kg · m −3 ), P q is pressure (Pa), τ q is viscous stress tensor (Pa), F q is interaction force (N · m −3 ), g is gravitational acceleration (m · s −2 ).
In the control of the solution, we set up the solution method at first. In the drop-down list of pressure-speed coupling, a phase-based coupling algorithm was selected to calculate the grid file. In the differential discrete format option, we set the gradient to cell-based least squares and the transient item format to first-order implicit. We set the monitoring window and convergence threshold. In the simulation data, we summarized nine types of membrane contamination data, such as too large, too small, and within the tolerance range; these data were collected for the main influencing factors of membrane contamination.
According to the analysis of the importance of membrane pollution factors, when the transmembrane pressure difference was constant, the above four influencing factors were selected as the research objects for analysis because the concentration difference of COD in and out water (C), BOD in and out water (B), solid concentration of mixed suspension (X) and hydraulic retention time (H) had obvious effects on membrane pollution. After testing and comparison, a tolerance of 5% was set for the COD concentration difference and BOD concentration difference of the inlet and outlet water in the series tubular membrane device, and a tolerance of 7% was set for the mixed suspension solid concentration and hydraulic retention time. A tolerance of 5% was set for the values of the membrane fouling factors in the parallel hollow fiber membrane device. When the membrane fouling factor value was within the set tolerance range, it indicated that there was no pollution in the series tubular membrane device. When the values of the membrane pollution factors exceeded the set tolerance, it meant that membrane pollution factors, such as COD concentration difference, BOD concentration difference, mixed suspension solid concentration, and hydraulic retention time were too large, resulting in membrane pollution. The types of membrane pollution are f2, f4, f6, and f8, respectively When the value of the membrane pollution factor was lower than the set tolerance, it indicated that the COD concentration difference, BOD concentration difference, mixed suspension solid concentration, and hydraulic retention time of the inlet and outlet water were too small, resulting in membrane pollution. The categories of membrane pollution are f3, f5, f7, and f9. Membrane pollution codes f1-f9 correspond to different membrane pollution types caused by "normal", "too large", and "too small" membrane pollution factors of the parallel hollow fiber membrane device in the actual operation of the membrane water treatment; see Table 1. To better speed up the training of the network model, we made the data easy to calculate, obtained more generalized results, and the input data were standardized; the mathematical expression is:

Experimental Process
The experimental processes of this article are fault data collection, fault classification and coding, data pre-processing, data analysis and division, MOJS-ADBN model construction, prediction coding, and result analysis. The specific steps are as follows: (1) Take membrane fouling data.
(2) Encode the data classification of membrane fouling.
(3) Classify the data into a training set and test set according to the ratio of 7:3.
(4) Build the MOJS-ADBN model, retain the weight in the unsupervised learning process, and use the adaptive learning rate to accelerate the training process. In the process of supervised learning, MOJS is used to optimize the algorithm and fine-tune the weight. The training set is used to adjust the network model to make the model optimal.
(5) Compare the actual code of the test set with the prediction code generated by the model. If the prediction code is consistent with the real coding result, the classification is correct; if the prediction code is inconsistent with the real coding result, the classification is wrong.
(6) Further analyze the model and judge the performance of the model from the perspective of average accuracy, average precision, average recall, and running time.
In this article, the MOJS-ADBN hidden layer was set as three layers, and the optimal number of hidden layer neurons was selected to determine the optimal number of hidden layer neurons based on the model error and running time. According to the experimental method, when the number of neurons in the hidden layer is 20, the performance effect is the best, as shown in the Figure 4. At this time, ape and MSE are 0.0618 and 0.0742, respectively. Figure 4 shows the relationship between the model error and the number of hidden layer neurons. In the formula, ape and MSE represent the absolute percentage error and mean square error, respectively.  To objectively prove that the best model structure of MOJS-ADBN is 18-20-20-20-9, 300 data were collected for each membrane pollution category of the parallel hollow fiber membrane device, with a total of 2700 experimental data. A total of 1890 samples were randomly selected as training samples, and the remaining 810 samples were used as test samples.
In the unsupervised training phase, each RBM was set to iterate 378 times, and the learning rate coefficients were set to B = 1.4 and b = 0.7, respectively; the kernel principal component analysis (KPCA) was used to extract the three principal components of the first RBM output feature and the three principal components of the final DBN output feature, which are represented in Figure 5   In the formula, y i andŷ i represent the real value and predicted value, respectively, and N t represents the number of test samples.
To objectively prove that the best model structure of MOJS-ADBN is 18-20-20-20-9, 300 data were collected for each membrane pollution category of the parallel hollow fiber membrane device, with a total of 2700 experimental data. A total of 1890 samples were randomly selected as training samples, and the remaining 810 samples were used as test samples.
In the unsupervised training phase, each RBM was set to iterate 378 times, and the learning rate coefficients were set to B = 1.4 and b = 0.7, respectively; the kernel principal component analysis (KPCA) was used to extract the three principal components of the first RBM output feature and the three principal components of the final DBN output feature, which are represented in Figure 5a,b respectively. It can be seen from Figure 5a that only f1 does not overlap with other faults; f2, f7, and f9 overlap, and the distribution of similar faults in f2 and f7 is relatively scattered. Moreover, f4, f5, and f8 overlap seriously, and the fault types cannot be classified correctly. Although f3 and f6 can be classified, there is still a small amount of overlap. It can be seen from Figure 5b that all kinds of faults do not overlap and can be classified better. Therefore, the DBN model can accurately distinguish other fault categories, and the distribution of similar faults is more compact than that in Figure 5a because the input data will undergo (four times) nonlinear mapping and the data will be reconstructed after passing through four RBMs, which can more accurately and abstractly express the input data.
We used the MOJS algorithm for supervised fine adjustment. We set the three layers as in [9,20,20,20], respectively, to establish the MOJS-ADBN model. Figure 6a,c,e,g represent the Pareto front scatter diagram, in the formula, and the abscissa and ordinate represent the objective function of the Pareto optimal solution respectively; while Figure 6b,d,f,h represent the Pareto frontier broken line graph, in the formula, the abscissa represents the number of Pareto optimal solutions, the two broken lines represent the objective functions of the Pareto optimal solutions, respectively, and the color block in the graph represents the overlapping part of the Pareto frontier scatter diagram. It can be seen from the graph that the weight can be improved after four times the MOJS algorithm optimization and supervised fine-tuning, make the weight distribution more reasonable. classified, there is still a small amount of overlap. It can be seen from Figure 5 (b) that kinds of faults do not overlap and can be classified better. Therefore, the DBN model c accurately distinguish other fault categories, and the distribution of similar faults is mo compact than that in Figure 5 (a) because the input data will undergo (four times) nonl ear mapping and the data will be reconstructed after passing through four RBMs, whi can more accurately and abstractly express the input data. We used the MOJS algorithm for supervised fine adjustment. We set the three lay as in [9,20,20,20], respectively, to establish the MOJS-ADBN model. Figure 6 (a), (c), ( (g) represent the Pareto front scatter diagram, in the formula, and the abscissa and or nate represent the objective function of the Pareto optimal solution respectively; wh Figure 6 (b), (d), (f), (h) represent the Pareto frontier broken line graph, in the formula, t abscissa represents the number of Pareto optimal solutions, the two broken lines represe the objective functions of the Pareto optimal solutions, respectively, and the color block the graph represents the overlapping part of the Pareto frontier scatter diagram. It can seen from the graph that the weight can be improved after four times the MOJS algorith optimization and supervised fine-tuning, make the weight distribution more reasonab The last layer is supervised by fine tuning weights 1 2 3 4 5 6 7 8 9 The second layer is supervised by fine tuning weights Figure 6. Pareto frontier analysis of the MOJS-ADBN model.
To reduce the influence of experimental randomness on the evaluation of the model diagnostic performance, 10 independent diagnostic experiments were carried out on the parallel hollow fiber membrane device. Figure 7 (a) presents the average confusion matrix of 10 diagnostic faults of the MOJS-ADBN model. From the figure, it can be seen that there are 9 fault codes from f1 to f9, and each fault is counted 900 times in total. In the formula, the total number of f1 misclassifications is 14. In the formula, misclassification is: f2-five times, f6-three times, f9-two times, and f3, f4, f7, and f8 are misclassified once each; the total number of false divisions is 14; f1 is classified six times, f6 is classified four times, and f3, f5, f8, and f9 are classified once each. The total number of f3 misclassifications is eight; f1, f4, f7, f9 are misclassified once each, and f5 and f8 are misclassified twice each. The total number of f4 misclassifications is 8; f1, f2, and f7 are misclassified once, f5 is misclassified three times, and f8-twice. The total number of misclassifications is 20. In the formula, misclassification is f1-five times, f2-eight times, exception misclassifications of f3, f4, f5, f7, and f8-once each, misclassification of f9-twice. The total number of f7 misclassifications is 9; f1, f3, f5, and f9 are misclassified once each, f4 is misclassified three times, f8 is misclassified two times. The total number of f8 misclassifications is 8; f1, f3, f6, and f7 are misclassified once each, f4 and f5 are misclassified twice. The total number of f9 misclassifications is eight. In the formula, misclassifications of f5, f6, and f8 are once each, and misclassifications of f1 and f2 are two times each. From the figure, it can be seen that f1, f2, and f6 are easy to be confused compared with the other faults. Figure 7 (b) To reduce the influence of experimental randomness on the evaluation of the model diagnostic performance, 10 independent diagnostic experiments were carried out on the parallel hollow fiber membrane device. Figure 7a presents the average confusion matrix of 10 diagnostic faults of the MOJS-ADBN model. From the figure, it can be seen that there are 9 fault codes from f1 to f9, and each fault is counted 900 times in total. In the formula, the total number of f1 misclassifications is 14. In the formula, misclassification is: f2-five times, f6-three times, f9-two times, and f3, f4, f7, and f8 are misclassified once each; the total number of false divisions is 14; f1 is classified six times, f6 is classified four times, and f3, f5, f8, and f9 are classified once each. The total number of f3 misclassifications is eight; f1, f4, f7, f9 are misclassified once each, and f5 and f8 are misclassified twice each. The total number of f4 misclassifications is 8; f1, f2, and f7 are misclassified once, f5 is misclassified three times, and f8-twice. The total number of misclassifications is 20. In the formula, misclassification is f1-five times, f2-eight times, exception misclassifications of f3, f4, f5, f7, and f8-once each, misclassification of f9-twice. The total number of f7 misclassifications is 9; f1, f3, f5, and f9 are misclassified once each, f4 is misclassified three times, f8 is misclassified two times. The total number of f8 misclassifications is 8; f1, f3, f6, and f7 are misclassified once each, f4 and f5 are misclassified twice. The total number of f9 misclassifications is eight. In the formula, misclassifications of f5, f6, and f8 are once each, and misclassifications of f1 and f2 are two times each. From the figure, it can be seen that f1, f2, and f6 are easy to be confused compared with the other faults. Figure 7b shows the curve of the accuracy, accuracy, and recall of all kinds of faults. From the figure, it can be seen that the accuracy, accuracy, and recall of all kinds of faults is above 97%; therefore, the MOJS-ADBN proposed in this article still has strong robustness.

Comparative Test of Different Learning Rates
As a probability model, RBM is mainly affected by weight, so a reasonable weight is the premise to ensure accurate network classification. Figure 8 shows the weights obtained by using the adaptive learning rate and fixed learning rate, respectively. As can be seen from the figure below, the weight distribution obtained by using the adaptive learning rate is more compact than that obtained by the fixed learning rate, which can effectively avoid the problems of ignoring detailed features or gradient disappearance caused by too large or too small weights.  The number of weights

Comparative Test of Different Learning Rates
As a probability model, RBM is mainly affected by weight, so a reasonable weight is the premise to ensure accurate network classification. Figure 8 shows the weights obtained by using the adaptive learning rate and fixed learning rate, respectively. As can be seen from the figure below, the weight distribution obtained by using the adaptive learning rate is more compact than that obtained by the fixed learning rate, which can effectively avoid the problems of ignoring detailed features or gradient disappearance caused by too large or too small weights.
In the past, the learning rate of the DBN was determined by experience. To further prove the applicability of the adaptive learning rate, comparative experiments were used to verify 0.01, 0.05, 0.1, 0.5, and 1 as the learning rates of RBM, the supervised learning part was fixed, and 10 experiments were carried out on the parallel hollow fiber membrane device as the research object. The training and test data were classified for verification. Table 2 shows the diagnostic comparison experiment. It can be seen from the table that the diagnostic accuracies of learning rates 0.1 and 1 were higher than that of other learning rates, but the adaptive learning rate proposed in this article not only ensured the accuracy but also accelerated the network convergence. Therefore, the adaptive learning rate proposed in this article, based on the setting of the parameter update direction, progressed compared to the traditional empirical method. As a probability model, RBM is mainly affected by weight, so a reasonable weight is the premise to ensure accurate network classification. Figure 8 shows the weights obtained by using the adaptive learning rate and fixed learning rate, respectively. As can be seen from the figure below, the weight distribution obtained by using the adaptive learning rate is more compact than that obtained by the fixed learning rate, which can effectively avoid the problems of ignoring detailed features or gradient disappearance caused by too large or too small weights. The number of weights

Comparison of Ablation Experiments
To further prove the effectiveness and superiority of the MOJS-ADBN model for membrane device-membrane fouling diagnosis, this method is compared with some common fault diagnoses and classification methods. We combined wavelet transform with PCA to extract features. In the learning of the shallow neural network, BP, extreme learning machine (ELM), SVM, and least square support vector machines (LSSVM) were used for classification diagnosis. In deep learning, the traditional DBN and adaptive learning rate DBN (ALRDBN) were used, the data set was expanded by overlapping sampling, and then the convolutional neural network (CNN) was used for comparison. According to the method in this article, training data and test data were classified for 10 independent diagnosis experiments, and the comparison indicators included network structure, average time, mean value, and variance of the test MSE; the results are shown in Table 3. It can be seen from the table that compared with the shallow network, the DBN can effectively extract the essence and depth characteristics of faults. After optimization, the DBN improved both the accuracy and network performance to varying degrees, and the nonlinear mapping between the initial data and characteristics were more obvious. Compared with the deep network, although the CNN has a lower diagnosis time than MOJS-ADBN, the diagnosis rate of the improved CNN is lower than MOJS-ADBN, and the CNN needs a large number of data sets and reasonable division to ensure the rationality of the model, so the MOJS-ADBN proposed in this article is more conducive to the accurate identification of faults. The parallel hollow fiber membrane device membrane fouling simulation data set was used to carry out ablation experiments. Five performances, including average accuracy, average accuracy, average recall, average time, and average determination coefficient R 2 were used as the bases for the model judgment, and the performances of the DBN ALRDBN improved CNN, and MOJS-ADBN were verified, respectively.
According to the analysis in Figure 9, the performance of the improved model in this article improved to varying degrees. Although the reduction effect of the running time was not prominent, the accuracy significantly improved (besides the running time), while the other four performance effects of the MOJS-ADBN model were significantly better than the other three network models, which verifies the effectiveness and superiority of the MOJS-ADBN diagnostic model proposed in this article. The parallel hollow fiber membrane device membrane fouling simulation data set was used to carry out ablation experiments. Five performances, including average accuracy, average accuracy, average recall, average time, and average determination coefficient R 2 were used as the bases for the model judgment, and the performances of the DBN ALRDBN improved CNN, and MOJS-ADBN were verified, respectively.
According to the analysis in Figure 9, the performance of the improved model in this article improved to varying degrees. Although the reduction effect of the running time was not prominent, the accuracy significantly improved (besides the running time), while the other four performance effects of the MOJS-ADBN model were significantly better than the other three network models, which verifies the effectiveness and superiority of the MOJS-ADBN diagnostic model proposed in this article. During the actual operation of the membrane bioreactor, there was environmental noise when the membrane component was treating sewage. At the same time, due to the characteristics of the membrane component itself, there was also noise, which produced unnecessary randomness in the collection of the membrane pollution data. At the same time, because the simulated data needed to be more consistent with the uncertainty of the operation of the membrane component under the actual working conditions, it was very important to add the variable noise experiment to the membrane fouling diagnosis exper- During the actual operation of the membrane bioreactor, there was environmental noise when the membrane component was treating sewage. At the same time, due to the characteristics of the membrane component itself, there was also noise, which produced unnecessary randomness in the collection of the membrane pollution data. At the same time, because the simulated data needed to be more consistent with the uncertainty of the operation of the membrane component under the actual working conditions, it was very important to add the variable noise experiment to the membrane fouling diagnosis experiment. To verify whether this method could obtain higher fault diagnosis accuracy and better generalization ability in the variable noise experiment, the experimental results of this article were compared with the experimental results of the methods proposed in references [34] and [36]. Reference [34] proposed a DBN fault diagnosis model with an improved model structure. The model uses multi-layer and multi-dimensional mapping to extract more detailed fault type differences and accurately diagnose faults. Reference [36] used the model after optimizing the parameters of a support vector machine to diagnose the signal features extracted by the DBN, realized the online detection of equipment faults, and improved the accuracy of the diagnosis. In this article, aimed at the membrane pollution data of the parallel hollow fiber membrane component as the training sample, Gaussian white noise (with SNRs of −2, 0, 2, and 4 dB) was added to the test sample, and the obtained membrane fouling diagnosis results were compared with other diagnostic methods. The experimental results are shown in Table 4  From Table 4, it can be seen (from the comparative data of four) that in the experimental results of different SNRs, the accuracy of the membrane component-membrane fouling diagnosis based on MOJS-ADBN was higher than that of other methods, and its anti-noise performance was stronger than the first three diagnostic methods.

Conclusions
This article presents a method of membrane packaging diagnosis based on MOJS-ADBN to optimize the DBN from the perspectives of unsupervised learning and supervised learning: (1) The adaptive learning rate was used to accelerate the convergence of the network and proved that the unsupervised part optimized by the adaptive learning rate was stable.
(2) The supervised part used the MOJS algorithm optimization to fine-tune the weight, proving that MOJS optimization has global convergence and stability in the Lyapunov meaning.
(3) MOJS-ADBN was verified by a simulation experiment with a parallel hollow fiber membrane component. The experimental results show that the MOJS-ADBN model can effectively classify and locate faults, and can be used as a new solution in the field of membrane fouling diagnosis for membrane water treatment.