Two-Stage Hybrid Model for Efficiency Prediction of Centrifugal Pump

Accurately predict the efficiency of centrifugal pumps at different rotational speeds is important but still intractable in practice. To enhance the prediction performance, this work proposes a hybrid modeling method by combining both the process data and knowledge of centrifugal pumps. First, according to the process knowledge of centrifugal pumps, the efficiency curve is divided into two stages. Then, the affinity law of pumps and a Gaussian process regression (GPR) model are explored and utilized to predict the efficiency at their suitable flow stages, respectively. Furthermore, a probability index is established through the prediction variance of a GPR model and Bayesian inference to select a suitable training set to improve the prediction accuracy. Experimental results show the superiority of the hybrid modeling method, compared with only using mechanism or data-driven models.


Introduction
Centrifugal pumps are widely used in construction, municipal water supply and drainage, petroleum and chemical industries, thermal power and other industries [1,2]. Most pumps are driven by motors and their electricity consumption is huge [3,4]. Due to the high energy consumption of centrifugal pumps, the frequency conversion technology has been widely adopted to adjust the speed of centrifugal pump by means of a frequency converter, thus plays an important role in energy saving in the pump industry [5].
When the change rate of rotation speed does not exceed about 33% of the rated centrifugal speed, the change in efficiency can be ignored [6]. This approximation only means that the centrifugal pump curve points will maintain the same efficiency, while the pump will not operate at the same efficiency once inserted into the system. In fact, the operating point of a centrifugal pump is defined by the intersection of the centrifugal pump curve and the system curve. Consequently, the overall efficiency depends not only on the efficiency of the centrifugal pump itself, but also by the influence of the system [7]. If the operating efficiency of centrifugal pumps at different speeds can be accurately predicted, the energy-saving effect of centrifugal pumps under variable frequency conditions will be noticeable. Meanwhile, the centrifugal pump can maintain a suitable operating condition and extend its effective service life.
Traditionally, predicting the state of centrifugal pumps at different speeds mainly uses the affinity law of pump and the computational fluid dynamics (CFD) software. However, the affinity law of pump assumes that the efficiency of a pump is approximately constant at different speeds. In fact, the volumetric efficiency, hydraulic efficiency, and mechanical efficiency will also change when the rotational speed of a centrifugal pump changes [7,8].

Experimental System
The diagram of this experimental system is shown in Figure 1. In order to obtain the efficiency curves at different speeds, the ZW1150-20-20 self-priming centrifugal pump shown in Figure 2 is used in the experiment. The instruments of this experimental system are listed in Table 1. changed through a variable frequency drive. At the same speed, adjust the outlet flow of the centrifugal pump through the opening of the outlet valve in the pipeline system. Under different valve openings V, according to the flowmeter, the pressure sensor, and rotation speed sensor record the outlet flow Q, the inlet pressure s P , the outlet pressure d P , and the rotational speed n, respectively. Additionally, the shaft power N can be obtained according to the power meter. The efficiency of different flow points is calculated according to Equation (1) [12].
where ρ is the density of transfer liquid; is the head of the centrifugal pump, and Hst is the static head of the system of the centrifugal pump.   Collect the experimental data by adjusting the frequency of variable frequency drive and the opening of outlet valve, then obtain the efficiency curves at different speeds according to Equation (1). A total of ten efficiency curves at different speeds (i.e., 1200 r/min, 1320 r/min, 1560 r/min, 1680 r/min, 1920 r/min, 2040 r/min, 2280 r/min, 2400 r/min, 2640 r/min, 2900 r/min for the datasets of 1    The centrifugal pump is driven by a variable frequency motor, and water flows into the system through the centrifugal pump. The operating speed of the centrifugal pump is changed through a variable frequency drive. At the same speed, adjust the outlet flow of the centrifugal pump through the opening of the outlet valve in the pipeline system. Under different valve openings V, according to the flowmeter, the pressure sensor, and rotation speed sensor record the outlet flow Q, the inlet pressure P s , the outlet pressure P d , and the rotational speed n, respectively. Additionally, the shaft power N can be obtained according to the power meter. The efficiency of different flow points is calculated according to Equation (1) [12].
where ρ is the density of transfer liquid; H = P d −P s ρg + H st is the head of the centrifugal pump, and H st is the static head of the system of the centrifugal pump.
Collect the experimental data by adjusting the frequency of variable frequency drive and the opening of outlet valve, then obtain the efficiency curves at different speeds according to Equation (1). A total of ten efficiency curves at different speeds (i.e., 1200 r/min, 1320 r/min, 1560 r/min, 1680 r/min, 1920 r/min, 2040 r/min, 2280 r/min, 2400 r/min, 2640 r/min, 2900 r/min for the datasets of S 1 , S 2 , S 3 , S 4 , S 5 , S 6 , S 7 , S 8 , S 9 , S 10 ) are collected from the experimental system including the efficiency curves at rated speeds, as shown in Figure 3. Collect the experimental data by adjusting the frequency of variable frequency drive and the opening of outlet valve, then obtain the efficiency curves at different speeds according to Equation (1). A total of ten efficiency curves at different speeds (i.e., 1200 r/min, 1320 r/min, 1560 r/min, 1680 r/min, 1920 r/min, 2040 r/min, 2280 r/min, 2400 r/min, 2640 r/min, 2900 r/min for the datasets of 1

Process Mechanism Analysis
The efficiency curves at different speeds have a common feature. As flow increases, the efficiency first increases rapidly and then decreases gradually, as shown in Figure 3. The main reason is that when the outlet valve opening is small, flow into the system is small, and the system loss is large. Deviating from the design flow of centrifugal pump results in a large impact loss inside the centrifugal pump. When the valve opening gradually increases, the efficiency of centrifugal pump will also increase. When the best efficiency point of centrifugal pump is reached, the efficiency of centrifugal pump gradually decreases as the flow rate increases. This is because the excessive flow also causes the

Process Mechanism Analysis
The efficiency curves at different speeds have a common feature. As flow increases, the efficiency first increases rapidly and then decreases gradually, as shown in Figure 3. The main reason is that when the outlet valve opening is small, flow into the system is small, and the system loss is large. Deviating from the design flow of centrifugal pump results in a large impact loss inside the centrifugal pump. When the valve opening gradually increases, the efficiency of centrifugal pump will also increase. When the best efficiency point of centrifugal pump is reached, the efficiency of centrifugal pump gradually decreases as the flow rate increases. This is because the excessive flow also causes the centrifugal pump to deviate from the design flow, resulting in excessive shock loss inside the centrifugal pump, which in turn leads to a decrease in the efficiency of centrifugal pump [8,12].
Based on the common points of efficiency curves at different speeds, using the affinity law of pump, the problem of the change of efficiency with rotational speed is transformed into an empirical formula for the ratio of efficiency to rotational speed [6], as shown in Equation (2).
where η e represents the efficiency under the speed n e (rated speed of the centrifugal pump) and η x represents the efficiency under the speed of n x (required efficiency), and m = 0.1 is an empirical coefficient which can be obtained by the relationship between the efficiency ratio and the speed ratio [6]. However, Equation (2) contains approximate values, especially ignoring the friction loss of pipeline system. The system friction loss in the large flow region is small, so the efficiency prediction is relatively accurate [7,9]. However, the efficiency prediction is not accurate for the small flow region. Under different speeds, the efficiency curves have common characteristics mainly because of the affinity law of pump [11]. Generally, the similarity of pump operating conditions is described through the operating speed of pump. The closer the operating speed is to rated speed, the higher the similarity [11]. However, there is not a criterion to clearly measure the similarity of pump operation at different speeds. To this end, using the probability information of the GPR model, a criterion is established to measure the similarity between various operating conditions at different speeds, thus providing a reasonable training set for GPR.

Process Mechanism Analysis
One appealing property of the GPR model is that it can provide a confidence level with its variance. Generally, the GPR model approximates a training set with N training samples. The valve opening V, the outlet flow Q, the inlet pressure P s , and the outlet pressure P d are selected as the input variables, i.e., The actual efficiency is the output variable, i.e., y i = η i . For an output variable y, the GPR model is the regression function with a Gaussian prior distribution and zero mean or in a discrete form [34].
where C is the N × N covariance matrix with the ijth element C x i , x j . Using the Bayesian method to train the GPR model, the matrix C can be estimated. For a test sample set with N t input samples X t = {x t,i } N t i=1 , t = 1, · · · , T, the output variableŷ t,i and its variance σ 2 can be calculated as follows [34]:ŷ where T is the covariance vector between the new input and the training data, and k t,i = C(x t,i , x t,i ) is the covariance of the new input [34]. Train multiple GPR soft sensor models separately through sample subsets at different speeds, and evaluate the relationship between a single GPR model and the test sample set The mean of the posterior probability P(GPR l | x t ) , which is comprised by the prediction variance σ 2 y t,i and Bayesian theorem in the GPR model, is used to measure similarity of datasets at different speeds. The index is defined as: Sensors 2022, 22, 4300 6 of 18 The mean ensemble posterior probability (MEPP) be defined as: where N l represents the number of samples in the training sample subset; N t represents the number of samples in the test set; v l,x t,i = σ 2 y t,i |ŷ l | × 100%, l = 1, · · · , L; σ 2 y t,i represents GPR l model's prediction uncertainty for x t,i .
A larger value of MEPP l,t indicating a larger P(GPR l | x t ), so the test set X t is more suitable to be predicted by the GPR l model. It means more similar between the training set X l for training the GPR l model for the test set X t . When a test set with a new speed appears, the MEPP index is used to find similar training sample subsets to form a training set for GPR to predict the efficiency of the test set. Additionally, in order to reduce the excessive dependence on the experimental data and reduce the experimental burden, the mechanism model based on the affinity law of pump is combined. Consequently, a hybrid modeling method is proposed to predict the efficiency of centrifugal pump at different flow stages.

Proposed Two-Stage Hybrid Model
A two-stage hybrid modeling method is used to construct an integrated soft sensor model to predict the efficiency of centrifugal pump at different speeds. By analyzing the impact of valve opening on efficiency, the efficiency curve is segmented. Sequentially, the efficiency of different flow stages can be predicted using data-driven model and mechanism model in an auto-switched manner.

Process Mechanism Analysis
The flow adjustment at the same speed mainly depends on the outlet throttle valve of the system [15]. The head curve of the system can be obtained from the knowledge of pipes and static head through simple hydraulic laws. The head curve of a general water supply system can be defined as [32]: where H p represents the piping system head, and K represents the dynamic head coefficient (friction loss). As shown in Figure 4, the curve of K values with valve opening at different speeds show two common characteristics. The first is, as the valve opening continues to increase, the K value decreases sharply, and when the valve opening is about 50%, the K value is close to zero. The second is, when the valve opening is between 30% and 50%, the K value of the same valve opening is not the same at different speeds. Therefore, for simplicity, the stage where the valve opening is larger than 50% is defined as a large flow stage, and the stage where the valve opening is less than 50% is defined as a small flow stage.
The training sample set are divided into two stages by the size of the valve opening. For convenience, redefine the training sample set as S = (S m , S h ) and the test sample set as X t = (X t,m , X t,h ). Among them, S m is the training sample set of the small flow stage, S h is the training sample set of the large flow stage, X t,m is the test sample set of the small flow stage, and X t,h is the test sample set of the large flow stage. the stage where the valve opening is larger than 50% is defined as a large flow stage, and the stage where the valve opening is less than 50% is defined as a small flow stage.

Stage Modeling Method
The two-stage hybrid modeling method can be implemented as follows. In the small flow stage, due to the influence of the system friction loss, the prediction result of the mechanism model is not accurate. Additionally, the opening of the outlet valve is small, the pressure difference between the inside and outside of the valve is large, and the valve opening is more sensitive to the change of flow, so more samples can be obtained. Therefore, according to the MEPP index in Equation (7), the LGPR model is trained using suitable sample sets of the small flow stage and it is used to predict the efficiency in this stage. This is different from the GPR model that is constructed using samples from the whole flow stage. In the large flow stage, the friction loss of system is small, and the mechanism model is used to predict the efficiency.

Stage Modeling Method
The two-stage hybrid modeling method can be implemented as follows. In the small flow stage, due to the influence of the system friction loss, the prediction result of the mechanism model is not accurate. Additionally, the opening of the outlet valve is small, the pressure difference between the inside and outside of the valve is large, and the valve opening is more sensitive to the change of flow, so more samples can be obtained. Therefore, according to the MEPP index in Equation (7), the LGPR model is trained using suitable sample sets of the small flow stage and it is used to predict the efficiency in this stage. This is different from the GPR model that is constructed using samples from the whole flow stage. In the large flow stage, the friction loss of system is small, and the mechanism model is used to predict the efficiency. First judge whether the test sample belongs to the large flow stage or the small flow stage according to the valve opening. As shown in Figure 5

Stage Modeling Method
The two-stage hybrid modeling method can be implemented as follows. In the small flow stage, due to the influence of the system friction loss, the prediction result of the mechanism model is not accurate. Additionally, the opening of the outlet valve is small, the pressure difference between the inside and outside of the valve is large, and the valve opening is more sensitive to the change of flow, so more samples can be obtained. Therefore, according to the MEPP index in Equation (7), the LGPR model is trained using suitable sample sets of the small flow stage and it is used to predict the efficiency in this stage. This is different from the GPR model that is constructed using samples from the whole flow stage. In the large flow stage, the friction loss of system is small, and the mechanism model is used to predict the efficiency.   The proposed modeling method uses available process knowledge and model information. In summary, the main implemented steps are illustrated in Figure 6. The step-by-step procedures are described as follows.
Step 1: Collect data at different speeds of the centrifugal pump S = {X, y} = {x i , y i } N i=1 .
Step 2: Train multiple GPR models according to the sample subsets at different speeds using Equation (3).
Step 3: For a test sample set at a new speed, X t is calculated by multiple GPR models using Equations (4)-(7) to obtain MEPP l,t , and select several training sample subsets with relatively larger GPR models of MEPP l,t to form a new training sample set S * .
Step 4: The test sample set X t and the new training sample set S * are segmented according to the valve opening to obtain a new test sample set X t,m and X t,h , and a new training sample set S * m and S * h .
Step 5: For the test sample set X t,h in the large flow stage, calculate the prediction efficiency using Equation (2). For the test sample set X t,m in the small flow stage, first train the LGPR model according to the training sample set S * m using Equation (3), and calculate the prediction efficiency using Equation (4).
Step 6: Finally, the prediction efficiency of the two stages is integrated to obtain the prediction efficiency of the test sample set X t .
Step 2: Train multiple GPR models according to the sample subsets at different speeds using Equation (3).
Step 3: For a test sample set at a new speed, t X is calculated by multiple GPR models using Equations (4)-(7) to obtain , MEPP l t , and select several training sample subsets with relatively larger GPR models of , MEPP l t to form a new training sample set * S .
Step 6: Finally, the prediction efficiency of the two stages is integrated to obtain the prediction efficiency of the test sample set t X . In the above modeling steps, useful process knowledge and GPR-based probability information are effectively integrated into the hybrid model to predict the efficiency of In the above modeling steps, useful process knowledge and GPR-based probability information are effectively integrated into the hybrid model to predict the efficiency of centrifugal pumps at different speeds. Separate modeling at each stage can better handle the process with different characteristics, reduce dependence on experimental data, and improve the prediction accuracy. From engineering point of view, this two-stage hybrid modeling method can be implemented straightforward.

Experimental Results and Discussion
Under the operation conditions described in Section 2.1, altogether 165 samples of ten operation speeds denoted as S = (S 1 , · · · , S 10 ) are collected from the experimental system shown in Figure 1. Six sets (S 1 , S 3 , S 5 , S 6 , S 9 , S 10 ) are used for training and the remaining four sets (S 2 , S 4 , S 7 , S 8 ) are for test. To compare the prediction performance of different Sensors 2022, 22, 4300 9 of 18 models, two common performance indices, i.e., the root mean square error (RMSE) and the maximum absolute relative error (MARE), are adopted as follows: whereŷ t,i represents the predicted value of y t,i . First, the effect of the MEPP index is verified. A larger value of MEPP means that the test sample set is more similar with the training sample subset of the training GPR model, thus the RMSE value is smaller. According to the four test sample sets of S 2 , S 4 , S 7 and S 8 , the MEPP and RMSE values of the GPR models trained by the corresponding six sample subsets are shown in Figure 7. The results indicate that the similarity between the test sample set and the training sample subset can be measured by the MEPP index, and a new training set can be formed by selecting more similar subset S * to construct a suitable LGPR model at the small flow stage. For this case, the new training set of S 2 is (S 1 , S 3 , S 5 ), the new training set of S 4 is (S 3 , S 5 , S 6 ), the new training set of S 7 is (S 5 , S 6 , S 9 ), and the new training set of S 8 is (S 6 , S 9 , S 10 ), respectively.
Since the valve opening in the large flow stage is large, the friction loss of system is small. The efficiency of the large flow stage is predicted by the mechanism model based on the pump affinity law of pump. The prediction result of the small flow stage for the four test sample sets S 2 , S 4 , S 7 , and S 8 are shown in Figure 8. Compared with the GPR model and the mechanism model, the training set S * is divided into two stages by the valve opening, and the training set S * m in the small flow interval is used. Notice that the GPR and LGPR models are trained with different samples. The LGPR model has good prediction performance for the small flow stage. As also shown in Table 2, the MARE values of three models validate that the LGPR model can be used in the small flow stage. centrifugal pumps at different speeds. Separate modeling at each stage can better handle the process with different characteristics, reduce dependence on experimental data, and improve the prediction accuracy. From engineering point of view, this two-stage hybrid modeling method can be implemented straightforward.

Experimental Results and Discussion
Under the operation conditions described in Section 2.1, altogether 165 samples of ten operation speeds denoted as , , , S S S S are for test. To compare the prediction performance of different models, two common performance indices, i.e., the root mean square error (RMSE) and the maximum absolute relative error (MARE), are adopted as follows: where ,t i y represents the predicted value of ,   The results of the two stages are integrated into the hybrid model to predict the efficiency of centrifugal pump at different speeds. For four different speeds, namely the test sample sets S 2 , S 4 , S 7 , and S 8 , the prediction results shown in Figure 9 indicate that the hybrid model can achieve a good prediction of the centrifugal pump efficiency.
The hybrid model, the GPR model, and the mechanism model based on the affinity law of pump are compared. Table 3 lists the performance comparison results of three models. Among them, the hybrid model has the best prediction effect, while the GPR and the mechanism models are inferior. As shown in Table 4, the mechanism model requires the least experimental data (for the efficiency prediction of S 2 , S 4 , S 7 , and S 8 datasets using Equation (2)). The main reason is that the mechanism model requires only the efficiency points at rated speed with the same valve opening as the test datasets (i.e., S 2 , S 4 , S 7 , and S 8 ). While the GPR model is purely data-driven and thus requires the most experimental data decided by the selected training samples (e.g., for S 2 the training set is (S 1 , S 3 , S 5 )). The hybrid model combines LGPR for the small flow stage and the mechanism model for the large flow stage. Consequently, the required samples can be separately determined for each stage. Since the valve opening in the large flow stage is large, the friction loss of system is small. The efficiency of the large flow stage is predicted by the mechanism model based on the pump affinity law of pump. The prediction result of the small flow stage for the four test sample sets 2 S , 4 S , 7 S , and 8 S are shown in Figure 8. Compared with the GPR model and the mechanism model, the training set * S is divided into two stages by the valve opening, and the training set * m S in the small flow interval is used. Notice that the GPR and LGPR models are trained with different samples. The LGPR model has good prediction performance for the small flow stage. As also shown in Table 2, the MARE values of three models validate that the LGPR model can be used in the small flow stage.  The results of the two stages are integrated into the hybrid model to predict the efficiency of centrifugal pump at different speeds. For four different speeds, namely the test sample sets 2 S , 4 S , 7 S , and 8 S , the prediction results shown in Figure 9 indicate that the hybrid model can achieve a good prediction of the centrifugal pump efficiency. The hybrid model, the GPR model, and the mechanism model based on the affinity law of pump are compared. Table 3 lists the performance comparison results of three models. Among them, the hybrid model has the best prediction effect, while the GPR and the mechanism models are inferior. As shown in Table 4, the mechanism model requires the least experimental data (for the efficiency prediction of 2 S , 4 S , 7 S , and 8 S datasets using Equation (2)). The main reason is that the mechanism model requires only the efficiency points at rated speed with the same valve opening as the test datasets (i.e., 2 S , 4 S , 7 S , and 8 S ). While the GPR model is purely data-driven and thus requires the most experimental data decided by the selected training samples (e.g., for 2 S the training set is ( ) , , S S S ). The hybrid model combines LGPR for the small flow stage and the mechanism model for the large flow stage. Consequently, the required samples can be separately determined for each stage.    In summary, the hybrid model makes full use of the process knowledge of centrifugal pump, while avoiding the empirical error of the mechanism model, so it has better predictive performance. Compared with the GPR model, the hybrid model requires fewer samples, which reduces the experimental burden. Consequently, the hybrid model can be simply applied to practical centrifugal pumps.

Conclusions
This work proposes a hybrid knowledge-and-data soft sensor model to predict the efficiency of centrifugal pumps at different speeds. The GPR with its probabilistic inferencing method is utilized to select the suitable datasets to construct an appropriate data-driven prediction model for the low flow region. The advantages of mechanism and GPR models are combined, thus better prediction results can be obtained in different flow stages. Consequently, the hybrid model maintains the prediction accuracy and shows the simplicity because it reduces the number of modeling samples compared with a purely data-driven model. The experimental results validate its feasibility and simplicity. Some future research topics include how to improve the prediction accuracy in large flow stage and how to collect more informative experimental data in an active and efficient manner.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this paper: