Pump Feature Construction and Electrical Energy Consumption Prediction Based on Feature Engineering and LightGBM Algorithm

: In recent years, research on improving the energy consumption ratio of pumping equipment through control algorithms has improved. However, the actual behavior of pump equipment and pump characteristic information do not always correspond, resulting in deviations between the calculated energy consumption operating point and the actual operating point. This eventually results in wasted power. To solve this problem, the data from circulating pumping equipment in a large pumping facility are analyzed


Introduction 1.Research Significance
Generally speaking, the delivery volume of the pump is 3-4 times its load volume, thus realizing the rationalization of the pump operation.The pump is divided into several types, such as piston type, diaphragm type, and slide type [1,2], which can adapt to the overall transportation and development, and enhance the efficiency of transportation operations.When using a pump for oil, gas, and water transfer, the aging or transportation process of the pump is abnormal and will produce a certain amount of energy consumption, resulting in a waste of resources.In 2001, for example, this waste accounted for an estimated 73% of the energy consumed by pumps installed across the EU [3].At the same time, it is estimated that water supply accounts for 7% of global energy consumption, with 90% of this energy being used for the operation of the pump systems themselves [4].If the installed global capacity of rotating equipment using electric drives is broken down by the individual type of work machine according to [5], an energy-saving potential of between 20 and 30% for pump systems would result in possible total energy savings of between two and three percent of total electricity demand worldwide.Therefore, pump research and technology exploration should be carried out, with the aim to change the form of pump development and improve the level of technology.With the progress of industrial IT and storage technology, the acquisition and accumulation of big data in the modern production process have energized the development of optimal decision-making processes for electrical energy consumption, while bringing great challenges [6].Thus, the rational extraction of effective information from big data and the combination of oil pump electric energy consumption optimization technology with big data technology have become research hotspots.
In summary, in the information era characterized by big data, the development of technology to optimize the decision-making process of pump transport electric energy consumption based on big data and artificial intelligence is of great significance to improve the economy and timeliness of the pump transport material process.In the next section, we will examine the factors that play a role in achieving these potential savings.

Current Status of Pump Applications
Reducing losses in the system by increasing the piping cross-section allows for maximum resource savings, reducing the dynamic head with a second term that can reduce the flow velocity [4,7].Waide et al. proposed the use of variable speed drives [5] down into the pump unit.This not only compensates for the increased power requirements of the pump unit (which is usually designed to be larger than needed for precautionary reasons) but also reduces the velocity-related losses by adjusting the flow rate.In this way, optimized automation can achieve energy savings of up to 30-50% during pump operation [8].Higher operating efficiency reduces the life-cycle costs of medium-sized pump systems, with energy consumption costs statistically accounting for nearly half of the total cost of a pump system.These costs can range from 25% to 50% of the total energy costs of an industrial company, or 90% of the total energy costs of water utilities.Using a variable-speed pump alone does not provide any benefit if the pump is only running at its rated operating point (this is because of the additional losses that occur in the inverter).In the case of a single pump, the general rule is that it is preferable to pump a certain amount of fluid continuously for a given period to save on electricity costs [9].This rule is determined by the marginal efficiency coefficient described by Crease [10], according to which the positive and negative equivalents of the flow rate always lead to an increase in the required energy [11].A practical application of single-pump systems is constant pressure control, where the required energy can be reduced by demand-dependent pressure adjustment.Luo and Gevorkov proposed a suitable solution based on the affinity law [12,13].The pump of the storage system is primarily subjected to increasing static pressure due to the filling process with an atmospheric storage tank at the bottom of the pipeline connection, and the pressure-loaded storage system generates an increase in pressure that compresses the volume of the gas.Bene and Hos [14] outlined a model-based approach that calculates the optimal flow rate based on the pump and system characteristics.Depending on the static height fraction, the linear control approach proposed by Ahonen [15] adapts the velocity to the static indenter, and allows the optimal velocity to be determined experimentally, where the optimal velocity is almost in a straight line with regard to the static height fraction, which in turn shows that an optimal filling process is not possible at a constant velocity.Heininger et al. [16] provide a controlled velocity concept of automatic parameterization.Due to the reduction in dynamic losses and the significant increase in filling time during energy-optimized operation, Lindstedt and Karvinen developed a method that combines a fixed time specification with energy efficiency to determine the optimal flow specification [17].The combination of the highly dynamic part of the system characteristic curve with a speed specification related to the fill level results in theoretical savings of up to 60%, although the actual savings may be lower due to partial load losses; for example, if there are multiple pumps in the system, coordination is required.In this case, two optimization options are available.Individual speeds can be calibrated or the switching points of individual pumps can be identified.A marginal efficiency factor can be used to determine the optimal speed regulation [12].The National Renewable Energy Laboratory published a 2012 report demonstrating the utility of intermittent process energy audits for identifying systemic inefficiencies.Shankar et al. provide a comprehensive review of opportunities to improve pump efficiency [8], focusing on selection and operational optimization.Torregrossa noted that there is a lack of feasible methods for analyzing pump energy consumption methods [18].Longo et al. also address this gap in their 2019 paper [19].Hilary et al. conducted data analysis of pumping station systems and the pump operating space [20].
In general, better pump energy consumption prediction is important to improve pump energy efficiency.However, most of the current research tends to focus on improving the utilization efficiency of pump energy consumption by optimizing and controlling a certain variable of the pumping process [21].However, due to the lack of real-time data and intelligent processing flow, the pump system cannot be improved in terms of monitoring and controlling the energy consumption effects of the pump system [22].Currently, most forecasting models use a data-driven approach.Data-driven methods can be further classified into two categories: linear modeling-based methods, and nonlinear modeling-based methods [23].The prediction accuracy of linear modeling algorithms is usually lower than that of nonlinear modeling algorithms [24], so model prediction mostly focuses on nonlinear prediction.For example, Wang constructed a prediction model for pump performance by using an artificial neural network, and the model was trained with the test data of an axial-flow pump [25].The results show that the BP neural network can improve the accuracy of performance prediction, and can shorten the time and cost required for experiments.To improve the numerical calculation method and performance prediction for centrifugal pumps, Tan used the commercial software FLUENT to simulate the performance of centrifugal pumps with six different speeds under design flow and off-design flow [26].Based on a BP artificial neural network, Liu established a characteristic prediction model to predict the power of a centrifugal pump in a shutdown state [27], proposed the input mode of the BP network prediction model, and determined the number of middle layers by conducting many experiments.Based on the data-fitting method, Luo proposed a multi-condition performance prediction method for centrifugal pumps [28], incorporated the performance relationship into the particle swarm optimization algorithm, and optimized the prediction model by automatically satisfying the performance constraints.Y Li used XGBoost to extract features from big datasets, trained multiple models separately, and then used MAE to fuse them with the independent prediction results of LightGBM to derive energy consumption-related variables, which were successfully applied to the self-developed IoT platform [29].We are inspired by Yoshua Bengio's research on improving the effectiveness of industrial prediction models [30].By collecting real-time data and summarizing various pump characteristics, such as flow and pressure, and using appropriately improved deep learning algorithms, we finally realize the pump prediction method, which also makes it possible to make data-driven, real-time decisions based on feature mining.

Key Issues
Our research focuses on whether the pump operating mechanism will have a significant impact on pump energy consumption prediction.Therefore, we improved the LightGBM method to verify whether the fusion mechanism can improve the prediction effect of pump energy consumption.The specific process is as follows.In this paper, the pumping equipment data of a large sewage treatment plant's pumping station facility are taken as the research object, and the ratio of the input power to the output power of the pumping equipment is selected as the research target.Firstly, the mechanism formula is used to complete the necessary feature selection of the pumping equipment, and then the feature engineering technology is used to explore the key features affecting the electrical energy consumption of the pumping equipment.Finally, based on the feature engineering dataset, the prediction model of pump power consumption is constructed and the reason for the prediction effect is explored.Our main contributions are as follows: (1) Inspired by Yoshua Bengio's improved model prediction effect research [30], we propose a new pump energy consumption prediction model based on a pump mechanism, which can improve the prediction efficiency of industrial pump energy consumption and make auxiliary decisions to reduce energy consumption; (2) We explore the feature construction process that combines feature engineering and data-driven approaches and found three pump features based on feature mining, which can effectively improve the prediction effect of pump energy consumption; (3) By combining artificial intelligence deep learning networks, we propose an industrial prediction process framework integrating a deep algorithm, feature engineering, and a physical mechanism, which can expand the research perspective of industrial intelligence and resource efficiency, and provide ideas for researchers in the field of energy consumption.
Our research focuses on how to improve pump energy consumption prediction, as we explain in each chapter.Section 1 introduces the main significance of pump energy consumption research to industrial research, and also describes current research on pump application status.Finally, the key problem of this paper is confirmed: how to integrate the mechanism to realize the pump energy consumption prediction method.Section 2 introduces the technical route of the energy consumption prediction method, data sources and how to deal with them in order to develop the rules of pump feature extraction, feature engineering construction, and the pumping equipment energy consumption prediction model.Section 3 introduces the experimental results and verifies how the prediction effect of pump energy consumption changes after the fusion mechanism is verified.Section 4 summarizes the performance of the energy consumption prediction model of the fusion pump mechanism, summarizes the conclusions of the research results, explains the shortcomings of the research, and suggests steps for future work.

Experimental Method
The technical route of this paper can be divided into three parts: (1) data pre-processing; (2) feature engineering construction; and (3) the construction of a pumping equipment prediction method, as shown in Figure 1.Firstly, in the data pre-processing part, the data are segmented according to different pumping equipment (see Section 2.1.1),and are then filtered by strong rules and combined.Then, during feature engineering construction, the main features are filtered according to the pump mechanism, and feature selection is performed to construct the feature dataset.Finally, during the final part, the construction of a pump energy consumption prediction method based on a 5-fold crossover and LightGBM algorithm improvement is completed.This paper proposes a rule for pump feature extraction and feature simplification, which can effectively simplify the operation and provide a good solution for practical engineering problems.It also can characterize the data very effectively by establishing a mechanistic formulation and then generating new feature attributes based on the operational mechanism and experimental data.In this paper, 80% of the training set and 20% of the test set are selected based on the feature engineering dataset, and then a LightGBM-based pumping equipment prediction method is constructed.Through the above process, this paper establishes a pump energy consumption model that effectively calculates the actual oil and gas transportation process, which can give the most effective prediction support for energy consumption in pump operation scenarios, and is also an innovative practice based on the integration of physical science and computational science.We believe that this will be helpful for the field of petrochemical energy conservation and can help researchers in related fields to develop their ideas.

Data Source and Processing
The pump test rig, represented by piping and instrumentation diagram (P&ID) in Figure 2, is operated by the two pump units A and B, each consisting of an IE3 asynchronous motor and a centrifugal pump.The pumps convey water from tank 1 to tank 2, from which it flows back into tank 1.The different operating modes in this case are single mode, parallel mode, and serial mode, which can be set using individual valves.The frequency inverter for variable speed operation is controlled by a PLC via a shared control unit.The pump operation data were obtained from a dataset in [20].This dataset contains operational data for a large wastewater treatment plant's pump station system.The pump system consists of 10 vertically mounted centrifugal pumps and 425-blade impellers.Each pump is driven by a 2610 kW (3500 HP) variable-speed synchronous three-phase motor with variable frequency.Each pump is initially rated at 4.8 m 3 /s, 450 kPa, 400 rpm, and 100% speed.The dataset contains raw hourly data for the 10 pumps for the 13 months from September 2020 to January 2022, including pump status (on/off), flow rate, power, motor speed, and pump suction tunnel elevation (ft).The pump shutdown data were removed from the set to obtain n = 32,440 data points.
In this paper, pre-processing operations such as data merging, tagging, removing invalid data, moving averages, etc., are performed on the dataset, and some data features are filtered by international standards.According to the ISO 10816-7 and ISO 7919-3 international standards [31,32], the limit of the vibration evaluation area relative to the nonrotating parts is exceeded, and these data are filtered.For example, if the pump vibration speed exceeds 12 mm/s in the experiment, this means that the data are invalid and need to be filtered.

1.
Simplify the physical structure of the pump study object.
Pump systems have complex and diverse structures.Pump systems can be classified according to whether they are self-influencing or not, where self-influencing [33,34] means that the pump system delivery process will influence the system to change.For example, in the filling process, the static back pressure of the pump will change when the fluid is injected from the bottom.Due to the page rise [34], the back pressure increases, and the self-influencing situation occurs.Self-influencing pump systems can be classified as non-self-influencing single-pump systems, self-influencing single-pump systems, non-self-influencing multi-pump systems, and self-influencing multi-pump systems [21].The different structures of the system can be finely classified.The non-self-influencing single-pump system can be divided into two categories according to the presence or absence of interference, and the self-influencing single-pump system can also be divided into two categories by the same metric.The non-self-influencing single-pump system can be divided into two categories according to the speed, the valve opening, and the combination of the two.To obtain the most streamlined and effective research object, we combine the experimental data to select the least hypothetical non-self-influencing single-pump system to study the physical structure.

2.
Focus on the most important features in the physical structure of the pump.
Selecting the most important features of the physical structure of the pump that possess the core requirements is crucial in order to make the research object more general.By studying the necessary features and ensuring that the focus is on optimizing the research objectives, the most effective results can be obtained with the lowest cost.We expand the pump electrical energy-related features through pump mechanism formulas and use Pearson correlation coefficients to select the most important key features for study, which can make the experiments simpler and the results more representative.

Pumping Equipment Mechanism
According to the 2018 Turing Award winner Yoshua Bengio [30], compared with structural causality, causal graph models, and statistical models, mechanistic and physical models are better predictors in the presence of confounding factors, when the data distribution changes, or when the data are anomalous.Mechanistic models refer to mathematical models of objects or processes based on mass balance equations, energy balance equations, momentum balance equations, phase balance equations, certain materiality equations, chemical reaction laws, basic laws of electrical circuits, etc.Therefore, we refer to the ISO 5199 international standard [35] and T Hienede's description of the characteristics of a single-pump system [21] and innovatively summarize the relevant characteristics of the pump based on the experimental data.Specific feature descriptions are as follows.
Given the wide range of operating pressures, flows, and speeds, advanced operational and pump control can further reduce energy consumption [8,18,36].Therefore, we expand the following characteristics based on experimental data, based on the description of the single-pump system in Hilary A. Johnson's paper, concerning the ISO 5199 international standard and the description of the characteristics of the single-pump system published by T Hieninge [21].
The energy-efficient operation of non-self-influencing single-pump systems can be described in terms of specific energy.The energy-saving operation of a non-self-influencing single-pump system can use the calculation formula ϑ, as shown in Equation (1).
where W denotes the output power and Q the volume flow rate.The flow velocity (υ) of the pump is shown in Equation ( 2), as follows: where S is the cross-sectional area of the pump cylinder.The dynamic pressure (D p ) is shown in Equation ( 3), as follows: A description of the pump head (H) is shown in Equation ( 4), as follows: where ρ is the density of the pump-transported liquid.Mass flow rate (G) is shown in Equation ( 5), as follows: Output power (W) is shown in Equation ( 6), as follows: Input power (W all ) is shown in Equation ( 7), as follows: where U is the operating voltage, I is the operating current, and cosφ is the power factor.The energy consumption ratio E s is shown in Equation ( 8), as follows: where E s is the motor energy consumption ratio.The torque (rev) is shown in Equation ( 9), as follows: where n is the rotational speed.Flow rate (Q) is shown in Equation (10), as follows: where S is the cross-sectional area of the pump cylinder, l is the distance moved by the pump cylinder piston, and t is the time interval.(see Section 2.1.1).

Feature Design and Extraction
The pumping equipment characteristics need to include the characteristics of the internal and external environment of the pump as well as the actual parameters of the pumping equipment.The pump's internal environment features are temperature, pressure, flow rate, flow velocity, current, and voltage; the pump's external environment features generally include temperature, humidity, and pressure.The actual parameters of the pumping equipment include impeller diameter, pump cylinder cross-sectional area, pump cylinder piston activity distance, power, torque, and speed.Through the experimental data pre-processing stage, we can obtain some features, and design and extract the pumping features related to the pump mechanism.We can then select the existing features and the mechanism function with a functional relationship, and calculate and expand the features.Similarly, the output power of the pump has a linear relationship with the liquid density, head, and gravitational acceleration.Through the design and extraction process, a total of 18 dimensional pumping equipment features are extracted.Among them, the non-selfimaging single-pump specific energy, calculated average efficiency, pump cylinder crosssectional area, head, number of revolutions, flow pressure, impeller diameter, and mass flow rate are constructed by the mechanism.The Parameters of feature engineering extraction are shown in Table 1.Feature selection is one of the key aspects in feature engineering construction, the purpose of which is to select the most effective features from many, i.e., data mining.The selection of features has an important impact on the electrical energy consumption of pumping equipment, mainly in two aspects: the complexity of model calculation and the generalization ability.For this reason, this paper uses Pearson correlation analysis to achieve feature selection.

Feature Selection Based on Pearson Correlation Coefficient
A 100% sample of pump data was selected and the Pearson correlation coefficient between any two attributes of the pump was defined as the quotient of the covariance and standard deviation between the two attributes, as follows in Equation (11).
The above equation defines the overall correlation coefficient, and r XY is used as the representative symbol.The Pearson correlation coefficient can be obtained by estimating the covariance and standard deviation of the samples, commonly represented by the lowercase letter k.The k calculation process is shown in Equation (12).
k can also be estimated from the standard score means of the (X i , Y i ) sample points to obtain an expression equivalent to the above equation.The k calculation process is shown in Equation (13).
where X i −X σX , X, and σX are the standard score, sample mean, and sample standard deviation for X samples, respectively.
According to the principles proposed in this paper, the necessary features with a strong correlation in terms of energy consumption ratio were selected for the experiments, and the pump feature correlation results are shown in Figure 3.In this paper, the features with an absolute value of the energy consumption ratio correlation greater than 80% are selected for algorithm training and testing experiments [37], and a correlation greater than 80% is seen in Figure 3 for flow rate, fluid mass, pump cylinder cross-sectional area, calculated output power, head, and flow pressure.

LightGBM-Based Prediction Model for Pumping Equipment
LightGBM is a fast, distributed, high-performance gradient-boosting algorithm based on the decision tree algorithm [38].The LightGBM algorithm is designed and implemented from an engineering perspective, with features such as histogram optimization, depthfirst splitting, gradient one-sided sampling strategies [38,39], mutually exclusive bundling strategies, direct support for category features, and parallel learning, making it easier to focus on fast iterations of the model when working with large-scale data.
LightGBM uses two algorithms to build the scheme: gradient-based one-side sampling (GOSS), which focuses on random sampling to calculate the gradient, and exclusive feature bundling (EFB), which bundles certain features to reduce the feature dimension, in order to achieve the purpose of reducing the feature dimensionality.Therefore, the prediction accuracy is improved based on greatly reducing the processing time of samples.
Based on the pump energy consumption prediction model of LightGBM, Pearson correlation analysis was carried out on the flow, fluid mass, cross-sectional area of the pump cylinder, calculated output power, head, flow pressure, total pressure, input power, energy consumption ratio, operating voltage, operating current, number of revolutions, and impeller outer diameter.According to the results of correlation analysis, seven factorsflow, fluid mass, cross-sectional area of the pump cylinder, calculated output power, head, and flow pressure-were selected as the input parameters of LightGBM, and the regression model between characteristic parameters and pump energy consumption was established.The overall process is shown in Figure 4.The pump dataset is divided, the optimal network structure is set, and the input features are input into the gradient unilateral sampling algorithm to meet the optimal setting requirements.The optimal decision tree is obtained by iterative histogram algorithm and optimal leaf growth strategy, and finally, the optimal learner is the output.
The specific operation steps are as follows: Before training, the necessary features of each dimension in the pump data are sorted.After sorting, the optimal LightGBM structure is set, and the features are divided into histograms (256 histograms in this paper), then iterated through a depth-limited leaf growth strategy.In subsequent training, histograms are used iteratively as 'features' for decision tree construction.Finally, the optimal decision tree is obtained and the result of the prediction model is output.Given a training set (x 1 , y 1 ), (x 2 , y 2 ), . . ., (x n , y n ), where x represents the pump characteristic samples and y the energy consumption samples.Energy consumption can be trained by fitting the loss function [40].The specific loss function (L(y, F(x))) formula is computed as shown in Equation ( 14) where γ m = argmin ∑ n i=1 L(y i , F m−1 (x i ) + γh m (x i ), M is the maximum number of iterations, and h m (i) is the basic decision tree.
In this paper, the study of pump electrical energy consumption prediction driven by operational mechanisms and feature engineering shows that real pump-based operational mechanisms and feature engineering can produce significant prediction results.To demonstrate the practical effectiveness and potential of this approach, all of the data used in this paper are those used in the plant.To fully demonstrate the final effect that the algorithm can achieve, the parameters of the model are tuned according to the heuristic particle swarm algorithm (the heuristic particle swarm algorithm has the advantages of high accuracy, insensitivity to outliers, and no data input assumptions), and the parameters of the particle swarm optimization (PSO) algorithm are set as shown in Table 2.The results after tuning by the heuristic algorithm are shown in Table 3.The model was subjected to 5-fold cross-validation, and the base learner gradient-boosting decision tree (GBDT, which consumes more time and memory, but is highly accurate and stable) was used, with a learning rate of 0.05.

Results
Techniques for pump design and selection operations are well-developed.For example, Bortoni et al. provide another model-based approach that minimizes the power required by two pumps operating in parallel by dynamic programming [41].For a constant system characteristic curve, Zhao proposed a theoretical model and an online control optimization method using a variable-flow hydraulic pump for a central air conditioning system based on the characteristics of a parallel hydraulic pump, including the optimal configuration of the speed ratio of the parallel variable-frequency hydraulic pump and the number of operating units [42].Koor determined the optimal operation of two pumps of equal size operating in parallel by providing information about the system [43].At present, most of the research on pump energy consumption focuses on pump design and range control, but this will cause the problem of the actual behavior of the pump being inconsistent with the predicted behavior [21].By combining equipment mechanisms, data mining, and deep learning technology, research on improving the energy consumption efficiency of the pump can effectively solve the problem of the inconsistency between the actual behavior and predicted behavior of the pump.Therefore, our research focuses on how to combine deep learning and data mining techniques to build a pump energy consumption model.We improved the LightGBM method to verify whether the fusion mechanism can improve the prediction effect of pump energy consumption.Experiments show that our energy consumption prediction model has good results.
The prediction results of the model algorithm are measured by the root mean square error (RMES), R 2 , and mean absolute percentage error (MAPE) of the algorithm to evaluate the effect of the algorithm.
In this paper, RMSE refers to the mean square root of the error between the predicted value and the real value.The calculation expression is as shown in Equation ( 15): MAPE is calculated as shown in Equation ( 16): In this article, R 2 is the ratio of the predicted value of the model to the true mean of the model.The expression is as shown in Equation ( 17): The pumping equipment error results obtained through the test are shown in Table 4.
The calculated results in the table clearly show that RMSE = 0.016, MAPE = 1.5, and R 2 = 94%, as the smaller the value of RMSE and MAPE and the closer R 2 is to 1, the better the model results are.Thus, the LightGBM-based prediction model for pumping equipment electrical energy consumption achieves better prediction results.To further verify the effectiveness of this mathematical model, this paper compares the algorithm with the traditional back propagation neural network (BP), XGBoost algorithm [44], support vector regression algorithm (SVR), and ExtraTree algorithm [45].Under the same feature dataset, the BP neural network activation function uses the relu function, the solver uses the limited-memory Broyden-Fletcher-Goldfarb-Shanno optimization algorithm (LBFGS), the learning rate is 0.1, and the number of hidden layer neurons is set to 1000.The XGBoost base learner uses gbtree, the number of base learners is 100, and the learning rate is 0.1.The ExtraTree algorithm node splitting evaluation criterion uses mean square error (MSE) and 100 decision trees, with put-back sampling, and 5-fold cross-validation.The data used in the experiments were collected once at an interval of T = 1, with a long collection interval and no interaction relationship.In this experiment, to make the experimental results easy to analyze and discuss, the pumping equipment data at the experimental interval T = 1 were selected as the experimental data.Each algorithm was cross-validated using a 5-fold cross-validation, i.e., disrupted and divided into five copies of the data, with 80% of the dataset trained and the other 20% tested each time.The results of the pumping equipment electrical energy consumption prediction errors are shown in Table 4.By comparison, it is found that the LightGBM algorithm has an R 2 closer to 1, and RMSE and MAPE are also closer to 0 compared with the XGBoost algorithm, SVR algorithm, ExtraTree algorithm, and BP algorithm; thus, it can be concluded that the LightGBM algorithm has higher model accuracy.
The prediction results of the electrical energy consumption of pumping equipment based on different algorithms were obtained through training and testing, as shown in Figure 5.The LightGBM algorithm has the highest degree of fitting at 40 continuous time sample points when the error is not greater than 0.03.From the interval point of view, in the 0.90 to 1.0 energy consumption interval, as shown in Figure 5a, 100% of the points can be fitted accurately, whereas in Figure 5b, 50% of the points can be fitted accurately, whereas in Figure 5e, 50% of the points can be fitted accurately; other algorithms have a relatively low degree of fitting.In the energy consumption range of 0.80 to 0.90, Figure 5a shows that more than 92% of the points can be fitted accurately, Figure 5b shows that approximately 87% of the points can be fitted accurately, Figure 5c shows that 84% of the points can be fitted accurately and Figure 5e shows that 75% of the points can be fitted accurately.From the continuous time sample point of view, Figure 5a,d,e, in the fluctuation of small 0 to 20 sample interval, the prediction effect is better, Figure 5a,b, in the fluctuation of large 20 to 30 sample interval, can predict the better fitting results.IIn addition, in terms of prediction accuracy, R 2 for LightGBM in Table 4 is closest to 1. Through the above comparison, it can be concluded that the prediction method of the LightGBM algorithm based on mechanism construction is the best.The importance distribution of pump features based on feature engineering is shown in Figure 6.The importance of the LightGBM regression algorithm accounted for the highest proportion of flow, 35.5%, followed by the calculation of output power (C p ), S. In the XGBoost regression algorithm, S is the highest, with a value of 49.8%, followed by the flow, and the output power is calculated.In the ExtraTree regression algorithm, the importance of the cylinder cross-sectional area is the highest, accounting for 26.1%, followed by the calculation of power, flow, dynamic pressure, and pump head.Through the above analysis, in many algorithms, it can be seen that the average importance of flow, calculation power, and cylinder cross-sectional area is greater than that of other characteristics.Among the three algorithms, the average contribution of S is the highest, reaching 30.6 percentage points, the flow reaches 26 percentage points, and C p reaches 23.6 percentage points.It can be roughly concluded that the above features can be used as important feature items in the mechanism-based algorithm model.

Conclusions
Pump system energy consumption accounts for a large proportion of global energy, so the study of pump system energy consumption is very important.This paper aimed to construct a pump feature that contains a strong correlation with pump energy consumption, and to build a pump energy consumption prediction method based on this feature.This paper proposes a construction method for a pump energy consumption prediction model based on mechanism feature engineering.Compared with the traditional pump control method, the calculation point is inconsistent with the actual working point.In this paper, the 18-dimensional data are selected as the pump characteristics through the pump operation mechanism, and the Pearson correlation coefficient analysis and single-factor variance analysis are used for verification.The six-dimensional pump equipment characteristics that characterize the deeper level of pump power consumption are selected as the model input, and then the prediction model method of pump power consumption based on LightGBM is constructed.Compared with the traditional fitting data modeling method, this paper proposes a LightGBM algorithm based on GBDT to seek the optimal parameters, which enables the pump equipment feature engineering dataset to show good regression prediction results.In addition, the model is compared with XGBoost, SVR, BP, and ExtraTree.It is found that the LightGBM algorithm has better regression prediction accuracy than other algorithms on the feature engineering dataset constructed in this paper.
Although the model construction method based on feature engineering and LightGBM proposed in this paper has a remarkable prediction effect, there are still some shortcomings: (1) In the process of mining the data feature dimensions based on mechanism and feature engineering, the mechanism is mainly based on existing research, and it has not been deeply studied in combination with specific scenarios; (2) The main research object of this paper is a single pump, its own changes (such as head changes) will not cause other pump changes.At present, there is no research on multi-pump system, and there is no research on the interaction of multi-pump system; (3) How to combine the research results of this paper with actual industrial scenarios has not been considered.
Thus, future work will combine the method proposed in this paper with actual industrial scenarios in order to investigate the role of prediction, including mechanism reconstruction, system analysis, and other areas.

Figure 1 .
Figure 1.The technical route of this paper can be divided into three parts: data pre-processing; feature engineering construction; and the construction of a pumping equipment prediction method.

Figure 2 .
Figure 2. P&ID of the test rig with two pumps.

Figure 3 .
Figure 3.The correlation coefficient is represented by the yellow-to-red gradient, where red represents a correlation coefficient of 1, which is the strongest correlation.The strongest correlations with pump power consumption (E s ) are found for flow rate, fluid mass, pump cylinder cross-sectional area, calculated output power, head, and flow pressure.

Figure 4 .
Figure 4.The pump dataset is divided, the optimal network structure is set, and the input features are input into the gradient unilateral sampling algorithm to meet the optimal setting requirements.The optimal decision tree is obtained by iterative histogram algorithm and optimal leaf growth strategy, and finally, the optimal learner is the output.

Figure 5 .
Figure 5. (a) LightGBM prediction results for 40 points of pump energy consumption data can be viewed.(b-e) Comparison model results for 40 points of pump energy consumption data can be viewed.

Figure 6 .
Figure 6.The contribution of each feature in the LightGBM, XGBoost, and ExtraTree algorithms to the prediction is relatively high, followed by the pump cross-sectional area (S), flow rate (flow), and calculated output power (C p ).

Table 1 .
Feature engineering extraction parameter list.

Table 4 .
Error results of the prediction method.