Condition-Based Failure-Free Time Estimation of a Pump

Reliable and continuous operation of the equipment is expected in the wastewater treatment plant, as any perturbations can lead to environmental pollution and the need to pay penalties. Optimization and minimization of operating costs of the pump station cannot, therefore, lead to a reduction in reliability but rather should be based on preventive works, the necessity of which should be foreseen. The purpose of this paper is to develop an accurate model to predict a pump’s mean time to failure, allowing for rational planning of maintenance. The pumps operate under the supervision of the automatic control system and SCADA, which is the source of historical data on pump operation parameters. This enables the research and development of various methods and algorithms for optimizing service activities. In this case, a multiple linear regression model is developed to describe the impact of historical data on pump operation for pump maintenance. In the literature, the least squares method is used to estimate unknown regression coefficients for this data. The original value of the paper is the application of the genetic algorithm to estimate coefficient values of the multiple linear regression model of failure-free time of the pump. Necessary analysis and simulations are performed on the data collected for submersible pumps in a sewage pumping station. As a result, an improvement in the adequacy of the presented model was identified.


Introduction
Company managers are currently trying to maximize profits by optimizing the use of resources and reducing costs while taking into account such criteria as ecology, environmental protection, and minimizing the carbon footprint [1]. However, there are examples of such applications in which optimization and minimization of costs, especially maintenance costs, cannot be the most important management criterion. One such example is sewage treatment plants that must operate continuously and reliably 24/7 [2]. In the event of a major failure, which involves stopping the flow and treatment of wastewater, the environment may be contaminated, and high fines may be payable [3]. Equipment in the sewage treatment plant, such as pumps, must be capable of continuous operation with high reliability [4,5]. It should also be taken into account that wastewater treatment processes are partly biological in nature, and errors in operation can lead to very-long-term consequences due to the need to obtain an appropriate bacterial flora [6,7].
The main pumping station of a wastewater treatment plant (WWTP) or water treatment plant (WTP) is usually designed as a multi-pump unit consisting of redundant systems due to the need to be flexible in terms of varying operating conditions (e.g., temperatures, depending on climate/season of the year; varying amounts of incoming wastewater, solid, and other contaminants; and chemicals, in the case of treatment of not only municipal sewage but also from industrial plants) [8,9]. In such conditions, it is necessary to minimize the maintenance costs and, at the same time, maintain the continuity of operation, already taken into consideration, in part, thanks to the redundancy of the systems [2,10]. The ability to anticipate impending failures by monitoring the available operating parameters of the

Discussion of Existing Works
In this section, we provide a more thorough discussion of the literature items that may be useful in our project.
Methods used in condition-based and predictive maintenance can be sub-divided into three main categories: model-based methods, data-driven methods, and signal-based methods. The following paragraphs describe examples of these methods, especially in relation to pump maintenance issues.
In the article [11], the authors focused on an attempt to reduce the operating costs of devices, attempting to depart from the maintenance deadlines rigidly imposed by the pump manufacturer. It is proposed to extend the time between maintenance operations but with the option of completing them sooner, if necessary. The Risk-Based Maintenance (RBM) Strategy has been developed to balance cost, efficiency, and risk by scheduling repairs first to equipment being most at risk of failure. Data on failures of pumps, valves, pipework, switchboards, telemetry, and structures were taken into account. Failure data were obtained from the CMMS (Computerized Maintenance Management System). A fault tree and risk rating matrix were developed, which allowed a risk rating distribution to be obtained. This approach made it possible to indicate which subsystems should be subjected to preventive maintenance, depending on the MTBF and which systems can operate in Run-To-Failure (RTF) mode. Preventive maintenance and the supply of spare parts could, therefore, be planned according to the risk of failure of certain components. It can be seen that, in this case, current data from the SCADA system or sensors of the control system were not used, but the higher-level data on the occurrence of failures and repairs were used. Such systems are not able to predict failure in real-time on the basis of observation of the current state of the pump but can only statistically estimate the time of failure to prevent it or prepare for it.
Anis [12] developed a mathematical model to optimize planning decisions related to the maintenance of centrifugal pumps operating in a wastewater treatment plant (WWTP). The goal was to reduce costs resulting from either planned or unscheduled downtime for repairs. On the basis of the available data, four models were identified: hydraulic, operation, deterioration, and optimization. The main parameters taken into account are the level of liquid in the tank that the pumps were required to pump out and the planned pump operation schedule; it was also necessary to take into account the fact that the pumps could work with adjustable capacity (intermediate states of the pumps are possible, not only off Sensors 2023, 23, 1785 3 of 22 or working at maximum capacity). The deterioration model made it possible to take into account the impact of wear on various components, such as seals, on pump performance.
The article [17] presents an analysis of the application of data on the operation of high-power pumps in order to identify efficiency opportunities, evaluate efficiency metrics, and assess applicability of operating space analysis. The goal was to reduce energy costs and minimize maintenance by using the most efficient pumps and matching capacity and number of pumps in operation to current demand. Datasets with hourly and 5 min sampling rates were averaged and used to calculate true weighted efficiency, demonstrating the effects of efficiency metric selection. Operating space analysis was used to evaluate pump control algorithms (i.e., control of variable speed drives), affecting the performance of a system. Data on pump status, flow, consumed power, and motor speed were collected automatically, while differential pressure data were intermittently measured by analog gauges and recorded manually. On the basis of this data, strategies to improve system efficiency using time series analysis, pump and system curve intersections, comparison of operation of different pumps, and control optimization were proposed. It has been shown that the characteristics of the pumps, identical at the beginning of operation, differ after some time due to the different degree of their wear and repair methods. Knowing these parameters allows them to be included in the control algorithms to optimize energy efficiency.
Monitoring of the motor operating parameters (especially the current) was the basis for the cloggage detection method of the pump rotor in the article [18]. The subject of the analysis were submersible pumps, often found in applications such as WWTP and WTP, to which direct access during operation is impossible. In such a situation, the ability to remotely monitor the status of devices is particularly important, which can be implemented through continuous control of selected parameters. Detection of a cloggage should allow the pump to be stopped before it becomes damaged. Due to the limitations of Motor Current Signature Analysis (MCSA), as this method does not work during start-ups or under changing load conditions, the authors decided to additionally use the Advanced Transient Current Signature Analysis (ATCSA) diagnostic method. Both of these methods are based on spectral analysis. As a result, it was possible to detect the symptoms of impending cloggage of the rotor, allowing for the appropriate response of the control system during various phases of pump operation.
The subject of analysis in the article [19] were gas turbines, but the similarity of the methodology used and the type of available data allows an attempt to apply these studies to optimize pump operation. Sensors collecting real-time data on temperatures of exhaust gas and burner tip were sources of data; an additional problem stems from the need to detect faulty sensors and to assess the number of required sensors in multi-sensor systems, allowing the machine to continue operation. Signal reconstruction techniques were also applied, which can be additionally used to detect sensor faults.
Failure analysis of the rainwater axial pumps installed in a wastewater pumping station are developed to enable service work to be carried out while minimizing unnecessary actions and, at the same time, reducing the risk that it will cause unwanted unforeseen failures [20]. In addition, device manufacturers are trying to provide better, optimized devices [21,22], ensuring long-term operation and resistance to typical hazards in this work environment [18]. Advanced control algorithms are also being developed, from the level of individual devices and their groups, to the complex control of the SCADA process at a high level-overall control of all sewage treatment processes [23]. Various modeling methods and combining theoretical models with real devices as a digital twin are also being developed [12,24,25]. Dhanraj et al. [26] provided a review of solar panel failure detection methods, the methods of fault detection, and monitoring of photovoltaic panels that can be adapted to other devices that require continuous operation and quick response to changing conditions. Methods used in reliability analysis and failure prediction include algorithms that compare parameters collected under normal operating conditions with current ones, parameter-based models, fault tree analysis, and machine learning techniques. Summing up the review of the literature on the operation of devices, especially pumps, the following methods are distinguished along with the frequency of data collection: periodic inspections of technical condition (troublesome in the case of submerged pumps), periodic or continuous monitoring of the values normally available in control systems, as well as installing additional sensors to collect more detailed data [27].
Let us now analyze examples of maintenance methods, in particular those based on linear regression issues. In the article [19], a generalization of multi-dimensional linear regression was proposed in order to facilitate multi-sensor fault detection and signal reconstruction through the use of analytical optimization. The analytical formulation was used to propose a low computational overhead technique for real-time monitoring. Authors recommend this technique for a variety of devices that require continuous, reliable operation for safety reasons.
When developing methods for modeling and predicting pump failures in sewage treatment plants, one can also rely on the experience of other industries. Gong et al. [28] worked on a condition-based decision-making approach for the inspection time of a ship, assuming that the life-cycle-cost probability distribution contains two components. The first is the cost of a ship maintenance (distinguishing dry-docking and structural renewal), and the second is the cost of failure consequences. In this case, it is important to determine cost-effective inspection times that allow for longer operation of the ship while maintaining an acceptable level of risk and safety [28].
Girtler, in [29], proposed the three-state semi-Markov model of the process of state transitions of a ship's main engine with three states: full serviceability, partial serviceability, and unserviceability. Empirical data concerning the engine were used for calculating limiting probabilities for the process, which can be used in decision making with the support of Bayesian statistical theory.
Wu et al. [30] developed a component maintenance priority measure used to select additional components for preventive maintenance (PM) when performing PM on critical component of a ship. In article [31], the author proposed the use of regression analysis to control the level of maintenance of the ship's main engines. Factors taken into account when determining the level of maintenance were, among others, the weight of the element, the number of structural connections, and the operating temperature of the element. Regression functions were also used for the piston and crankshaft system, fuel system, cooling and lubrication systems, turbocharging system, and the starting system, with high multiple determination coefficients.
The authors of article [32] focused on the subject of shortening the time of the ship's stay in the ship repair yard. This stay should be minimized, and the authors used a multiple linear regression model in which the ship's repair time was a function of the ship's age, deadweight, and previous repairs to the paint coating of the hull, tanks, and structural steel. Method of least squares was applied to estimate the regression coefficients. Multiple linear regression analysis were also adopted from this article by the authors of [33] in order to assess the ship's maintenance needs. The authors proposed a genetic algorithm (GA) to determine the coefficient values of the multiple linear regression model, and the analyses were carried out on data from ships intended for the transport of oil and chemicals. Improved algorithms from this publication have been adapted to predict the time of correct operation of pumps in a sewage treatment plant.
The literature review shows that even though there are some articles on pumps control and maintenance optimization [12,17,20], only a few address the problem of estimating the failure-free time of a pump of the sewage treatment plant by observing the relationship among the data on pump condition and environment and failure-free time. This paper presents an approach to estimating the failure-free time for managers to assist in advance maintenance planning. The presented approach was also applied to find a hidden pattern that shows the relationship between maintenance work and maintenance duration. The original value of this paper is the application of a multiple linear regression model to a new problem: failure-free time estimation of a pump in a sewage treatment plant.

Goals and Approaches
One of the managers' goals is to pre-define the scope of maintenance works on a critical infrastructure. Another goal is to estimate the duration of maintenance and failurefree times in order to build predictive maintenance schedules and plan the workload of maintenance workers [34]. The purpose of this paper is to develop a model to predict a pump maintenance time. A multiple linear regression model is used to describe the effect of tank level (m), power (kW), voltage (V), frequency (Hz), current (A), and torque (Nm) on the failure-free time of a pump. The linear function with the highest value of the multiple determination coefficient and the smallest standard deviation value is sought. Data on operation of pumps are collected from SCADA system at sewage treatment plants in a region with a relatively high population density and concentration of industrial plants.
Another objective is to identify variables that have a significant impact on the pump's failure-free time and their inter-relationships. The objective of the paper is to present a condition-based prediction model to evaluate the failure-free time considering historical data.
The use of a GA to estimate values of linear regression coefficients describing the relationship between the recorded data on the pump condition and the time of failure-free operation is the original part of this article.
The paper is organized as follows: Section 2 contains characteristics of the pump station in the investigated sewage treatment plant; the next Section 3 presents a model for estimating the duration of a repair, as well as the GA for estimation of the regression coefficients. Regression coefficients estimation is presented in Section 4, including application of the Solver appendix of MS Excel (Section 4.1) and proposed GA (Section 4.2). Section 5 contains further analyses and experimental test results related to the research on the application of the GA. The paper concludes with a brief summary of the results (Section 6).

Characteristics of the Pump Station in the Sewage Treatment Plant
The wastewater treatment plant (WWTP) in which the analyzed pumps operate is modern and relatively new (built in 2002, modernized in 2013). It is equipped with an integrated automatic control system based on 8 PLCs, a SCADA (Supervisory Control and Data Acquisition) system, and a fieldbus fiber optic network. This is to ensure minimization of employment with simultaneous optimal and safe operation, allowing for obtaining a legal level of wastewater treatment.
A WWTP is an example of an installation that must operate continuously, 24 h a day, 7 days a week, all year round. The WWTP staff works in three shifts and consists of the management staff (manager and chief technologist, who work only on the first shift), dispatchers (one on each shift, who are responsible for continuous supervision of the sewage treatment plant operation via the SCADA system and overview of operators and maintenance workers), teams of operators working in individual shifts (responsible for ongoing operation of processes, equipment supervision, auxiliary and transport work, and removing minor problems), and maintenance workers (mechanics, electricians, and automation specialists, responsible for solving emerging problems as well as carrying planned maintenance works).
Working in a WWTP is associated with many inconveniences and dangers, including unpleasant odors, the risk of falling into open tanks and installations, microbiological hazards, dangerous gases, and the risk of explosion (in some wastewater treatment processes, methane and other gases are byproducts that can be even used to generate heat and electricity) [6,10].
Due to the need for continuous operation and changing conditions, including the amount and composition of wastewater (variable proportions of municipal and industrial sewage), sewage temperature, and ambient temperature, a large part of the key equipment and installations is redundant, and there are spare devices and connections between/among the sewage treatment plant departments/modules. Data adopted at the design stage of the WWTP include: average daily sewage flow (Q dśr ) 51,000 m 3 /day, maximum daily flow (Q dmax ) 60,000 m 3 /day, maximum hourly flow during rainless weather (Q hmax ) 3825 m 3 /h, and maximum hourly flow during rainy weather (Q hmax-max ) 5988 m 3 /h.
Wastewater is treated in the first stage by mechanical processes (draining and sedimentation) and in the second stage biologically (wastewater is processed by microorganisms that use it as food and bind various chemical compounds, which then must be removed) [2]. Biological processes require very precise regulation of conditions (temperature and time/flow rate), and any deviations may lead to the death of the favorable microorganisms and possibly the development of unfavorable ones. The variability of wastewater composition and temperatures means that the reactors must be precisely dosed with appropriate auxiliary substances, which is the responsibility of wastewater composition sensors, control system, technologist, and dispatcher. Obtaining and maintaining the right bacterial flora is a very difficult and slow process, and mistakes can have long-term effects (the operation of the WWTP can be disturbed for months or even years, leading to additional costs, environmental pollution, and the need to pay penalties).
The high variability of many factors influencing the way the WWTP works means that even an advanced automatic control system must be supervised and supported by a technologist and dispatcher, and many technological installations and devices require partly manual operation by operators (e.g., cleaning grates from large objects that flowed with sewage).
The raw wastewater pumping station (RWPS) is part of the WWTP, located at the very beginning of the technological scheme. It is directly supplied by wastewater from the sewage system of a medium-sized city (population of approximately 170,000 people and many industrial plants producing wastewater with a different composition than typical household wastewater) in a Central European country with a temperate climate.
The main task of the RWPS is to lift sewage (inflowing by gravity through underground channels) to a height of approximately 6 m above ground (level 0), due to which in the next stages of sewage treatment, wastewater flow is gravitational-it does not have to be additionally forced. A complete shutdown of the pump station is practically impossible, as it would stop the wastewater treatment processes, which could lead not only to disturbances in biological treatment processes but also to an increase in the level of wastewater in sewer collectors (and in extreme cases, to the need to discharge wastewater and its pollution to the river). It can be seen that the availability of efficient, ready-to-work pumps is crucial for maintaining the continuous operation of WWTP so as not to lead to an ecological disaster and the need to pay high penalties.

Description of the Main Raw Sewage Pumping Station
Wastewater is supplied to the RWPS by three main sewage collectors: the main sewage collector with a diameter of 1.5 m, supplying wastewater from the city center and the industrial district; an additional collector with a diameter of 0.8 m, supplying wastewater from new districts; and a combined collector with a diameter of 1.5 m, supplying combined wastewater and rainwater from city districts without separate collectors. The supply collectors connect in the inlet chamber, where the flow is equalized and the kinetic energy of the wastewater is reduced. From the inlet chamber, wastewater flows to Chambers A and B of the RWPS through pipes with a diameter of 0.8 m through large mesh gratings (the gratings are designed to retain large impurities that may have entered the sewage system). These connections can be closed as needed. The main part of RWPS is a recessed below-ground-level, reinforced concrete covered tank with dimensions A × B × H 5 × 12 × 10 m.
The pump station is equipped with five main pumps, installed in two chambers, A and B (Figure 1). The design parameters of the pumps are flow Q = 1500 m 3 /h and water lifting height 16.6 m. Each pump pumps wastewater through an independent pipeline to the channel in the sieve station (which is the first stage of mechanical wastewater treatment). The collector supplying wastewater from the sewage system is separated to feed both tanks with submersible pumps, and the flow of wastewater into Chambers A and B is independently controlled by valves. At the sewage inlets to the tanks, there are also gratings, which stop large-sized solid impurities/objects that could damage valves or pumps.
gratings, which stop large-sized solid impurities/objects that could damage valves or pumps.
Chamber A has two pumps (Pumps #1 and #2) and Chamber B has three pumps (#3, #4, and #5). The subject of interest are the pump's motors (1 to 4) controlled by inverters, allowing for their speed to be set and, consequently, for efficiency. Pump #5 is an auxiliary, controlled by a softstarter (unable to control rotation speed of the pump), and is used in exceptional circumstances (e.g., control system failure and no other pumps available) and is, therefore, not included in the analysis. All pumps are submerged, and the level of wastewater in the tank must exceed the height of the pumps by a safe margin in order for the control system to be able to start them. The two chambers, A and B, equipped with independent wastewater level sensors are connected to each other under normal operating conditions, and the level of wastewater in them should be identical. If it is necessary to perform service work on one of the chambers, it is necessary to cut off the chamber in which the given pump is installed (closing the inflow from the collector and the connections between the tanks) and empty/dry the chamber completely. At this time, the WWTP operates on the basis of the remaining active tank and the pumps installed in it. In the event that Tank B needs to be emptied, the backup Pump #5 becomes unavailable, which reduces the safety level of the entire RWPS. The pump repair procedure is very troublesome and expensive, so there is a need to develop methods and procedures to minimize its occurrence.

Collecting channel
Pumps in RWPS are submersible pumps (wet well installation) designed for efficient pumping of clean water, surface water, and wastewater containing solids or long-fibered material [35]. The maximum depth of immersion is 20 m. Each pump consists of two main Chamber A has two pumps (Pumps #1 and #2) and Chamber B has three pumps (#3, #4, and #5). The subject of interest are the pump's motors (1 to 4) controlled by inverters, allowing for their speed to be set and, consequently, for efficiency. Pump #5 is an auxiliary, controlled by a softstarter (unable to control rotation speed of the pump), and is used in exceptional circumstances (e.g., control system failure and no other pumps available) and is, therefore, not included in the analysis. All pumps are submerged, and the level of wastewater in the tank must exceed the height of the pumps by a safe margin in order for the control system to be able to start them.
The two chambers, A and B, equipped with independent wastewater level sensors are connected to each other under normal operating conditions, and the level of wastewater in them should be identical. If it is necessary to perform service work on one of the chambers, it is necessary to cut off the chamber in which the given pump is installed (closing the inflow from the collector and the connections between the tanks) and empty/dry the chamber completely. At this time, the WWTP operates on the basis of the remaining active tank and the pumps installed in it. In the event that Tank B needs to be emptied, the backup Pump #5 becomes unavailable, which reduces the safety level of the entire RWPS. The pump repair procedure is very troublesome and expensive, so there is a need to develop methods and procedures to minimize its occurrence.
Pumps in RWPS are submersible pumps (wet well installation) designed for efficient pumping of clean water, surface water, and wastewater containing solids or long-fibered material [35]. The maximum depth of immersion is 20 m. Each pump consists of two main modules: centrifugal pump (Xylem Flygt N 3356 submersible self-cleaning pump for wastewater and water 350/350 mm, power rating 45-140 kW) and motor (Xylem Flygt 665.000 motor, 3-50 Hz, 90 kW, 985 rpm, current 180 A).
Pumps are equipped with sensors: thermal sensors in the stator windings (to prevent overheating), an analog temperature sensor (to monitor the lower bearing), and leakage sensors in the stator housing/leakage chamber and the junction box. The sensors decrease the risk of bearing and stator failure.
Under average operating conditions of the WWTP, the capacity of one or a maximum of two pumps is sufficient, but it is not always necessary to use their full capacity. Si- multaneous activation of all pumps may be necessary only in the event of the maximum design sewage inflow, which has been determined by taking into account the increase in the number of people and industrial plants discharging sewage.
Each of the main pumps (#1-#4) is connected to a separate inverter, enabling electronic regulation of its rotation and, consequently, its efficiency. Inverters allow the pumps to be supplied with AC of the appropriate frequency, current, and voltage and the continuous monitoring of these parameters. Each pump and inverter are also equipped with various sensors to detect unusual conditions and events, e.g., overloading or overheating. Activation of the protections stops the pump operation and sends an appropriate alarm to the control system. Stopping the pump is also possible after pressing the "emergency stop" button by the pump station operator or any person in its vicinity. The pumps can be started by the inverter in manual or automatic mode, depending on the control system of the entire WWTP (SCADA). In the structure of the control system, the inverters are connected to the local PLC.
For further analysis, in order to develop a methodology for forecasting pump failures, reports from the SCADA system were used because only such data can be obtained automatically and was allowed for us to use. More detailed maintenance data can be found in dispatchers' work reports, but they are not available in electronic form, are often incomplete, and the format is not fully standardized.
The next section presents a model for estimating the failure-free operation of the pump as well as the genetic algorithm for determining of the regression coefficients.

Failure-Free Time Estimation Model
The problem of estimating the failure-free time of the main pump taking into account the condition data of tank level (m), power (kW), voltage (V), frequency (Hz), current (A), and torque (Nm) is investigated. In order to establish the relationship between modes and failure-free times (min), historical data are collected both for dependent (failure-free time) and independent variables (health conditions of the pump). Multiple linear regression analysis is adopted from [32] to identify the mathematical relationship between modes and failure-free times. A multiple linear regression analysis is performed for maintenance time estimation in [36].
Using the least squares method [32,33,37], the objective function is achieved: Sensors 2023, 23, 1785 The collected data on m i , Pr i , Ve i , Fy i , Ct i , Te i , and FT i is substituted in the Equations (2)-(8) simultaneously. Regression coefficients are estimated using the improved GA and Excel. The objective of the model is the selection of variables that have the greatest impact on the failure-free time. The adequacy of the presented model is computed using: (1) a standard deviation (2) a multiple determination coefficient where: x i -actual value of failure-free time X in period i,x i -theoretical value of the X variable resulting from the Equation (1) in period i, x i -mean value of the variable X in the time series of length n, k-number of variables explaining the model, and k + 1-number of model parameters.
The multiple determination coefficient measures the compliance of the trend function with historical data on m i , Pr i , Ve i , Fy i , Ct i , Te i , and FT i . To estimate the mean time to failure, the function with the highest value of the multiple determination coefficient (10) and the lowest value of standard deviation (9) is selected.
In order to estimate mean time to failure, the GA was elaborated, including five modules: data interface, chromosomes coding, optimization, selection of chromosomes, and chromosomes decoding.
In GA, a chromosome is represented by a vector of genes, which codes the regression coefficient values for the problem of estimation of a failure-free time of a pump, while a fitness function is used to evaluate the chromosome. The fitness function measures the adequacy of the presented model to historical data describing the condition of the pump.
The fitness function of a chromosome (11) is the scalar function of the coefficient of multiple determination (10) and standard deviation (9): where s x * η -the maximal standard deviation achieved in a set of chromosomes in an iteration and R(x * η )-the maximal coefficient of multiple determination achieved for chromosome x η .
The pseudo code of the GA is presented in Figure 2. The steps of the GA are explained below.
In order to estimate mean time to failure, the GA was elaborated, including five modules: data interface, chromosomes coding, optimization, selection of chromosomes, and chromosomes decoding.
In GA, a chromosome is represented by a vector of genes, which codes the regression coefficient values for the problem of estimation of a failure-free time of a pump, while a fitness function is used to evaluate the chromosome. The fitness function measures the adequacy of the presented model to historical data describing the condition of the pump. The fitness function of a chromosome (11) is the scalar function of the coefficient of multiple determination (10) and standard deviation (9): where s x * -the maximal standard deviation achieved in a set of chromosomes in an iteration and R x * -the maximal coefficient of multiple determination achieved for chromosome x . The pseudo code of the GA is presented in Figure 2. The steps of the GA are explained below.
Step 1: DNA Library generation Step 2: Generation of an initial population of chromosomes η While number of iterations τ = (0 … Γ) is higher than 0 repeat Step 3: Evaluate individuals x from η using the fitness function F (11), Step 4: Copy population η in order to population of clones λ, Step 5: Mutate globally clones λ in order to create offspring population λ*, Step 6: x ← c * if ( * ) < in the elite selection process, Step 7: Local mutation in order to create offspring population λ**, Step 8: x ← c * * if F(c * * ) < F x in the elite selection process Step 9: The best individual is selected In the condition-based estimation problem of the failure-free time of the pump, a chromosome is represented by a binary and integer-based representation. The complexity of the problem presented requires sophisticated coding practice for a chromosome, including not only regression coefficient values but also a positive or negative sign before each coefficient. Each sub-chromosome is created by random generation of the six genes from the DNA library (Table 1). Each chromosome of the individual represents a binary gene for a sign, an integer gene for a number of zeros after the decimal point for a coefficient value, and a mixed binary and integer value for a coefficient value. Each chromosome is represented by a set of sub-chromosomes, one sub-chromosome for one regression coefficient ( Figure 3). By decoding each sub-chromosome from left to right, values of regression coefficients b0,..b6 are substituted into Equation (1). [ ]  In the condition-based estimation problem of the failure-free time of the pump, a chromosome is represented by a binary and integer-based representation. The complexity of the problem presented requires sophisticated coding practice for a chromosome, including not only regression coefficient values but also a positive or negative sign before each coefficient. Each sub-chromosome is created by random generation of the six genes from the DNA library (Table 1). Each chromosome of the individual represents a binary gene for a sign, an integer gene for a number of zeros after the decimal point for a coefficient value, and a mixed binary and integer value for a coefficient value. Each chromosome is represented by a set of sub-chromosomes, one sub-chromosome for one regression coefficient ( Figure 3). By decoding each sub-chromosome from left to right, values of regression coefficients b 0 ,..b 6 are substituted into Equation (1). The process of coding a chromosome into a multiple linear regression equation consists of coding each sub-chromosome, i.e., each regression coefficient. Consider the first row of the matrix x j,n ( Table 2), which encodes the regression coefficient b 0 . The genes of each sub-chromosome are presented in columns of the matrix x j,n . The genes are coded from left to right. The first gene of sub-chromosome b 0 encodes the "+" sign, the second gene encodes the multiplier for the value before the decimal point (0), and the third gene encodes the coefficient value before the decimal point, that is 12. Gene (C) encodes the "0 or 1" value (Table 1), A necessary condition is added in order to generate a feasible solution: if D = 0, Gene C can only obtain the value 1, otherwise, when D = 1, Gene C can receive 1 for Sub-chromosome b 0 ( Table 2). Gene E encodes the number of zeros after the decimal point, that is, 0.0001. Gene F encodes the coefficient value after zeros, that is, 161.
Sensors 2023, 23, x FOR PEER REVIEW 12 of 24 procedure in the solution differentiation process. Two sub-chromosomes and genes are randomly selected for the mutation procedure. The mutation procedure (Step 5 of the GA) begins with the selection of two sub-chromosomes from the parent chromosome ( Figure 3a). Next, one gene is selected in each subchromosome from V, D, J, C, E, or F. The selected genes are removed from the sub-chromosomes (Figure 3b). Offspring chromosome is produced by copying the remaining genes from the corresponding positions of the parent chromosome. The missing genes are randomly selected from the DNA library (Figure 3c). For example, Sub-chromosomes b5 and b6 are selected for the mutation procedure (Figure 3a), Genes J and D are selected for b5 and b6, respectively (Figure 3b). The selected genes are removed from the chromosome of the parent, and remaining genes are copied in the corresponding positions of the offspring's chromosome (Figure 3b). The missing offspring sub-chromosome genes are replaced with genes randomly selected from the DNA library (Figure 3c).  For each chromosome, the decoding procedure is applied by computing the multiple regression equation (2) under the constraints (3)(4)(5)(6)(7)(8). Then, in the decoding procedure in Steps 3, 6, and 8 of the GA (Figure 2), the fitness function (11) with the components of multiple determination (10) and standard deviation (9) is computed. In the elite selection procedures (Steps 6 and 8 of the GA), the best chromosome survives to the next generation of two: parent and child.
In Step 7, a point mutation is performed on the parent chromosome. Only the F gene on a randomly selected sub-chromosome is mutated in the point mutation procedure. The selected gene is removed from the offspring's sub-chromosome. The removed gene is replaced with a randomly selected gene from the F library. The elite selection procedure is presented in detail in Figure 4. The best chromosome survives to the next generation of the two: parent and offspring.
A termination condition is running a predefined number of iterations. The best chromosome, which is said to be the optimal or close to the optimal, is stored in the last iteration.
The next section presents the estimation of the regression coefficients, including the application of MS Excel Solver and the proposed GA.  For each sub-chromosome, the following decoding procedure is applied: Value 12.161 (13) is achieved for the first sub-chromosome and substituted into Equations (2)-(8) for the regression coefficient b 0 ; value -12 is achieved for the second sub-chromosome, that is, for the regression coefficient b 1 , etc.
x 0,n = (V) · (D · J + (C · (E · F))) = −1 · (1 · 12 + 1 · (0.001 · 161)) = 12.161 (13) An initial population is generated by a random generation of regression coefficients (Step 2 of the GA). The parent pool is generated by copying the initial set of regression coefficients (Step 4 of the GA). A new chromosome is generated by the two-point mutation procedure in the solution differentiation process. Two sub-chromosomes and genes are randomly selected for the mutation procedure.
The mutation procedure (Step 5 of the GA) begins with the selection of two subchromosomes from the parent chromosome ( Figure 3a). Next, one gene is selected in each sub-chromosome from V, D, J, C, E, or F. The selected genes are removed from the subchromosomes (Figure 3b). Offspring chromosome is produced by copying the remaining genes from the corresponding positions of the parent chromosome. The missing genes are randomly selected from the DNA library (Figure 3c). For example, Sub-chromosomes b 5 and b 6 are selected for the mutation procedure (Figure 3a), Genes J and D are selected for b 5 and b 6 , respectively (Figure 3b). The selected genes are removed from the chromosome of the parent, and remaining genes are copied in the corresponding positions of the offspring's chromosome (Figure 3b). The missing offspring sub-chromosome genes are replaced with genes randomly selected from the DNA library (Figure 3c).
For each chromosome, the decoding procedure is applied by computing the multiple regression equation (2) under the constraints (3)(4)(5)(6)(7)(8). Then, in the decoding procedure in Steps 3, 6, and 8 of the GA (Figure 2), the fitness function (11) with the components of multiple determination (10) and standard deviation (9) is computed. In the elite selection procedures (Steps 6 and 8 of the GA), the best chromosome survives to the next generation of two: parent and child.
In Step 7, a point mutation is performed on the parent chromosome. Only the F gene on a randomly selected sub-chromosome is mutated in the point mutation procedure. The selected gene is removed from the offspring's sub-chromosome. The removed gene is replaced with a randomly selected gene from the F library. The elite selection procedure is presented in detail in Figure 4. The best chromosome survives to the next generation of the two: parent and offspring.  A termination condition is running a predefined number of iterations. The best chromosome, which is said to be the optimal or close to the optimal, is stored in the last iteration.
The next section presents the estimation of the regression coefficients, including the application of MS Excel Solver and the proposed GA.

Regression Coefficient Estimation
The failure-free time estimation problem of the pump is investigated using computer software: MS Excel Solver and GA written in C++ language. The objective is to estimate the failure-free time of the main pump, taking into account the pump condition data.

Regression Coefficients Estimation Using MS Excel
The presented model is verified with MS Excel. The following hypotheses are tested: • the failure-free time is proportional to variables Pr i and Ve i taking into account data collected for the main pump and 9 events. Taking into account historical data on Pr i and Ve i collected for the main pump for n = 9 failures, the Solver estimates values of unknown regression coefficients b 0 , . . . , b 6 . The achieved regression formula is (14): Under the constraints (15,16): The estimated values of the parameters of Equation (14) and constraints (15,16) cannot be accepted since the multiple determination coefficient, R 2 = 0.000006, and the standard deviation is rather high, s = 371.128 min. The average failure-free time depends to the greatest extent on the voltage variable (Ve i ) since the value of b 2 is the highest (1.706199) compared with the value of b 1 .
Taking into account 54 historical observations on m i , Pr i , Ve i , Fy i , Ct i , Te i , and FT i for the main pump collected during 10 days, the Solver estimates values of regression coefficients (17): The estimated values of the parameters of Equation (17) are adequate for the main pump since the multiple determination coefficient R 2 equals 0.96, and the standard deviation is 29.1058 min. The high value of parameter b 6 (0.81196) means that the value of failure-free time is high relative to Te i (Nm). The average failure-free time depends to the greatest extent on the torque data. The second most related variable is Ve i since the value of b 3 is higher (0.379) compared with the values of b 1 , b 2 , b 4 , and b 5 .
Taking into account 30 historical observations on m i , Pr i , Ve i ,Ct i , Te i , and FT i for the main pump collected during 5 days, the Solver Appendix estimates values of regression coefficients (18): The estimated values of the parameters of Equation (18) are adequate for the main pump since the R 2 = 0.08 and the standard deviation s = 371.75 min. The high value of parameter b 6 (0.7268277) means that the value of failure-free time is high relative to Te i (m). The second most related variable is Ve i .
Taking into account 45 historical observations on m i , Pr i , Ve i , Fy i , Ct i , Te i , and FT i for the third pump collected during 5 days, the Solver Appendix estimates values of regression coefficients (19): The estimated values of the parameters of Equation (19) are not adequate for the third pump since the multiple determination coefficient R2 = 1.28, and the standard deviation s = 547.72 [mins]. The high value of parameter b 6 (1.01743) means that the value of failurefree time is most related to FT i (Nm). The average failure-free time depends to the greatest extent on the torque data. The second most related variable is Ct i since the value of b 3 is higher (0.45264) compared with the values of b 1 , b 2 , b 4 , and b 5 .
Let us consider 45 historical observations on M i , Pr i , Ve i , Ct i , Te i , and FT i for the third pump collected during 5 days, the Solver Appendix estimates values of regression coefficients (20): FT i (20) In addition, the estimated values of the parameters of Equation (20) are not adequate for the third pump since R 2 =1.29 and s = 548.022 min. The conclusions are the same as in the case of the third pump's condition data, including Fy i .
Finally, consider 75 historical observations for each data on m i , Pr i , Ve i , Ct i , Te i , and FT i for the first and third pumps collected during 5 days, the Solver Appendix estimates values of regression coefficients (21): The estimated values of the parameters of Equation (21) are adequate for the main pump since R 2 = 0.69 and s = 467.9 min. The value of failure-free time is highly related to Te i (Nm). Again, the second most related variable is Ve i .
The most adequate multiple linear regression Equation (17)

Regression Coefficients Estimation Using the GA
Historical data on both dependent (FT i ) and independent variables (m i , Pr i , Ve i , Fy i , Ct i , and Te i ) to establish the relationship between modes and failure-free times were used. A relationship between modes and failure-free times (2-8) (independent and dependent variables) was described by a chromosome-a vector of regression coefficients. The affinity function corresponded to the adequacy of the presented model to historical data on m i , Pr i , Ve i , Fy i , Ct i , Te i , and FT i .
The GA for estimating pump uptime was coded in Borland C++. Computer simulations were performed for a set of key parameters, namely initial population size First, the average quality of the population for computer simulations described by {chn, mn, in = 70} is examined ( Figure 6). The mean scalar function increases as soon as computer simulations reach approximately 20 iterations. After approximately 30 iterations, the mean scalar function decreases, except for the experiment described by {chn = 200, mn = 1, in = 70}. The mean scalar function improves as the number of mutation points increases. As can be seen, better results are not so much related to the size of chn as to mn.  (17), the predicted failure-free time equals 732.131644 mins. The real failure of the main pump occurred on the 12th at 1263 mins.

Regression Coefficients Estimation Using the GA
Historical data on both dependent (FTi) and independent variables (mi, Pri, Vei, Fyi, Cti, and Tei) to establish the relationship between modes and failure-free times were used. A relationship between modes and failure-free times (2-8) (independent and dependent variables) was described by a chromosome-a vector of regression coefficients. The affinity function corresponded to the adequacy of the presented model to historical data on mi, Pri, Vei, Fyi, Cti, Tei, and FTi.
The GA for estimating pump uptime was coded in Borland C++. Computer simulations were performed for a set of key parameters, namely initial population size (chn = First, the average quality of the population for computer simulations described by {chn, mn, in = 70} is examined ( Figure 6). The mean scalar function increases as soon as computer simulations reach approximately 20 iterations. After approximately 30 iterations, the mean scalar function decreases, except for the experiment described by {chn = 200, mn = 1, in = 70}. The mean scalar function improves as the number of mutation points increases. As can be seen, better results are not so much related to the size of chn as to mn.  Table 3 presents the best multiple determination coefficients obtained for the regression coefficients (b0, b1, …, b6) in the simulation described by {chn, mn and in = 70}. The values of R 2 should be in the range [0, 1]. Therefore, directed selection pressure with four conditions was developed (Figure 4). For example, a solution survives when R 2 approaches 1 from a pair (parent and offspring) with coefficients greater than or equal to 1 in the first condition. In other cases, a solution survives when R 2 approaches 1 from a pair (parent and offspring) with coefficients less than 1.  Table 3 presents the best multiple determination coefficients obtained for the regression coefficients (b 0 , b 1 , . . . , b 6 ) in the simulation described by {chn, mn and in = 70}. The values of R 2 should be in the range [0, 1]. Therefore, directed selection pressure with four conditions was developed (Figure 4). For example, a solution survives when R 2 approaches 1 from a pair (parent and offspring) with coefficients greater than or equal to 1 in the first condition. In other cases, a solution survives when R 2 approaches 1 from a pair (parent and offspring) with coefficients less than 1.
Next, the average quality of the population is examined for computer simulations described by {chn, mn, in = 140} (Figure 7). The same phenomenon is noticed: the mean scalar function increases as soon as computer simulations reach approximately 20 generations. After approximately 40 generations, the mean scalar function decreases. For the experiment described by {chn = 100, mn = 1, in = 140}, the GA works according to its idea of improving the genetic material from generation to generation only after 100 iterations. The phenomenon of improving the mean scalar function with the increase in mn was clearly visible for the computer simulations described by {chn, mn, in = 140}. The conclusion was proved that better results are not so much related to chn as to mn. Table 3. Multiple determination coefficient obtained for the best regression coefficients (b 0 , b 1 , . . . , b 6 ) in the simulation described by {chn, mn, in = 70}. Next, the average quality of the population is examined for computer simulations described by {chn, mn, in = 140} (Figure 7). The same phenomenon is noticed: the mean scalar function increases as soon as computer simulations reach approximately 20 generations. After approximately 40 generations, the mean scalar function decreases. For the experiment described by {chn = 100, mn = 1, in = 140}, the GA works according to its idea of improving the genetic material from generation to generation only after 100 iterations. The phenomenon of improving the mean scalar function with the increase in mn was clearly visible for the computer simulations described by {chn, mn, in = 140}. The conclusion was proved that better results are not so much related to chn as to mn.
A comparison of the average quality of the population for computer simulations described by both {chn, mn, in = 70} ( Figure 6) and {chn, mn, in = 140} (Figure 7) allows us to draw the following conclusion: a greater number of mutation points {mn = 3.4} and iterations {in = 140} produce better results.  Table 4 presents the best R 2 obtained for the regression coefficients (b0, b1, …, b6) in the simulation described by {chn, mn and in = 140}. R 2 approaches 1 with more mutation points. More iterations are required for GA to find the right solution, even using the directed selection pressure.   Table 4 presents the best R 2 obtained for the regression coefficients (b 0 , b 1 , . . . , b 6 ) in the simulation described by {chn, mn and in = 140}. R 2 approaches 1 with more mutation points. More iterations are required for GA to find the right solution, even using the directed selection pressure.
The next section contains further analyses and discussion related to the research on the application of the GA.

Discussion of the Results of the GA Experiments
The presented method is useful for managers to obtain an accurate forecast. A high selection procedure is given for the smallest s of the obtained forecasts (9). On the contrary, in Section 4, the results were obtained with R 2 (10) for selection pressure.
Six computer simulations were carried out for the values of key parameters: chn = 100, in = 100, and mn = {3, 4, 5} (Table 5). Coefficient R 2 is at a maximum for (23) obtained using the GA (Figure 8). Although the best R(x 6 ) 2 = 1, s is still very high (571). The average failure time of a main pump depends to the greatest extent on Ve i , Ct i , and Te i , according to (23):    Finally, we performed simulations for a limited range of the J gene from [0, 2] to reduce the solution space and, thus, increase the chances of achieving a better solution.
Two experiments were run for {chn = 100, mn = 3, in = 100}, and the GA achieved linear regression coefficients (24,25): 54 · 1.00056 + 0.00338 Coefficient R 2 achieved the maximum value in both experiments (Figure 9). The value s for the linear regression (24) is 223 min. The value s for the linear regression (25) is 157 min. The average failure time of the main pump depends to the greatest extent on Pr i in the case of (24). The average failure time of the main pump depends to the greatest extent on Ct i and Te i , according to (25). Managers of the sewage treatment plant may substitute new data for sewage level (m), power (kW), voltage (V), frequency (Hz), current (A), and torque (Nm) into Equation (25) in order to estimate the future failure-free time. The predicted failure-free time of the main pump is 626 mins.
As a result of the analysis of the obtained results (Figures 8 and 9), the following conclusions can be drawn:

•
The agreement of the regression line with the empirical data is very good (even 99-100%), while the GA obtained a slightly better fit (R 2 = 1) compared with Excel (R 2 = 0.9943).

•
In both graphs, the estimated repair time is very sensitive to an increase in historical time: a relatively small increase in historical time results in a stronger increase in estimated repair time.

•
Further research is needed taking into account more days of data collection to support the conclusion that the time between failures of the main pump depends on current and torque.

Conclusions
Planning the operation and repair of pumps in a wastewater treatment plant is a very important activity due to the serious consequences of potential failures, including environmental contamination and high costs. It was assumed that the tool for forecasting the time of failure-free operation can be based only on historical data collected from the SCADA system, without installing additional sensors and measuring instruments or interfering with SCADA and the current operation of the sewage treatment plant. Integra- Managers of the sewage treatment plant may substitute new data for sewage level (m), power (kW), voltage (V), frequency (Hz), current (A), and torque (Nm) into Equation (25) in order to estimate the future failure-free time. The predicted failure-free time of the main pump is 626 min.
As a result of the analysis of the obtained results (Figures 8 and 9), the following conclusions can be drawn:

•
The agreement of the regression line with the empirical data is very good (even 99-100%), while the GA obtained a slightly better fit (R 2 = 1) compared with Excel (R 2 = 0.9943).

•
In both graphs, the estimated repair time is very sensitive to an increase in historical time: a relatively small increase in historical time results in a stronger increase in estimated repair time.

•
Further research is needed taking into account more days of data collection to support the conclusion that the time between failures of the main pump depends on current and torque.

Conclusions
Planning the operation and repair of pumps in a wastewater treatment plant is a very important activity due to the serious consequences of potential failures, including