2.1.1. Business Understanding
The objective of business understanding is to define the project’s requirements. In the case study, the generator’s events and alarms list are primarily focused on fulfilling cause and effect matrices. However, in many instances, this information alone does not provide the operator with a comprehensive diagnosis of a failure. For example, when the machine’s supervisory system notifies the operator that the machine tripped due to a variable exceeding a trip limit, it fails to provide the operator with a detailed diagnosis of the specific cause that led to the variable surpassing the limit. In many cases the information that the operator has is related to very low, low, high, or very high sensor values (information of magnitude values that generate alarms and tripping).
This sensor information requires that the maintenance team analyze what happened to the machine to diagnose the failure. In some cases, the machine’s manuals’ troubleshooting sections need to be consulted, or specialist maintenance teams must be involved to diagnose a failure. Thus, the time spent investigating the causes of a failure is directly related to the increase in Mean Time to Repair (MTTR), impacting the decrease in equipment availability.
This case study was conducted during an observation period spanning from 1 January 2019 to 31 December 2022, during which all failures in the power system’s generators were recorded. This system is in operation since February of 2018 and the first year of operation of the system was disregarded for failure diagnosis since it is part of commissioning phase of the project.
In 2019, Generator A experienced eighteen forced shutdowns due to failures. Notably, the mineral lube oil system, along with the fuel gas and turbine control systems, had the greatest impact on this generator’s reliability.
Aside from operator-reported failures, the maintenance records for Generator A were examined. This examination revealed that the mineral lube oil system, along with synthetic lube oil and turbine monitoring systems, accounted for the most frequent issues during the evaluation period.
Having the results of the main systems that had an impact on the selected generator’s reliability performance, the next step was to analyze how the turbogenerator’s systems and subsystems relate each other in order to establish an operational priority system to develop the failure diagnosis model.
Figure 3 illustrates the tubogenerator system and subsystem breakdown. As can be seen, the mineral lube oil system influences several parts of the machine, such as the gas turbine, gearbox, and generator. As this system is vital for the turbogenerator operation and is one of the top three issues reported in operation and maintenance records, it was selected as the focus for the failure diagnostic model. The scope of the study expanded to encompass other generator systems as well.
To develop the diagnostic model, it is necessary to first understand the failure modes of the mineral lube oil system. This information can be accessed through the engineering documentation containing the Failure Modes, Effects, and Criticality Analysis (FMECA) [
30], where all the failure modes and affected parts of this system are listed. The failure mechanisms and failure modes are then described and codified according to ISO 14224. The effects of the failures are evaluated, and the criticality is defined in order to establish the maintenance and spare parts strategy. Some maintenance strategies include parameter monitoring for failure detection.
Fuzzy models are widely known for their characteristics of being a knowledge-based modeling method that uses semantic descriptions obtained from human operators and/or expert knowledge as model inputs with gradual boundaries rather than discrete numeric ones [
31]. This method can be used to incorporate specialist knowledge, such as FMECA, into an AI model.
From the FMECA list, several failures that can be detected through online monitoring were identified, as presented in
Table 1, serving as outputs for the fuzzy diagnostic model. Additionally, two other failures were added to the model: mineral lube oil tank vent failure and monitoring sensor failure.
The FMECA also suggests monitoring parameters for condition assessment. These parameters can be gathered through online sensor monitoring, operational route checks, predictive maintenance, or preventive maintenance monitoring. These sensor signals serve as inputs for the fuzzy model, which bridges the gap in the turbogenerator supervisory system by explaining why a generator shifted from “in service” to “forced shutdown”.
2.1.2. Data Understanding
Pressure, temperature and level are the magnitudes that are directly available at sensors of the mineral lube oil system. These sensors monitor certain points of interest of the system to provide information to the turbogenerator’s control and protection systems enabling safe operation of the machine.
For each potential failure mode, a meticulous analysis was conducted using FMECA to identify the sensors capable of detecting specific component-related failures. Notably, not all sensors within the mineral lube oil system were necessary inputs for constructing the model. In instances where FMECA did not cover certain failure modes, the expertise of system specialists was relied upon to select the appropriate set of sensors for detection.
Having established the sensor array based on the system’s mapped failure modes, the subsequent task involves obtaining calibration ranges for each process variable-monitoring sensor to determine the FIS universe variables. Using project documentation such as instrument lists, pipe & instrument diagrams, and electrical diagrams, it is possible to determine the lower and upper limits of each variable that is monitored by the system’s sensors.
The acquisition and processing of data from the monitoring sensors of the machines under study were performed by collecting data from the supervisory system of the production plant where the turbogenerators are installed. This was done using the Python library “pandaspi” which serves as an interface between the Python language and the generator’s supervisory system for raw data acquisition [
32] in a data frame format. A data cleaning step was necessary to avoid some missing or unsampled values.
The total data acquisition consisted of 3,112,051 input patterns per attribute, with a total of 13 attributes and a sampling period of 60 s per variable.
2.1.3. Data Preparation and Modeling
Throughout the operation of the turbogenerators, an array of sensors monitors various components of the machines. These sensors are equipped with distinct operational thresholds categorized as normal operation, alarm, and trip ranges.
The normal operation range is comprised between the low and high alarm limits and it is the range where the machine operates under normal conditions.
The alarm range is generally divided into low and high alarm ranges. When a machine’s parameter resides within either of these ranges, the machine retains functionality, although the parameter’s status is no longer considered a normal operating condition and can lead the machine to a failure condition since the parameter is now closer to the trip value, which is the next scale in the trip values.
For the trip range, the same analogy as for the alarm range can be performed. The difference is that the limits of alarm and trip values are different and established by the machine manufacturer in order to protect the machine.
The trip and alarm limits of each input variable were used as parameters for the limits of the membership functions in the FIS model. As not all sensors are designed exclusively for protective measures, but some also serve diagnostic purposes, the configuration of membership function limits requires the expertise of specialists to devise an optimal strategy for the model.
The trapezoidal function was chosen to represent the behavior of the inputs due to its characteristics of high membership in a range of values for the low, medium, and high sensor readings. Equation (
1) represents the trapezoidal membership function shown in
Figure 4.
For sensor readings below , the membership value to a given fuzzy set is low, resulting in . In the interval , the membership value to a given fuzzy set ranges between “0” and “1”, indicating a weak association to that fuzzy set. The maximum membership value is achieved when the sensor readings fall within the interval defined by . This maximum membership value implies that the crisp value maximizes the membership function. Similar evaluations can be performed for the sensor reading intervals between and . In the first interval, the membership value to a given fuzzy set ranges between “0” and “1”, while in the second interval, the membership value is low, resulting in .
As shown in
Figure 1, three fuzzy sets were created for each sensor, where low readings of the sensors indicate that the sensor is measuring a magnitude below its normal operating range. In this case, the flat part of the trapezoidal function represents the readings below the trip value, and the slope of the trapezoidal function decreases from the low trip value to the low alarm value.
Medium readings of a sensor mean that the sensor is operating within its normal range. The flat part of the trapezoidal function represents the normal readings, while the slopes of the trapezoidal function increase from the low trip value to the low alarm value and decrease from the high alarm value to the high trip value.
High readings of a sensor mean that the sensor is measuring a magnitude above its normal operating range. The flat part of the trapezoidal function represents the readings above the trip value, and the slope of the trapezoidal function increases from the high alarm value to the high trip value.
With the creation of each membership function for the three ranges of sensor readings, the fuzzy inputs for the FIS can be generated. The formulation of fuzzy rules is a crucial step that relies on the failure modes of the mineral lube oil system and the interrelationships among measured parameters. In order to establish these relationships between the system’s monitoring variables and identify the affected part in the event of a failure in the mineral oil lubrication system, logical operators, as described in Equations (
2)–(
5), are employed. This approach is followed by a diagnostic method consistent with FMECA analysis, where the failure classes are based on the machines’ failure modes.
where the operator in Equation (
2) is the negation or the inverse of the truth value for a given proposition “
P”. The operator in Equation (
3) is the logical “or” operator, which is equivalent to the maximum truth value from the evaluations made between the membership functions of propositions “
P” and “
Q” for each fuzzy set. Equation (
4) presents the “and” operator, which is equivalent to the minimum truth value from the evaluations made between the membership functions of propositions “
P” and “
Q” for each fuzzy set. Finally, Equation (
5) presents the implication operator, which represents the “if”/”then” operation.
The created fuzzy rules were based on the processing of natural language processing since the fuzzy inputs defined as low, medium and high sensor’s readings.
Table 2 lists each rule created for the FIS relating the input variables to obtain the desired diagnose based on the FMECA documentation.
With fuzzy rules and inputs, FIS engine makes the inferences based on the instantaneous fuzzy input values and generate the fuzzy outputs that need to be defuzzificated to obtain a crisp output. The defuzzification thecnique employed was the centroid, according to the Equation (
6), where
is the crisp output of fuzzy control system (the centroid) and
is the output membership function and
z is the corresponding crisp output value.
For classification purposes the fuzzy outputs were divided in 9 classes as presented in the
Table 3. Triangular membership functions were used to transform the fuzzy outputs in crisp outputs in the FIS. Each triangular membership function had its peak on the central value of the corresponding crisp output range and the triangle base was defined between the crisp output range.
For the creation of the fuzzy inference system, an algorithm was developed in Python. The library “scikit-fuzzy” was used in this implementation, considering that it has the necessary functions for the development of applications in fuzzy logic, for the chosen programming language. The ”Antecedent” function of the “Control” module in this library was used to create the antecedent sets, which are the functions that are activated by the input values of each variable. Similarly, the ”Consequent” function was used to create the consequent memberships of the fuzzy inference system.
The “Rule” function from the “Control” module was employed to establish the rules to be applied to the preceding data points. To enhance the classification method, two stages of fuzzy inference were developed. In the initial stage, an evaluation is conducted to determine if the system is operating in a healthy or faulty state. If a fault is detected, the subsequent step involves identifying the specific type of failure present in the system.
The “ControlSystem” and “ControlSystemSimulation” functions were used to define the relationships between the created rules and perform the calculation of the fuzzy output of the inference systems. The function “Compute” was then used to calculate the output of the fuzzy inference system given a set of inputs.
The functions “trapmf” and “trimf”, which also belong to the library “scikit-fuzzy”, were used for creating the membership functions of the inference model, where the parameters for those functions were established in accordance with the defined limits of the inputs and outputs.
To enhance the performance of the classification model and minimize computational processing, the first step is to evaluate whether the system is in a healthy or faulty state. If the system is in a faulty state, the failures are classified based on
Table 3. Since multiple failures can occur simultaneously, the algorithm is designed to handle this condition. The same set of rules was utilized for both detecting system faults and diagnosing failures.
A flow diagram depicting the FIS algorithm is presented in
Figure 5. The algorithm begins by acquiring data from the dataset and passing it through a data cleaning stage. Next, the algorithm creates the universe variables based on the calibration ranges of the sensors, followed by the creation of input membership functions based on the limits of the input variables. Rules are then created using logical operations to establish relationships between system variables, based on the failure modes defined in the FMECA. Subsequently, output memberships are created to facilitate the diagnosis of failures.
During the inference phase, the FIS engine iteratively processes each input variable and rule, taking into account their interactions and combining the results to generate the final output. If normal operation is detected, no classification is made, and the algorithm proceeds to process the next sample. However, if a failure is detected during the inference phase, a more detailed failure classification is performed to determine which failure(s) occurred at the given time. The failure(s) are then classified, and an output of the model is generated.