Diagnostics and Prognostics of Energy Conversion Processes via Knowledge-Based Systems †

This paper presents a critical and analytical description of an ongoing research program aimed at the implementation of an expert system capable of monitoring, through an Intelligent Health Control procedure, the instantaneous performance of a cogeneration plant. The expert system is implemented in the CLIPS environment and is denominated PROMISA as the acronym for PROgnostic Module for Intelligent System Analysis, generates, in real time and in a form directly useful to the plant manager, information on the existence and severity of faults, forecasts on the future time history of both detected and likely faults, and suggestions on how to control the problem. The expert procedure, working where and if necessary with the support of a process simulator, derives from the available real-time data a list of selected performance indicators for each plant component. For a set of faults, pre-defined with the help of the plant operator (Domain Expert), proper rules are defined in order to establish whether the component is working correctly; in several instances, since one single failure (symptom) can originate from more than one fault (cause), complex sets of rules expressing the combination of multiple indices have been introduced in the knowledge base as well. Creeping faults are detected by analyzing the trend of the variation of an indicator over a pre-assigned interval of time. Whenever the value of this ‘‘discrete time derivative’’ becomes ‘‘high’’ with respect to a specified limit value, a ‘‘latent creeping fault’’ condition is prognosed. The expert system architecture is based on an object-oriented paradigm. The knowledge base (facts and rules) is clustered: the chunks of knowledge pertain to individual components. A graphic user interface (GUI) allows the user to interrogate PROMISA about its rules, procedures, classes and objects, and about its inference path. The paper also presents the results of some simulation tests.


Introduction
Modern energy conversion plants are very complex systems under a technological point of view. Any downtime or drop in the energy quality of the output involve often unacceptable elevate direct and indirect monetary losses, but even more important is the resource destruction that constitutes the end result of the fault. All modern design methods contain procedures that take into due account variable load conditions (off-design operation), availability losses due to scheduled and unscheduled maintenance and performance degradation due to wear and fouling in the equipment.
To lessen the likelihood of plant failures, preventive maintenance is regularly performed: it reduces by three to nine times the costs in lost production [1], higher costs for parts, and other overhaul costs compared to reactive, unplanned maintenance.
Among the different types of preventive maintenance, Condition-based Maintenance (CBM) presents several advantages when applied to energy conversion systems. CBM is a maintenance strategy that monitors the actual health evolution in time of the plant operational performance and reports the cause(s) of the detected malfunctioning. Therefore, maintenance will only be performed when certain indicators show signs of decreasing performance or upcoming failure. These indicators include non-invasive measurements, performance data and scheduled tests.
Compared with preventive maintenance, CBM thus increases the time between maintenance interventions, because maintenance is done on an as-needed basis [2][3][4].
Condition data can be gathered either at certain intervals, or continuously. The analysis of the data flowing from the plant becomes rapidly overwhelming, which make it difficult to analyze. A powerful aid to this task is provided by the implementation of an expert system. The choice of a knowledge-based expert system rather than a deep learning solution is suggested because a lack of training data makes machine learning approaches fall short. Moreover, the expert system works toward explainable AI and expands the knowledge through collaborative interactions. Most modern AI algorithms are like black boxes resulting in answers and recommendations without any insight into how the system arrived at those answers and which parameters were most significant.
Intelligent process management tools (IPMTs) [5,6] are not only by definition capable of producing an intelligent diagnosis of the present state of the plant but also to enact a prognostic action, making intelligent estimates of the future state of the plant under the foreseen boundary conditions [7]. Finally, they can use design, operation and load-scheduling data, together with other relevant external information (like for instance local weather forecasts or projected operating load curves of similar plants in the same ''fleet'') to provide operators with valuable information about the ''optimal'' operating curve of the plant in some future period T [8].
The present paper describes the development of a diagnostic and prognostic tool, specifically designed for a gas turbine-based cogeneration system; its development constitutes though a useful paradigm for different applications.
Let us define the plant availability factor PF as the ratio of the total equivalent full load operating hours in a year and the total number of hours in the year. It is apparent that no energy conversion plant can operate with PF equal to 1, due to three orders of reasons: (a) plant shutdowns due to scheduled maintenance; (b) plant shutdowns due to unscheduled maintenance; (c) plant shutdowns due to sudden failures.
It is useful for our purposes to separately account for events of type ''b'', that imply the replacement of a component for which an early failure has been prognosed, and events of type ''c'', in which the replacement is done after the failure has forced a plant shutdown.
Our study specifically concentrates on plant shutdowns due to sudden failures. Strictly speaking, ''sudden catastrophic failures'' rarely happen as such, and when they do, they are obviously by definition unforeseeable. But extensive field studies have conclusively shown that most of the failures we call ''sudden'' are in reality caused by a series of component-localised phenomena that lead to a (usually very small but still significant) deterioration of its performance.
Our efforts may thus be redirected to the early detection of these ''performance degradation''warning signals. The method to follow is in principle straightforward: a sufficient number of ''critical points'' in the process are monitored in real time, and a specific series of performance decay indicators are computed. As soon as one of these faults is detected, the operator, working under tight co-operation with the designer and the plant manager decides whether to execute an immediate shutdown to fix the fault, or to wait until the next scheduled maintenance intervention.

The General Conceptual Layout of a Diagnostic/Prognostic System
In the language of artificial intelligence (AI), we say that a procedure is enacted by an ''Agent''. In the following description, the agent is our expert system: but it is easy to recognise a high degree of similarity between the individual steps of the procedure and the actions that a human operator would take when executing the same task. Our scope here is to show that both the procedures in its entirety and each one of its single steps is feasible at the present level of AI technology. We shall separately describe the diagnostic and the prognostic procedures, but will show later that they both admit a meta-procedure, i.e., they can be embedded in a single code.

A Diagnostic System
A possible procedure for an automatic diagnostic system consists of the following steps: 1. The intelligent agent (IA) must identify in "real time" the operational state of the process. This requires that the IA be endowed with an efficient interface with a process data collection system which produces vector of length N containing an ordered set of measurables, i.e., of process parameters that identify the state (mass flow rates, pressures, temperatures, etc.). 2. At each selected time step, the IA must compare, at each selected time step, the detected operational state with the expected one. To do this, IA must have access either to a predetermined operational process schedule of the process, or, if the latter is not available, to a reliable process simulator that provides the IA with such reference operating state. 3. If the value of the kth measurable differs from the corresponding design value by more than a preset tolerance, the IA activates a monitoring-and-control procedure on the component to which this measurable pertains. 4. The IA verifies whether the ''failure'' condition just detected appears in one of the ''fault chains'' contained in its knowledge base. If it does, then the IA proceeds to step 5 here below. If it does not, the IA activates a sub-procedure to monitor k for a prescribed period of time, and notifies the (human) plant operator of this action. 5. If the event ''kth measurable out of range'' belongs to one or more fault chains known to the IA, the agent launches a monitoring-and-control procedure on all measurables i, j, …, p that appear together with k in the detected fault chains. 6. If a fault chain is indeed identified as ''active'', the IA will: a-notify the plant operator; bconsult its knowledge base to search for remedial actions (e.g., adjustment of other process parameters to compensate for the derangement in k); c-decide whether it is possible to wait for the next scheduled maintenance intervention or a repair/substitution is immediately necessary.

A Prognostic System
A possible procedure for an automatic prognostic system consists of the following steps: 1. The IA must compare at a pre-determined time step the operational state of the process. 2. The IA projects the detected operational state forward in time, founding this projection on the most recent time history (two or more previous time steps) of the process. 3. If the projected value of the kth measurable at t + Δt activates one of the known fault signatures, or if it shows an undesirable trend in the time history of xk (e.g., "dxk/dt too high" according to some norm), the IA activates a monitoring-and-control procedure on the component to which this measurable pertains. 4. The IA also launches a monitoring-and-control procedure on all measurables r, s, …, z that are related to k (i.e., whose values are known to be functionally linked to the value of xk). 5. Otherwise, the IA keeps monitoring xk for a pre-defined time interval, and notifies the plant operator of this action. 6. If the IA estimates that a fault chain may be ''activated'' by an excessive variation of xk, it will: A.notify the plant operator; B. consult its knowledge base to search for and recommend actions (e.g., adjustment of other process parameters to compensate for the derangement in xk C. decide whether it is possible to wait for the next scheduled maintenance intervention or a repair/substitution is immediately necessary.
Notice the remarkable analogy between the steps of the diagnostic and those of the prognostic procedure.

Theoretical and Practical Aspects of the Implementation of the Intelligent Agent
For the intelligent agent to be in fact ''expert'' and ''intelligent'' in performing his task, its knowledge base (KB) must be as ''complete'' and ''exact'' as possible: • ''Complete'' means that there must exist a one-to-one mapping of all rules and information available to the human operator and this KB. • ''Exact'' means that this mapping must be logically consistent, i.e., that no logical chain of induction correctly derived from the KB contradicts any of the rules and information available to the human operator.

The Meta-Rules of Failure Detection
It is known from AI theory [9,10] that it is convenient to re-organise, wherever possible, the knowledge bits acquired during the knowledge acquisition phase. Such a systematization goes in favour of the transparency and the accessibility of the ''built-in-logic'' of the expert system. In the case in point, we are dealing with ''failures'' of a system, and we have found it useful to construct our KB on the basis of the following seven meta-rules: 1. There exists a finite number of possible types of failure, and for each one of them there exists at least one specific signature, i.e., a unique combination of the process parameters. 2. There are no sudden failures: every possible failure is ''forewarned'' by a drifting of the point representative of the operational state of the plant, on a path that leads to a specific attractor in the state space (the failure point). 3. Each one of these ''drifting'' processes has a characteristic time scale that depends both on the component and on the type of failure. 4. A convenient way to represent such a drifting is that of employing a proper set of dimensionless indicators, each defined as the ratio of the instantaneous value of a measurable of interest to its ''design'' value. Notice that such a design value is in reality a time-dependent quantity: it is the value expected for the same instantaneous operative conditions but without any derangement. 5. The process of ''failure formation'' is described by at least one ''fault chain'', i.e., an ordered list of the immediate causes of the failure. There may be more than one chain (see point 6 here below). Each chain though has at least two fuzzy aspects: first, the ''causes'' it contains are necessary, but not sufficient (for example, for a creep failure in a first row statoric blade in a gas turbine, it is necessary that the gas temperature at turbine inlet be higher than a certain design limit; but once the temperature exceeds this limit, failures are not certain). Second, even this necessity is affected by some degree of uncertainty (for example, a blade failure may happen even if the gas temperatures are below the design limit). 6. Some of the fault chains may be concurrent. That is, the same failure stem from one or the other or from a combination of two (or more) fault chains. 7. Many of the fault signatures are non-local: the values of measurables detected at locations physically remote from the point where the failure actually takes place may be affected by the drifting process mentioned in point (2). In this case, we say that these measurables (and the indicators constructed on them) are correlated with the ones immediately affected by the failure.

Formalisation of the Fault Signatures and Choice of the Fault Indicators
A very extensive database is available for the monitoring of all energy conversion plants (e.g., ''efficiency'', ''mechanical output'', ''thermal output''), and especially for gas turbines and their derivates (combined and cogeneration plants), a very extended database is available. There is a body of international industrial standards, often validated by Public Agencies, which regulates even the fine details of the type and tolerance of the measurables. Our approach here is though rather different: we are not interested in the abidance by contractual specifications, but rather in a (continuous) monitoring of whether the system operates within a certain number of admissible states. Therefore, the various sets of measurables defined by International Standards do not suffice for our purpose: in fact, our KB complements them with other knowledge bits derived from design handbooks, operators manuals and interviews with field experts. Using as an example a standard gas turbine plant, Table  1 reports a list of failures to be diagnosed/prognosed and Table 2 the indicators adopted. Define the ''performance function'' ΠP of an energy conversion process P as the deterministic mathematical relation between the instantaneous process output(s) and a set of N process parameters that we call the measurables: ΠP may be thought of as an operator that, applied to the vector X of measurables, generates the output vector Y, ΠP(X) = Y. Now, denote as X' a deranged operational state, in which some of the measurables have taken values slightly, but detectably, different from the design values. The new functional value assumed by the operator ΠP(X): where dΠ/dX represents the term-by-term derivative of a vector and not the total differential and the new vector of measurable is:

The Knowledge Base Implementation
To complete the transition from the mathematical representation of component failure to the knowledge base construction it is necessary to create the so-called fault chains diagrams.
The first step is to define the ''performance indicators'' on which the derangement from ''standard operative conditions'' is measured. The plant operating manual and the design specifications provided by the designer and by the constructor define only a very limited set of operating points. We must add some form of ''logical extrapolation'' based on an intelligent comparison between the measured data and a set of proper theoretical operating curves (this step requires the assistance of a Domain Expert). At this point, we can create the fault chains for the knowledge base of the expert system.
An example of failure detection criteria for compressors are presented in Table 3. The corresponding failure chain for compressor chocking is presented in Figure 1.
The expert system, knowledge base and inference engine, have been written in CLIPS, an opensource programming environment. The inputs are fed to the knowledge base from a pre-processor that receives and elaborates data either from a data acquisition system or from a plant simulation. The code scans all fault indicators, establishes the progression at assigned time steps and feeds the results to the inference engine of the expert system. That consults the knowledge base to determine if there is a incipient fault and its possible causes.
In Figure 2 is presented an example of the code for the compressor fault chain.

Discussion
The possibility of devising and implementing an AI procedure to extend the fault diagnosis into the realm of prognostic is -at the current state of the art-perfectly possible. It requires a shift from from the "machine thinking" typical of ANN and GA procedures to the "propositional" and "fuzzy" thinking characteristic of real AI (Expert Systems). In the case in point, an application to a real compressor was implemented in [1]. The code performs satisfactorily also in the first stages of condition derangement.

Conclusions
The application of expert systems for energy conversion systems failure analysis has been proven to be a reliable and effective aid in dealing with large amount of data and complex fault chains, especially whenever it is essential to understand the consequential process and avoid expensive plant downtimes.