A Process to Implement an Artiﬁcial Neural Network and Association Rules Techniques to Improve Asset Performance and Energy E ﬃ ciency

: In this paper, we address the problem of asset performance monitoring, with the intention of both detecting any potential reliability problem and predicting any loss of energy consumption e ﬃ ciency. This is an important concern for many industries and utilities with very intensive capitalization in very long-lasting assets. To overcome this problem, in this paper we propose an approach to combine an Artiﬁcial Neural Network (ANN) with Data Mining (DM) tools, speciﬁcally with Association Rule (AR) Mining. The combination of these two techniques can now be done using software which can handle large volumes of data (big data), but the process still needs to ensure that the required amount of data will be available during the assets’ life cycle and that its quality is acceptable. The combination of these two techniques in the proposed sequence di ﬀ ers from previous works found in the literature, giving researchers new options to face the problem. Practical implementation of the proposed approach may lead to novel predictive maintenance models (emerging predictive analytics) that may detect with unprecedented precision any asset’s lack of performance and help manage assets’ O&M accordingly. The approach is illustrated using speciﬁc examples where asset performance monitoring is rather complex under normal operational conditions.


Introduction
Nowadays we are living a digital transformation in maintenance and asset management, characterized by higher assets' performance information levels and enhanced management control possibilities, in real time and throughout the useful life of the assets.
Data is what really enables this transformation and is becoming critical in today's organizations [1].Having useful data available is becoming paramount in order to be able to make the best decisions in Asset Management (AM).However, transforming data into information and the management of this information as knowledge are still challenges.Different researchers agree in the fact that an integrated asset information management strategy is required [2] in order to effectively habilitate the AM decision-making.
Besides the long-term availability of data, the quality of decisions made is often constrained by the quality of data available.Thus, data quality is another interesting and recent topic for research for asset management practitioners [3].Therefore, research has also focused on developing frameworks for assessing and improving data availability and quality.
When considering which data are needed for supporting informed AM decision-making, the required data can vary from case to case.The major challenge is that required data is usually

Problem Description
In this paper, we address the problem of assets' performance monitoring, with the intention of both detecting any potential reliability problems and predicting any loss of energy consumption efficiency.
The idea of the paper is to add new capabilities to existing Condition-Based Maintenance (CBM) programs that may exist in energy plants and facilities.CBM is defined as "Preventive maintenance including a combination of condition monitoring and/or inspection and/or testing, analysis and subsequent maintenance actions" [17].CBM monitors the condition of maintainable items in order to dynamically determine a suitable preventive schedule [18].ISO 13372 [19] defines CBM as "Maintenance performed as governed by condition monitoring programs".
CBM can be presented in the literature as a system, program or solution.For instance, the US Army [20] defines the "CBM system" as a system including the analytical methods, sensors, data acquisition (DA) hardware, signal processing software, and data management standards necessary to support the use of CBM as a maintenance approach to sustain and maintain systems, subsystems, and components.A "CBM program" comprises the application of the different CBM solutions that have been adopted for a particular system involving management and maintenance task planning [21].A "CBM solution" can be understood as the application of a particular monitoring solution to a specific case (failure mode or element).
CBM is increasingly becoming common in engineered systems.In recent decades, we have seen the transition from maintenance approaches that combine run-to-fail and programmed preventive maintenance to more efficient maintenance approaches such CBM [22].As instrumentation and information systems become cheaper and more reliable, CBM becomes an important tool for running a Energies 2019, 12, 3454 3 of 25 plant or a factory [18].In automated manufacturing or process plants, condition-based maintenance (CBM) is preferred wherever it is technically feasible and financially viable [23].
Condition monitoring techniques such as vibration analysis, thermography, acoustic emission or tribology [19], along with developments in Prognosis and Health Management (PHM) has provided new powerful capabilities for maintenance [24,25].These capabilities allow us to treat, efficiently, the new maintenance challenges in modern systems and applications [20,26].
However, the problem of asset's performance monitoring can become a very complex problem in some operational situations when:

•
The assets can perform in very diverse operating conditions (sometimes also diverse environmental conditions) over time; • Assets' condition is not feasible to be monitored, or simply doing it becomes a complex technical problem with a very troublesome and economically non-viable solution.This difficulty many times related to the asset specific functional location; • Altogether could cause a lack of assets performance control and subsequent loss of expected performance efficiency.
To overcome these problems, we need practical tools that can be adapted to any potential current asset location and operating condition, having the possibility to control asset performance and reliability, ensuring life cycle expectations according to existing business plans.
This has become an important concern for many industries and utilities with very intensive capitalization in long-lasting assets [27].These organizations demand innovative, practical solutions, that can somehow replicate the benefit provided by emerging IoT and CPS, but in currently existing assets, contributing to their current assets' Total Cost of Ownership (TCO) and/or Total Value of Ownership (TVO) [28], the latter in case that we can somehow model, and quantify, the value provided by the asset along its life cycle.
To this end, we will first introduce ANN and DM techniques that will be used in the paper.ANN will be proposed to solve the problem of identifying when asset behavior abnormalities appear while DM AR techniques will be used to in detail for which operating conditions are appreciated as behavior abnormalities, and to what extent is energy efficiency impacted.Then we will present the proposed process, together with some samples to appreciate the way its implementation is deployed.Finally, we present some practical results, conclusions and further research opportunities.

Rational for the ANN and DM Techniques Selection
Within the AI tools, Artificial Neural Networks (ANN) is the tool selected because of their ability to process information features in non-linear, high-parallelism, fault and noise environments with learning and generalization capabilities [29,30].In comparison to other methods, ANNs are well suited for solving problems where explicit knowledge is difficult to specify or define, but where there are enough data [31,32].For these reasons ANN are a very popular tool for prediction and classification problems, but we agree that more research is needed to make their implementation more practical in real life applications, taking advantage of existing maintenance engineering tools and management systems [33].
Concerning data analysis tools, several DM techniques exist and, according to the purpose of each study and to the characteristics of the variables under investigation, the accurate selection of the most appropriate one is the first step for the success of the analysis [34].We propose a classification of the DM techniques, differentiating among models requiring or not a reference value for the analyzed variable.The former group is furtherly detailed, discriminating between descriptive models, namely the ones oriented to defining relationships among objects (e.g., conceptual clustering), and associative models, addressing the relationships among variables (e.g., association rules, model-based reasoning, reason weighting).According to this, among the DM techniques, we selected Association Rules (ARs) mining for two main reasons: firstly, because of its purpose, that is discovering relevant and non-trivial attribute-value relations not immediately recognizable in a massive amount of data [35].The second one is related to ARs' mining intuitiveness, that makes results interpretation instantaneous even for non-domain experts.Also, ARs mining is very popular in several different fields of application.
In this paper, we will propose an innovative process for determining the proper combination of ANN tools with Data Mining (DM) techniques.In fact, the combination of these two techniques can now be done using software which can handle large volumes of data (big data), assuming that the data must be available during the asset's life cycle, and that its quality must be tested.If this process could be practically implemented, novel predictive maintenance models may emerge (predictive analytics), models that with unprecedented precision may detect any asset lack of performance and therefore help to manage asset O&M.

State of the Art: ANNs Applications in AM
As already mentioned, ANNs represent a powerful methodology for prediction and classification issues.Specifically, they enable condition analysis of the asset and of the operating variables, with the aim of anticipating potential problems [36].
In the existing literature, there are several applications of ANNs referring to predictive maintenance and asset management.For instance, [37] studied the condition of gearbox bearings in wind turbines using a nonlinear autoregressive ANN, in order to estimate their condition and anticipate possible failures.Moreover, ANNs can be combined with other techniques, like decision trees, in order to improve the accuracy of the prediction obtained: an example of this combination is presented for the maintenance of heating ventilating and air conditioning systems [38] or in engine gearbox fault diagnosis [39].ANNs have also been applied to detect the health status of rotatory components; an enhancement of the ANN's learning stage (e.g., reducing the calculation time) can be performed by estimating weights and biases through a Genetic Algorithm, as suggested by [40].In [41], an asset prognostic system, based on principal component analysis and back-propagation ANN, is proposed and applied to transformers of a large power system.The potentiality of ANNs in fault diagnosis is proven, among the others, by [42] that analyzed the vibration data to predict bearings failures, while [43] addressed the engine cylinders temperature to foresee potential malfunctioning.A similar approach is also applied for controlling the exhaust gas temperature of gas turbine engine and detecting wearing problems: integrating data recorded through different sensors of the turbine, allows the monitoring of the turbine's performance and fault detection [44].Another application belonging to the energy field regards the analysis of temperature and irradiance values to predict the performance and failures of photovoltaic plants [45]; interestingly, as implemented in a power grid company, ANNs may also be integrated in ontological models to optimize the faults diagnostics and repairing processes [46].
The ANN technique can be considered therefore a mature technique with high technological readiness for its immediate implementation and in many companies many different applications where they are used can be found.This makes rather practical the implementation of ANN algorithms for prediction in many industrial scenarios, of course given the accuracy and fitting of the algorithm is the appropriate one.

State of the Art: AR Mining Applications in AM
ARs were first applied to determine the relationships among items frequently bought together [47].With the growing popularity of such a methodology, the applications to other fields increased as well, furtherly highlighting the versatility of the methodology.For example, in the manufacturing field, ARs mining is applied for relating the root cause to groups of machines in a production process [48] or in predicting process failures [49,50].
Other ARs mining applications regard the analysis of events co-occurrences or correlations.For instance, they can be used in defining the occurrence of couples of events, as presented in case of trains diagnostics [51]: rules relating sequential events are taken into consideration, in order to more efficiently monitor train fleets.In a similar manner, process reliability and safety is addressed by several authors: [52] develop a procedure for failure detection on machines belonging to different production lines, by integrating maintenance data; [53], instead, apply the ARs to deploy a Total Productive Maintenance strategy to increase the reliability of the production process.Maintenance policies based on ARs mining for the prediction of components' breakages are also applied in existing contributions related to refinery plants, both involving an active participation of plants operating personnel in parameters setting [54] and automatizing the latter aspect through the resolution of an optimization model considering production constraints [55].Conversely, some authors showed that the application of ARs mining in power plants has a positive impact for diagnostic purposes, but are not a sufficient in predicting failure occurrences [56].
The ANNs and ARs mining approaches are promising methodologies for prediction and classification, but more research is needed in terms of combination of the two.A work integrating the ARs mining and ANNs approaches operational reliability issues is already existing in literature: specifically, the ARs are extracted to define the factors influencing power distribution reliability; then, such factors are given in input to an ANN for the normal behavior modelling of the distribution net [57].The approach proposed in the current paper is the opposite: indeed, the normal behavior is firstly estimated through an ANN and, when a substantial deviation is noticed, the ARs mining is applied to diagnose whether such a deviation impacts on the efficiency and performance of the asset.In fact, if the ability of ANN models to predict the normal behavior of a system starting from the analysis of the process' variables is indisputable, it might also be useful to understand what are the impact that influence the abnormal behaviors of the process.Such an analysis can be easily carried out through the ARs mining.For this reason, the approach proposed in the current paper regards the combination of the ANN and ARs mining.

Brief Background of the ANN-DM Techniques Selected
The ANN architecture selected consists of an input layer, an output layer and one or more hidden layers.Research has been made in evaluating the number hidden layers (deep of the network) and the number of neurons in these layers, normally depending on the problem complexity or on the accuracy needed [58].Also, neural networks may use different available training algorithms to learn from data, with different features and efficiency [59].The learning process consists of an iterative process to minimize a function (named loss function), which is normally a non-linear function containing the ANN parameters (biases, learning rate and weights).The process stops when a specified condition is reached.
In the literature, we can find many different training algorithms among which the gradient descent is recognized as the simplest one [60].This is a first order method that will be used in this paper to implement a backpropagation process to train the network so that the ANN can learn to link arbitrary inputs to outputs.The backpropagation algorithm has been selected since it has provided excellent results compared to conventional lineal and polynomial methods dealing with time series of data [61].
In this paper, the ANN with backpropagation prediction model selected is a continuous time simulation model that has demonstrated to be easy to adapt for prediction accuracy and self-learning capabilities (as explained later in Section 4.2).
Concerning our data analysis tool, once the current study addresses the identification of the relationships among variables, an associative model is required and, specifically, the Association Rules (ARs) are mined.A formal definition of an Association Rule and a description of the rule mining algorithm is provided in the following.
Given a set of Boolean data I = {i1, i2, . . .in} called items and a set of transactions T = {t1, t2, . . .tm}, a transaction ti is defined as a subset of items belonging to I (i.e., an item-set).An Association Rule (AR) is an implication in the form Γ → ∆, where Γ and ∆ are item-sets belonging to I and whose intersection is null (i.e., Γ ∩ ∆ = Ø).Γ and ∆ referred to as body and head of the rule, respectively.The quality of an AR can be assessed through different metrics; in this study, we apply the most popular ones, that are (1) the Support and (2) the Confidence.(1) Support (Γ → ∆) = (|Γ,∆|)/m: the support is the ratio between the number of transactions both containing the item-sets Γ and ∆ (|Γ, ∆|) and the total number of transactions (m).Hence, it represents the probability of finding a transaction containing Γ and ∆.
(2) Confidence (Γ → ∆) = (Support(Γ → ∆))/(Support(Γ)): the confidence can be considered as a measure of the strength of the rule.Indeed, it is calculated as the support of Γ → ∆ over the support of item-set Γ.In practice, confidence represents the probability of occurrence of ∆, given the occurrence of Γ, i.e., the conditional probability of ∆, given Γ.
The procedure for mining the ARs from a dataset, starts from the generation of the Frequent Item-sets: the users define a minimum support according to the dataset's characteristics; then, the item-sets appearing in T with a frequency higher than the minimum support are generated through the FP-growth algorithm [62].Starting from each frequent item-set F, all the rules in form X → Y, such that X U Y = F, are generated.
Since assets' performance monitoring involves the analysis of continuous variables, the discretization of such attributes should be taken into account.This issue is already addressed in existing literature: indeed, some authors provided guidelines on how discretizing such values.For example, [12] propose an entropy minimization-based heuristic to define multiple intervals, while [63] propose the application of an evolutionary algorithm to define ranges automatically.Data could also be clustered before extracting the ARs, even if some relationships can be disregarded (e.g., the support of such rules does not reach the minimum support required).In order not to lose any relevant information, in the energy consumption monitoring field, data are often partitioned according to a specific strategy, e.g., in fixed ranges [64].Hence, in our application, the discretization of the continuous variables is defined in accordance with the O&M department, so that all the different operating conditions can be mapped.The software RapidMiner (www.rapidminer.com),an opensource data mining platform, is selected to carry out the analysis.Among the various tools available on the market, RapidMiner is being considered one of the most user-friendly, because of the graphical interface and the ease of modifying the parameters, since no programming skills are required [65].

ANN-DM Combination Process and Sample Problem
The "Asset Performance Monitoring and Reliability Assessment Process" coordinates and integrates data, engineering tools and human resources (with knowledge, skills and experience) in order to achieve the best possible recommendation for assets' operations and maintenance actions.This process also analyzes the different renovation and reinvestments options which are available for the physical asset.All these activities are accomplished taking into account different operational and data related constraints and controls, as presented using the Functional Modelling method IDEFØ diagram [66] of the process in Figure 1.
Energies 2019, 12, x FOR PEER REVIEW 6 of 25 (1) Support (Γ → Δ) = (| Γ,Δ|)/m : the support is the ratio between the number of transactions both containing the item-sets Γ and Δ (|Γ, Δ |) and the total number of transactions (m).Hence, it represents the probability of finding a transaction containing Γ and Δ.
(2) Confidence (Γ → Δ) = (Support(Γ → Δ))/(Support(Γ)) : the confidence can be considered as a measure of the strength of the rule.Indeed, it is calculated as the support of Γ → Δ over the support of item-set Γ.In practice, confidence represents the probability of occurrence of Δ, given the occurrence of Γ, i.e., the conditional probability of Δ, given Γ.
The procedure for mining the ARs from a dataset, starts from the generation of the Frequent Item-sets: the users define a minimum support according to the dataset's characteristics; then, the item-sets appearing in T with a frequency higher than the minimum support are generated through the FP-growth algorithm (Han et al., 2000) [62].Starting from each frequent item-set F, all the rules in form X → Y, such that X U Y = F, are generated.
Since assets' performance monitoring involves the analysis of continuous variables, the discretization of such attributes should be taken into account.This issue is already addressed in existing literature: indeed, some authors provided guidelines on how discretizing such values.For example, [12] propose an entropy minimization-based heuristic to define multiple intervals, while [63] propose the application of an evolutionary algorithm to define ranges automatically.Data could also be clustered before extracting the ARs, even if some relationships can be disregarded (e.g., the support of such rules does not reach the minimum support required).In order not to lose any relevant information, in the energy consumption monitoring field, data are often partitioned according to a specific strategy, e.g., in fixed ranges [64].Hence, in our application, the discretization of the continuous variables is defined in accordance with the O&M department, so that all the different operating conditions can be mapped.The software RapidMiner (www.rapidminer.com),an opensource data mining platform, is selected to carry out the analysis.Among the various tools available on the market, RapidMiner is being considered one of the most user-friendly, because of the graphical interface and the ease of modifying the parameters, since no programming skills are required [65].

ANN-DM Combination Process and Sample Problem
The "Asset Performance Monitoring and Reliability Assessment Process" coordinates and integrates data, engineering tools and human resources (with knowledge, skills and experience) in order to achieve the best possible recommendation for assets' operations and maintenance actions.This process also analyzes the different renovation and reinvestments options which are available for the physical asset.All these activities are accomplished taking into account different operational and data related constraints and controls, as presented using the Functional Modelling method IDEFØ diagram [66] of the process in Figure 1.This process is a continuous improvement process responsible for leading and controlling the knowledge captured concerning the behavior of the asset (observed properly when integrating O&M Data [I1], with process [I2] and condition monitoring data [I3]), adapting permanently operations and maintenance actions.The process provides consistency to major maintenance actions and assets replacement options ensuring proper AM orientation according to predictable and/or unpredictable circumstances that may take place.
The general process in Figure 1 can be split into four different subprocesses, named modules in this paper, as illustrated in Figure 2. The first module is Data Processing, responsible for the preparation of valid data, quality tested (O1), for the subsequent processes, as well as for the normalization of this data (O2).The second module is the one to provide an algorithm that can identify deviations of the asset from its expected behavior under normal operating conditions.These deviations are presented in the form of prediction error (O3).The third module deals with the measurement of the deviation, obtaining the relationship between operational data and efficiency (O5) and then calculating the efficiency loss observed for a period (O6).Finally, the last module takes this information to adjust O&M actions properly (O7) and to reevaluate asset major interventions (O8) accordingly.
This process is a continuous improvement process responsible for leading and controlling the knowledge captured concerning the behavior of the asset (observed properly when integrating O&M Data [I1], with process [I2] and condition monitoring data [I3]), adapting permanently operations and maintenance actions.The process provides consistency to major maintenance actions and assets replacement options ensuring proper AM orientation according to predictable and/or unpredictable circumstances that may take place.
The general process in Figure 1 can be split into four different subprocesses, named modules in this paper, as illustrated in Figure 2. The first module is Data Processing, responsible for the preparation of valid data, quality tested (O1), for the subsequent processes, as well as for the normalization of this data (O2).The second module is the one to provide an algorithm that can identify deviations of the asset from its expected behavior under normal operating conditions.These deviations are presented in the form of prediction error (O3).The third module deals with the measurement of the deviation, obtaining the relationship between operational data and efficiency (O5) and then calculating the efficiency loss observed for a period (O6).Finally, the last module takes this information to adjust O&M actions properly (O7) and to reevaluate asset major interventions (O8) accordingly.In the following paragraphs, we explain each one of these processes with detail, going down to level 2 in the IDEFØ diagram presented (therefore we use the Functional Modeling method IDEFØ with three hierarchical series of diagrams).In addition, the application of each phase of the process is illustrated with a practical example.

Sample Problem Description
For this paper we have selected the example of a process pump located inside a cryogenic LNG storage tank, with the following features:

•
A unique design allowing the pump to be installed inside the tank in a vertical column to remove the possibility of major tank leakage due to a pipe or connection problem.

•
The pump and motor units are submerged, and the column acts as a guide to seat the pump during installation and performs as the discharge pipe from the pump.

•
The plant information system shows operating conditions of the pump to detect any potential problem.In the following paragraphs, we explain each one of these processes with detail, going down to level 2 in the IDEFØ diagram presented (therefore we use the Functional Modeling method IDEFØ with three hierarchical series of diagrams).In addition, the application of each phase of the process is illustrated with a practical example.

Sample Problem Description
For this paper we have selected the example of a process pump located inside a cryogenic LNG storage tank, with the following features:

•
A unique design allowing the pump to be installed inside the tank in a vertical column to remove the possibility of major tank leakage due to a pipe or connection problem.

•
The pump and motor units are submerged, and the column acts as a guide to seat the pump during installation and performs as the discharge pipe from the pump.

•
The plant information system shows operating conditions of the pump to detect any potential problem.

•
The pump has very different possible operating conditions (resulting of very different tank levels, flow, pressure, LNG density, operating hours of the pump, etc.).
• It is very troublesome to establish a clear judgement about potential pump malfunctioning.Besides this, condition monitoring capabilities are limited to previously mention operating variables (for instance, no items vibration, bearing temperatures, motor temperatures are received).
In this example, the scope of the analysis was fixed as follows: • To generate a process and tool for the prediction of anomalies in the operation of the pumps with complex operation and supervision regime.

•
To establish the guidelines for the practical implementation of the models and methodology.

•
To generate enough information to prepare a Business Case for the company to implement the process if there is a demonstrated payback to the business.

•
Some other secondary or complementary objectives were: To identify the operating regimes of the equipment so that anomalies are recorded.

•
To obtain the loss of performance of the equipment when producing the anomalies detected.

Imput Data Processing Module
The procedure described below is necessary to unify the data received from different information sources.For this, the guidelines for data capture, cleanup and subsequent standardization are listed.In this way, we can ensure that ANN models and association rules are as effective and precise as possible.
The steps listed below are shown in detail in the IDEFØ 1.1 to 1.5 diagram (Figure 3): (1) Asset case study definition: The first step of the procedure (IDEFØ1.1) is to choose the asset case study from organization asset portfolio (I1.1) so that it can be representative to extend the results to other similar equipment or systems of the plant.For the equipment selection, the organization must have good and consistent data for operation and maintenance history (C1.1) that allows obtaining the model of the same with success.In this step, it is convenient to collect all the technical information of the equipment (C1.3), referring to the maintenance and operation recommendations by the manufacturer.Each operation variable that is extracted from the control system of the equipment will have to be standardized based on the design standards.Additionally, another restriction to consider is the operational context, to understand how the equipment works in their environment and how they can be affected by external conditions and different modes of operation (C1.2).Singular case studies and the obtaining of a single model for single equipment can be given, or alternatively, the option of a sufficiently flexible model to be able to extend to similar equipment, and in this way, take advantage of the work and the dedicated effort.In many cases, the operational context may be completely different, because of location, altitude, external agents and other operational parameters that accelerate ageing and variation of performance.So, it is advisable to choose equipment that do not have the same operational context to obtain better accuracy in obtaining the model.The final decision to select the asset case study is approved by O&M Manager "R2", based on the above comments.
The following table shows all available variables and the origin of them (Table 1):  (2) Selection of variables: Once the case study equipment has been chosen and the operational context described, as indicated in the previous paragraphs, the next step (IDEFØ 1.2) is to select the variables that are used for modeling and studying the behavior by ANN and association rules.
All the variables must be part of the historical operation and maintenance, and to comply with the following requirements: • Data must be consistent and without significant temporal breaks in the series.

•
The frequency at which the information is extracted from different sources of information must be enough to capture the changes of state in each of the variables.

•
The variables must be independent of each other, if some dependencies between variables are detected, those dependent variables for which less information is available will be discarded.

•
At least one or more variable must be of operation (for example, flow, pressure, electric consumption, power consumed, speed, etc.).

•
The other variables must be linked to the equipment condition, such as vibration, bearing temperature, etc.In addition, some variable must be related to the operational context and environment, such as outside temperature, required load, operation mode, etc.
Notice that in order to improve the choice of variables, it is advisable to use statistical techniques to measure the degree of dependence (see R1 in previous Figures).For the example presented in this paper, the variables that have been selected and that will follow the next phases of the process are those shown in Table 2.This table is the result of applying statistical techniques; in particular, it has been studied using a covariance matrix and correlation coefficients, the dependence between variables [67].The analysis of the results assumes that flow, output pressure, output temperature and Hrs since last overhaul are indispensable for the model, and that other variables such as fluid density, tank level and intake temperature are secondary, and they could be left out of the model if sufficient data is not available for power consumption prediction modeling.(3) Period selection and data recording frequency: Once the variables have been selected, it is necessary to define the period in which the study will be carried out (IDEFØ 1.3).It is advisable to choose a period in which all possible scenarios have been registered, for example, that the system has operated in all different operating modes, overhauls, corrective maintenance, etc.Likewise, in addition to selecting the period, it is also necessary to select the interval of data recording of the different information sources.As mentioned in step 2, this period must allow us to capture with enough detail, the change of state in the variables for its interpretation and study, with the objective of training the network faithfully and being able to identify anomalies in the behavior of the equipment (C1.4).At this point, it is convenient to carry out a study between the efforts required to obtain the data and the subsequent results for a model with a high level of precision.For a shorter sampling interval, higher levels of accuracy in the prediction of the model are obtained (for example, data/minute vs data/hours).In Table 3, the periods selected for the case study are shown.It is divided into three periods; the passage from one period to another is marked by an overhaul (OH) maintenance activity.The data recording frequency has been every calendar hour.(4) Data validation and processing: This step involves the validation and processing of the information (IDEFØ 1.4).This is one of the most important steps and usually the one that consumes more resources in the application of the methodology.Usually, there are errors in the information that must be reviewed and above all, the unification of the information from the different information sources in the same selected period.The importance of this step is essential to achieve results that are as close as possible to reality.Another measure to consider at this point is the revision of the consistency of the database, filtering the noise and erroneous information generated by the instrumentation installed in the assets.For this, it is necessary to use data validation tools "R1", representations by dispersion diagrams, etc.Once the data has been normalized, the next step is the implementation in the artificial neural network to obtain the ideal model of system behavior.The model obtained, together with the real-time reading of the variables, can be compared with the real behavior of the system, so that in an ideal case, the deviation from the expected behavior can be detected.(5) Data normalization: Normalization is applied as part of data preparation for machine learning (IDEFØ 1.5).The goal of normalization is to change the values of different variables in the dataset to a common scale, without distorting differences in the ranges of values.For machine learning, every dataset does not require normalization.It is required only when features have different ranges.Each variable is normalized between 0 and 1, making use of the technical information collected in the first step of the procedure.For example, if a variable is by design limited between 250 and 700 m 3 /h, the values that are outside the range are not considered for the study, but the values within the range will be normalized between 0 for values close to 250 m 3 /h and 1 for values close to 700 m 3 /h.In the following table, the maximum (Vmax) and minimum (Vmin) values for each of the variables selected for the case study are shown in Table 4.

Prediction Model Module
As mentioned above, in this paper, the prediction model selected is a continuous time simulation model that has demonstrated to be easy to adapt for prediction accuracy and self-learning capabilities.The model (Figure 4) implements an ANN with a backpropagation algorithm using Vensim ® simulation environment (Ventana Systems Inc., Harvard, MA, USA), to benefit of the outstanding software optimization features for fast training.The stock and flow diagram-SFD-of the simulation-based prediction model is presented in Figure 5.
information collected in the first step of the procedure.For example, if a variable is by design limited between 250 and 700 m 3 /h, the values that are outside the range are not considered for the study, but the values within the range will be normalized between 0 for values close to 250 m 3 /h and 1 for values close to 700 m 3 /h.In the following table, the maximum (Vmax) and minimum (Vmin) values for each of the variables selected for the case study are shown in Table 4.

Prediction Model Module
As mentioned above, in this paper, the prediction model selected is a continuous time simulation model that has demonstrated to be easy to adapt for prediction accuracy and self-learning capabilities.The model (Figure 4) implements an ANN with a backpropagation algorithm using Vensim ® simulation environment (Ventana Systems Inc., Harvard, MA, USA), to benefit of the outstanding software optimization features for fast training.The stock and flow diagram -SFD-of the simulation-based prediction model is presented in Figure 5.Besides providing high rigor for writing model equations, Vensim helps to trace and to understand the importance of model existing feedback loops and supports multiparametric optimization that will result essential for this process (IDEFØ 2.1 in Figure 4).Some interesting features of the model that has been implemented in this process are: • The ANN structure (number of neurons per layer) can be easily changed by modifying the Besides providing high rigor for writing model equations, Vensim helps to trace and to understand the importance of model existing feedback loops and supports multiparametric optimization that will result essential for this process (IDEFØ 2.1 in Figure 4).Some interesting features of the model that has been implemented in this process are:

•
The ANN structure (number of neurons per layer) can be easily changed by modifying the number of elements of the different subscripts, which correspond to the number of neurons of the network (in our initial model we have considered only one hidden layer and 20 neurons).

•
The network has two bias parameter (hidden layer [Bias 1] and output layer [Bias 2]) and two learning coefficients or rates (hidden layer [LC 1 = Eta 1 during training time] and output layer [LC 2 = Eta 2 during training time]).

•
Input Data (in our example values of flow, temperatures, tank levels, densities, and operating hours, to predict a pump energy consumption) and target data (values of energy consumption) are imported from plant information systems, to be uploaded automatically to Vensim (O2 in Figure 4).Input data goes to first layer neurons, while target data will be compared with the values predicted by the ANN output.

•
ANN Training time is introduced as a parameter, once this time is reached in the simulation, the LC1 and LC2 are set to zero, and the learning process finished since the adjustments flows of the weights will be stopped.
To overcome potential training dynamic issues, we can on tools provided by Vensim for optimization and calibration of model parameters.Vensim uses the direct-search method that does not evaluate the gradient (Powell Modified Method), to calibrate model parameters [68,69] In our case, it is important to calibrate Bias 1, Bias 2, Eta1 and Eta2 (learning coefficients), and all initial weights parameters w1o(i,j) and w2o(j,k) in order that the ANN offers its best possible fit as soon as possible.The optimization model implemented in the Vensim Powell Optimizer in our (IDEFØ 2.1) is as follows: Objective Function: where w 10 (i, j) and w 20 (j, k) are the initial weights values.
With Vensim the analyst can select to stop the algorithm according to a maximum number of iterations or according to certain tolerance criteria for the solution.It is surprising to see the low error obtained for a relatively few number of iterations of the algorithm.Once the optimization algorithm is stopped, results for parameter values can be logged together with the performance of the model for the optimal solution found.
Proceeding in this way, we can easily select initial model parameter values, and then by running the model with the backpropagation algorithm, we can obtain final weights values for the ANN algorithm that will be put into operation (IDEFØ 2.3 in Figure 4).
For our pump example, the considered ANN model structure, where the authors had to initially iterate five times with the number of neurons in the hidden layer, can be characterized as:

•
Three layers The implementation in Vensim required: (1) Connection with standard data in Excel provided by Plant Info Systems; (2) Training interval definition (first 70% of valid data points in Table 3, Period 1, in chronological order);  3, Period 1, in chronological order) and calculation of the quadratic error of prediction; (7) The prediction error trend study.
Results of the prediction error (as in Figure 6) provide precise information about possible abnormal operating conditions of the assets for any operating regime (or, as in Figure 6, after an overhaul intervention carried out on the asset).Calibration can be done for each network configuration selection (i.e.given numbers of neurons per layers: n, m, l), and each time input data conditions recommend to retrain the network.Sample additional regression results for the ANN algorithm (for the 20 neurons in the hidden layer) are presented in Table 5.Notice how step 4 of the Vensim model implementation process, described above, provides extremely good results for training (Table 5) since all initial values for the weights, biases values as well as training coefficient are already optimized when the backpropagation algorithm starts for the final run with optimal parameter values.Therefore, the backpropagation algorithm just refines the values for the weights set by the direct search technique.Sample additional regression results for the ANN algorithm (for the 20 neurons in the hidden layer) are presented in Table 5.Notice how step 4 of the Vensim model implementation process, described above, provides extremely good results for training (Table 5) since all initial values for the weights, biases values as well as training coefficient are already optimized when the backpropagation algorithm starts for the final run with optimal parameter values.Therefore, the backpropagation algorithm just refines the values for the weights set by the direct search technique.The precision of the algorithm results can be observed in Figure 7, where a histogram of absolute values for the results of prediction before the overhaul of the pump is presented.Sample additional regression results for the ANN algorithm (for the 20 neurons in the hidden layer) are presented in Table 5.Notice how step 4 of the Vensim model implementation process, described above, provides extremely good results for training (Table 5) since all initial values for the weights, biases values as well as training coefficient are already optimized when the backpropagation algorithm starts for the final run with optimal parameter values.Therefore, the backpropagation algorithm just refines the values for the weights set by the direct search technique.The precision of the algorithm results can be observed in Figure 7, where a histogram of absolute values for the results of prediction before the overhaul of the pump is presented.Therefore, by using the Vensim model and software, the algorithm to put into operation (final ANN model) will be the result of two optimization phases: (1) Direct search techniques using the modified Powell method applied for the optimization of initial values of weights, biases and learning rate coefficients (in IDEFØ 2.1.in Figure 4).( 2) Backpropagation algorithm for the optimization of the weights during the ANN training period (IDEFØ 2.2. in Figure 3), when we use initial values of weights, biases and learning rate coefficients optimized in IDEFØ 2.1.
Before concluding this phase and regardless of the fact our results are promising, we realize that the selection of a given technique, like ANN, should not be done without discussing whether we are delivering results that can be benchmarked to those provided by other available advanced machine learning techniques for predictive analytics.This certainly supports the tool selection and provides more elements for a well-informed decision for the management of assets [33].
It is important to say that not all the techniques that we will now review have the same maturity level than ANNs in reliability analysis or maintenance.For instance, we have few examples of Support Vector Machines (SVM) or Random Forest (RF) to predict systems performance and reliability, the utilization of these techniques, in the referred fields, will probably take place with more intensity in the coming years [70].
Regarding the application of SVM techniques compared to ANN, the readers are referred to [71] where a study is presented comparing different Neural Network techniques (NNBR, NRBR, NRBR) and NSRV, based on the results obtained in each of them from the mean square error and the relative mean (RMSE and MRE).This model of artificial intelligence (SVM) is in this case the one that provides the best result, obtaining a high precision in the power consumption prediction of pumps and significantly improving the results of the neural networks.Likewise, there are numerous references for the application of this technique to study energy prediction and systems' performance, for instance some authors [72,73] compare SVR techniques (multi-scale support vector regression) with a multilayer perceptron neural network, obtaining better results with SVR due to its speed and robustness.Although it is a relatively recent technique, the results obtained are very promising and encourage further research in this field.
We have fitted the SVM nonlinear models with the SVM function available in the library e1071 of the R system [74], which offers an interface to the award-winning C++ implementation, LIBSVM, by Chan and Lin.The data set is described by n training vectors {x i , y i }, i = 1, 2, . . ., n, where the p-dimensional vectors x i contain the predictor features and the y i ∈ {−1, 1} are the responses of each vector.Among the several variants of SVM existing in the library e1071, we have used ε-classification with the Radial Basis Gaussian kernel function.
Concerning the linear SVM prediction, the Liblinear library [75] has been used, which is an open source library for large-scale linear prediction.It supports logistic regression and linear support vector machines.We have used the implementation of the LiblineaR package [76].This package does not consider the nonlinear transformation φ(x) nor the kernel function.However, the estimation of the models is particularly fast as compared to other libraries, and the package is based on the LIBLINEAR C/C++ library for machine learning.
Regarding RF, this is one of the most recent techniques that we have decided to test for this case study, reaching very promising results.Some other interesting examples in the literature use the RF technique to classify data, showing high accuracy compared to other techniques [77]; In other examples [78] RF is used to improve, for instance, prediction of wind energy production in the short term, which is a complicated problem due to the stochastic nature of the wind and using the effects of seasonality.Other interesting examples can be found in [79], where two applications of Decision Trees techniques are presented: the planning of organized energy storage in microgrids and energy control within a PC through the optimal use of local energy resources.A complete case study shows properly the feasibility of this technique.
We have used the R package RandomForest [80], which builds 500 trees by default and the number of variables to randomly select is the p/3, being p the number of predictors.
Table 6 presents the results that were obtained during the application of the above mentioned techniques to the prediction of energy consumption of the pump for the period before and after overhaul 1, and of course for similar data.As the reader can appreciate, the prediction accuracy results are really similar to the different algorithms, but for this particular case study the ANN model performed better that the SVM, while the RF model results were definitely the benchmark.
This was a good exercise to test the prediction quality of the ANN algorithm, this fact besides the maturity in the implementation of this technique within the reliability and assets management organization and some other practical implications, made the selection of the ANN process reasonable.

Output Data Analysis and Control Module
When a substantial deviation of the error trend is noticed, the third step of the process is performed, as shown in Figure 8.
algorithms, but for this particular case study the ANN model performed better that the SVM, while the RF model results were definitely the benchmark.
This was a good exercise to test the prediction quality of the ANN algorithm, this fact besides the maturity in the implementation of this technique within the reliability and assets management organization and some other practical implications, made the selection of the ANN process reasonable.

Output Data Analysis and Control Module
When a substantial deviation of the error trend is noticed, the third step of the process is performed, as shown in Figure 8.The prediction error (O3) of the power consumption prediction model acts as a trigger for this stage.Indeed, when the error committed by the ANN increases more than a tolerable threshold (C2) and this growth protracts for a defined time interval, it is recommendable to analyze the performance in terms of efficiency.Hence, the valid data (O1) determined in the Input data processing unit and the prediction error are input in the AR data mining activity.The aim of this step is studying the relations among the operating conditions and the efficiency of the system under investigation (O5) and checking any loss of efficiency (O6).
The steps to carry out the analysis, reported in detail in Figure 7, are the following: (1) Time interval selection (IDEFØ 3.1): in order to check for any possible performance loss, we may want to compare situations belonging to different time intervals (C3), such as the efficiency before and after an overhaul or a failure of the equipment.For instance, we could The prediction error (O3) of the power consumption prediction model acts as a trigger for this stage.Indeed, when the error committed by the ANN increases more than a tolerable threshold (C2) and this growth protracts for a defined time interval, it is recommendable to analyze the performance in terms of efficiency.Hence, the valid data (O1) determined in the Input data processing unit and the prediction error are input in the AR data mining activity.The aim of this step is studying the relations among the operating conditions and the efficiency of the system under investigation (O5) and checking any loss of efficiency (O6).
The steps to carry out the analysis, reported in detail in Figure 7, are the following: (1) Time interval selection (IDEFØ 3.1): in order to check for any possible performance loss, we may want to compare situations belonging to different time intervals (C3), such as the efficiency before and after an overhaul or a failure of the equipment.For instance, we could select all the available data to develop as complete as possible analysis or select a symmetric time window, e.g., 1000 working hours before and after an overhaul.(2) In our application, according to the O&M manager (R2), we decided to consider the whole dataset, in order not to lose any information on the asset performance and to study only the filtered data relevant for the aim of the Data Mining step.Specifically, we split the dataset into 3 periods (Table 7): the first one, starts from the beginning of the system monitoring and lasts until the preventive Overhaul.The entire period would be composed of 13,772 working hours, but due to missing data and low quality ones, only 3223 can be considered in this study.The second time window, instead, ranges from the Overhaul to the catastrophic failure of the system (4856 of the 5173 working hours can be analyzed).After the failure of the asset, only 92 working hours are recorded, of whom 49 can be studied.(3) Variable selection and discretization (IDEFØ 3.2): depending on the depth of the analysis, we could filter some of the valid data determined in the first step, or limit the study to a certain number of variables.In addition, if the selected variables are measured through continuous values, a discretization is necessary since the AR mining algorithm applies to discrete elements.In our application, the variables selected are pressure, flow-rate, impulsion temperature, aspiration temperature, tank level, and density; all of them are related to the efficiency of the system.(4) The discretization process is performed in accordance with the O&M manager (R2).Indeed, a tradeoff is needed in this step: on the one hand, in fact, the discretization ranges have to be small enough to represent a specific operating condition.On the other hand, instead, we wish to have the greatest interval size, so that the system frequently operates inside it.In other words, if the intervals are too small, the system could work inside each of them very rarely and, consequently, we may not profit from the data.In the next table (Table 8) we report the minimum and maximum values of the valid data and the size chosen for each interval.(5) AR mining (IDEFØ 3.3): the discretized variables ranges represent the input of AR mining stage; quality of the results depends on data quality and availability (C1) and the temporal constraints considered (C3).The resources necessary to deploy the data mining procedure are O&M manager (R2) and the AI&DM analyst (R6), together with AR RapidMiner model (R4).According to the methodology explained in Section 3, we have to analyze the rules having values of the efficiency as head (∆), and values of all the other selected variables as body (Γ).Each AR is associated with a support and a confidence: the former represents the percentage of time units over the total in which the operating conditions and the efficiency have assumed the values expressed by the rule.The latter, instead, indicates the probability of having the efficiency expressed by the rule, given the values of the operating variables.For example, in Table 9, some of the rules extracted (O5) are reported.Column 1 to 6 (Flow rate, LNG Density, Tank level, Pressure, Intake Temperature, Impulsion Temperature) contain the body of the rule, while the seventh is the head (Efficiency); column 8 and 9, instead, respectively report the support and the confidence associated with each rule.The rules exemplified in Table 3, should be interpreted as follows (let's consider, for instance, the first row): when the operating conditions of the system are characterized by a flow rate included in the range 478-508 m 3 /h, a density between 450 and 455 kg/m 3 , the tank level between 5900 and 6900 mm, the pressure inside the tank is between 10.5 and 11.5 kgf/cm 3 , the aspiration temperature (TAsp) is included in −159 • C and −158 • C and the impulsion temperature (TImp) is in −158 • C and −159 • C, in the 75% of the cases (confidence = 0.75), the efficiency of the system ranges between 59 and 60%; in the remaining 25% of the cases (see the second row of the table), instead, the efficiency range is 60-61%.The support associated with the former relation is 0.0006: this value indicates the probability of having the operating conditions values listed before and an efficiency level in 59-60%.Similarly, the rule associating the operating conditions values and the efficiency between 60-61% have a joint probability of occurrence, namely a support, of 0.0002.(6) Rules comparison (IDEFØ 3.4): The relations among the operating conditions and the efficiency (O5) mined in the previous step should be compared in order to verify the actual existence of an efficiency loss (O6) and the corresponding probabilities.To this end, we need to select rules presenting analogue values of the operating conditions but belonging to different time intervals, i.e., those identified in the first step of this stage.This activity has to be carried out both by the O&M manager (R2) and the DM&AI analyst (R6).In our study, three different time intervals have been selected and compared, as shown in Table 3.The highest number of comparable conditions characterizes TI 1 and TI 2: specifically, 58 common operating conditions have been identified in which the system under investigation works both before and after the overhaul (i.e., the milestone dividing TI 1 from TI 2).As shown in the following table (Table 10), in almost the totality of the rules there is a loss of efficiency.Comparing the remaining time intervals, there are less coincident operating conditions.In general, the efficiency of the TI3 is higher than the one of TI 2 and greater-equal than the one of TI 1.A more detailed example is reported in the following table (Table 11).The first row contains the operating conditions and the corresponding efficiency in TI 1, while in the following three rows the analogue conditions and the efficiency values are reported for TI 2. In all cases, a decrease of the efficiency is noticed: indeed, in TI 1 its value is among 64% and 66%, while in TI2, efficiency values range among 60% and 62%.In TI 3, the efficiency increases, returning to the same range as TI 1.The interpretation of these results should be related to the preventive overhaul executed at the end of TI 1. Possibly, an error during the maintenance intervention or the re-installation occurred, compromising the performance of the entire asset and leading to a catastrophic failure at the end of TI 2. Indeed, in TI 3, the normal level of efficiency was reestablished.It is important to notice that modifying the head of the rule, i.e., replacing the efficiency with another attribute, allows moving the focus of the investigation.For example, it could be interesting to compare the operational variables with the failure modes, in order to associate different operating conditions with the most likely failure mode and prevent it through specific actions.

Decision Support Module
This last module of the IDEFØ 1 diagram is presented in Figure 9.It is important to notice that modifying the head of the rule, i.e., replacing the efficiency with another attribute, allows moving the focus of the investigation.For example, it could be interesting to compare the operational variables with the failure modes, in order to associate different operating conditions with the most likely failure mode and prevent it through specific actions.

Decision Support Module
This last module of the IDEFØ 1 diagram is presented in Figure 9.At this point of the process, the O&M manager can make decisions concerning possible changes in the normal operation and maintenance of the asset, but can also select a better strategy for the overhaul and reinvestment in the asset or its future replacement.These possibilities are constrained by the process flexibility sometimes, and may generate changes in equipment behavior that must be tracked and analyzed for possible re-training of the tool.

Implementation Process Results for the Pump Example
In our example, and concerning the scope of the project (as detailed in the introduction of Section 4), after a discussion with the General O&M Manager, the following was concluded: • Concerning the prediction of anomalies in the operation of equipment with complex operation regime and poor monitoring of their condition through ANN AI-AR DM models: It was agreed that it is possible to model the behaviour of the pumps and detect anomalies in their operation by deviations from the prediction.For instance, it could be appreciated how a pump that suffered a catastrophic failure after its first overhaul, worked with anomalous behaviour throughout its second period of operation before the failure.Error accumulation At this point of the process, the O&M manager can make decisions concerning possible changes in the normal operation and maintenance of the asset, but can also select a better strategy for the overhaul and reinvestment in the asset or its future replacement.These possibilities are constrained by the process flexibility sometimes, and may generate changes in equipment behavior that must be tracked and analyzed for possible re-training of the tool.

Implementation Process Results for the Pump Example
In our example, and concerning the scope of the project (as detailed in the introduction of Section 4), after a discussion with the General O&M Manager, the following was concluded:

•
Concerning the prediction of anomalies in the operation of equipment with complex operation regime and poor monitoring of their condition through ANN AI-AR DM models: It was agreed that it is possible to model the behaviour of the pumps and detect anomalies in their operation by deviations from the prediction.For instance, it could be appreciated how a pump that suffered a catastrophic failure after its first overhaul, worked with anomalous behaviour throughout its second period of operation before the failure.Error accumulation was predicting this abnormal behaviour.Another clear appreciation was the punctual accumulations of error detected when the asset was reaching certain operating regimes.

•
Concerning the guidelines for the practical implementation of the models in business systems: It was agreed that the systems are currently registering enough data to proceed with the implementation of this type of tools, and that it would be much more convenient to have all variables in a digital format and integrated the same information system, where the ANN algorithm is finally implemented.The implementation of the algorithm can be done by asset, for greater precision, without a significant increase in costs.

•
Concerning the possibility to obtain information to build a business case: It was agreed that the integrated ANN-DM tool could avoid premature deterioration of equipment, verify the quality of its major maintenance intervention, ensure its operation at maximum energy efficiency, improve alarms and current interlocks for the operation of equipment, generate a new technical services to offer to potential clients, provide thoroughly understanding of the behaviour of pumps throughout Energies their life cycle, adjust the periods of completion of major maintenance and estimate the health of the asset better (comparing with current available techniques).

•
Concerning the identification and measurement of the asset´s loss of performance when anomalies are detected: It was agreed that the analysis of performance in similar operating regimes could be done properly.For instance, the measurement of the performance loss of the pump with the catastrophic failure, after its first overhaul could be done showing that the pump always operated with 4% to 6% lower performance.After the second overhaul, after the catastrophic failure, the pump recovered perfectly the initial performance.
In general terms, the implementation of the model has been considered a positive learning experience that can now be deployed for the entire fleet of similar assets, once the business case is approved.Another important aspect of this project implementation was to estimate the time schedule for this type of project.Figure 10 the percentage of time per process module accomplished when studying the first pump is presented.It is important to notice the following:

•
The input data processing time resulted to be a 50% of the total project time in this initial implementation, as a result of the dispersion, different formats and quality of the data processed.The total time when replicating the study to a similar asset was reduced by 30%.Mainly impacting the input data processing module (20% time reduction), and the same impact in the next two modules (5% time reduction each).

•
Time for data processing resulted longer than for prediction and data analysis modules together.This was shocking to the team involved in the project, even if the processing it is in line with the one indicated by the Cross Industry Standard Process for Data Mining (CRISP-DM), that consider it to be around the 70% of the total processing time [81].The use of the AI-DM tools for the problem analysis was expected to be more time consuming than the data processing according to the complexity of the tools that were going to be used.However, two intermediate processes were really efficient and less dependent on circumstances out of the team control.

Conclusions and New Research Opportunities
In this paper we deal with the problem of asset performance monitoring, to detect potential reliability problems and to predict loss of efficiency in energy consumption.We concentrated on Being the current approach based on data, the main success factor of the methodology is data quality; however, this aspect can represent a major limitation for implementing it, since a substantial amount of data may not be immediately available in every company.At first, as noticed before, the procedure may appear time-consuming, but when it is included in the organizational routine, the obstacle characterized by the time for data validation and processing can be overcame.In addition, the application of ANN and AR mining requires an expertise in data understanding and modeling a positive mindset towards changing the current habits in terms of decision making in the O&M field.

Conclusions and New Research Opportunities
In this paper we deal with the problem of asset performance monitoring, to detect potential reliability problems and to predict loss of efficiency in energy consumption.We concentrated on assets with multiple operating conditions and low possibilities of components health monitoring.
This paper main contribution is a suggested method to tackle this problem, and the detailed explanation of the process accomplished, using the IDEFØ Functional Modeling method with three hierarchical series of diagrams, to implement the method in a complete case study.The resources and time needed for the project team to finalize the work were recorded.
The technical solution proposed considers the complementary application of the ANN (to detect when asset behavior abnormalities appear) and DM techniques (to detail the operating conditions for which behavior abnormalities were appreciated, and the extent of the impact on energy efficiency).The possibilities of the use of the AR mining tool, after the ANN, were tested and compared with alternative advanced method to ensure their suitability and accuracy.
Main limitations of the result proposed were found in the time needed for data processing.This activity was really time consuming for existing business conditions, due to data dispersion, different formats and quality of the data processed.
The overall case study project results showed that decision-making for the O&M manager could be improved by detecting any abnormal power consumption of the pump (through the accumulation of the square error of the prediction), with desired precision, and by releasing the corresponding process to measure the loss of energy efficiency when that happens.Additionally, it was proven how this process could serve to test the quality of major overhauls, to improve alarms and current interlocks and generally to provide a thorough understanding of the behavior of pumps throughout their entire life cycle.
Finally, it is important to mention that the idea is that the data processing for the ANN training would be done off-line, as well as the initial selection of variables intervals and rules for DM.Then, the information provided for O&M decision support could be generated on-line for the management.

Figure 2 .
Figure 2. Level-1 representation of the asset performance monitoring and reliability assessment process.

Figure 2 .
Figure 2. Level-1 representation of the asset performance monitoring and reliability assessment process.

Figure 3 .
Figure 3. Level-2 representation of the Data Processing process.

Figure 4 .
Figure 4. Level-2 representation of the Artificial Neural Network (ANN) model stage.Figure 4. Level-2 representation of the Artificial Neural Network (ANN) model stage.

Figure 5 .
Figure 5. Stock and Flow Diagram (SFD) of the ANN dynamic simulation model with backpropagation for power consumption prediction.

Figure 6 .
Figure 6.ANN algorithm results for power consumption prediction error in the example presented.

Figure 6 .
Figure 6.ANN algorithm results for power consumption prediction error in the example presented.

Figure 6 .
Figure 6.ANN algorithm results for power consumption prediction error in the example presented.

Figure 7 .
Figure 7. ANN results precision (X axis) after training, for the period before the OH 1.Figure 7. ANN results precision (X axis) after training, for the period before the OH 1.

Figure 7 .
Figure 7. ANN results precision (X axis) after training, for the period before the OH 1.Figure 7. ANN results precision (X axis) after training, for the period before the OH 1.

Figure 8 .
Figure 8. Level-2 representation of the Association Rule (AR) mining step.

Figure 8 .
Figure 8. Level-2 representation of the Association Rule (AR) mining step.

Figure 9 .
Figure 9. Level-1 representation of the O&M Decision support process.

Figure 9 .
Figure 9. Level-1 representation of the O&M Decision support process.

Figure 10 .
Figure10.Project Timeline for the process implementation in the first asset (as a % of the total project make span).

Figure 10 .
Figure10.Project Timeline for the process implementation in the first asset (as a % of the total project make span).

Table 1 .
Asset case study definition, data source, for the process pump example.

Table 2 .
Sample covariance matrix and correlation coefficients, to identify the dependence between variables.

Table 4 .
Values for data normalization.

Table 4 .
Values for data normalization.

Table 5 .
Artificial Neural Network (ANN) algorithm results for the regression problem (with 20 N/HL).

Table 5 .
Artificial Neural Network (ANN) algorithm results for the regression problem (with 20 N/HL).

Table 5 .
Artificial Neural Network (ANN) algorithm results for the regression problem (with 20 N/HL).

Table 6 .
Benchmarking the result of the different predictive analytics techniques.

Table 7 .
Time intervals analyzed in our application.

Table 8 .
List of variables, minimum values, maximum values and interval sizes.

Table 9 .
Example of the ARs extracted.

Table 10 .
Efficiency comparison among the identified time intervals.

Table 11 .
Rules comparison among the identified time intervals.