Development of a Methodology for Condition-Based Maintenance in a Large-Scale Application Field

: This paper describes a methodology, developed by the authors, for condition monitoring and diagnostics of several critical components in the large-scale applications with machines. For industry, the main target of condition monitoring is to prevent the machine stopping suddenly and thus avoid economic losses due to lack of production. Once the target is reached at a local level, usually through an R&D project, the extension to a large-scale market gives rise to new goals, such as low computational costs for analysis, easily interpretable results by local technicians, collection of data from worldwide machine installations, and the development of historical datasets to improve methodology, etc. This paper details an approach to condition monitoring, developed together with a multinational corporation, that covers all the critical points mentioned above.


Introduction
The increasing demand for more complex automation machines endowed with high efficiency, reliability, safety, and product quality requires automation companies to develop and improve cyber-physical systems (CPS) and Internet of Things (IoT) systems.These technologies manage interconnected physical systems such as actuators and sensors with cyber-computational capabilities, for example in the case of computer networks, intelligent data management for Big Data, and analytical proficiency [1].The increasing importance of machine reliability involves the use of more efficient methods for equipment maintenance [2].As a matter of fact, time-based maintenance (TBM) is flanked by condition-based maintenance (CBM) [3].The TBM method consists of scheduled preventive maintenance with an estimation of the mean time between failures (MTBF).This method is very conservative because maintenance is periodically executed without the certainty of preventing incipient and random failures.The CBM method allows for real-time diagnosis of machine health.With the CBM approach, it is possible to predict critical failures of the machines several weeks in advance and achieve zero-downtime performance [4,5].In this way the machine uptime increases, the waste cost caused by unplanned stoppages is minimized, and the warehouse for the spare parts is optimized.In the industrial field, this approach also transforms human service work by improving the collaborative human-machine skills for decision-making with respect to maintenance.The collaborative actions between condition-monitoring systems and human service operation involve a socio-cyber-physical system (SCPS) [6].These systems are linked in a global production network where the interaction of global and individual decision-makers acts in a different way for each sub-system [7].The new decision architecture needs a high training level from the base (i.e., service engineers, stakeholders) to the top (i.e., management) and a very efficient communication network.Without one of these two elements there is a high probability of the creation of idiosyncrasies in the SCPS that will decrease the sustainability and competitiveness of the production system [8].In the last decades, the scientific community has developed new technologies and methodologies for condition monitoring, in accordance with the hardware available and adopted by the industry.In addition, cloud computing has become the symbol of the so-called 4.0 technology.The direct result is big-data analysis or data-driven analysis, which refer to the capability to analyze large datasets collected on the cloud, often through the use of expert systems.Diez-Olivan et al. [9] study an anomaly detection system by characterizing and modeling operational behaviors.The learning framework is performed on the basis of a machine learning approach that combines constrained k-means clustering for outlier detection with fuzzy modeling of distances to normality.The proposed solution is deployed in a CBM platform for the on-line monitoring of assets.Zhang et al. [10] propose an adaptive discrete-state model to estimate the remaining lifetime of the system based on Bayesian belief network (BBN) theory, to be used in data-driven diagnostics.Boškoski et al. [11] focus on the features trend modeling in the on-line remaining useful life (RUL) of bearings.They propose an approach for bearing fault prognostics that employs Rényi entropy-based features.This exploits the idea that the progressing fault results in an increasing dissimilarity in the distribution of energies across the vibrational spectral band which is sensitive to the bearing faults.Youree et al. [12] propose a data-driven generalized multivariate statistical analysis technique for prediction of impending failures in electronic and electromechanical equipment.Statistical analysis algorithms, integrated into a predictive fault detection statistical analysis engine, operate on heterogeneous streams of data from sensors that monitor selected equipment structural and functional parameters.The statistical analysis engine applies the trending results to determine the most probable trend, which is related to the requirements for scheduling of equipment maintenance actions.Kruger et al. [13] propose an effective and easy adaptable multivariate data-driven method for wind turbine monitoring and fault diagnosis, which consists of three parts: (1) an off-line training process; (2) an on-line monitoring phase; and (3) an on-line diagnosis phase.Langone et al. [14] effectively use least squares support vector machines (LS-SVMs) for early fault detection in an on-line fashion.In particular, they are able to distinguish between normal operating condition and abnormal situations in a vertical form fill and seal (VFFS) machine, and accurately predict the evolution of dirt accumulation in the sealing jaws.Yan and Lee [15] present a hybrid method for on-line assessment and performance prediction of remaining tool life in drilling operations based on the vibration signals.Logistic regression analysis combined with maximum likelihood technique is employed to evaluate tool wear conditions based on features extracted from vibration signals using the wavelet packet decomposition technique.The auto-regressive moving average model is then applied to predict remaining useful life based on tool wear assessment results.Alpay et al. [16] propose an on-line anomaly detection technique using a hybrid method which combines first-principles (physical) models with data-driven (empirical) models.The model output variance estimation technique is used in a statistical test to determine whether observed output measurements are statistically too far from expected output values (for given inputs) to declare that an anomaly has occurred.Park et al. [17] proposed a conceptual paper on the use of IoT for condition-based monitoring of rolling stocks.It integrates reliability, availability, maintainability, and safety (RAMS)-based maintenance methods, and IoT.RAMS-centered maintenance provides powerful rules for deciding a failure management policy, based on the estimation of probability distribution function for the real-time condition monitoring of components.
This paper presents a methodology of a modular CMS developed and used in real smart factories.The architecture defines a line guide for future implementation and enhancement in a large-scale market, taking into account the challenges of big data volume, data analysis, and collaborative work in a SCPS.This paper is organized as follows.In Section 2 the architecture of a CMS of a real industrial plant is explained.Section 3 is dedicated to the measuring and data acquisition, and Section 4 is focused on the data processing methods.Section 5 concludes the paper.

Condition-Monitoring Architecture
The purpose of the condition-monitoring application in the packaging machine is to reduce its unexpected breakdowns in order to, in turn, increase machine up-time (avoiding unplanned stoppages by predicting failures), reduce waste (for the same reason), and optimize operational costs (with optimal maintenance tactics based on prediction).This is achieved through a constant monitoring of critical functions to predict failures with the possibility of initiating maintenance, before the failure occurs, through regular alerts and insights.With predictive maintenance, the commitment to reliability is taken to the next level by predicting failures before they occur.Real-time monitoring of critical areas of the equipment is used to find deviations in machine functions that could lead to pauses in machine activity or breakdowns.In this process, knowledge of critical functions, expert analysis, alerts, and skilled staff to execute the event are instrumental and fundamental.The design of a CMS directly depends on the plant and the functions of each system because failure causes and effects are different.Nevertheless, the high-level architecture of the condition-monitoring system presented in this paper can be extended to any manufacturing company.
At the basis of an effective condition-based monitoring system there is an initial failure mode, effects, and criticality analysis (FMECA).FMECA analysis is a critical and powerful tool, developed by reliability engineers in the late 1950s, to highlight failure modes with relatively high probability and severity of consequences, allowing remedial efforts to be directed to the point where they will produce the greatest benefits.These benefits are precisely quantified in terms of saved costs and they are the best incentive to introduce condition-monitoring systems in industry.This paper does not cover FMECA analysis, since it should already have been performed in order to identify the most critical components.The aim of this paper is to give hints to the reader in order to build the condition monitoring system.This process ideally starts after the FMECA analysis.All the mechanical components shown in the rest of the paper are the results of an in-depth FMECA analysis that took more than one year to complete.Several books on reliability cover the foundations of FMECA analysis.The interested reader could start, for example, with the work of Birolini [18].
At the end of the condition-monitoring process, every industrial plant must have a performance management center, i.e., a team of data-scientists, managers, reliability engineers, and skilled service technicians, constantly updated on the status of the fleet of monitored components.The team schedules the interventions on the basis of a preventive maintenance policy, building a database of all the service actions, and computing statistics on the reliability of the components.The team is ready to intervene in the event of an alarm by the monitoring system and condition-monitoring algorithms based on the new data from the field are constantly updated.The performance management center implements all that is necessary for the correct management of the reliability of the system.As an example, the uncertainty quantification is fundamental for properly planning (preventive) maintenance policies.A list of these actions is out of the scope of the present paper.The interested reader could find details, for example, in the work of O'Connor and Kleyner [19].
This paper takes a packaging filling machine into account in order to prove the feasibility of the methodology.In particular, the industrial plant consists of one or more core functions (e.g., filler machines), infrastructure functions (e.g., packaging buffers and conveyors), accessory functions (e.g., cap and straw applicators), output functions (e.g., card board packers) and supervision functions (e.g., the packaging line monitoring system).
The processing of the collected data is divided into three parts: pre-processing that is carried out in the customer's factory, cloud-processing that is performed in the cloud, and post-processing for the management of the critical states.
The flowchart of data and information throughout the process is outlined in Figure 1.

Data Acquisition
The data acquisition (DAQ) step involves the setup of the sensors on the machine, the acquisition device, and a central unit that manages the data logging.The sampling of the data can be implemented in two ways:

•
Continuous Condition Monitoring; Sensors are recorded continuously.This sampling policy is recommended for those critical components with a high impact on the costs and a short time-to-failure.

•
Periodic Condition Monitoring; Sensors are recorded at scheduled time intervals.This policy is particularly suitable for components with a medium-high time-to-failure.
In a condition-monitoring system with a large-scale applications there are inevitable delays related to information management: the acquisition of the data, the local pre-processing, the transfer on the cloud, the subsequent post-processing, the feedback from the data analysts, and the service support in fixing the problem.As a consequence, a robust condition-monitoring project should work mainly on periodic sampled data, keeping a time margin for the task processing.Nevertheless, critical components can be taken into account reserving computational slots, priorities, or managing the data collection on the spot for the prompt feedback of the service engineer.The periodic condition monitoring gives a margin of time to collect data from a fleet of sensors, one-by-one, limiting the stream of data and the computational resources.This architecture of the sampling policy allows us to easily update the extension of the number of sensors.The acquisition setup-i.e., the sampling frequency and the acquisition time-depends on the specific sensor and must be determined on the basis of the specific processing defined in the development stage of the condition-monitoring system.For example, the temperature changes within intervals of minutes, while vibrations need to be acquired thousands of times per second.
The data can be divided into two main classes: on-line data and off-line data as detailed below.

On-Line Data
In this paper, the term "on-line data" refers to informative data acquired in the working conditions of the machine.This is in contrast "off-line data", which is collected during event independent of the working conditions.For our purposes, the "on-line data" is data collected by specific sensors in order to measure the state variables of the system.The choice of the type of the sensors, their placement, and the schedule of the data collection require the knowledge of the process, a bibliographic survey, and an analysis of the impact of sensor costs on the process.Generally, the sensors can be divided into three main classes:

•
Multi-purpose external sensors; They are the most used sensors for condition monitoring.
They can be applied to different components (multi-purpose), measuring the effects of impacts or events in time domain and include for example accelerometers or external temperature sensors.These sensors are not usually present in the machine and represent an extra cost for maintenance.• Specific external sensors; They are used for a specific measurements in specific parts of the machine.Sometimes multi-purpose sensors cannot be used because of the impossibility of installation, such as environmental conditions or some possible mechanical interference with moving parts during the process.Sometimes a specific measurement is needed in a very limited but critical part of the plant, for example chemical analysis.These sensors are not usually present in the machine and represent an extra cost for maintenance.Moreover, the specificity of the measurement implies a higher cost of the sensor with respect to a multi-purpose sensor.

•
Embedded sensors; They are already present in specific components of the machine, since they are used by control logics for the correct operation of the machinery.They do not represent an extra cost for maintenance.For example, in the modern servomotors there is always an encoder for position measurement, an embedded amperometer (often by means of two simple Hall sensors) for the measurement of the current absorbed by the mains, and a temperature sensor (often embedded in the encoder) for the measurement of the heat inside the motor (or at least a positive temperature coefficient (PTC) thermistor in the coils for detection of over temperature).
The main sensors used in the real case example are listed below: • Accelerometers; These measure the vibrations of the mechanical components (e.g., rotating shafts), giving a picture of the inner health of the machine.Every month hundreds of scientific papers on the use of accelerometers for diagnostics purposes are published (multi-purpose external sensors) [20][21][22][23].

•
Encoders; These measure the position of rotating parts (e.g., shafts), providing a flag at each complete rotation.In particular, encoders are increasingly present in electric motors, embedded in any servomotor with a high angular resolution (e.g., 4096 ticks per revolution).
Together with accelerometers, they allow the diagnostics of the components in the angle-domain, that is, a reconstruction of the vibration signal based on the actual rotation of the component, providing immunity to speed fluctuation which can make the noise-to-signal ratio worse [24][25][26] (embedded sensors).

•
Current/torque sensors; These are embedded sensors necessary for the correct operation of an electric motor.The current absorbed by the motor is proportional to the torque load applied to the motor shaft.It is straightforward that any change in the working conditions of the motor (e.g., an increase of the wear) increases the torque load and consequently the current requested (embedded sensors).

•
Pressure sensors; In order to avoid any possible interference between the moving parts of the package forming line and the cables of the sensors, it is necessary to introduce pressure sensors for the indirect measurement of the wear on cutting knives (specific external sensors).

•
Temperature sensors; These measure the temperature of specific components.In particular, servomotors can have an embedded temperature sensor to measure the heat inside the motor (embedded sensors).
The total number of the sensors depends on the size of the machine under control, the critical key points, and the budget available for the condition-monitoring area.

Off-Line Data
In this paper, the term "off-line data" refers to informative data asynchronous to the working conditions of the machine.This is the opposite of "on-line data" which is collected during events synchronous to the working conditions.For our purposes the "off-line data" is event data, i.e., the list of all technical interventions performed by service engineers.These events cover scheduled service interventions, unexpected service interventions, and production conditions of the machine.Examples of scheduled service interventions are firmware and software updating, and preventive maintenance of specific components.Examples of unexpected service interventions are breakdowns of mechanical or electronic components.De facto, the minimization of this type of interventions is the target of every condition-monitoring system.Examples of production condition events are the starting and the stopping of production and the substitution of consumables.Some off-line data can be acquired and stored automatically, for example the stopping or starting up of the machine, but most of the off-line data is manually inserted by the service engineers that perform the technical interventions or by the after-sales department which defines the scheduled operations.Off-line data is essential for condition-based maintenance and much more in the development step of the data-driven processing, showing the difference between supervised and unsupervised methods (for more details see [27,28]).The collection of the off-line data can be hardly demanded for an automated system.Consequently, proper training of the service engineers is fundamental for all the companies that want to do maintenance.By experience, not all events are declared for different reasons (the periodicity of an event may not be of concern, or is not recorded), although greater awareness of the consequences of one's work can minimize the missing events.

Data Pre-Processing
The data collected from a single machine must be pre-processed locally, before data is sent to a high-level storage structure that will be described in the Section 2.4.The main reasons are to reduce the amount of data to be sent to the cloud platform and to decrease the latency in the decision-making process.The cloud platform costs depend on the amount of data processed, therefore aggregated data is preferable for cost reduction.Moreover, quick pre-processing can give fast alarms, since it is possible to detect the problem before the entire row of data is logged in the cloud, and send the alert to the machine promptly.
The local architecture of the condition-monitoring system is made up of: • An industrial PC (iPC) for data manipulation; • Data logger hardware for the acquisition of external sensors; • Fieldbus (IEC 61158) network for data communication between the iPC and motor drives (or other embedded sensors).
The main functions of the pre-processing step are the following: • Removal of empty or incomplete files; The condition-monitoring system records data regularly.Only a few sensors at a time collect data so as to reduce computational efforts.It could be that specific parts of the system are not working during the time frame when the corresponding sensor is acquired, generating empty or incomplete files.These files must be removed to free memory space on the storage device.

•
Checking of the sensors; The measurement files are checked for inconsistency of data.Especially in manufacturing machines, processes are repeated cyclically and the expected data from sensors must contain cyclic components too (e.g., at the productivity frequency of the machine).If the data recorded by a given sensor does not show cyclic components in the spectrum, it is due to a problem on the measurement chain: the sensor, the cable, or the acquisition system.The inconsistency of the data must generate an alarm to the service engineer that will schedule a check of the sensor.
• Calculation of statistics; The computational capacity of the modern industrial personal computers allows statistical analysis on the acquired data, such as the root mean square (RMS) value, variance, kurtosis, quartiles, etc.The main advantage is data reduction; each statistic is a single scalar value compared to the thousands of points acquired by each sensor.Statistics are the features that the data-driven diagnostic method uses to make the post-processing analysis.

•
Selection of specific data; If the post-processing based on data-driven analysis reports an incipient fault, a more detailed model-based analysis is performed.The performance management center can ask the local unit for specific data useful for a targeted analysis.The local unit sends those specific raw data to the cloud.

•
Storage of data; The data is locally stored for a limited period of time with a backup policy (when the storage space ends the new file overwrites the oldest one).The storage is needed to provide selected raw data if asked.

•
Sending of the data to the cloud; All relevant data, i.e., the statistics and the off-line data, is sent to the cloud for the post-processing step.

Data Cloud Processing
The data cloud processing mainly consists of cloud-computing data management.The statistics and off-line data for different machines are collected on the cloud and plotted with respect to time to monitor the evolution of the data constantly.Today, cloud-computation providers assure sufficient computational power to run complex algorithms and most of them already implement a Python or R-language console.De facto, these free programming languages are common languages of data scientists for statistical computing.More recently, some cloud providers have offered integration with well-known commercial software for mathematical computing.The main functions of the cloud-processing step are as follows: • Data-driven analysis; Statistics data from every monitored subsystem of the machine are analyzed by means of data-driven machine learning techniques, such as neural-networks, support vector machines, and clustering.The machine learning system generates alarms to the performance management center, i.e., the data-scientists, who can query the local system for a more detailed analysis on specific data.

•
Data transfer; The off-line data does not need further processing.In this case, the cloud acts as a simple storage device; the analysts pick up the off-line data collected from different machines for the off-line development of condition-monitoring techniques.
It must be noted that the development and the training of the machine learning techniques are not performed on the cloud, but at the performance management center.The software implemented in the data cloud-processing must be ready-to-run in order to avoid interruptions of the servers.

Data Post-Processing
The data post-processing mainly consists of reporting, decision support and detailed analysis of the data.In particular, the main functions of the post-processing step are as follows: • Reporting; The condition-monitoring outputs are divided into several reports on the state of the sub-system components.The stakeholders of condition-monitoring reports are varied: service engineers, managers, consultants, and external service providers etc., and each of them needs different pieces of information.

•
Decision support; The reports are used by the performance management center, i.e., a structured support service, in order to update historical data for modeling upgrade development, analyze criticality, query advance failure analysis of specific components, and manage the technical service.
• Model-based analysis; Once an alarm is received by the cloud-processing, more advanced signal processing tools can be used to assess more details on the fault, for example, if there is a fault in the inner or outer ring of a bearing.

•
Service; If some problems are identified, a report of the situation is sent to the service engineers through a IoT device.In this way the service engineers can monitor the state of the plant at any time and in case of alarm they are warned promptly.Thanks to the analysis service, the service engineers are not only warned about an incipient failure but they are also informed about the procedure necessary for the maintenance, whether it is necessary to order the broken part, and whether it is available in the warehouse.
A web platform with a custom-driven application programming interface (API) must be developed as an infrastructure in order to satisfy different requirements of both data scientists and service engineers.They can retrieve data by using query methods or read reports through PCs or smart-phones.

Condition-Monitoring Algorithms
Condition-monitoring algorithms are the foundations of the maintenance policy, since they allow a reliable and fast response to incipient faults.They can create, in the customer, a feeling of confidence in condition monitoring or destroy it completely in the case of missing or false alarms.The definition of a proper algorithm requires a lot of time and its value cannot be underestimated.
Several algorithms are suggested in the scientific literature every day.Each component under test has its own fault modes, i.e., a characteristic type of fault is one due to wear and based on its geometry and dynamic conditions.For example, ball bearings are one of the most common components in mechanical design and their fault modes are related to the working conditions.In particular, the bearing is made up of an outer ring, an inner ring, rolling elements, and a cage.Each part of the bearing can be subject to damage, which can differ in the periodicities of impacts.These differences allow the recognition of the damaged components.Despite the number of possible customized components, the most common components in mechanical design are standard ones, such as bearings, gears, shafts, and electric motors, regardless of the specific industrial field.As a consequence, an initial bibliographic survey on scientific journals is the starting point for the development of a proper condition-monitoring algorithm for the data processing.
The data flow starts from the raw data acquired by sensors to the final output, usually in limited dimensionality such as binary output or low-dimensional output.The data flow can be divided into three main classes:

•
Data cleaning; This includes all the procedures activated to remove inconsistent data, for example, empty measurement files, corrupted files, disconnected sensors, and broken cables, etc.This is not a proper condition-monitoring technique but it is a preparation process.

•
Fault detection; This includes all the procedures suitable to recognize a fault in the system.It does not usually return the specific causes of the fault, only its presence.In most cases, anomaly detection techniques are sufficient for industrial purposes.If there is a faulty bearing in an electric motor, the motor must be completely replaced regardless of whether the fault is in the outer ring rather than in the inner one.

•
Fault diagnostics; This includes all the procedures suitable to characterize the fault of a specific component and the level of the damage of the component.It is also the starting point for the estimation of the residual life of the component (prognostics) [29].Fault diagnostics techniques are useful for redesigning a component: the detailed knowledge of the fault can suggest a better design to reduce the loads in working conditions, extending the expected life of the component.
Focusing on the fault detection and fault diagnostics techniques, the scientific literature can be divided into two main classes as well: • Data-driven techniques; For the purposes of this paper, data-driven techniques are only used for fault detection.

•
Model-based techniques; For the purposes of this paper, model-based techniques are used only for fault diagnostics.

Data-Driven Techniques
Data-driven techniques are not related to the physical system they model, but only to the input data, independent of the type of sensor.These techniques basically provide a metric of similarity among data.Common metrics are the Euclidean and the Mahalanobis distances.Machine learning techniques are an example of data-driven techniques.They require a training step and a testing step.The training step defines the expected dataset for faulty and healthy components.In this step, the off-line data defined in Section 2.1.2has great importance since it locates time instants corresponding to the breakage of a component.Data before and after that time instant can provide a good example of faulty and healthy conditions to be used in training.The testing step is the application of the machine learning techniques to the new input data.A greater similarity between the recorded data and the faulty or healthy datasets determines the actual health status of the component.The machine learning techniques that need a training step are also known as "supervised" learning techniques.Conversely, "unsupervised" learning techniques may not need a training step, depending on the method that is used: for instance, one-class SVM needs a training phase [30], while artificial immune systems do not [31].These techniques try to describe the data distribution of a healthy state (or a faulty one) in a complete way, so that any metric variation is an indicator of a faulty state (or a healthy one).It must be noted that machine learning techniques need a sufficient amount of historical data for training, but they also need training datasets that cover all possible fault events.
Hundreds of data-driven techniques have been developed in the literature and an exhaustive list is out of the scope of this paper.Relevant review papers have already been reported so far [32][33][34][35], demonstrating the capability of these techniques in different fields of application.Based on the direct experience of the authors, three machine learning techniques are presented below:

•
Artificial neural networks (ANNs); This technique tries to mimic the biological neural networks and the way in which the pieces information are managed by the human brain.It builds a weight matrix trying to reward or penalize input features based on the error output in the training step.One or more layers, i.e., weighting matrices, can be chosen.The key component of the ANN is the backpropagation algorithm that distributes the error term back up through the layers by modifying the weights at each node.The ANN technique has been used in several research fields [36][37][38][39][40][41].

•
Support vector machines (SVMs); The SVM technique [42] computes a hyperplane that divides faulty and healthy data by maximizing the distance of the hyperplane to the datasets.The dimension of the hyperplane depends on the dimension of the input data features.The key component of the SVM is the choice of kernel function, the purpose of which is to project data in a high-dimensional space where the data can be separated by the hyperplane.Once defined, the hyperplane acts as a threshold, classifying new input data into the two classes .Examples of the application of SVMs to condition monitoring can be found in [23,[43][44][45][46][47].

•
Autoassociative kernel regression (AAKR); This technique predicts the health status of a component thanks to the historical data deriving from a healthy dataset.New inputs are compared to the prediction of the healthy state.The difference between the two signals, i.e., the residual, is used as a metric to assess the health status of the component.Examples of AAKR applications to condition monitoring can be found in [48][49][50][51][52].
All the machine learning techniques need, as input, a subset of the acquired data.Since sampling frequencies of some sensors could exceed 10 kHz for more than 10 s, it is unthinkable to work with weighting matrices of 100,000 × 100,000 in size.Statistics are usually computed on the input data, reducing the weighting matrices to a 10 × 10 size (as an order of magnitude).The type of statistics and their number are the results of a trial-and-error process, depending also on the specific system under testing.Nevertheless, basic statistics, which describe the probability density function of a variable, are good attempt values, and include: • RMS; This is defined as the square root of mean square; • Variance; This is the second central moment of a real-valued random variable; • Skewness; This is the third central moment of a real-valued random variable; • Kurtosis; This is the fourth central moment of a real-valued random variable; • Quartiles; These are the 25th, 50th and 75th percentiles of the input variable.
In some cases, even parameters linked to the dynamics of the machine are relevant, for example the hourly capacity of the machine during the acquisition of the sensors.
Once trained, machine learning techniques do not require high computational efforts and return a fast classification of the new input data.For these reasons, they are particularly suitable for cloud computing and can be used for the cloud-processing described in Section 2.3.

Model-Based Techniques
In the introduction of their three-part papers on process fault detection and diagnosis, Venkatasubramanian et al. [53] give a clear and exhaustive description of model-based approaches.Model-based techniques require a priori knowledge of the set of failures and the relationship between experimental data (observations) and failures (causes).This relationship is developed by using frequency-response models or dynamic models.Venkatasubramanian et al. divide the model-based methods into two classes: qualitative or quantitative."The model is usually developed based on some fundamental understanding of the physics of the process.In quantitative models this understanding is expressed in terms of mathematical functional relationships between the inputs and outputs of the system.In contrast, in qualitative model equations these relationships are expressed in terms of qualitative functions centered around different units in a process [53]".In automatic control, the quantitative modeling of physical system is the core part of the so-called system identification.This research field uses statistical methods to build mathematical models of dynamical systems from measured data.De facto, the system identification determines the transfer function between input and output.By abstraction, the model of the system can be represented as a box connecting inputs (working conditions) and outputs (measured data).This box can be classified into three main classes: • White-box model; This is a model based on first principles, e.g., the Newton-Lagrange equations.It requires a deep knowledge of the system: the geometry, external loads and torques, characteristics of the materials, the type of interactions among components (e.g., friction, or impacts), masses, etc.In many cases such models will be overly complex due to the complex nature of many systems and processes.It must be noted that the development of a white-box model is not a one-shot activity but it must be continuously developed, adding more details if necessary.Examples of white-box modeling can be found in [54][55][56][57][58]. • Black-box model; No a priori model is available.The input/output relation of the system is statistically computed not considering the physics of the process at all.Most system identification algorithms focus on this type.The black-box model is similar to data-driven approaches, which are not further considered in this paper.

•
Gray-box model; This model is in between the white-box and the black-box models.Although the peculiarities of what is going on inside the system are not entirely known, a certain model based on both insight into the system and experimental data is constructed [59].The resulting model still has a number of unknown free parameters which can be estimated using system identification.An example of a gray-box is the modeling of the expected signal produced by a faulty system (i.e., the output signal of the system).In this particular case the gray-box model has been studied in depth in the literature (e.g., a ball-bearing) and it is used to simulate the expected output signal in different working conditions.The condition-monitoring analyst can use the simulated signal to develop and validate signal processing techniques.Examples of fault modeling can be found in [60][61][62][63][64][65][66][67].
Based on the level of detail required, the development of a model-based technique requires more time than a data-driven model.The model of a physical system depends on the characteristics of the system itself.Consequently, it is not possible to indicate a common development methodology that could be extended to a general physical system.Analysis of the scientific literature is the first step to modeling.Further assistance could come from specific commercial software for the modeling of physical systems, but the analysis of the physical process that takes place is unavoidable.Due to complexity and the demanding computational time, model-based techniques are particularly suitable for off-line computing of specific subsets of data.Results are generally better than those obtained by means of data-driven techniques, since the description of the fault cause is identified better.As mentioned in Section 2.4, analysis of data in advance is useful for the technical development of the redesign of components, in order to optimize geometry and to maximize the expected life of the component.

Results
The condition-monitoring system described in Section 2 is the result of the experience of the authors, who applied it to a fleet of industrial food packaging machines in the last decade.
Aseptic packaging machines are complex systems, with several electric motors, complex dynamics, and specific processes that guarantee the packaging of solid or liquid food in a sterilized chamber with high standards of safety for the customer.The condition-monitoring system is a further service offered to the customer in order to increase productivity and reliability.When applied to the packaging machines, the advantage of the proposed condition-monitoring framework is in terms of a strengthened relationship with the customer through mutual trust, thanks to win-win situation where the client saves days of unplanned stoppages and the supplier delivers a digital service.The customer also obtains more reliable production planning.
Following the process steps detailed in Section 2, the results for the real case application are reported below.For the sake of clarity, only pertinent results are reported to better highlight the potentiality of the methodology.

Data Acquisition
Several parts of the machine have been sensorized by means of external sensors, while other pieces of information are recorded directly by the programmable logic controllers (PLCs) and drivers controlling the electric motors.The numbers and types of on-line sensors used are listed in Table 1.Off-line data refers to the list of services performed in the machine, and the timing of power-on and power-off for each subsystem of the machine.Data is acquired daily with the acquisition setup that depends on the specific sensor used.

Data Pre-Processing
An iPC is responsible for the data acquisition, data manipulation, and logging of the data to the cloud.The functions managed in this step are detailed in Section 2.2: removal of empty or incomplete files, checking of the sensors, calculation of statistics, selection of specific data, storage of data, and sending of the data to the cloud.Figure 2 shows an example of sensor checking.The figure compares the signals acquired by an accelerometer mounted correctly (Figure 2a) and an accelerometer mounted incorrectly (Figure 2b).The checking of the sensors is an easy task but is very useful in order to reduce the processing time, save money, schedule technical services, and improve the reliability of the system.For example, for periodical interventions for sensors disconnected because of vibration, the use of thread-lock in the setup of accelerometers was suggested.Thanks to this simple solution, the number of disconnected sensors was dramatically reduced.

Data Cloud Processing
The statistics of the on-line data and the raw off-line data are stored in the cloud servers.Data-driven techniques elaborate input data and monitor the health status of the components.
Among the statistics listed in Section 3.1, two parameters proved to be good indicators of the healthy status of different components:

•
Root mean square (RMS); This returns a measure of the mechanical and environmental noise affecting the sensor in healthy conditions.A high level of RMS may not be necessarily related to a fault, but it could be the consequence of environmental conditions.The evolution of the RMS, rather than its absolute value, is an important indicator for condition monitoring; • Kurtosis; It is well-known in literature [68,69] that a high kurtosis value is related to the presence of spikes in the vibration data, e.g., due to impacts between mechanical parts.A steady increase of the kurtosis value is symptomatic of possible faults.
The trends of RMS-kurtosis in the time-domain are generated as reports in the cloud-processing step.Moreover, a support vector machine based on these two parameters has been developed for the anomaly detection of ball bearings [43].In particular, the system under observation was a machine for the packaging of liquid products, the brushless AC motors MPL-B680B by Rockwell Automation, mounting a NSK 6309 single-raw bearing.Thirteen bearings were available; 7 of them were healthy and 6 were faulty at different levels of severity.The faulty bearings came from the field and were opened lastly, verifying the presence of a fault.Eleven bearings were tested at three different hourly capacities.Two bearings, one faulty and one healthy, were put aside for a further test of the SVM on never-seen-before bearings.Due to the cyclic motion of the motor, the acquired vibration signal was split into single machine cycles providing 1584 data-array samples: 1109 samples (70%) were used for training of the SVM, and (30%) for the testing (Table 2a), showing the confusion matrix for the resulting SVM using both RMS and kurtosis values of the data as inputs.All the samples are correctly recognized.Indeed, the anomaly detection, i.e., the classification between a healthy case and a faulty one, is a simpler task than the classification of the faulty case into subclasses (e.g., fault in the outer rather than in the inner ring of the bearing).Nevertheless, the choice of the input parameters is crucial, requiring a trial-and-error approach during the development of the data-driven technique.
As an example, Table 2b shows the confusion matrix of the resulting SVM, if the RMS value only is used as input.Table 2. Support vector machine (SVM) confusion matrices: (a) using both root mean square (RMS) and kurtosis as inputs; and (b) using only the RMS.The developed SVM was also tested on two bearings never used during the training and testing of the SVM.One bearing was healthy, while the other was faulty.The SVM correctly classified both the bearings.

Actual Healthy Faulty
RMS and kurtosis have been successfully used for the data-driven condition monitoring of other components in the packaging machine.For example, they have been able to detect anomalies in the following cases:

•
Loosening of a belt; For this type of failure mode, the system is able to detect a variation in the working conditions of the machine.This generally will end with nonstandard wear of the component due to the changing of the working conditions and a failure of the applications.The time between the detection and the functional failure can be weeks, depending on the application itself and the working conditions; • Faulty ball bearing; This is strictly dependent on the application, motion profiles, and load condition of the bearing, but it is generally detected several weeks before catastrophic failure.This is sufficiently early to schedule the replacement intervention and avoid the unplanned stoppage of the machine;

•
Poor lubrication; This depends on environmental conditions (e.g., humidity and temperature).
In the case of complete missing lubrication, the degradation of mechanical components is much faster than a general wear and detection is less effective.Detection is done as soon as the point of interest is deviated from the standard working conditions and in general this is sufficient to prevent the damage of the component; • Wear of surfaces; This is strictly dependent on the application, the motion profiles, the load condition, and environmental conditions.In particular, the monitoring system detected the loosening between a bearing and its seat.The mean time between the detection and failure is quantifiable as two months, but statistical evidence is still missing.

•
Loosening of an elastic coupling; The detection depends greatly on all the kinematic chain and stress conditions of the component.The mean time between the detection and failure is quantifiable as a few days but statistical evidence is still missing.
In case of an alarm from the data-driven algorithm, the performance management center can send a service engineer to fix the problem or query the cloud infrastructure for specific raw data, providing an advanced analysis.

Data Post-Processing
The performance management center (PMC), i.e., a centralized service department, analyses the recorded data continuously , checks the status of the fleet of machines, manages technical services and develops a future release of the condition-monitoring system.The functions managed in this step are those detailed in Section 2.4: reporting, decision supporting, model-based analysis, and service management.
The ball-bearing diagnostics of servo-drive-motors show the potentiality of a model-based condition-monitoring system.Even if ball-bearings are commercial components, the working non-stationary conditions make the diagnostics challenging and non-trivial.All the related literature is based on constant speed applications.Only in the last years has the scientific community moved the focus to variable speed applications.As a consequence, ball-bearing diagnostics have been studied both modeling the physical system [70][71][72] and the expected vibration signal [61,73,74].
Figure 4 shows the output spectrum of vibration data that reveals the presence of an inner ring fault, as experimentally checked subsequently.The specific algorithm used for the analysis of bearings was published in Cocconcelli et al. [75] and is pending patent.

Conclusions
This paper presents a condition-monitoring methodology used in order to develop a condition-based maintenance program in a real industrial plant that could be easily scaled by both small companies and multinational corporations with a fleet of installations.It defines the guidelines for a solid CMS architecture, suggesting a hybrid approach between the classical model-based maintenance used so far and the modern data-driven approach made available as an output of big-data technologies.The proposed methodology complies with the dictates of Industry 4.0, including the advantages of the IOT, cloud computing, and cognitive computing, while linking them with solid foundations of physical modeling.The architecture of the condition-monitoring system is divided into four steps: • Data-acquisition setup, i.e., the hardware infrastructure; • Data pre-processing, responsible for data cleaning and quick alarm monitoring; • Data cloud processing, responsible for data-driven analysis and high-level condition monitoring; • Data post-processing, responsible for model-based analysis and decision support to the maintenance policy.
The resulting procedure can cope with various problems and different troubleshooting times.In particular, the second step allows a quick feedback for local problems (e.g., disconnected sensors) or condition monitoring of critical components that need a short time to be fixed.The third step requires a low computational effort (for a single query) and it can be extended to a large range of data, giving a wide-ranging vision of condition monitoring and multi-sensor fusion.The fourth step involves advanced signal processing techniques.This can require a high-computational effort but it can be applied to a limited set of key components.The suggested methodology is the result of the experience of the authors, who developed a condition-monitoring system for packaging machines.The methodology allows a scalable number of points of interest and a scalable number of components of the fleet.The procedure is validated on real industrial applications, reporting few but significant results for all the steps.
At the moment, the sensors used are not combined together, except in the monitoring of specific components.For example, the encoder signals are used to re-sample the accelerometer data from the motors in order to perform computed order tracking.In some cases, more sensors are checked in parallel in order to provide a confirmation of an abnormal operating condition of the machine.As a future development step, "data fusion" approaches will be addressed to obtain a reduction in uncertainty and more robust pieces of information from the sensors.

Figure 2 .
Figure 2. Comparison of the vibration signal acquired by an accelerometer mounted correctly (a) and an accelerometer mounted incorrectly (b).

Figure 3
Figure 3 shows the RMS-kurtosis map with the projection of the hyperplane that divides healthy and faulty training datasets.

Figure 3 .
Figure 3. RMS-kurtosis map with the projection of the hyperplane that divides healthy (red dots) and faulty (blue dots) training datasets.Toolbox developed by Prof. S.R. Gunn, University of Southampton.

Figure 4 .
Figure 4. Example of the model-based technique output for condition monitoring.Order tracking and demodulation of the vibration signal reveal the presence of a fault in the inner ring of the bearing.

Table 1 .
On-line data acquired in a single packaging machine.