## 1. Introduction

The primary goal of industrial Internet of Things (IoT) has been linking the operations and information technology for insight into production dynamics. This potential flexibility entails a floor of technologies made of distributed networks of physical devices embedded with sensors, edge computers, and actuators used to identify, collect, and transfer data among multiple environments. Such IoT-based cyber-physical systems (CPS) establish a direct integration of engineering systems into digital computer-based ones, where the measurement or sensing technologies play an essential role in capturing the real world. Data collected are then the main ingredient to lift efficiency, accuracy, and economic benefits with the added merits of minimum human intervention [

1,

2].

A consequence of such transformation is that the sensor layer, customarily used to measure, is now the means to map the actual status (of the process) into the cyber-world to derive process information, collect it in databases, and use it as a basis for the models which can be adapted (ideally by self-optimization) to the real situations. In this vein, the CPS provides a holistic view of the engineering systems and enables a bi-directional physical to digital interaction via multimodal interfaces [

3,

4]. Unlike the classic concepts derived from control theory, CPS forms the basis to describe even complex interactions and thus anticipate process deviations or interpretation and prediction of system behavior, diagnosis of exceptional events, explanation of causal mechanisms, and reactive response to urgent events [

5].

On the other hand, CPS are disclosing large quantities of data to create augmented knowledge about process control. These high-dimensional datasets typically include heterogeneous measures acquired through a large variety of sensors which may have different sampling rates and communication protocols [

6]. Following this trend, in recent years, there has been a rapid development of mathematical modeling techniques and analytical tools able to address the aforementioned problems, while meeting the new IoT requirements to assist process decision makers.

In this scenario, Artificial Intelligence (AI) is playing a major role to support development of intelligent systems in process control engineering [

7]. Examples are data preprocessing tasks, such as outlier removal [

8], the replacement of missing values [

9], and the definition of machine learning models for predictive and prescriptive maintenance [

10,

11,

12]. Moreover, AI can be used to model a system without the computational complexity of a full simulation or a physical analysis, learning functional dependencies between system components from the data [

13]. Even though most AI models are data-driven, overabundant high-dimensional data can have a negative impact on model behavior and efficiency, not only because of the computational complexity but also in terms of accuracy [

6]. The existence of redundant and noisy data contributes to the degradation of the performance of learning algorithms, making them less scalable and reliable [

14].

Remedial strategies must fundamentally select the most representative features of the original dataset, by using feature subset selection (FSS) techniques. FSS, part of the larger family of dimensionality reduction approaches, aims at extracting a smaller set of representative features that retains the optimal salient characteristics of the data, especially in terms of global information content.

The dimensionality problem is particularly evident in data collected from control monitoring systems in engineered processes and machines, due to the strong presence of redundant data related to physical quantities, with similar trends, monitored in different points of the system, or to parameters of different nature but strongly correlated [

15]. By detecting and removing irrelevant, redundant, or noisy features, FSS can alleviate the curse of dimensionality from industrial sensor networks, improving model interpretability, speeding-up the learning process, and enhancing model generalization properties [

16,

17].

FSS techniques can be both supervised and unsupervised depending on the availability of the data class labels used for guiding the search for discriminative features [

18]. Recently, unsupervised FSS has been attracting an ever growing interest, especially in the control and monitoring field, because of the lack of ground truth data in many real word applications [

19].

Traditional unsupervised FSS methods based on dependence measures, such as correlation coefficients, linear dependence, or statistical redundancy, are already widely used [

19]. Recently, feature clustering demonstrated its merit in terms of accuracy with respect to other unsupervised approaches [

20]. In addition, clustering algorithms outperform state-of-the-art methods in detecting groups of similar features [

21,

22], as well as in selecting metrics (one or more features) out of every cluster to reduce the dimensionality of the data with more or less granularity based on the application.

In this paper, we propose an unsupervised FSS algorithm based on a novel feature clustering technique tailored to time series collected from a real industrial IoT sensor networks. The clustering approach complements different tools from complex network theory, which are becoming promising in the field of nonlinear time series analysis for their ability to characterize the dynamical complexity of a system [

23]. In particular, we used visibility graphs [

24] to map time series in the network domain, then node degree sequences extraction [

25] to characterize them, and finally community detection algorithms [

26] to perform time series clustering.

The proposed method was tested on the sensor network of a 1 MW Combined Heat and Power Plant (CHP) central monitoring system. The heterogeneous dataset includes measurements from the engine, the auxiliaries, the generator, and the heat recovery subsystem. Finally, we compared with other traditional time series clustering methods in terms of redundancy and information content for FSS.

The rest of the paper is organized as follows. In

Section 2, we describe the background and related works. In

Section 3, we present the unsupervised FSS method. In

Section 4, we describe the case study. In

Section 5, we report experimental results to support the proposed approach. In

Section 6, we summarize the present work and draw some conclusions.

## 3. Methods

This section discusses the proposed method, starting from the problem of time series clustering up to the task of unsupervised FSS. Given a set of N time series $y={y}_{1},{y}_{2},\dots ,{y}_{N}$, the main steps of the proposed clustering approach are here summarized.

Remove time series noise through a low-pass filter.

Segment time series ${y}_{n}$ into consecutive non-overlapping intervals ${s}_{n}^{1},{s}_{n}^{2},\dots ,{s}_{n}^{T}$ corresponding to a fixed time amplitude L, where T is the number of segments extracted for each time series.

Transform every signal segment ${s}_{n}^{t}$ ($t=1,\dots ,T$ and $n=1,\dots ,N$) into a weighted natural visibility graph ${G}_{n}^{t}$.

Construct a feature vector ${k}_{n}^{t}=({\left({k}_{n}^{t}\right)}_{1},{\left({k}_{n}^{t}\right)}_{2},..,{\left({k}_{n}^{t}\right)}_{L})$ for each visibility graph ${G}_{n}^{t}$, where ${\left({k}_{n}^{t}\right)}_{i}$ is the degree centrality of the ith node in the graph and ${k}_{n}^{t}$ the degree sequence of the graph.

Define a distance matrix ${D}^{t}$ for every tth segment ($t=1,\dots ,T)$, where the entry ${d}_{ij}^{t}$ is the Euclidean distance between the degree centrality vectors ${k}_{i}^{t}$ and ${k}_{j}^{t}$. Every matrix gives a measure of how different every pair of time series is in the tth segment.

Compute a global distance matrix D that covers the full time period T where the entry $(i,j)$ is computed as ${d}_{ij}=\frac{1}{T}{\sum}_{t=1}^{T}{d}_{ij}^{t}$, averaging the contributions of the individual distance matrices associated to every segment.

Normalize D between 0 and 1, making it possible to define a similarity matrix as $S=1-D$, which measures how similar every pair of time series is over the full time period.

Build a weighted graph C considering S as an adjacency matrix.

Cluster the original time series by applying a community detection algorithm on the graph C and visualize the results through a force-directed layout.

Figure 2 illustrates the flowchart of the methodology.

After the initial stages of data filtering (Step a) and time series segmentation (Step b), for the transformation of every signal into network domain (Step d), we used the natural weighted visibility graphs. The natural variant was preferred to the horizontal one because it is able to capture properties of the original time series with higher detail, avoiding simplified conditions. The weighted variant, on the other hand, is used to magnify the spatial distance between observations that have visibility and thus avoid binary edges in favor of weighted edges in the visibility graph.

Since we used natural weighted visibility graphs to map time series into networks, for the extraction of a feature vector for each signal segment (Step e), we considered the weighted degree centrality sequence of the network, as suggested in [

84], because it is able to fully capture the information content included in the original time series [

25,

85].

Then, after the construction of the segment distance matrices ${D}^{t}$ and the normalized global similarity matrix S together with its graph representation C (Steps f–h), we used the modularity-based Louvain’s method in step (Step i) for community detection since extremely fast and well performing in terms of modularity.

To achieve a modular visualization of the clusters detected by the discussed method and their mutual connections, we used a force-directed algorithm, namely the Frushterman–Reingold layout, as a graphical representation.

Finally, for specific unsupervised FSS purposes, we considered a representative parameter for each cluster. Such parameters were identified based on their importance within the communities, by considering the signals with highest total degree centrality in their respective groups.

Every part of the proposed approach was developed in Python 3.6 [

86], using the Numpy [

87] and NetworkX [

88] packages.

## 5. Results

This section provides a detailed discussion about the experimental results obtained by the proposed approach, followed by a comparison with two traditional time series clustering methods.

Figure 4 shows the plot of the 78 standardized signals during a representative period of about two months. Data were extracted during a total measuring time of almost 11 months.

The total dataset was then analyzed by applying the method described in

Section 3. After the application of a low-pass filter for noise removal, Steps b–d of the workflow, time series were segmented into non-overlapping intervals

${s}_{n}^{t}$, then mapped into natural visibility graphs

${G}_{n}^{t}$, and finally feature vectors were extracted in terms of degree sequences

${k}_{n}^{t}$. Afterwards, in Steps e–g, a global distance matrix

D was computed by combining the contribution of all the distance matrices

${D}^{t}$, followed by the definition of the similarity between all the pairs of time series. The resulting similarity matrix

S is shown in

Figure 5.

As per Step h, the similarity matrix S is represented in the form of a weighted graph, also called similarity graph C, where each node corresponds to a specific signal and the edge weights quantify pairwise similarities between time series. To carry out the community detection phase, only the most important edges were taken into account. In particular, we performed edge pruning by filtering the pairwise similarities lower than the second quantile of their probability distribution.

Then, as for Step i, by means of the Louvain’s algorithm (see

Section 2.2.3), we identified 12 different communities within the filtered similarity graph, which globally cover 70 parameters.

Table 2 provides the detail of the variables contained in each cluster with reference to the parameter IDs presented in

Table 1.

The eight signals shown in

Figure 6 were not clustered since they were characterized by independent dynamics. This subset includes engine lube oil parameters, i.e., carter pressure, sump level, and pressure; generator parameters, i.e., power factor and reactive power; parameters in the fuel primary storage, i.e., tank level and pressure; and parameters in the exhaust gas treatment, i.e., urea tank level.

Time series clustering results are illustrated with reference to the functional groups shown in the block diagram in

Figure 3.

Most of the fuel parameters were grouped into two distinct homogeneous clusters (see

Figure 7). Fuel temperatures from the primary storage to the output of the tanks 1 and 2 are included in Cluster 1 (

Figure 7a), while Cluster 2 (

Figure 7b) groups the fuel levels in the two tanks.

Engine sensor signals fall, together with other strictly related parameters, into two distinct clusters (see

Figure 8). In particular, Cluster 3 (

Figure 8a) includes all the cylinder temperatures and the exhaust temperatures, while Cluster 4 (

Figure 8b) includes the casing temperatures, the supercharger temperatures, and the temperatures monitored at the engine auxiliaries, e.g., cooling water, lube oil, and intercooler subsystems. Cluster 4 also contains some parameters by the heat exchange with the engine cooling, such as water inlet temperatures of the process heat circuit and inlet fuel temperature.

All the parameters of the high temperature heat recovery circuit (process steam demand) were, instead, separated into two distinct groups (see

Figure 9). In detail, Cluster 5 (

Figure 9a) includes the thermal power and hot water flow rate, monitored at the boiler inlet, while, in Cluster 6 (

Figure 9b), all the specific steam parameters are grouped together, such as steam flow rate, pressure, and thermal power, as well as the temperature of the condensed water.

As mentioned above, low temperature heat circuit sensor signals, measured at the plant inlet, are part of Cluster 4 together with other engine and auxiliaries signals (see

Figure 8b), while the water temperatures at the plant outlet and the delta in–out temperature are in Cluster 7 (see

Figure 10).

The two principal properties of the electric power supply, frequencies and voltages, were divided into two clusters (see

Figure 11). Notably, in

Figure 11a, it is possible to note how the engine speed was included in Cluster 8 together with the generator and grid frequencies. On the other and, Cluster 9 includes all the generator and grid voltages.

Other electrical parameters, such as powers and currents, have instead been divided into three different clusters (see

Figure 12). In particular, Cluster 10 (

Figure 12a) and Cluster 11 (

Figure 12b) distinguish, respectively, the generator powers from the generator currents, while Cluster 12 (

Figure 12c) groups together the grid powers and currents. The latter refer only to Phase 2 current, because the Phase 1 and 3 currents were removed in the preprocessing phase due to sensor malfunctions.

The clustering results show that the proposed approach is independent from the nature of the monitored parameters and their functionality within the system.

For example, Clusters 1, 2, 7, 9, 10, and 11 (

Figure 7a,

Figure 10a,

Figure 11b, and

Figure 12a,b) include only homogeneous variables (e.g., temperatures) belonging the same functional area (e.g., engine). Among those, it is interesting to note how the parameters within Cluster 2, i.e., the fuel levels in the tanks for primary storage, seem to be very different from the Euclidean point of view, but the method identified a similarity in their global trends.

On the other hand, Clusters 5, 6, and 12 (

Figure 9a,b, and

Figure 12c) represent some examples of communities populated by heterogeneous physical parameters recorded in the same functional area.

Finally, a particular interest derives from the hidden relationships identified between parameters characteristic of different functional areas. Examples are Cluster 3 (

Figure 9a), which includes temperatures of cylinder and exhaust; Cluster 4 (

Figure 9b), which groups together temperatures referred to the engine external casing, the engine auxiliaries, heat recovery, and fuel pre-heating systems and the inlet fuel; and Cluster 8 (

Figure 11a), which is composed by frequencies and voltages related to both the generator and the grid.

After the identification of clusters, exploratory network analysis was used to render a graphical representation of their degree of similarity (the higher the similarity between nodes, the smaller their spatial distance), thus improving the cluster visualization. The Frushterman–Reingold layout applied to the similarity graph

C, after edge pruning, provided the results shown in

Figure 13.

The force-directed layout gives the evidence of a central core of strongly connected parameters, which includes, respectively, most of the fuel temperatures in the storage area (Cluster 1), all the temperatures of cylinders and exhaust (Cluster 3), all the process low temperature parameters (Cluster 7), and most of the generator and grid parameters (Clusters 8–11).

Notably, only two parameters of Cluster 3 are outside the central core, namely T29 and T34, measuring, respectively, the fuel temperature in the primary storage and in the tank 2 (the latter being a backup tank). It is also possible to notice how the temperatures of engine cooling water (T25–T27) and lube oil (T43) subsystems represent a key group in bridging the central core to the other variables of Cluster 4. Similarly, the steam parameters in the high temperature heat recovery (Cluster 6), although not directly included in the central core, appear to be strictly connected to it. As expected, no correlation is active among the fuel levels inside the tanks (Cluster 2), the power and flow rate of the hot water at the boiler inlet (Cluster 5), the grid power and currents (Cluster 12), and the rest of the network. To improve the interpretation of the results by adding quantitative information to the exploratory analysis, we calculated the cumulative percentage distribution of the average degree centrality of each cluster (see

Figure 14).

The bar chart in

Figure 14 attributes a specific ranking to the clusters according to their average contribute to the degree centrality of the network. Overall, the results confirm the considerations made so far in relation to the core communities (Clusters 1, 3, 7, and 8–11), to the boundary communities (Clusters 4 and 6), and to the communities totally unrelated to the network (Clusters 2, 5, and 12).

As for the communities included in the central core, it is possible to obtain a distinction between the roles played in the network. In detail, Cluster 10, which groups engine speed and generator and grid frequencies together, is the most influential on the control and stability of the global systems, followed by Cluster 3, which includes cylinder temperatures and exhaust gases.

Finally, after clusters identification and analysis, FSS was performed by selecting in each cluster the representative signal as the one with the highest degree contribution in its group.

Table 3 shows the selected variables associated to each cluster, together with their degree centrality in the similarity graph, and their share contribution to the sum of the degree centralities within the reference cluster.

The representative parameters shown in the table are visually confirmed by the force-directed layout in

Figure 13. For example, variable T0 (condense temperature) appears to be the most influential node of Cluster 6 (process high temperature user parameters), having a high number of connections not only with variables of its own cluster, but also with those belonging to the central core of strongly connected signals. Another example is the parameter T43 (oil temperature) with respect to Cluster 4 (parameters strictly related to the engine).

As reported in the case study, the data matrix considered as input for the analysis has 30,240 × 78 dimensions. After the application of the proposed method, by considering the 12 representative cluster variables, listed in

Table 3, together with the 8 independent signals shown in

Figure 6, we obtained a final data matrix of size 30,240 × 20, thus reducing the dimensionality by 74.4%.

## 6. Conclusions

With the advent of Industry 4.0, the increasing availability of sensor data is leading to a rapid development of models and techniques able to deal with it. In particular, data-driven AI models are becoming essential to conduct the analysis of complex systems based on large data streams.

State-of-the-art models fail when dealing with overfitting in the data and suffer from performance loss when variables are highly correlated between each other. Many FSS methods have been introduced to address these problems. Notably, it has been demonstrated that clustering-based methods for unsupervised FSS outperform traditional approaches in terms of accuracy.

The complexity of nonlinear dynamics associated to data streams from sensor networks make standard clustering methods unsuitable in this context. For these reasons, in this paper, we propose a new clustering approach for time series useful for unsupervised FSS, exploiting different complex network tools. In particular, we mapped time series segments in the network domain through natural weighted visibility graphs, extracted their degree sequences as feature vectors to define a similarity matrix between signals, used a community detection algorithm to identify clusters of similar time series, and selected a representative parameter for each of them based on the variable degree contributions.

The analysis of the results highlights two advantages deriving from the proposed method. The first is the ability to group together both homogeneous and heterogeneous physical parameters even when related to different functional areas of the system. This is obtained by capturing time series similarities not necessarily linked to signal Euclidean distance. In the FSS perspective, the approach, by considering 12 representative variables for the identified clusters and 8 independent signals that were not clustered, reduced the dimensionality of the dataset by 74.4%.

Second, as an additional advantage with respect to FSS purposes, the method allows discovering hidden relationships between system components enriching the information content about the signal roles within the network.

Since the construction of a natural weighted visibility graph has time complexity $O\left({L}^{2}\right)$, being L the number of samples in a time series interval, the proposed approach was intended as an offline filtering tool. In particular, being the visibility graph the bottleneck of the algorithm, the global time complexity is in the order of $O\left(T{L}^{2}\right)$, where T is the number of consecutive non-overlapping segments. Running the algorithm on a dataset of 11 months with time windows of 24 h took approximately 15 min. The idea is to consider the whole dataset at disposal in order to identify the overall most relevant signals, by averaging the contributions of all intervals. Thus, the resulting reduction in the dimensionality of data streams opens the possibility to simplify the condition monitoring system and its data.

If, instead, a real time tool for FSS or time series clustering is of interest, it is possible to imagine the integration of the proposed algorithm into sensor network now-casting models, e.g., on a sliding window of 24 h the algorithm runs in less than 3 s.