A Novel Deep Clustering Method and Indicator for Time Series Soft Partitioning

The aerospace industry develops prognosis and health management algorithms to 1 ensure better safety on board. Particularly for in-flight controls where jamming is dreaded. For 2 that, vibration signals are monitored to predict future defect occurrences. However, time series 3 are not labeled according to severity level, and the user can only assess the system health from 4 the data mining procedure. To that extent, we developed a clustering algorithm using a deep 5 neural network core. We encoded the time series into pictures to be feed into an artificially 6 trained neural network: U-NET. From the segmented output, one-dimensional information on 7 cluster frontiers is extracted and filtered without any parameter selection. Then, a kernel density 8 estimation finally transforms the signal into an empirical density. Ultimately, a Gaussian mixture 9 model extracts the latter independent components. The method empowered us to reveal different 10 degrees of severity faults in the studied data, with their respective likelihood without prior 11 knowledge. We compared it to state-of-the-art machine learning algorithms. However, internal 12 clustering results evaluation for time series is an open question. As state-of-the-art indexes were 13 not producing relevant results, we built a new indicator to fulfill this task. We applied the 14 whole method to an actuator consisting of an induction machine linked to a ball screw. This 15 study lays the groundwork for future training of diagnosis and prognosis structures in the health 16 management framework. 17


Introduction
According to the Maintenance Cost Technical Group (MCTG) from the Interna- 22 tional Air Transport Association (IATA), US$69 billion was spent on maintenance repair 23 and overhaul (MRO)in 2018 for a fleet count of 27, 535 aircraft from 54 airlines. The 24 aerospace industry is shifting from hydraulic to electromechanical actuation, which could 25 be prone to seizure. Hence, it is mandatory to prevent actuator jamming at any cost 26 during in-flight control where several ball-screws are used. 27 The prognosis and Health Management (PHM) framework can be used to reduce 28 the amount of money spent on unscheduled MRO operations by developing methods to 29 forecast structural defects. Further, it can better anticipate this hazardous jamming event 30 by detecting any unusual behavior from either signal of vibration or electrical. 31 Version July 26, 2021 submitted to Energies https://www.mdpi.com/journal/energies group and a fault severity is clearly stated in [7]. We can assume that patterns found 48 in this approach are representative of several fault detection. We followed the same 49 hypothesis: a group of data represents a certain fault severity. For the fussy clustering 50 algorithm, [8] presents another approach. 51 Further, finding a general indicator is challenging as the clustering quality indexes 52 are specific to the fundamental hypothesis made in the research [9]. This article presents 53 a new method and a new quality indicator for clustering and quantifying the results 54 for time series data. First, the method will be outlined, and then it will be applied to

60
The clustering method has been applied to time series in a broad sense, i.e., indexed 61 signals by a timestamp. More generally, signals, which elements abide by a common 62 precedence relation, are considered. Let T be such a signal. Practically, it represents 63 the values of a given sensor indexed by time, on a monitored system. n realizations of 64 T for the system can be gathered: {T 1 , T 2 , ..., T n }. In order to assess system health, an 65 ensemble of p statistical and model-based parameters are extracted from each T i∈ 1;n .
66 Let X j∈ 1;p ∈ M 1n (R) be such parameter. Each one of them can be concatenated to 67 form what is called a feature matrix M ∈ M np (R).

68
In the following work, studied parameters are the chosen columns of the feature 69 matrix M. Let choose for j ∈ 1; p , X j = x j 1 x j 2 ... x j n the jth descriptor.

70
A clustering method deals with data classification without prior knowledge. To do 71 so, it has to minimize an intra-class distance and maximize an outer-class one. Doing 72 so, it aggregates objects with similar properties and repels objects with heterogeneous 73 ones. A distance is used to compare its elements x i∈ 1;n between each other, allowing 74 the extraction of relevant clustering information from each X j . It allows the extraction 75 of groups with similar characteristics. For that, the Euclidean distance presented in 76 equation 1 has been chosen.

77
∀(x, y) ∈ R 2 , d(x, y) = x 2 − y 2 By applying the distance to all the elements of X i , a square distance matrix 78 ∈ M n (R) can be obtained in equation 2.
The nearest the points are to each other, the lesser the value of the distance is. Microsoft's COCO dataset [11] and its 3 × 10 6 images.

111
The network U-NET is comprised of two main parts, as seen in Figure 1. is used made of ConvTranspose2d, an inverse convolution operation to up-sample its 120 input, as developed in Pytorch [12].
121 Figure 1. U-NET U-shaped architecture represented using algorithms from [13] U-NET's goal, is to segment every distance matrix G j∈ 1;p . As stated earlier, 122 the network has to learn to recognize square patterns alongside each matrix diagonal.

123
Consequently, it has to be trained to do so.  for j ← 1 to i do 5: 6: for i ← n − 1 to 1 do get indices for other matrix part 7: for j ← i to 1 do 8: 9: return indexArray indexArray ← Get Indices of Matrix(n) 15: for i ← 1 to 2n − 1 do 16: 17: return resArray However, standard filtering methods require the non-trivial setting of parameters, such 171 as the sliding window. The latter represents the number of points to which the filtering 172 algorithm will be applied simultaneously. Hence, it controls the sensitivity of the algorithm 173 to local and global dynamics. An iterative procedure has been created to avoid such 174 shortcomings. It consists of looping through the signal to be filtered with an increased 175 window size for each loop. Figure 4 represents the algorithm visually. declare remainingSizeList of n integers 8: for i ← 2 to n do create array with all the windows and remaining sizes 9: remainingSizeList[i] ← n % i % represents the modulus operator 11: for i ← 0 to n − 1 do 12: noWindow ← n // i 13: remainElmt ← remainingSizeList[i] 14: if remainElmt = 0 then suppose that language allows array broadcasting 15: tmpArray ← signalArray[: −remainElmt].reshape(noWindow, i) 16: else 17: tmpArray ← signalArray.reshape(noWindow, i) 18: declare indexStatValueList of unknown size of objects 19: declare indexMaxArray of unknown size of floats 20: for j ← 0 to noWindow do get an array 21: indexMaxArray ← arg( tmpArray[i, :] == max(tmpArray[i, :])) 22: for k ← 0 to size(indexMaxArray) do 23: indexStatValueList.append([j, indexMaxArray[k]) track maxima and their corresponding indexes (like chained list) 24: declare idxMaskArray of size m × l integers 25: for j ← 1 to m × l do local initialization 26: idxMaskArray[j] ← 0 27: idxMaskArray.reshape(m, l) 28: for k, Finally, since the resulting array is still of size 2n − 1, the abscissa scale of the 186 signal is divided by a factor of two. This compensates the dilation generated by the 187 method explained in Algorithm 1. It is assumed that uncertainty of two cycles is tolerated 188 for the health monitoring framework. The resulting signals are noted f j∈ 1;p ∈ M n1 (R).  At this stage of the method, p time series have been transformed into p distance matrices G j , and the neural network has segmented each matrix. From each output picture, p first candidates to cluster boundaries signal are obtained, every one of them is filtered through batch peak detection algorithm 2. Hence, p new time series F j have been created. These could be concatenated into a matrix F as represented in equation 3.
The developed method assumes that each descriptor contributes to the clustering 192 result. Therefore, this information has to be fused to get a global final frontier signal.

193
The principle of kernel density estimation (KDE) [14] is used to do so. By con- not a computationally optimized one, it is chosen for its theoretical properties.
The best property in this context is the ability to chose a simple bandwidth

214
In this second part, the clustering method developed in section 2 is applied to 215 monitoring vibration signals generating by an induction motor driving a ball screw.

216
After presenting the dataset used, several signals will be selected to test the previous 217 theoretical part. Finally, our method will be compared to other state-of-the-art clustering  the raw data measured on the monitored actuator. Every descriptor X j has 1130 samples.

234
Among all p features, four were chosen. They will be referred to as {X 1 , X 2 , X 3 , X 4 }. Each 235 has been standardized and normalized between 0 and 1. They are represented in Figure 6.

236
The four descriptors were selected to evaluate the method on signals that exhibit various 237 dynamics but are still coherent for extracting a health monitoring information extraction.

238
Obviously, from Figure 6, the features are rather noisy and not strictly monotonic.

239
However, a general ascending or descending trend is recognizable.   Finally the loss function used was BCEWithLogitsLoss from the torch.nn package [12].

265
Note that the learning phase is done only once.

266
The results of Figure

303
The two independent components of the density estimation of Figure 11 are

306
In order to represent the clustering results on the four descriptors, we consider 307 that each local maximum of the density function is a cluster frontier. Figure 13 is 308 obtained.
309 Figure 13. Clustering results of initial signals from Figure 6 In Figure 13, each cluster has a different color corresponding to the health stage of 310 the actuator. The first cluster is green, the second cluster is orange, and the last cluster

314
In this subsection, we compare our clustering method to other previous state-of-the-315 art algorithms. The latter are summarized in Table 1 We have selected these methods according to two criteria. The first being the ease 318 of implementation to our benchmark scope, as they are present in standard machine 319 learning libraries in Python. The second is the representativeness of the diversity of the 320 clustering algorithms. Indeed, according to [20, Fig 6.], they can be separated into six 321 different approaches. We tried to cover at least three of them: partitional, hierarchical, 322 and model-based. Each method in Table 1 was fed a data structure with five columns 323 and 1130 lines. Four of the columns were the concatenation of X i∈ 1;4 , and the fifth one 324 was an index array ranging from 0 to 1129 representing time. The same X i of Figure 6 325 were used. Thus, it allows each algorithm to be aware of the time precedence relation 326 between points. Each clustering method fits the data and then predicts the clustering 327 labels. The labels are then stored in a matrix to represent clustering results.

328
Since the clustering methods use the four X i to produce one clustering result, we 329 can only represent the algorithm outcome for one signal. Figure 14 plots the clustered first 330 signal. We chose the number of groups in data according to a new quality indicator.

Figure 14.
Clustering results with state-of-the-art algorithms for X 1 In Figure 14, the majority of the methods have detected two clusters except for

332
MeanShift and our clustering method UNET clustering. One detects four groups, whereas 333 ours provides three groups in data, as seen previously in Figure 13. Regarding the shape 334 of signal X 1 , three clusters seem to be the optimal number embedded in the data.

335
To evaluate the accuracy of our method regarding the state-of-the-art, we realized a invariance guided criterion has been created by [22]. Nevertheless, these indexes can be 341 separated into two main groups: the external and the internal ones [23]. Since we do 342 not have class labels for our data, we only must count on information contained in the 343 studied data as stated in [23] study. Eleven widely used indicators are compared. In our 344 work, we tested Silhouette [24], the Davies-Bouldin separation measure [25], the Calinski

345
Harabasz [26] indicator, the conjunction of an inertia and temporal consistency measure 346 from [27], and finally the SD [28] and S_dbw [21] validity indexes. None of them could 347 manage to select the right clustering result in Figure 14. As those are not fitted to the With m the number of clusters found in the data, σ j the standard deviation of 360 the jth cluster, max y j − min y j the range of the jth cluster, y j its elements and t k its 361 respective indexes and n j its number of points. Note that J is the ensemble of the point 362 indexes in cluster j. While the denominator of a j measures the temporal consistency, the 363 numerator measure the shape of the clusters.

364
Finally, the quality index χ deals with the variation between two consecutive 365 groups and not a j in itself. Note that the denominator in a j can never be null for time 366 series. Indeed, we can see in Figure 14 for our algorithm that the first and third data 367 partitions exhibit similar dynamics. However, since the system cannot rejuvenate itself, 368 those should be considered as two different groups. To recognize the better partitioning 369 result, the value I defined in Equation 7 has to be maximized.

370
The ideal number of clusters in Figure 14 was selected according to the results 371 of the To better grasp the idea of the indicator, we have separately given Table 3 which 379 represents the contribution of the shape measure to χ and Table 4 which represents the 380 temporal consistency one. Practically, Table 3 represents the values of σ j (max y j − min y j ) 381 and Table 4 represents the values of 1  Table 4 is greater than one. Thus the numerator of 388 a j is divided by an amount greater than one which minimize its value. This penalization 389 is then propagated in the total calculus of χ.
390 Table 3. Shape score σ j (max y j − min y j ) depending on the clustering method and the number of clusters

Short Biography of Authors
Alexandre Eid is born in 1993 in France. He gets his engineering degree in computer science/IT and control from Grenoble INP -Esisar in 2018. Its final year project in the aeronautic industry is a stepping stone to begin a Ph.D. thesis in computer science applied to prognosis and health management. He currently works at the University Claude Bernard Lyon 1 and Safran aeronautic group. He focuses his research on data science and artificial intelligence algorithm applied to time series.