#
Clustering at the Disposal of Industry 4.0: Automatic Extraction of Plant Behaviors^{ †}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Clustering

**K-means**[54]: one of the most common and simplest unsupervised methods. It is a recursive algorithm that aims to identify the clusters by estimating their respective means. The idea is to identify K clusters (K is a manual parameter) by randomly drawing K points, consider them as the centers of the clusters, and then associate any point of the database to the nearest of these K points. Once the data are clustered, the K initial means are updated as the means of the newly built clusters. This procedure is repeated until some criteria are satisfied. Let ${\mathcal{C}}_{k}^{\left(t\right)}$ and ${m}_{k}^{\left(t\right)}$ be, respectively, the kth cluster itself and its mean at iteration t. The clusters are formed following (1), and their means are updated by (2).$${\mathcal{C}}_{k}^{\left(t\right)}=\left\{x\in \mathcal{D}:d(x,{m}_{k}^{\left(t\right)})=\underset{i\in \left[\phantom{\rule{-0.166667em}{0ex}}\right[1,K\left]\phantom{\rule{-0.166667em}{0ex}}\right]}{min}d\left(x,{m}_{i}^{\left(t\right)}\right)\right\}$$$${m}_{k}^{(t+1)}=\frac{1}{\left|{\mathcal{C}}_{k}^{\left(t\right)}\right|}\sum _{{x}_{i}\in {\mathcal{C}}_{k}^{\left(t\right)}}{x}_{i}$$**Self-organizing maps SOMs**[55]: an unsupervised clustering method that has proven itself through time. It can be seen as a topological K-means: the means are linked to each other, somehow forming a grid, whose nodes are the K centers of the clusters. When data x is drawn, the node’s patterns are hierarchically updated: the nearest cluster’s pattern is the most highly updated (attracted by the data); it is called the best matching unit, and is denoted as ${k}^{*}$ in the following. This update is then propagated through the whole grid, from a node to another; the farther from the data the node is, the less it is updated. If we ignore the notion of neighborhood, we obtain the K-means anew. The neighborhood linkage is given by (3), and the node’s pattern update by (6). The learning stops after ${T}_{max}$ iterations.$${h}_{k}^{\left(t\right)}\left({k}^{*}\right)=exp\left(-\frac{{d}^{2}(k,{k}^{*})}{2{{\sigma}^{\left(t\right)}}^{2}}\right)$$$${\sigma}^{\left(t\right)}={\sigma}_{0}exp\left(-\frac{t}{{T}_{max}}\right)$$The learning rate ${\u03f5}^{\left(t\right)}$ at iteration t aims to decrease the learning of the grid through time (to avoid oscillations), and is defined by (5), where ${\u03f5}_{0}$ is its initial value ($t=0$).$${\u03f5}^{\left(t\right)}={\u03f5}_{0}exp\left(-\frac{t}{{T}_{max}}\right)$$Finally, the kth cluster’s pattern, here represented by an attraction coefficient, called weight ${w}_{k}^{\left(t\right)}$ at iteration t, is updated by (6).$${w}_{k}^{(t+1)}\left(x\right)={w}_{k}^{\left(t\right)}+{\u03f5}^{\left(t\right)}\times {h}_{k}^{\left(t\right)}\left({k}^{*}\right)\times \left({w}_{k}^{\left(t\right)}-x\right)$$

**Summary**

**1.**

#### 2.2. Quantifiers

**KS test**[60,61]: an empirical test based on the study of the repartition of real data, the Kolmogorov–Smirnov (KS) test compares the data distributions of two datasets. Assuming a real system $\mathcal{S}$ tends to follow a given cumulative distribution function (CDF) ${F}_{\mathcal{S}}$, the dataset $\mathcal{D}={\left\{{x}_{i}\right\}}_{i\in \left[\phantom{\rule{-0.166667em}{0ex}}\right[1,N\left]\phantom{\rule{-0.166667em}{0ex}}\right]}$ recorded during a given window of time should coarsely follow the same distribution, with N being the number of samples. Since ${F}_{\mathcal{S}}$ is a CDF, it is also the probability that a data ${x}_{i}\in \mathcal{D}$ is higher than a value x (7).$$\forall x\in \mathbb{R},\phantom{\rule{0.166667em}{0ex}}{F}_{\mathcal{S}}\left(x\right)=P({x}_{i}\le x)$$As a consequence, it is possible to estimate the empirical probability ${\widehat{F}}_{\mathcal{S}}$ of appearance of a value x by counting the real data whose values are actually higher than x, and dividing by the size N of the database (to normalize the value), as defined by (8).$$\forall x\in \mathbb{R},\phantom{\rule{0.166667em}{0ex}}{\widehat{F}}_{\mathcal{S}}\left(x\right)=\frac{1}{N}\sum _{{x}_{i}\in \mathcal{D}}H({x}_{i}\le x)\phantom{\rule{2.em}{0ex}}\mathrm{with}\phantom{\rule{1.em}{0ex}}H({x}_{i}\le x)=\left\{\begin{array}{cc}1\hfill & \mathrm{if}\phantom{\rule{4.pt}{0ex}}{x}_{i}\le x\hfill \\ 0\hfill & \mathrm{otherwise}\hfill \end{array}\right.$$This estimate is motivated by the law of large numbers [62]; indeed, if the duration of the recording and the number of samples are infinite, ${\widehat{F}}_{\mathcal{S}}$ will converge towards ${F}_{\mathcal{S}}$ (9).$$\underset{N\to \infty}{lim}{\widehat{F}}_{\mathcal{S}}={F}_{\mathcal{S}}$$The final KS test is defined as the absolute maximal distance from the real CDF ${F}_{\mathcal{S}}$ and any estimate ${\widehat{F}}_{\mathcal{S}}$, as defined by (10).$$KS=\underset{x}{max}|{F}_{\mathcal{S}}\left(x\right)-{\widehat{F}}_{\mathcal{S}}\left(x\right)|$$The KS test was originally designed to compare how far from the real distribution ${F}_{\mathcal{S}}$ any other, empirical distribution ${\widehat{F}}_{\mathcal{S}}$ is. This can be used to compare a prediction to the true distribution of the system, and estimate its accuracy; unfortunately, this requires having access to the analytical distribution ${F}_{\mathcal{S}}$, which is often hard to build, and totally prohibitive in a data mining context. As a consequence, it is barely applicable to our case; nonetheless, it can be used to compare two empirical distributions, so as to statistically estimate how close from one another they are. Ironically, this is not the KS itself that interests us here, but its reverse: the behaviors of a system should be very different, and follow very distinct data distributions; therefore, it is possible to estimate how different and clearly separated the different clusters are from each other. In other words, if the clusters depict clear and unique behaviors, their distributions should be as clear and unique; on the opposite, if a unique behavior has been split into several clusters, they should follow very near distributions, and their KS scores will be very high. The problem with the KS test is that it is normally designed to operate in a 2D-space; since we are working in a multi-dimensional space, we will “simply” compute the KS test along every dimension, and then average all of the dimensions to obtain a scalar score. Another drawback is that it is a binary comparator: the clusters must be compared pairwise, and the local scores must be merged in some fashion.**Silhouettes**[63]: the first attempts to blindly assess cluster qualities were based on a dynamic comparison of the data inside and between the different clusters. From the Dunn index [64] to the silhouette coefficient [63], and passing by the Davies–Bouldin index [65], the main idea was the same: for any given data, on the one hand, compare it to that of the same cluster as its, and on the other hand, contrast it with any other data not belonging to its cluster; only their respective formalism slightly differs. For instance, the most recent (but quite old) silhouette coefficient proceeds in three steps: (1) for any data ${x}_{i}$, compute the average distance $avg\left({x}_{i}\right)$ between it and its neighbors within its same cluster ${\mathcal{C}}_{k}$ (11); (2) for every other cluster ${\mathcal{C}}_{{k}^{\prime}}$, compute the mean distance between ${x}_{i}$ and the data belonging to ${\mathcal{C}}_{{k}^{\prime}}$: the minimal value among them is the dissimilarity $dis\left({x}_{i}\right)$ of ${x}_{i}$ (12); (3) compute the silhouette coefficient $sil\left({x}_{i}\right)$ of ${x}_{i}$ by subtracting the average distance $avg\left({x}_{i}\right)$ from the dissimilarity $dis\left({x}_{i}\right)$, and divide this value by the maximum among both measures (13).$$\begin{array}{cc}\hfill \forall {x}_{i}\in {\mathcal{C}}_{k},\phantom{\rule{4pt}{0ex}}avg\left({x}_{i}\right)& =\frac{1}{|{\mathcal{C}}_{k}-1|}\sum _{{x}_{j}\in {\mathcal{C}}_{k}\backslash \left\{{x}_{i}\right\}}d({x}_{i},{x}_{j})\hfill \end{array}$$$$\begin{array}{cc}\hfill \forall {x}_{i}\in {\mathcal{C}}_{k},\phantom{\rule{4pt}{0ex}}dis\left({x}_{i}\right)& =\underset{\begin{array}{c}{k}^{\prime}\in \left[\phantom{\rule{-0.166667em}{0ex}}[1,K]\phantom{\rule{-0.166667em}{0ex}}\right]\\ {k}^{\prime}\ne k\end{array}}{min}\left\{\frac{1}{|{\mathcal{C}}_{{k}^{\prime}}|}\sum _{{x}_{j}\in {\mathcal{C}}_{{k}^{\prime}}}d({x}_{i},{x}_{j})\right\}\hfill \end{array}$$$$\begin{array}{cc}\hfill \forall {x}_{i}\in {\mathcal{C}}_{k},\phantom{\rule{4pt}{0ex}}sil\left({x}_{i}\right)& =\frac{dis\left({x}_{i}\right)-avg\left({x}_{i}\right)}{max\{dis\left({x}_{i}\right),avg\left({x}_{i}\right)\}}\hfill \end{array}$$The silhouette coefficient takes into account the relation between the data, both inside and outside a cluster. The avg measure captures the compactness of the clusters, i.e., how close the data within are: the lower, the more compact, with value 0 as its best case (data overlay); the dis measure depicts how distant the clusters are from one another: the higher, the more distant and, thus, the better (clear borders). The final sil coefficient represents how well each point has been classified, which can been seen as a regulated dissimilarity measure, whose score is reduced by the average distance as a penalty: the coefficient is higher in case the data of a given cluster are very close (avg) and that cluster in question is well-separated from the others (dis), and vice-versa. Dividing by the maximum among the two local scores only aims to normalize the final score, so as to ease the interpretation of the silhouette coefficient. It takes a value between −1 and +1, with −1 meaning that data ${x}_{i}$ is on average nearer to the point of a different cluster than that of the cluster it is actually in, and that it has probably been wrongly classified; and value +1 meaning that ${x}_{i}$ overlays its neighbors of its cluster (maximal compactness), and is far from the other clusters; as a consequence, the nearer to +1, the better. This measure is very representative of the classification, but its main drawback is it is very heavy to compute: it is not suited for large databases. Moreover, it does not consider the cluster size: a cluster comprising a single value, but distant from the others (i.e., an outlier), will obtain the optimal value +1: with no more attention put on it, it might be assimilated to a real behavior, whereas it is in fact a “simple” outlier.**AvStd**[66] In statistics, one of the most common tools is the standard deviation, for it informs about data scarcity [67]. It is computed along a given axis; with N axes, there are N values, one per axis. It is generally harder to handle a feature vector than a single scalar, especially when considering other scalar metrics. As a consequence, reference [66] naturally proposed to compute this metric along every dimension, and then fuse the vector’s components in a representative fashion. It investigated several ways for the merging: minimum, maximum, mean, etc. It ended with the conclusion that the most representative and the most universal is the average of all these standard deviations; hence, the name of this metric: average standard deviation, or AvStd. Based on a statistical study of the data, it has the advantage to diminish the impact of the outliers. The drawback is that it is essentially an indicator of data scattering. The standard deviation is computed feature-by-feature, and the AvStd measurement is the average. Let ${m}_{k}$ be the mean of the kth cluster, defined by (2), whose data are defined by (1). The standard deviation ${\sigma}_{k}$ is defined by (14).$${\sigma}_{k}^{\left(i\right)}=\frac{1}{\left|{\mathcal{C}}_{k}\right|}\sum _{x\in {\mathcal{C}}_{k}}{\left({m}_{k}^{\left(i\right)}-{x}^{\left(i\right)}\right)}^{2}$$$${\mathrm{AvStd}}_{k}={\overline{\sigma}}_{k}=\frac{1}{n}\sum _{i=1}^{n}{\sigma}_{k}^{\left(i\right)}$$**Density**[31] In physics [68,69,70], the density of an entity is defined as the ratio of the number of items contained within to the volume it occupies; more specifically, to be a true density, this value must be divided by a reference, since a density has no unit. The question is to know what a volume is; indeed, in dimension 3, there is no ambiguity, but this notion becomes more blurred when the dimension increases. To answer that, we propose using the hyper-volume theory, which provides N-dimension equivalents to the regular 3D volumes: for instance, a 3D sphere becomes a ND hypersphere. Therefore, we propose representing a cluster by a hypersphere and, thus, assimilate both volumes: the cluster’s volume is that of the smallest hypersphere containing all of its data points. It remains to evaluate the cluster’s span; the most natural estimate would be the maximal distance separating two of its points, but it is quite heavy to compute (complexity of $\mathcal{O}\left({N}_{k}^{2}\right)$, with ${N}_{k}$ being the number of points within cluster ${\mathcal{C}}_{k}$). To compensate that, we decided to find the maximal distance between any point and a fixed reference (for instance the cluster’s mean or its pattern), which greatly reduces the complexity ($\mathcal{O}\left({N}_{k}\right)$), and to double it so as to estimate a hypersphere containing all of the points. The first solution, the maximal distance between two points, leads to the smallest hypersphere containing the cluster, whilst the second approach, the double of the maximal distance from a reference, gives a higher estimate of the cluster’s volume, but is easier to compute, and is centered around the cluster’s pattern. Moreover, a cluster’s pattern is mostly a region of influence, which only depends on the observed data (data belonging to a given class could be outside the identified cluster’s borders trained upon the available observations): a higher estimate is not so problematic in that case. The cluster’s span is therefore given by (16).$${s}_{k}=2\times \underset{{x}_{i}\in {\mathcal{C}}_{k}}{max}\left\{d({x}_{i},{m}_{k})\right\}$$$${v}_{s}^{\left(n\right)}\left(r\right)={r}^{n}{v}_{s}^{\left(n\right)}\left(r=1\right)=\frac{{r}^{n}.{\pi}^{n/2}}{\mathsf{\Gamma}\left(\frac{n}{2}+1\right)}$$$$\forall z\in \left\{\mathbb{C}:\mathfrak{Re}\left(z\right)>0\right\},\mathsf{\Gamma}\left(z\right)={\int}_{0}^{+\infty}{x}^{z-1}{e}^{-z}dx$$The density ${\rho}_{k}^{\left(n\right)}$ of cluster k is finally computed by dividing the number ${N}_{k}$ of data instances contained within by its volume ${v}_{s}^{\left(n\right)}$; the density is given by (19).$${\rho}_{k}^{\left(n\right)}=\frac{{N}_{k}}{{v}_{s}^{\left(n\right)}\left({s}_{k}\right)}$$In [31], we tested this density-based metric on both academic and industrial datasets; it proved to be very reliable to characterize the clusters and to evaluate their qualities. It is quite efficient and representative of both the outliers and the number of data contained within. In our previous study, we endeavored to compare the densities of the different clusters, but in this present article, we will “normalize” them by the value of the database itself, since using a reference sanctifies the notion of density.

**Summary**

**2.**

#### 2.3. Industry 4.0 Datasets

^{®}, specifically their factory located at La Rochelle, France, a chemistry plant specialized in the manufacturing of rare earth (RE) specialty products.

**Summary**

**3.**

## 3. Discussion of the Results

#### 3.1. Battery 1

#### 3.1.1. Preliminary Qualitative Study

**Summary**

**4.**

#### 3.1.2. Kolmogorov–Smirnov Test

**Summary**

**5.**

#### 3.1.3. Quantitative Assessment

**Summary**

**6.**

#### 3.1.4. Local Conclusion about Battery 1

#### 3.2. Battery 2

#### 3.2.1. Quantitative Study

**Summary**

**7.**

#### 3.2.2. Kolmogorov–Smirnov Test

**Summary**

**8.**

#### 3.2.3. Qualitative Validation

**Summary**

**9.**

#### 3.2.4. Local Conclusion about Battery 2

## 4. Conclusions

#### Future Work

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

^{®}, for his help and useful discussions concerning the data collection and understanding. Authors also wish to express their gratitude to EU 2020 program for supporting the presented works.

## Conflicts of Interest

## Abbreviations

MDPI | Multidisciplinary Digital Publishing Institute |

ML | machine learning |

DM | data mining |

RE | rare earth |

ND | N-dimension(al) |

SOM | self-organizing map |

SC | silhouette coefficient |

AvStd | average standard deviation |

CLT | cluster |

## References

- Lezoche, M. Formalisation Models and Knowledge Extraction: Application to Heterogeneous Data Sources in the Context of the Industry of the Future; Habilitation à Diriger des Recherches, Université de Lorraine: Lorraine, France, 2021. [Google Scholar]
- Soliman, M. Analyzing Failure to Prevent Problems. Ind. Eng. Mag.
**2014**, 56, 10. [Google Scholar] [CrossRef] - Li, X.; Li, D.; Wan, J.; Vasilakos, A.; Lai, C.F.; Wang, S. A review of industrial wireless networks in the context of Industry 4.0. Wirel. Netw.
**2017**, 23, 23–41. [Google Scholar] [CrossRef] - Uslu, B.; Eren, T.; Gür, S.; Özcan, E. Evaluation of the Difficulties in the Internet of Things (IoT) with Multi-Criteria Decision-Making. Processes
**2019**, 7, 164. [Google Scholar] [CrossRef][Green Version] - Pagnier, L.; Jacquod, P. Inertia location and slow network modes determine disturbance propagation in large-scale power grids. PLoS ONE
**2019**, 14, e0213550. [Google Scholar] [CrossRef] [PubMed] - Yeshchenko, A.; Di Ciccio, C.; Mendling, J.; Polyvyanyy, A. Comprehensive Process Drift Detection with Visual Analytics. In Conceptual Modeling; Laender, A.H.F., Pernici, B., Lim, E.P., de Oliveira, J.P.M., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 119–135. [Google Scholar] [CrossRef]
- Bose, R.P.J.C.; van der Aalst, W.M.P.; Žliobaitė, I.; Pechenizkiy, M. Handling Concept Drift in Process Mining. In Advanced Information Systems Engineering; Mouratidis, H., Rolland, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 391–405. [Google Scholar] [CrossRef]
- Ivanov, D.; Sethi, S.; Dolgui, A.; Sokolov, B. A survey on control theory applications to operational systems, supply chain management, and Industry 4.0. Annu. Rev. Control
**2018**, 46, 134–147. [Google Scholar] [CrossRef] - Dolgui, A.; Ivanov, D.; Sethi, S.P.; Sokolov, B. Scheduling in production, supply chain and Industry 4.0 systems by optimal control: Fundamentals, state-of-the-art and applications. Int. J. Prod. Res.
**2019**, 57, 411–432. [Google Scholar] [CrossRef] - Grantner, J.; Fodor, G. Fuzzy automaton for intelligent hybrid control systems. In Proceedings of the 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE’02. Proceedings (Cat. No.02CH37291), Honolulu, HI, USA, 12–17 May 2002; Volume 2, pp. 1027–1032. [Google Scholar] [CrossRef]
- Shiraishi, K.; Hamagami, T.; Hirata, H. Multi Car Elevator Control by using Learning Automaton. IEEJ Trans. Ind. Appl.
**2005**, 125, 91–98. [Google Scholar] [CrossRef][Green Version] - Javadi, M.; Mostafaei, H.; Chowdhurry, M.U.; Abawajy, J.H. Learning automaton based topology control protocol for extending wireless sensor networks lifetime. J. Netw. Comput. Appl.
**2018**, 122, 128–136. [Google Scholar] [CrossRef] - Khazaee, M.; Sadedel, M.; Davarpanah, A. Behavior-Based Navigation of an Autonomous Hexapod Robot Using a Hybrid Automaton. J. Intell. Robot. Syst.
**2021**, 102, 29. [Google Scholar] [CrossRef] - Gozhyj, A.; Kalinina, I.; Nechakhin, V.; Gozhyj, V.; Vysotska, V. Modeling an Intelligent Solar Power Plant Control System Using Colored Petri Nets. In Proceedings of the 2021 11th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), Cracow, Poland, 22–25 September 2021; Volume 2, pp. 626–631. [Google Scholar] [CrossRef]
- Baduel, R.; Bruel, J.M.; Ober, I.; Doba, E. Definition of states and modes as general concepts for system design and validation. In Proceedings of the 12th Conference Internationale de Modelisation, Optimisation et Simulation (MOSIM 2018), Toulouse, France, 27–29 June 2018; Open Archives HAL: Toulouse, France, 2018. Available online: https://hal.archives-ouvertes.fr/hal-01989427/ (accessed on 23 February 2022).
- Pan, F.; Wang, W. Anomaly detection based-on the regularity of normal behaviors. In Proceedings of the 2006 1st International Symposium on Systems and Control in Aerospace and Astronautics, Harbin, China, 19–21 January 2006. [Google Scholar] [CrossRef]
- Yasami, Y.; Safaei, F. A Statistical Infinite Feature Cascade-Based Approach to Anomaly Detection for Dynamic Social Networks. Comput. Commun.
**2016**, 100, 52–64. [Google Scholar] [CrossRef] - Buchanan, B. Can Machine Learning Offer Anything to Expert Systems? In Knowledge Acquisition: Selected Research and Commentary; Springer: Boston, MA, USA, 1989; pp. 5–8. [Google Scholar]
- Ben-David, A.; Frank, E. Accuracy of machine learning models versus “hand crafted” expert systems—A credit scoring case study. Expert Syst. Appl.
**2009**, 36, 5264–5271. [Google Scholar] [CrossRef] - Seifert, J.W. Data mining: An overview. In National Security Issues; Nova Science Publishers, Inc.: New York, NY, USA, 2004; pp. 201–217. [Google Scholar]
- Thiaw, L. Identification of Non Linear Dynamical System by Neural Networks and Multiple Models. Ph.D. Thesis, University Paris-Est XII, Paris, France, 2008. (In French). [Google Scholar]
- Cohen, Y.; Faccio, M.; Galizia, F.G.; Mora, C.; Pilati, F. Assembly system configuration through Industry 4.0 principles: The expected change in the actual paradigms. IFAC-PapersOnLine
**2017**, 50, 14958–14963. [Google Scholar] [CrossRef] - Chukalov, K. Horizontal and vertical integration, as a requirement for cyber-physical systems in the context of Industry 4.0. Int. Sci. J. Industry 4.0
**2017**, 2, 155–157. [Google Scholar] - Krenczyk, D.; Skolud, B.; Herok, A. A Heuristic and Simulation Hybrid Approach for Mixed and Multi Model Assembly Line Balancing. In Intelligent Systems in Production Engineering and Maintenance—ISPEM 2017; Burduk, A., Mazurkiewicz, D., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 99–108. [Google Scholar] [CrossRef]
- Pinto, M. End-Effector Tools Wear Prediction: A Multimodel Approach. Ph.D. Thesis, Politecnico di Torino, Turin, Italy, 2021. [Google Scholar]
- Wang, S.; Wan, J.; Zhang, D.; Li, D.; Zhang, C. Towards smart factory for industry 4.0: A self-organized multi-agent system with big data based feedback and coordination. Comput. Netw.
**2016**, 101, 158–168. [Google Scholar] [CrossRef][Green Version] - Ghadimi, P.; Wang, C.; Lim, M.K.; Heavey, C. Intelligent sustainable supplier selection using multi-agent technology: Theory and application for Industry 4.0 supply chains. Comput. Ind. Eng.
**2019**, 127, 588–600. [Google Scholar] [CrossRef] - Mateos, G.N. Multi-Agent System for Anomaly Detection in Industry 4.0 Using Machine Learning Techniques. ADCAIJ Adv. Distrib. Comput. Artif. Intell. J.
**2019**, 8, 33–40. [Google Scholar] [CrossRef] - Cao, Z.; Zhou, P.; Li, R.; Huang, S.; Wu, D. Multiagent Deep Reinforcement Learning for Joint Multichannel Access and Task Offloading of Mobile-Edge Computing in Industry 4.0. IEEE Internet Things J.
**2020**, 7, 6201–6213. [Google Scholar] [CrossRef] - Karnouskos, S.; Leitao, P.; Ribeiro, L.; Colombo, A.W. Industrial Agents as a Key Enabler for Realizing Industrial Cyber-Physical Systems: Multiagent Systems Entering Industry 4.0. IEEE Ind. Electron. Mag.
**2020**, 14, 18–32. [Google Scholar] [CrossRef] - Molinié, D.; Madani, K. Characterizing N-Dimension Data Clusters: A Density-based Metric for Compactness and Homogeneity Evaluation. In Proceedings of the 2nd International Conference on Innovative Intelligent Industrial Production and Logistics—IN4PL, INSTICC, SciTePress, Valletta, Malta, 25–27 October 2021; Volume 1, pp. 13–24. [Google Scholar] [CrossRef]
- Zolkipli, M.F.; Jantan, A. An approach for malware behavior identification and classification. In Proceedings of the 2011 3rd International Conference on Computer Research and Development, Shanghai, China, 11–13 March 2011; Volume 1, pp. 191–194. [Google Scholar] [CrossRef]
- Wakita, T.; Ozawa, K.; Miyajima, C.; Igarashi, K.; Itou, K.; Takeda, K.; Itakura, F. Driver identification using driving behavior signals. IEICE Trans. Inf. Syst.
**2006**, 89, 1188–1194. [Google Scholar] [CrossRef] - Filev, D.; Lu, J.; Prakah-Asante, K.; Tseng, F. Real-time driving behavior identification based on driver-in-the-loop vehicle dynamics and control. In Proceedings of the 2009 IEEE International Conference on Systems, Man and Cybernetics, San Antonio, TX, USA, 11–14 October 2009; pp. 2020–2025. [Google Scholar] [CrossRef]
- Lin, N.; Zong, C.; Tomizuka, M.; Song, P.; Zhang, Z.; Li, G. An Overview on Study of Identification of Driver Behavior Characteristics for Automotive Control. Math. Probl. Eng.
**2014**, 2014, 1–15. [Google Scholar] [CrossRef] - Ma, Y.; Xie, Z.; Chen, S.; Wu, Y.; Qiao, F. Real-Time Driving Behavior Identification Based on Multi-Source Data Fusion. Int. J. Environ. Res. Public Health
**2022**, 19, 348. [Google Scholar] [CrossRef] [PubMed] - Brown, A.; Catterson, V.; Fox, M.; Long, D.; McArthur, S. Learning Models of Plant Behavior for Anomaly Detection and Condition Monitoring. In Proceedings of the 2007 International Conference on Intelligent Systems Applications to Power Systems, ISAP, Kaohsiung, Taiwan, 4–8 November 2007; Volume 15, pp. 1–6. [Google Scholar] [CrossRef][Green Version]
- Calvo-Bascones, P.; Sanz-Bobi, M.A.; Welte, T.M. Anomaly detection method based on the deep knowledge behind behavior patterns in industrial components. Application to a hydropower plant. Comput. Ind.
**2021**, 125, 103376. [Google Scholar] [CrossRef] - Calvo-Bascones, P.; Sanz-Bobi, M.A.; Álvarez Tejedo, T. Method for condition characterization of industrial components by dynamic discovering of their pattern behaviour. In Proceedings of the ESREL2020, Venice, Italy, 1–5 November 2020. [Google Scholar]
- Wang, H.; Liu, X.; Ma, L.; Zhang, Y. Anomaly detection for hydropower turbine unit based on variational modal decomposition and deep autoencoder. Energy Rep.
**2021**, 7, 938–946. [Google Scholar] [CrossRef] - Maseda, F.J.; López, I.; Martija, I.; Alkorta, P.; Garrido, A.J.; Garrido, I. Sensors Data Analysis in Supervisory Control and Data Acquisition (SCADA) Systems to Foresee Failures with an Undetermined Origin. Sensors
**2021**, 21, 2762. [Google Scholar] [CrossRef] [PubMed] - Molinié, D.; Madani, K.; Amarger, C. Identifying the Behaviors of an Industrial Plant: Application to Industry 4.0. In Proceedings of the 11th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, Cracow, Poland, 22–25 September 2021; Volume 2, pp. 802–807. [Google Scholar] [CrossRef]
- Niggemann, O.; Maier, A.; Vodencarevic, A.; Jantscher, B. Fighting the Modeling Bottleneck - Learning Models for Production Plants. In Dagstuhl-Workshop MBEES: Modellbasierte Entwicklung Eingebetteter Systeme VII, Schloss Dagstuhl, Germany, 2011, Tagungsband Modellbasierte Entwicklung Eingebetteter Systeme; Giese, H., Huhn, M., Phillips, J., Schätz, B., Eds.; Fortiss GmbH: München, Germany, 2011; pp. 157–166. [Google Scholar]
- Vodenčarević, A.; Bürring, H.K.; Niggemann, O.; Maier, A. Identifying behavior models for process plants. In Proceedings of the ETFA2011, Toulouse, France, 5–9 September 2011. [Google Scholar] [CrossRef]
- Dotoli, M.; Pia Fanti, M.; Mangini, A.M.; Ukovich, W. Identification of the unobservable behaviour of industrial automation systems by Petri nets. Control. Eng. Pract.
**2011**, 19, 958–966. [Google Scholar] [CrossRef] - Lee, I.S.; Lau, H.Y. Adaptive state space partitioning for reinforcement learning. Eng. Appl. Artif. Intell.
**2004**, 17, 577–588. [Google Scholar] [CrossRef] - Fan, X.; Li, B.; SIsson, S. Online binary space partitioning forests. In Proceedings of the International Conference on Artificial Intelligence and Statistics, PMLR, Online, 26–28 August 2020; pp. 527–537. [Google Scholar]
- Giorginis, T.; Ougiaroglou, S.; Evangelidis, G.; Dervos, D.A. Fast Data Reduction by Space Partitioning via Convex Hull and MBR computation. Pattern Recognit.
**2022**, 126, 108553. [Google Scholar] [CrossRef] - Zou, B.; You, J.; Wang, Q.; Wen, X.; Jia, L. Survey on Learnable Databases: A Machine Learning Perspective. Big Data Res.
**2022**, 27, 100304. [Google Scholar] [CrossRef] - Cai, L.; Wang, H.; Jiang, F.; Zhang, Y.; Peng, Y. A new clustering mining algorithm for multi-source imbalanced location data. Inf. Sci.
**2022**, 584, 50–64. [Google Scholar] [CrossRef] - Pandey, K.K.; Shukla, D. Approximate Partitional Clustering Through Systematic Sampling in Big Data Mining. In Artificial Intelligence and Sustainable Computing; Springer: Singapore, 2022; pp. 215–226. [Google Scholar] [CrossRef]
- Paudice, A. Algorithms for Clustering and Robust Unsupervised Learning Problems. Ph.D. Thesis, Università degli Studi di Milano, Italy, 2022. [Google Scholar]
- Ezugwu, A.E.; Ikotun, A.M.; Oyelade, O.O.; Abualigah, L.; Agushaka, J.O.; Eke, C.I.; Akinyelu, A.A. A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects. Eng. Appl. Artif. Intell.
**2022**, 110, 104743. [Google Scholar] [CrossRef] - Jain, A.K. Data clustering: 50 years beyond K-means. Pattern Recognit. Lett.
**2010**, 31, 651–666. [Google Scholar] [CrossRef] - Kohonen, T. Self-organized formation of topologically correct feature maps. Biol. Cybern.
**1982**, 43, 59–69. [Google Scholar] [CrossRef] - Dunn, J.C. A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters. J. Cybern.
**1973**, 3, 32–57. [Google Scholar] [CrossRef] - Bezdek, J.C. Pattern Recognition with Fuzzy Objective Function Algorithms; Kluwer Academic Publishers: Philip Drive Norwell, MA, USA, 1981. [Google Scholar]
- Boser, B.; Guyon, I.; Vapnik, V. A Training Algorithm for Optimal Margin Classifier. In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, Pittsburgh, PA, USA, 27–29 July 1992; Volume 5, p. 5. [Google Scholar] [CrossRef]
- Wang, S.; Yu, L.; Li, C.; Fu, C.W.; Heng, P.A. Learning from Extrinsic and Intrinsic Supervisions for Domain Generalization. In Computer Vision—ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 159–176. [Google Scholar] [CrossRef]
- Simard, R.; L’Ecuyer, P. Computing the Two-Sided Kolmogorov-Smirnov Distribution. J. Stat. Softw.
**2011**, 39, 1–18. [Google Scholar] [CrossRef][Green Version] - Hassani, H.; Silva, E.S. A Kolmogorov-Smirnov Based Test for Comparing the Predictive Accuracy of Two Sets of Forecasts. Econometrics
**2015**, 3, 590–609. [Google Scholar] [CrossRef][Green Version] - DeGroot, M.H.; Schervish, M.J. Probability and Statistics, 4th ed.; Addison-Wesley: Boston, MA, USA, 2019. [Google Scholar]
- Rousseeuw, P. Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. J. Comput. Appl. Math.
**1987**, 20, 53–65. [Google Scholar] [CrossRef][Green Version] - Dunn, J. Well-Separated Clusters and Optimal Fuzzy Partitions. Cybern. Syst.
**1974**, 4, 95–104. [Google Scholar] [CrossRef] - Davies, D.; Bouldin, D. A Cluster Separation Measure. IEEE Trans. Pattern Anal. Mach. Intell.
**1979**, 2, 224–227. [Google Scholar] [CrossRef] - Rybnik, M. Contribution to the Modelling and the Exploitation of Hybrid Multiple Neural Networks Systems: Application to Intelligent Processing of Information. Ph.D. Thesis, University Paris-Est XII, Paris, France, 2004. [Google Scholar]
- Wan, X.; Wang, W.; Liu, J.; Tong, T. Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC Med. Res. Methodol.
**2014**, 14, 135. [Google Scholar] [CrossRef][Green Version] - Kriegel, H.P.; Kröger, P.; Sander, J.; Zimek, A. Density-based clustering. Wiley Interdiscip. Rev. Data Min. Knowl. Discov.
**2011**, 1, 231–240. [Google Scholar] [CrossRef] - National Aeronautics and Space Administration (NASA). Gas Density. Last Update: 7 May 2021. Available online: https://www.grc.nasa.gov/WWW/BGH/fluden.html (accessed on 26 March 2022).
- Encyclopedia Britannica. Density. Last Update: 2 February 2021. Available online: https://www.britannica.com/science/density (accessed on 26 March 2022).
- Lawrence, A.E. The Volume of an n-Dimensional Hypersphere; University of Loughborough: Loughboroug, UK, 2001. [Google Scholar]
- Huertos, F.J.; Chicote, B.; Masenlle, M.; Ayuso, M. A Novel Architecture for Cyber-Physical Production Systems in Industry 4.0. In Proceedings of the 2021 IEEE 17th International Conference on Automation Science and Engineering (CASE), Lyon, France, 23–27 August 2021; pp. 645–650. [Google Scholar] [CrossRef]

**Figure 1.**Battery 1 (${\mathcal{D}}_{1}$) sensors over time. Two main motifs stand out, circled in red and green: the green coarsely corresponds to the steady state of the process, whereas the red is more the state of the sensors when the plant is shut down. A pair of red and green motifs forms a regular work week.

**Figure 2.**Battery 2 (${\mathcal{D}}_{2}$) sensors over time. Similar to battery 1 (Figure 1), two main motifs stand out, circled in red and green, and a pair of both represents, anew, a regular work week, with the steady state in green and the offline state in red, respectively. A third group of data, circled in pink, is also noticeable: it represents an irregular event that occurred during the data recording.

**Figure 3.**Battery 1 (${\mathcal{D}}_{1}$) clustering comparison ($3\times 3$ SOM). When using a SOM with ${\mathcal{D}}_{1}$, one among three sets of clusters appears; each column is one of these configurations, and the rows are two sensors of the database given as examples, sensor 2 (

**top**) and sensor 8 (

**bottom**).

**Figure 4.**Battery 1’s eight clustered sensors of the rightmost column of Figure 3.

**Figure 5.**Battery 1 probability distributions. The solid colored lines are the empirical distributions of each cluster, and the black dashed line is that of the database ${\mathcal{D}}_{1}$. The abscissa shows the real normalized values of the corresponding sensor, and the ordinates, their probabilities of appearance.

**Figure 6.**Battery 1 mean probability distributions computed along each dimension (sensor). The solid color lines are those of the clusters, and the dashed black line is that of the database. Normalized sensor values in abscissa and empirical probability of appearance in the ordinate.

**Figure 7.**Battery 2 (${\mathcal{D}}_{2}$) clustering comparison ($3\times 3$ SOM). The SOMs end up with three possible configurations, with four or five main clusters; these configurations are the columns, whilst the rows are two of the sensors of the database, sensor 9 (

**top**) and sensor 12 (

**bottom**).

**Figure 8.**Battery 2 probability distributions. The solid colored lines are the empirical distributions of each cluster, and the black dashed line is that of the database ${\mathcal{D}}_{2}$. The real normalized values of the corresponding sensor are in abscissa, and their respective probabilities of appearance are in ordinate.

**Figure 9.**Battery 2 mean probability distributions computed along each dimension (sensor). The solid color lines are those of the clusters, and the dashed black line is that of the database. Normalized sensor values in abscissa and empirical probability of appearance in ordinate.

**Figure 10.**Battery 2’s twelve clustered sensors of the rightmost column of Figure 7.

Name | Acronym | Description | Interpretation |
---|---|---|---|

Kolmogorov–Smirnov test | KS test | Statistical comparison of two datasets based on the empirical data distributions. | The lower the score, the higher the resemblance between the two datasets. |

Average Standard deviation | AvStd | Mean of the standard deviations computed along all the dimensions. | Compactness of the cluster, i.e., how the data are centered around their means. |

Hyper-Density | Density | Smallest ND hypersphere covering all of the cluster’s data. | Density of the cluster, i.e., how full and scattered the hyper-volume is. |

Silhouette Coefficients | SCs | Normalized differences between the intra-cluster distances of the data and the inter-cluster distances. | The higher, the nearer to the data of the same cluster and the father from the data of the other clusters, and vice-versa. |

Database | Average Standard Deviation | Density | |||
---|---|---|---|---|---|

Tag | Card | Min | Mean | Max | |

${\mathcal{D}}_{1}$ | 65,505 | 0.071 | 0.280 | 0.405 | 44,401.066 |

${\mathcal{D}}_{2}$ | 63,715 | 0.048 | 0.267 | 0.414 | 39,935.226 |

Clusters | clt1 | clt2 | clt3 | clt4 | clt5 | clt6 | clt7 | clt8 | clt9 |

Colors | |||||||||

Blue | Orange | Green | Red | Purple | Pink | Gray | Olive | Cyan |

**Table 4.**Battery 1 pairwise KS test of the clusters. All probabilities expressed in percentages ($\times 100$). See Table 3 for the details about the color scheme.

clt1 | clt2 | clt3 | clt4 | clt5 | clt6 | clt7 | clt8 | clt9 | |
---|---|---|---|---|---|---|---|---|---|

clt1 | 0.00 | 62.50 | 62.40 | 6.40 | 43.80 | 16.80 | 54.20 | 43.80 | 12.90 |

clt2 | 62.50 | 0.00 | 12.00 | 60.40 | 45.90 | 57.10 | 37.60 | 27.40 | 49.90 |

clt3 | 62.40 | 12.00 | 0.00 | 60.40 | 38.00 | 51.00 | 27.60 | 21.30 | 49.90 |

clt4 | 6.40 | 60.40 | 60.40 | 0.00 | 41.10 | 18.90 | 51.50 | 41.10 | 10.90 |

clt5 | 43.80 | 45.90 | 38.00 | 41.10 | 0.00 | 37.00 | 12.50 | 20.80 | 41.90 |

clt6 | 16.80 | 57.10 | 51.00 | 18.90 | 37.00 | 0.00 | 41.70 | 44.50 | 12.80 |

clt7 | 54.20 | 37.60 | 27.60 | 51.50 | 12.50 | 41.70 | 0.00 | 12.50 | 42.80 |

clt8 | 43.80 | 27.40 | 21.30 | 41.10 | 20.80 | 44.50 | 12.50 | 0.00 | 46.90 |

clt9 | 12.90 | 49.90 | 49.90 | 10.90 | 41.90 | 12.80 | 42.80 | 46.90 | 0.00 |

**Table 5.**Battery 1 quantification of the clustering: AvStd, density, and silhouettes. For the first two tags, “Abs” represents the absolute value of quantifier $\gamma $, “$\xf7\phantom{\rule{4pt}{0ex}}{\gamma}_{\mathcal{D}}$” is this value divided by that of database ${\mathcal{D}}_{1}$, and “$\xf7\phantom{\rule{4pt}{0ex}}{\gamma}_{max}$”, divided by the maximal value of $\gamma $ among the nine clusters. For the silhouettes, there are as many coefficients as data in a cluster: “Min”, “Mean”, and “Max” are, respectively, the minimum, mean, and maximum values among all the silhouettes of a cluster. See Table 3 for the details about the color scheme.

Clusters | AvStd $\overline{\mathit{\sigma}}$ | Density $\mathit{\rho}$ | Silhouettes SC | |||||||
---|---|---|---|---|---|---|---|---|---|---|

Tag | Card | Abs $\left(\times {\mathbf{10}}^{-\mathbf{2}}\right)$ | $\xf7\phantom{\rule{4pt}{0ex}}{\overline{\mathbf{\sigma}}}_{\mathcal{D}}$ | $\xf7\phantom{\rule{4pt}{0ex}}{\overline{\mathbf{\sigma}}}_{max}$ | Abs | $\xf7\phantom{\rule{4pt}{0ex}}{\mathbf{\rho}}_{\mathcal{D}}$$\left(\times {\mathbf{10}}^{-\mathbf{2}}\right)$ | $\xf7\phantom{\rule{4pt}{0ex}}{\mathbf{\rho}}_{max}$$\left(\times {\mathbf{10}}^{-\mathbf{2}}\right)$ | Min | Mean | Max |

clt1 | 21,178 | 3.477 | 0.124 | 0.401 | 26,066 | 58.7 | 89.3 | −0.060 | 0.496 | 0.633 |

clt2 | 21,210 | 4.784 | 0.171 | 0.553 | 29,186 | 65.7 | 100 | −0.165 | 0.314 | 0.541 |

clt3 | 15,034 | 4.662 | 0.167 | 0.539 | 14,365 | 32.4 | 49.2 | 0.028 | 0.374 | 0.556 |

clt4 | 7747 | 6.715 | 0.240 | 0.777 | 5548 | 12.5 | 19.0 | −0.289 | 0.291 | 0.439 |

clt5 | 2 | 7.250 | 0.259 | 0.838 | 5.286 | 0.0119 | 0.0181 | −0.588 | −0.211 | 0.166 |

clt6 | 102 | 3.487 | 0.125 | 0.403 | 129.2 | 0.291 | 0.443 | −0.202 | 0.666 | 0.781 |

clt7 | 3 | 5.606 | 0.200 | 0.648 | 11.94 | 0.0269 | 0.0409 | 0.192 | 0.277 | 0.381 |

clt8 | 6 | 8.648 | 0.309 | 1.000 | 10.12 | 0.0228 | 0.0347 | −0.257 | −0.012 | 0.192 |

clt9 | 223 | 7.900 | 0.282 | 0.913 | 147.8 | 0.333 | 0.506 | −0.616 | 0.085 | 0.442 |

AvStd | Density | Silhouettes | KS Test | ||||||
---|---|---|---|---|---|---|---|---|---|

Abs | $\xf7\phantom{\rule{4pt}{0ex}}{\overline{\sigma}}_{\mathcal{D}}$ | $\xf7\phantom{\rule{4pt}{0ex}}{\overline{\sigma}}_{max}$ | Abs | $\xf7\phantom{\rule{4pt}{0ex}}{\rho}_{\mathcal{D}}$ | $\xf7\phantom{\rule{4pt}{0ex}}{\rho}_{max}$ | Abs | Stats | CDFs | Scores |

3.796 | 3.885 | 0.010 | 6.787 | 7.452 | 0.012 | $4.001\times {10}^{5}$ | 0.330 | 98.877 | 0.868 |

7.691 ms | 14.251 ms | $4.001\times {10}^{5}$ ms | 99.745 ms |

**Table 7.**Battery 2 quantification of the clustering: AvStd, density, and SCs. “Abs” is the absolute value of the quantifier $\gamma $, whilst “$\xf7\phantom{\rule{4pt}{0ex}}{\gamma}_{\mathcal{D}}$” is this value divided by that of the database, and “$\xf7\phantom{\rule{4pt}{0ex}}{\gamma}_{max}$” by its maximum. “Min”, “Mean”, and “Max” are the minimum, mean, and maximum of the silhouettes. See Table 3 for the details about the color scheme.

Clusters | AvStd $\overline{\mathit{\sigma}}$ | Density $\mathit{\rho}$ | Silhouettes SC | |||||||
---|---|---|---|---|---|---|---|---|---|---|

Tag | Card | Abs $\left(\times {\mathbf{10}}^{-\mathbf{2}}\right)$ | $\xf7\phantom{\rule{4pt}{0ex}}{\overline{\mathbf{\sigma}}}_{\mathcal{D}}$ | $\xf7\phantom{\rule{4pt}{0ex}}{\overline{\mathbf{\sigma}}}_{max}$ | Abs | $\xf7\phantom{\rule{4pt}{0ex}}{\mathbf{\rho}}_{\mathcal{D}}$$\left(\times {\mathbf{10}}^{-\mathbf{2}}\right)$ | $\xf7\phantom{\rule{4pt}{0ex}}{\mathbf{\rho}}_{max}$$\left(\times {\mathbf{10}}^{-\mathbf{2}}\right)$ | Min | Mean | Max |

clt1 | 21,514 | 4.41 | 0.164 | 0.287 | 18,216 | 45.6 | 100 | −0.134 | 0.489 | 0.653 |

clt2 | 7199 | 7.05 | 0.261 | 0.458 | 6873 | 17.2 | 37.7 | −0.190 | 0.427 | 0.616 |

clt3 | 18,369 | 6.93 | 0.258 | 0.453 | 17,393 | 43.6 | 95.5 | −0.253 | 0.498 | 0.665 |

clt4 | 9091 | 5.68 | 0.208 | 0.365 | 7737 | 19.4 | 42.5 | −0.398 | 0.439 | 0.587 |

clt5 | 104 | 8.42 | 0.316 | 0.554 | 88.52 | 0.222 | 0.486 | −0.368 | 0.640 | 0.781 |

clt6 | 7345 | 7.01 | 0.262 | 0.460 | 6909 | 17.3 | 37.9 | −0.725 | 0.490 | 0.648 |

clt7 | 24 | 15.2 | 0.570 | 1.000 | 21.42 | 0.0536 | 0.118 | −0.362 | 0.128 | 0.388 |

clt8 | 6 | 11.6 | 0.434 | 0.760 | 8.387 | 0.0210 | 0.0461 | −0.182 | 0.153 | 0.307 |

clt9 | 63 | 10.7 | 0.400 | 0.701 | 48.55 | 0.122 | 0.267 | −0.273 | 0.363 | 0.530 |

**Table 8.**Battery 2 pairwise KS test of the clusters. All probabilities expressed in percentages ($\times 100$). See Table 3 for the details about the color scheme.

clt1 | clt2 | clt3 | clt4 | clt5 | clt6 | clt7 | clt8 | clt9 | |
---|---|---|---|---|---|---|---|---|---|

clt1 | 0.00 | 58.00 | 58.70 | 13.90 | 49.80 | 57.90 | 37.20 | 50.10 | 13.90 |

clt2 | 58.00 | 0.00 | 22.20 | 65.50 | 23.10 | 9.50 | 37.50 | 21.80 | 46.60 |

clt3 | 58.70 | 22.20 | 0.00 | 68.20 | 38.60 | 19.30 | 45.50 | 32.70 | 55.20 |

clt4 | 13.90 | 65.50 | 68.20 | 0.00 | 57.20 | 65.60 | 44.30 | 57.70 | 21.30 |

clt5 | 49.80 | 23.10 | 38.60 | 57.20 | 0.00 | 19.50 | 25.10 | 16.80 | 36.80 |

clt6 | 57.90 | 9.50 | 19.30 | 65.60 | 19.50 | 0.00 | 34.40 | 19.70 | 46.60 |

clt7 | 37.20 | 37.50 | 45.50 | 44.30 | 25.10 | 34.40 | 0.00 | 28.10 | 23.30 |

clt8 | 50.10 | 21.80 | 32.70 | 57.70 | 16.80 | 19.70 | 28.10 | 0.00 | 38.70 |

clt9 | 13.90 | 46.60 | 55.20 | 21.30 | 36.80 | 46.60 | 23.30 | 38.70 | 0.00 |

AvStd | Density | Silhouettes | KS Test | ||||||
---|---|---|---|---|---|---|---|---|---|

Abs | $\xf7\phantom{\rule{4pt}{0ex}}{\overline{\sigma}}_{\mathcal{D}}$ | $\xf7\phantom{\rule{4pt}{0ex}}{\overline{\sigma}}_{max}$ | Abs | $\xf7\phantom{\rule{4pt}{0ex}}{\rho}_{\mathcal{D}}$ | $\xf7\phantom{\rule{4pt}{0ex}}{\rho}_{max}$ | Abs | Stats | CDFs | Scores |

4.788 | 5.123 | 0.010 | 8.007 | 8.324 | 0.013 | $4.134\times {10}^{5}$ | 0.323 | 100.786 | 0.668 |

9.921 ms | 16.343 ms | $4.134\times {10}^{5}$ ms | 101.454 ms |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Molinié, D.; Madani, K.; Amarger, V. Clustering at the Disposal of Industry 4.0: Automatic Extraction of Plant Behaviors. *Sensors* **2022**, *22*, 2939.
https://doi.org/10.3390/s22082939

**AMA Style**

Molinié D, Madani K, Amarger V. Clustering at the Disposal of Industry 4.0: Automatic Extraction of Plant Behaviors. *Sensors*. 2022; 22(8):2939.
https://doi.org/10.3390/s22082939

**Chicago/Turabian Style**

Molinié, Dylan, Kurosh Madani, and Véronique Amarger. 2022. "Clustering at the Disposal of Industry 4.0: Automatic Extraction of Plant Behaviors" *Sensors* 22, no. 8: 2939.
https://doi.org/10.3390/s22082939