# Customized Knowledge Discovery in Databases methodology for the Control of Assembly Systems

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- Poverty of well-defined and standardized data analysis procedures and guidelines for manufacturing datasets [8];
- Absence of a consolidated data analysis culture in the manufacturing field [9];
- Scarcity of well-established and appropriate data collection and saving systems; and
- Issues with data accessibility and sharing.

## 2. Research Statement and Objective

- Assembly datasets mainly focus on quality variables and process parameters, leading to the generation of discrete time series related to processed items instead of continuous signals coming from equipment. Therefore, the nature of available data, as deeply discussed in Section 3.2, prevents the creation of models based on signals coming from sensors and describing machines technical parameters such as temperatures, absorbed powers or vibrations. This concept explains why powerful and reliable techniques provided by ICPS, such as data-driven KPI estimators and predictors, based on real or soft sensors [4], are not completely applicable to this specific context.
- Techniques typical of the control theory [16,17], such as dynamic state space models or observers (e.g., Kalman filter), could not be used because the tracking of different and multiple issues, featured by different physical reasons, could not be reliable in case of single mathematical modeling. Moreover, even if these strategies would be able to identify a certain deviation with respect to the typical stable scenario, they are not able to identify the root cause of the problem, leading to a partial effectiveness of predictive action.

#### Research Objective

_{t}= ϕ

_{1}X

_{t−1}+ ⋯ + ϕ

_{p}X

_{t−p}+ Z

_{t}− ϑ

_{1}Z

_{t−1}− … − ϑ

_{q}Z

_{t−q}

- Consistency. Each step will be motivated and justified from both a mathematical-analytical and a physical-industrial point of view.
- Generality. Even though the tool will be specifically designed and validated within the Bosch VHIT S.p.A. assembly system, it strives for being a standardized instrument, applicable by a manufacturing industry independently from nature and layout of performed assembly process. Moreover, it aims to be easily tunable and customizable, depending on the local field of application, so to optimize its performances for the specific case study.

## 3. Background

#### 3.1. KDD Methodology

- Capability to reduce Big Data original size, decreasing computational cost and associated technical problems, by focusing on useful variables only.
- Capability to work in uncertain situations. Since the goal of KDD is to extract knowledge from raw data, it is naturally built to be flexible and to adapt its framework with respect to partial results obtained along its application.
- Popularity and generality. Academic literature suggests KDD as the most suitable methodology to be used for the sake of information extraction from raw Big Data [21,22]. The main reason is that it gives very general guidelines, leaving to the practitioner a sufficient number of degrees of freedom to develop and adapt it to the actual case.

#### 3.2. Bosch VHIT S.p.A. Assembly System

#### Collected Data

- General assembly layout information: codes for items’ traceability and model recognition, in case of multi-product lines; date and time of product processing; cycle time of each workstation.
- Single workstation information: physical variables referred to performed operation; station outcome (compliant or non-compliant piece).

## 4. Customized KDD for Assembly System

#### 4.1. Definition of Preliminary Points

#### 4.2. Data Pre-processing

^{i}

_{j}, with i = 1, … , n and j = 1, …, p, contains the value of the j-th parameter (variable) for the ith piece (item).

- Position data in a chronological order. These operations help future time-series analysis and forecasting activities.
- Consider the proper period only. The whole available history is not always the best choice, even though it may bring to a better validation of used stochastics models. For example, immediately after the assembly line installation, the series production will start only after properly equipment regulation and prototyped products, thus gathered data are highly affected by this extraordinary situation.
- Eliminate outliers. An outlier is defined as a measurement that results inconsistent with respect to the rest of the dataset. Its value can be generated by different causes such as poorly calibrated or faulty instruments, electrical issues, communication channels issues, and server issues. The management of outliers is particularly puzzling since outliers considered as normal measurements can contaminate data behavior, compromising the identification of patterns and the algorithms’ functioning; however, normal measurements considered as outliers can obstruct the capture of useful knowledge. If one removes them, a fundamental portion of the dataset is going to be removed, namely the non-compliant pieces. Since the final goal is to prevent issues, based on past knowledge, the exclusion of these values from the database seems inappropriate. To verify that data are not real outliers, a check if a candidate outlier corresponds to a compliant piece must be performed: if the answer is positive, the measurement is inconsistent and the point is discarded; if the answer is negative, it is not possible to neglect the point. Then, it is important to understand how to manage value of these points, which is completely different from the remaining portion of the dataset. Since the difference can be of several orders of magnitude too, a single anomaly can affect the entire dataset behavior and compromise prediction and regression analysis. Thus, their value is shifted to positions that lie immediately outside the admissible range to include these values in the analysis while keeping their original information (i.e. the piece is not compliant).

#### 4.3. Data Dimensionality Reduction

#### 4.4. Data Mining

#### 4.5. Development of Algorithm’s Selection Criteria

- Selection layer one deals with the mathematical nature of the problem. When referring to statistical learning literature, two main objectives are usually faced: inference problems, which involve the research of relationships between different variables and the subdivision of individuals in specific categories basing on the values of parameters describing them; prediction problems, which involve the modeling of variables and the attempt of forecasting their future behavior [31]. First kind of problem should be faced with classification techniques while second one with regression algorithms. According to proposed analysis goal, focused on single process parameters, it is clear that the actual analysis should find a solution of a prediction problem.
- Selection layer two deals with physical nature of the assembly system. Considering the very peculiar behavior of assembly systems, with the generation of almost non-repeatable technical or qualitative issues on different subgroups, data appear with a very particular shape. Process parameters sequences come with a general stable and planar behavior, spaced by local and isolated peaks caused by unique reasons. Each issue appears as an anomaly, namely a point in time where the behavior of the system is unusual and significantly different from previous and normal behavior. Figure 4, related to the time series of a process parameter collected in Bosch VHIT S.p.A., clearly shows this kind of behavior (see the area pointed by the arrow, highlighting the emerging behavior after 3,100,000 pieces).

## 5. Data Mining via ARIMA Modeling

#### 5.1. Data Time Series Preparation

- Random fluctuations are absorbed within samples, solving to the aim of previously designed filters.
- The customized tuning of N allows freeing the algorithm from assembly’s production rate. The choice of N is postponed to the SA procedure, because no specific rules have been found to set it. The reason is that almost all literature is focused on sampling techniques for continuous-time signals more than discrete time series. When moving to time series of discrete measurements, namely referred to different individuals, literature focuses on sampling techniques of heterogeneous populations [36], while no specific criteria are provided in case of homogeneous measurements.
- The shift from physical process parameters to statistical moments allows freeing the algorithm from the physical nature of the problem. In this way, it is effective for all peculiarities appearing in the assembly system, not depending on their qualitative or technical causes and not depending on the involved process parameters.
- The inclusion of variables coming from Statistical Process Control (SPC) world allows freeing the algorithm from eventual modifications in the process limits. Even after a manual modification of boundaries, the algorithm is able to automatically adapt itself to new conditions.

**S**, and to apply Principal Component Analysis (PCA) on them, to reduce again the analysis on a single variable [37]. PCA is a data projection technique used to condensate a dataset in a smaller set of artificial but uncorrelated variables by maximizing the covariance between them. It is used to shrink useful information in a few numbers of meaningful parameters. The number of artificial variables, called Principal Components (PCs), to be considered to represent the whole dataset with a sufficiently high degree of approximation, is set by some popular tools, such as the Kaiser rule [29]. In this field of application, one should be able to combine the knowledge contained in each statistical moment and to detect the growth of all possible anomalies in one single artificial indicator. This kind of strategy, with PCA applied on statistical moments related to a single physical variable, finds a positive confirmation both in faults’ detection literature and in alarm design one [34]. In particular, this second one suggests that the use of PCA to generate warnings can be more efficient with respect to traditional Qualitative Trend Analysis (QTA) methods, based on single variables [38]. The set of statistical tools used to perform PCA must be optimized through a SA process to select a combination able to forecast all possible problems and to concentrate a sufficiently high variance percentage in the first principal component, satisfying Kaiser rule. The variables to be combined are described in Table 2. Each moment is referred to a sample having size N. Again, KDD iterative nature is recalled: the selection of ARIMA model constraints the analyst to move back to data dimensionality reduction stage in order to achieve the best data shape for the specific situation.

**t**.

#### 5.2. Application of Customized ARIMA Model

**t**to forecast its immediate future tendency. An ARIMA model is actually the sum of three models: AR stands for Autoregressive, I stands for Integrated and MA stands for Moving Average. The Integrated model can be seen just as an auxiliary tool used to satisfy the assumptions of the remaining two ones, and it will be treated later on. The core model is the ARMA one [39]. An ARMA(p,q) model is the sum of an Autoregressive model of order p (Equation (3)) and a Moving Average model of order q (Equation (4)), resulting in the global shape of (Equation (5)). In these equations, X is the modeled variable, meaning the tracker

**t**, Z is a random variable characterized by a normal distribution (Equation (6)) and ϕ and ϑ are the coefficients of the learner.

_{t }= ϕ

_{1}X

_{t−1}+ ⋯ + ϕ

_{p}X

_{t−p }AR(p) model

_{t}= Z

_{t }− ϑ

_{1}Z

_{t−1}− … − Z

_{t−q }MA(q) model

_{t}= ϕ

_{1}X

_{t−1}+ ⋯ + ϕ

_{p}X

_{t−p}+ Z

_{t}− ϑ

_{1}Z

_{t−1}− … − ϑ

_{q}Z

_{t−q }ARMA(p,q) model

_{t}∽ N(0,σ

^{2})

^{2}points and it is reasonable to model it in a single step, e.g. through Box–Jenkins method [40]. On the other hand, the studied industrial databases reach also 10

^{4}–10

^{5}points and, despite data aggregation, the final length results definitively bigger than the previously mentioned one. Therefore, a single model could never be accurate enough to describe the entire series. Moreover, since the general tendency of the assembly process is to keep a planar behavior, a model built on the entire data history will generate flat and, consequently, meaningless results, precluding the capability of forecasting any problem. To obtain effective and reliable forecasts, this work suggests to model the series step-by-step, considering only the last L points of the whole data sequence, to weight in a stronger way the appearance of anomalies. Of course, for the sake of feasibility, each passage must be automated, to avoid continuous manual tuning of ARIMA orders.

#### 5.3. Parameters Optimization

## 6. Results

#### 6.1. Customized KDD Application

#### 6.2. Summary of Other Customized KDD Applications

## 7. Conclusions

- Exploitation of data analysis in assembly environment. Literature research has revealed a high imbalance between research effort in studying machining data and assembly ones, in favor of the first category. This work could help in reducing this gap.
- Standardized and flexible framework of the presented procedure. The KDD-based backbone of the process, together with the systematic rules provided in each section of the process, gives it a standardized nature, fundamental to allow its full-scale implementation in the manufacturing world. On the other hand, the iterative and parametric policies adopted on different layers of the procedure leave some degrees of freedom with respect to its general architecture, allowing to customize it.
- Definition of rigorous and effective guidelines in data mining algorithms’ selection. The two-stage selection technique proposed in Section 4.5 appears as a powerful guideline devoted to the minimization of user’s effort in the choice and application of statistical learning algorithms.

#### Future Work

- Research for alternative algorithms satisfying the two-stage selection procedure discussed in Section 4.5. The target should be the comparison and the assessment of the proposed algorithm (ARIMA) with respect to alternative ones or its eventual replacement with more satisfactory solutions. This point is mainly driven by the extremely wide range of methods contained within novelty detection literature that must be investigated when applied to manufacturing environments.
- Research for tools able to work on entire assembly systems instead of single process parameters. Even though this work is focused on single variables, according to needs of the company and to physics of their processes, in other scenarios, relationships between different parameters may exist, pushing towards the identification of hidden patterns. A possible target for this kind of system could be the creation of a “Digital Check”, namely an instrument able to predict the outcome of a certain workstation by analyzing the outcomes of previous ones.
- Research for architectures able to industrialize the presented methodology in a structured way. Tools such as a Decision Support System (DSS), with high degrees of automation and easy implementation on company’s Manufacturing Execution System (MES), could be a reasonable solution for companies deciding to adopt implement the presented solution [43,44,45,46,47].

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Shin, J.H.; Jun, H.B. On condition based maintenance policy. J. Comput. Des. Eng.
**2015**, 2, 119–127. [Google Scholar] [CrossRef] - Goel, P.; Datta, A.; Sam Mannan, M. Industrial alarm systems: Challenges and opportunities. J. Loss Prev. Process Ind.
**2017**, 50, 23–26. [Google Scholar] [CrossRef] - Jiang, Y.; Yin, S. Recursive total principle component regression based fault detection and its application to Vehicular Cyber-Physical Systems. IEEE Trans. Ind. Inf.
**2018**, 4, 1415–1423. [Google Scholar] [CrossRef] - Jiang, Y.; Yin, S.; Kaynak, O. Data-driven Monitoring and Safety Control of Industrial Cyber-Physical Systems: Basics and Beyond. IEEE Access
**2018**, 6, 47374–47384. [Google Scholar] [CrossRef] - Bumbaluskas, D.; Gemmill, D.; Igou, A.; Anzengruber, J. Smart Maintenance Decision Support System (SMDSS) based on corporate data analytics. Expert Syst. Appl.
**2017**, 90, 303–317. [Google Scholar] [CrossRef] - Ge, Z.; Song, Z.; Ding, D.X.; Haung, A.B. Data Mining and analytics in the process industry: The role of machine learning. IEEE Access
**2017**, 5, 20590–20616. [Google Scholar] [CrossRef] - Mourtzis, D.; Vlachou, K.; Milas, N. Industrial Big Data as a result of IoT adoption in manufacturing. Procedia CIRP
**2016**, 55, 290–295. [Google Scholar] [CrossRef] - Kranz, M. Building the Internet of Things: Implement New Business Models, Disrupt Competitors, Transform Your Industry; John Wiley & Sons Inc.: Hoboken, NJ, USA, 2016. [Google Scholar]
- Tiwari, S.; Wee, H.M.; Daryanto, Y. Big data analytics in supply chain management between 2010 and 2016: Insights to industries. Comput. Ind. Eng.
**2018**, 115, 319–330. [Google Scholar] [CrossRef] - Piateski, G.; Frawley, W. Knowledge Discovery in Databases; MIT Press: Cambridge, MA, USA, 1991. [Google Scholar]
- Gamarra, C.; Guerrero, J.M.; Montero, E. A knowledge discovery in databases approach for industrial microgrid planning. Renew. Sustain. Energy Rev.
**2016**, 60, 615–630. [Google Scholar] [CrossRef][Green Version] - Cheng, G.Q.; Zhou, B.H.; Li, L. Integrated production, quality control and condition-based maintenance for imperfect production systems. Reliab. Eng. Syst. Saf.
**2018**, 175, 251–264. [Google Scholar] [CrossRef] - Mourtzis, D.; Vlachou, E. A cloud-based cyber-physical system for adaptive shop-floor scheduling and condition-based maintenance. J. Manuf. Syst.
**2018**, 47, 179–198. [Google Scholar] [CrossRef] - Kumar, S.; Goyal, D.; Dang, R.K.; Dhami, S.S.; Pabla, B.S. Condition based maintenance of bearings and gearsfor fault detection—A review. Mater. Today
**2018**, 5, 6128–6137. [Google Scholar] [CrossRef] - Bengtsson, M.; Kurdve, M. Machining Equipment Life Cycle Costing Model with Dynamic Maintenance Cost. Procedia CIRP
**2016**, 48, 102–107. [Google Scholar] [CrossRef] - Keliris, C.; Polycarpou, M.; Parisini, T. A distributed fault detection filtering approach for a class of interconnected continuous-time nonlinear systems. IEEE Trans. Autom. Control
**2013**, 58, 2032–2047. [Google Scholar] [CrossRef] - Mahmoud, M.; Shi, P. Robust Kalman filtering for continuous time-lag systems with markovian jump parameters. IEEE Trans. Circuits Syst.
**2003**, 50, 98–105. [Google Scholar] [CrossRef] - Nahmias, S.; Lennon Olsen, T. Production and Operations Analysis; Waveland Press Inc.: Long Grove, IL, USA, 2015. [Google Scholar]
- Fayyad, U.; Stolorz, P. Data mining and KDD: Promise and challenge. Future Gener. Comput. Syst.
**1997**, 13, 99–115. [Google Scholar] [CrossRef] - Gullo, F. From Patterns in Data to Knowledge Discovery: What Data Mining can do. Phys. Procedia
**2015**, 62, 18–22. [Google Scholar] [CrossRef] - Galar, D.; Kans, M.; Schmidt, B. Big Data in Asset Management: Knowledge Discovery in Asset Data by the Means of Data Mining. In Proceedings of the 10th World Congress on Engineering Asset Management, Tampere, Finland, 28–30 September 2015. [Google Scholar]
- Choudhary, A.K.; Harding, J.A.; Tiwari, M.K. Data Mining in manufacturing: a review based on the kind of knowledge. Adv. Eng. Inf.
**2008**, 33, 501. [Google Scholar] [CrossRef][Green Version] - Qu, Z.; Liu, J. A new method of power grid huge data pre-processing. Procedia Eng.
**2011**, 15, 3234–3239. [Google Scholar] [CrossRef] - Bilalli, B.; Abellò, A.; Aluja–Banet, T.; Wrembel, R. Intelligent assistance for data pre-processing. Comput. Stand. Interfaces
**2017**, 57, 101–109. [Google Scholar] [CrossRef] - Munson, M.A. A study on the importance of and time spent on different modeling steps. ACM SIGKDD Explor. Newsl.
**2011**, 13, 65–71. [Google Scholar] [CrossRef] - Garces, H.; Sbarbaro, D. Outliers detection in industrial databases: An example sulphur recovery process. World Congr.
**2011**, 18, 1652–1657. [Google Scholar] [CrossRef] - Nisbet, R.; Miner, G.; Yale, K. Handbook of Statistical Analysis and Data Mining Applications; Academic Press: Burlington, MA, USA, 2018. [Google Scholar]
- Gandomi, A.; Haider, M. Beyond the hype: Big Data concepts, methods and analytics. Int. J. Inf. Manag.
**2014**, 35, 137–144. [Google Scholar] [CrossRef] - Saporta, G.; Niang, N. Data Analysis; ISTE Ltd.: London, UK, 2009; pp. 1–23. [Google Scholar]
- James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer: New York, NY, USA, 2013. [Google Scholar]
- Hastie, T.; Tibshirani, R.; Friedman, J. The elements of Statistical Learning; Springer: New York, NY, USA, 2001. [Google Scholar]
- Pimentel, M.A.F.; Clifton, D.A.; Clifton, L.; Tarassenko, L. A review of novelty detection. Signal Process.
**2014**, 99, 215–249. [Google Scholar] [CrossRef] - Amhmad, S.; Lavin, A.; Purdy, S.; Agha, Z. Unsupervised real-time anomaly detection for streaming data. Neurocomputer
**2017**, 262, 134–147. [Google Scholar] [CrossRef][Green Version] - Baptista, M.; Sankararaman, S.; de Medeiros, I.P.; Nascimento, C.Jr; Prendiger, H.; Henriques, E.M.P. Forecasting fault events for predictive maintenance using data-driven techniques and ARMA modeling. Comput. Ind. Eng.
**2018**, 115, 41–53. [Google Scholar] [CrossRef] - Janouchová, E.; Kučerová, A. Competitive comparison of optimal designs of experiments for sampling-based sensitivity analysis. Comput. Struct.
**2013**, 124, 47–60. [Google Scholar] [CrossRef][Green Version] - Barreiro, P.L.; Albandoz, J.P. Population and sample. Sampling techniques. Available online: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=2ahUKEwi9kez10ebdAhWOyKQKHXmvCVMQFjAAegQIBxAC&url=https%3A%2F%2Foptimierung.mathematik.uni-kl.de%2Fmamaeusch%2Fveroeffentlichungen%2Fver_texte%2Fsampling_en.pdf&usg=AOvVaw2btopZugJaU8jsfUXEfm2l (accessed on 2 October 2018).
- French, A.; Chess, S. Canonical Correlation Analysis & Principal Component Analysis. Available online: http://userwww.sfsu.edu/efc/classes/biol710/pca/CCandPCA2.htm (accessed on 2 October 2018).
- Chen, K.; Wang, J. Design of multivariate alarm systems based on online calculation of variational directions. Chem. Eng. Res. Des.
**2017**, 122, 11–21. [Google Scholar] [CrossRef] - Neusser, K. Time Series Econometrics; Springer: New York, NY, USA, 1994. [Google Scholar]
- Model Selection. In Econometrics ToolboxTM User’s Guide; The MathWorks Inc.: Natick, MA, USA, 2001.
- Statistics ToolboxTM User’s Guide; The MathWorks Inc.: Natick, MA, USA, 2016.
- Woods, D.C.; McGree, J.M.; Lewis, S.M. Model selection via Bayesian information capacity designs for generalized linear models. Comput. Stat. Data Anal.
**2016**, 113, 226–238. [Google Scholar] [CrossRef] - Prasad, D.; Ratna, S. Decision support systems in the metal casting industry: An academic review of research articles. Mater. Today Proc.
**2018**, 5, 1298–1312. [Google Scholar] [CrossRef] - Krzywicki, D.; Faber, L.; Byrski, A.; Kisiel-Dorohinicki, M. Computing agents for decision support systems. Future Gener. Comput. Syst.
**2014**, 37, 390–400. [Google Scholar] [CrossRef][Green Version] - Li, H.; Pang, X.; Zheng, B.; Chai, T. The architecture of manufacturing execution system in iron & steel enterprise. IFAC Proc. Vol.
**2005**, 38, 181–186. [Google Scholar] - Jiang, P.; Zhang, C.; Leng, J.; Zhang, J. Implementing a WebAPP-based Software Framework for Manufacturing Execution Systems. IPAC-Pap. Online
**2015**, 48, 388–393. [Google Scholar] [CrossRef] - Itskovich, E.L. Fundamentals of Design and Operation of Manufacturing Executive Systems (MES) in Large Plants. IPAC Proc. Vol.
**2013**, 46, 313–318. [Google Scholar] [CrossRef]

**Figure 4.**Time series of the air volumetric flow rate measured within a vacuum pump to test presence of leakages. The typical behavior is flat while the peculiar behavior rising at the end of the sequence corresponds to an issue observed in the pneumatic circuit of the bench.

**Figure 5.**Proposed algorithm’s functioning for general kth iteration. Models’ orders p, d, q are automatically set through statistical inference hypothesis tests, in order to avoid manual intervention.

**Figure 7.**Time series of measured air flow rate. The degradation of an o-ring, a round gasket aimed at preserve air leakages, generates a decreasing path until the lower acceptable bound trespasses and the station starts discharging products. Each point represents a processed product.

**Figure 8.**Optimal tracker time series (green curve) vs. number of scraps per sample (red curve), referring to flow rate of Figure 10. The outcome of SA first step suggests to use N = 400, M = 40, T = −29.5237 and to combine μ, σ, P

_{u,}P

_{l,}Cp

_{u,}Cp

_{l,}θ, Δ to build the tracker. Each point represents the tracker value for a sample of N products while the black line stands for the admissible threshold computed for the tracker.

**Figure 9.**Optimization stage for tracker segment length L. Values from 20 to 100, with a shift of 10 between two attempts, have been tested. L = 40 is selected since it maximizes BIC. Each value is repeated g = 25 times and the statistical distribution of corresponding results is provided in the associated box plot.

**Figure 10.**Optimization stage for future window F. g = 25 values of F ensuring PMSE < 10% have been tested. F = 6 (minimum) has been selected to be conservative and to avoid fake alarms.

**Figure 11.**Algorithm’s forecast (green line) against real tracker behavior (red line), in the case of nominal operative conditions (sample 83 of tracker time series).

**Figure 12.**Algorithm’s forecast (green line) against real tracker behavior (red line), when the valve starts wearing (sample 210 of tracker time series). The passing of tracker acceptable threshold determines the warning given by the code.

**Table 1.**Parameters to be tuned in the two macro-stages of the proposed procedure to optimize algorithm’s performances. DET parameters are determined “a priori” by the algorithm itself, while OPT parameters are tuned, starting from a reasonable first attempt value, through a SA procedure.

Phase | Parameter | Nature | Description |
---|---|---|---|

1—time series processing | N | OPT | Size of a sample of aggregated process parameters |

1—time series processing | M | OPT | Shift in between two consecutive samples |

1—time series processing | S | OPT | Set of main statistical magnitudes describing each sample |

1—time series processing | T | OPT | Tracker threshold: the system moves to a status of not-normality in case it is trespassed |

2—ARIMA modeling | L | OPT | Historical time window of time series used to fit ARIMA model |

2—ARIMA modeling | d | DET | Integrating model order |

2—ARIMA modeling | p | DET | Autoregressive model order |

2—ARIMA modeling | q | DET | Moving Average model order |

2—ARIMA modeling | F | OPT | Future time window of time series forecast by ARIMA model |

**Table 2.**Parameters to be combined in PCA to build the tracker. Tracker time series could be considered as the definitive output of data preparation stage and the data stage able to maximize performances of ARIMA model.

Parameter | Description |
---|---|

μ | Mean value |

σ | Standard deviation |

P_{u} | Estimated probability of being above upper bound (B_{up}) |

P_{l} | Estimated probability of being below lower bound (B_{low}) |

Cp_{u} | System capability of being below B_{up}: B_{up} − μ/3σ |

Cp_{l} | System capability of being above B_{low}: μ − B_{low} /3σ |

θ | Linear regression slope |

Δ | Difference between two consecutive linear regressions: θ_{i} – θ_{i−1} |

A^{+} | Number of pieces lying inside process parameters but above a confidence interval of 95%, assuming normal data distribution (μ + 3σ) |

A^{−} | Number of pieces lying inside process admissible range but below a confidence interval of 95%, assuming normal data distribution (μ − 3σ) |

S^{+} | Number of pieces lying above process admissible range |

S^{−} | Number of pieces lying below process admissible range |

**Table 3.**Advance achievable with proposed tool with respect to original warning systems on both technical and qualitative issues.

Problem | Nature of the Issue | Affected Process Parameter | Algorithm’s Performance | N | M | L | F |
---|---|---|---|---|---|---|---|

Obstruction of a pipe due to engine oil drops | Technical | Air volumetric flow rate | Identification in advance of 61 h | 800 | 100 | 20 | 8 |

Porosity in pump’s channels | Qualitative | Air volumetric flow rate | Identification in advance of 2.3 h | 100 | 10 | 40 | 4 |

Wear and sliding of a spindle | Technical | Torque absorbed by the pump | Identification in advance of 16 h | 400 | 40 | 40 | 6 |

Non-compliant material of screws | Qualitative | Screwing torque | Identification in advance of 1.3 h | 750 | 100 | 30 | 5 |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Storti, E.; Cattaneo, L.; Polenghi, A.; Fumagalli, L. Customized Knowledge Discovery in Databases methodology for the Control of Assembly Systems. *Machines* **2018**, *6*, 45.
https://doi.org/10.3390/machines6040045

**AMA Style**

Storti E, Cattaneo L, Polenghi A, Fumagalli L. Customized Knowledge Discovery in Databases methodology for the Control of Assembly Systems. *Machines*. 2018; 6(4):45.
https://doi.org/10.3390/machines6040045

**Chicago/Turabian Style**

Storti, Edoardo, Laura Cattaneo, Adalberto Polenghi, and Luca Fumagalli. 2018. "Customized Knowledge Discovery in Databases methodology for the Control of Assembly Systems" *Machines* 6, no. 4: 45.
https://doi.org/10.3390/machines6040045