# Using Ensembles for Accurate Modelling of Manufacturing Processes in an IoT Data-Acquisition Solution

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. State of the Art

#### 2.1. Data Acquisition Platforms in IoT Solutions

#### 2.2. Machine Learning Techniques Applied to Machining Optimization

- Regarding the reduction of the number of expensive experimental tests, Oleaga et al. [15] demonstrated that some machine-learning algorithms, for example, Random Forest ensembles, can provide accurate prediction models for critical depth of cut-and-chatter frequency in milling operations with a smaller number of experimental tests than experimental or analytical models.
- With respect to the extraction of useful information from imbalanced datasets, Bustillo and Rodriguez [16] demonstrated that some of those techniques, such as ensembles, can overcome unbalanced data in an extensive experimental dataset. They validated this approach to unbalanced data through industrial breakage detection of multitooth tools in real industrial datasets, showing successful detection of 59 insert breakages from a total dataset of 30,000 mechanized crankshafts.
- As for the reduction of the number of features without losing information, Grzenda et al. [17] demonstrated that Multilayer Perceptrons can make reliable predictions of surface roughness for face-milling operations, following dataset dimensionality reduction. They reduced the number of accelerometers needed in this case for reliable machine process monitoring. Grzenda and Bustillo [18] proposed the use of a genetic algorithm with neural networks to identify the best set of inputs to provide accurate prediction models for surface quality in high-torque face milling operations, reducing the data-acquisition costs within industrial environments.
- The capability of machine-learning techniques to complete missing attributes, due to sensor malfunction or data-transmission error has also been studied in a few works. Grzenda et al. [19] demonstrated that Genetic Algorithms and Multilayer Perceptrons can complete damaged datasets in deep drilling operations of steel components to predict borehole roughness. Besides, machine-learning techniques have proved their capability to create especially designed visual models for real-time visual processing of many manufacturing processes. Teixidor et al. [20] used different machine-learning algorithms, among which k-Nearest Neighbors, neural networks and decision trees, to model some outputs of industrial interest in pulsed laser micromachining of micro geometries, such as dimensional accuracy, surface roughness and material removal rate. They demonstrated reliable models of immediate industrial application by means of decision trees, which can process direct rules or 3D-Charts that optimize process parameters.
- The suitability of machine-learning techniques to build reliable prediction models for different machining-process cutting outcomes has been widely demonstrated. Bustillo et al. [21] proposed the use of Bayesian Networks for breakage detection of cutting tools in crankshafts machining. They showed the flexibility of these types of networks to extract process information of direct industrial use by means of inquiries forwarded to the Bayesian Network when evaluating tool wear in a discretized way: broken or normal. Karandikar et al. [22] proposed a Bayesian inference method to evaluate tool life for end milling of AISI 1018 in a continuous way, instead of with a discretized number of levels. Along the same lines, but using Multilayer Perceptrons, Mikołajczyk et al. [23] described the use of MLP-based automatic image analysis for assessing tool wear. Their results promised a good correlation between the new methods and the commonly used optically measured VB index for the entire life range of the tools.The same two strategies (continuous or discretized output) have been proposed for the two main workpiece-related quality indicators: surface roughness and dimensional accuracy. Grzenda et al. [17] built accurate Multilayer-Perceptron models for surface roughness prediction in cast-iron face-milling operations. Facing the same task, Rodríguez et al. [24] proposed surface roughness prediction in face milling, through the use of decision trees for their immediate implementation by process engineers in workshops, instead of black box models such as Multilayer Perceptrons (MLPs). Bustillo et al. [25] defended the advantages of Bayesian Networks to predict surface roughness in deep drilling operations with steel components, in this case, by discretizing surface roughness using an industrial standard: the ISO 4288:1996.Workpiece dimensional accuracy as a continuous output has been also modeled with neural networks and decision trees in pulsed laser micromachining of Hardened AISI H13 tool steel by Teixidor et al. [20]. In contrast, Ferreiro et al. [26] demonstrated that machine-learning algorithms, especially Bayesian Networks and Decision trees, were more accurate than mathematical models for the detection of burr during high-speed drilling in dry conditions on aluminum Al 7075-T6. Detection is classified, in this case, as admissible and non-admissible burr, considering industrial tolerances for this process.

#### 2.3. Machine Learning Techniques and Unbalanced Industrial Data

## 3. Data-Acquisition Set Up

## 4. Modeling

#### 4.1. Dataset Description

#### 4.2. Machine-Learning Techniques and Best Metrics

#### 4.2.1. Classification

- TP (i.e., true positives): number of times the classifier correctly predicts the minority class.
- FP (i.e., false positives): number of times the classifier incorrectly predicts instances of the majority class as instances of the minority class.
- FN (i.e., false negatives): number of times the classifier is wrong when predicting minority class instances as majority class instances.

- 1.
- When using Equation (2) and $TP+FP=0$, resulting in no definition of precision.
- 2.
- When also using equation Equation (3) and $TP+FN=0$, resulting in no definition of recall.
- 3.
- When using equation Equation (5), $TP+FN=0$ (i.e., no positive instances on the test partition) and also $TP+FP=0$ (i.e., the classifier predicts no instance as positive).

- 1.
- Naïve Bayes [34] which classifies by assigning probabilities to each class using Bayes theorem. It is taken as a baseline to compare with the other classifier methods, due to its simplicity, so methods with worse results than Naïve Bayes would not be acceptable.
- 2.
- kNN [35] calculates the distance to all instances from the instance to predict (Euclidean distance was used in the experiments). The majority class of the closest k instances is predicted. The value of k is a parameter of this algorithm. For each cross-validation partition in the experiments, the optimal k value from the integers 1 to 10 was used.
- 3.
- Decision trees. J48 was used, which is the WEKA implementation of the C4.5 decision tree [36]. The branching criterion of C4.5 is the information gain.
- 4.
- Function-based methods, such as Support Vector Machines (SVM) and Neural Networks:
- (a)
- The Multilayer Perceptron [37] is a neural network that has a number of hidden layers of neurons. The connections between the neurons have a weight which is obtained from a backpropagation algorithm. In the experiments, only one hidden layer was used. The number of neurons in this layer was given by the heuristics (number of independent variables +1)/2.In the Multilayer Perceptron, the learning rate and momentum parameters were optimized through internal cross validation, in order to maximize the F-Macro. The WEKA Multisearch package was modified for this purpose, as it is not prepared for optimization by F-Macro.
- (b)
- Support vector machines or SVMs [38] are actually classifiers for binary problems. They calculate a hyperplane that separates the regions of space corresponding to two classes. This hyperplane is said to maximize the margin, which intuitively means that it is as far as possible from the points on the border of both regions. These points are the so-called support vectors.Therefore, for the SVM to work properly, the regions corresponding to both classes need to be linearly separable, which is not too often the case. There are two strategies to adapt the algorithm to problems that are not linearly separable:
- A parameter C is introduced in the algorithm. It represents how much the incorrect classification of training instances (i.e.; the ones falling on the wrong side of the hyperplane) are penalized by the margin optimization procedure.
- A transformation of the classification problem from the original features space to another space by means of a non-linear transformation. In that new space, it is expected that the problem will be linearly separable, or at least, the number of instances that can cause that lack of linear separability will be reduced. To reach this goal, the concept of kernel is introduced. A kernel is a special type of function that computes the scalar products between instances in the new space. This scalar product is necessary for the calculation of the hyperplane. The most popular kernel is the Radial Basis Function kernel (SVM-RBF), which is given by Equation (6).$$K({x}_{i},{x}_{j})={e}^{-\frac{{\left(\right)}^{x}}{2}{\gamma}^{2}}$$The letter $\gamma $ represents the bandwidth, and it is a parameter of the method.When an SVM does not use a kernel, it is said to be an SVM with a linear kernel. Whether or not a linear kernel is used, the C parameter is present and must be specified.

Adapting SVM to multi-class problems is done using the 1 vs. 1 strategy, whereby a problem of n classes becomes (n−1)! binary problems. These problems result from confronting each class with all the rest. For each binary problem, an SVM is then trained, and the final prediction comes from the majority vote of all those SVMs.Two implementations of SVM were used in the experiments:For LibLinear, the C parameter was optimized, and in the case of LibSVM, the $\gamma $ parameter was also optimized. In all cases, cross validation was used on the training set. The aim of both optimizations was to maximize the F-macro, using the same modification of the WEKA Multisearch package described above for the Multilayer Perceptron.

- 5.
- Bagging-based methods [41] in which several decision trees are created. These trees are different because they are trained by resampling the training set. The final prediction in these ensembles is the one obtained by the majority vote of each of these trees. The trees used for the ensemble are commonly referred to as base classifiers.The Bagging variants used in the experiment were:
- (a)
- Bagging using C4.5 trees as base classifiers
- (b)
- Bagging using Random Balance of C4.5 trees [42]. Random Balance changes the number of instances of each class before they are used by each C4.5. Rather than balancing the number of instances of each class, this technique assigns a random number of instances to each. To do so, it both resamples the training set when it needs less instances, and it also creates synthetic instances as and when it needs. To create these synthetic instances, it uses SMOTE [27]. Random Balance is oriented towards unbalanced multiclass datasets, so it is a priori an approach that fits the nature of the problem under analysis very well.
- (c)
- Random Forest [43] can be considered a Bagging technique in which the base classifiers are “Random Trees”. In these decision trees, each time a branching node is to be built, the possible attributes to be considered are randomly restricted. In WEKA, Random Trees are implemented as a modification of REP-Trees (i.e.; Reduced Error Pruning trees), which have a fast building process.
- (d)
- Rotation Forest [44]. In this method, before building each tree, and once the training set has been resampled, the features are grouped randomly. These groups are disjointed from each other; and the union of all the groups contains all the attributes. All groups have the same number of attributes n, except perhaps one of the groups, when the dataset number of attributes is not divisible by n. In each group, the PCA projection is computed, taking only a number of principal components (i.e.; n in most cases) to keep the size of the groups unchanged.In the experiments, Rotation Forest of C 4.5 was used, as was Rotation Forest of Random Forest, because its training time is significantly shorter with the same number of trees.

All Bagging configurations have been computed using 100 trees. For this reason, Rotation Forest of Random Forest calculates 10 Random Forests, each containing 10 trees. - 6.
- Finally, two Boosting based ensembles were included for classification:
- (a)
- AdaBoost.M1 [45], because it is the most popular Boosting ensemble for classification. In this ensemble each base classifier is derived from the previous one, so that it gives more weight to the instances that the base classifier of the immediately preceding iteration has incorrectly classified. Unlike Bagging, the final prediction is not made by a majority vote of the base classifiers, but by a vote weighted by the individual error of each one. In the experiments, 100 C4.5 trees were taken as the base classifiers.
- (b)
- LogitBoost [46], because it is more suitable than AdaBoost.M1 for working with multi-class problems. This method makes a logistic transformation to convert the classification problem into a regression problem that predicts the probability of the instance belonging to each class. Therefore, LogitBoost uses regressors (i.e.; continuous value predictors), instead of classifiers, as base predictors. LogitBoost uses an additive regression in each iteration by appending regressors that learn the residues of the probability predictions (i.e., they learn the differences between the predicted values and the values of the training set). In each iteration, the probabilities of each training instance can be estimated by adding the predictions of the residuals. With that estimation, weights may be given to the instances, to add more importance to those where a higher prediction error occurs. The base regressors used in the experiments were REPTree regression trees. The size of the ensemble is 100 iterations as in the other cases.

#### 4.2.2. Regression

- 1.
- Function-based methods such as Linear Regression, Support Vector Machines (SVM), and Neural Networks:
- (a)
- Linear Regression creates a function that is a linear combination of the input variables, which minimizes the quadratic error of the training set. The Akaike criterion, as the WEKA implementation default, is used to select the attributes of the model.
- (b)
- The simplest version of SVM for regression is the linear SVM [47]. As in the case of linear regression, we also have a model that is a linear combination of the input variables (i.e., a hyperplane), but this time the optimization process ignores the instances that are less than $\u03f5$ away from that hyperplane (i.e., they are within the “margin”). The distance, $\u03f5$, is a parameter to specify that is denoted by C in most implementations. The implementation used in the experiment was LibLinear [40], for which the C parameter was optimized by cross validation, to minimize RRSE, using WEKA’s Multisearch package.The SVM for regression, like its version for classification, can use a kernel, and generate the hyperplane in a different space, leading to a more complex geometry than in the original space, which is more appropriate if the behavior of the variable to be predicted is not linear. Therefore, in the experiment we have also used an SVM regressor with a Radial Basis Function kernel (SVM-RBF). In the case of this kernel, besides optimizing C, it is necessary to optimize the bandwidth parameter. As in other methods, Multisearch was also used, using internal cross validation and aiming at minimizing the RRSE. The SVM-RBF implementation used in the experiment is the regression version of LibSVM [39].
- (c)
- The neural network method used is the same as that described for classification (i.e., a Multilayer Perceptron [37]). As in the classification, only an intermediate layer was used. The number of neurons in this layer is also given by the heuristics (number of independent variables + 1)/2. Likewise, the learning rate and momentum parameters were optimized with Multisearch, again through internal cross validation, to minimize the RRSE.

- 2.
- kNN [48] works similarly to kNN for classification. The difference is that, once the k instances closest to the test instance are found, the class mean for those k instances is predicted. As for classification, the value of k was also optimized in the experiments for each cross-validation partition with the best value obtained for k ranging between 1 and 10.
- 3.
- Regression trees. Two types of decision trees for regression were used:
- (a)
- M5P [49] which is a type of tree in which a linear regression is created with the instances in each leaf, so the prediction that is returned is the one of the linear regression corresponding to the leaf in which the test instance falls.
- (b)
- REP-Tree [33] is WEKA’s “native” decision tree for both classification and regression. When used as a regressor, it returns the mean value of the feature to be predicted for the instances of the leaf with the test instance. REP-Tree uses variance reduction as a criterion for finding split points for each node. The tree is pruned using the Reduced Error Pruning technique.

M5P is not in the results tables of the next section, because values higher than 1 were always obtained for RRSE, both when it was used alone and when it was tested within some variants of ensembles for regression. - 4.
- Methods based on ensembles [50], which combine the prediction of other simpler regressors called base regressors. The base regressors used in the experiments were REPTrees, since M5Ps always led to RRSE greater than 1. The ensembles used in the experiment were:
- (a)
- Bagging [41]. In this ensemble each base classifier is trained with a sample of the training set with replacement. The sample is the same size as the training set. The prediction is the average of the predictions of the base regressors.
- (b)
- Iterated Bagging [51]. In this method the first iteration applies Bagging over the original training set. The differences (i.e., residuals) between the training values predicted by that initial Bagging and the actual values are then computed. In the next iteration, Bagging is done again, but this time predicting the residuals. At this point the ensemble prediction would be obtained from the sum of the predictions of both Baggings. With this prediction, a new set of residuals are calculated, which are used to train another new Bagging, and so on during n iterations. In the experiments 10 iterations of Baggings of 10 regression trees were used.
- (c)
- Rotation Forest for regression [52] is a variant of Bagging in which the base regressors are transformed in the same way as in the Rotation Forest variant for classification. That is, random and discrete groups of features are created and the PCA projection is applied to each group.

All the ensembles have 100 base regressors. That is the reason why Iterated Bagging was configured with 10 iterations of Baggings of 10 trees.

## 5. Results and Industrial Interpretation

#### 5.1. Classification: State and Mode Prediction

**V**and

**D**respectively represent the number of these significant wins and losses in the two classification problems for the method in that row versus all other methods. Then, the difference “$\Delta $” is computed by subtracting D from V. This difference is taken as an indicator of what the best method is. In Table 3, the methods are ordered by $\Delta $. We can see from the table that the two best positioned methods are tied with $\Delta $ equal to 16. They win 16 times from the 24 matches, and they never lose.

- 1.
- Very good results may be found with just one single C4.5 decision tree. It could, on the one hand, mean that the input variables adequately describe the output variables and, on the other hand, that the number of instances is also sufficient.
- 2.
- Both problems are not suitable for a linear classification in view of the results of the linear SVM.
- 3.
- Classifiers based on the optimization of a complex function, such as Neural Networks or Radial Basis Function SVM, do not obtain competitive results despite the computational cost involved in the parameter optimization process.
- 4.
- All the top ranked methods are ensembles, which hardly vary from each other in their performance.
- 5.
- kNN is the best of the non-ensemble methods. It is once again due to the low number of characteristics (i.e., there is no curse of dimensionality) and the high number of instances.

- 1.
- The C4.5 tree reached very similar values to those of some ensembles. Usually, it is expected that the variance component of the classification error decreases as the training dataset increases [54]. It is known that Bagging-based ensembles reduced this error component, as Boosting-based ensembles also do in their later iterations [55]. Hence, these similar results of C4.5 vs. ensembles can be explained, at least in part, by the dataset size and the superiority of decision trees over the other non-ensemble methods.
- 2.
- It is also interesting to note that the F-Micro and MCC for State variable prediction almost reaches one (i.e., the maximum possible value) in many of the classifiers that were tested, and in nearly all in the case of F-Micro. This points out again that the State is well characterized by the attributes of this data set.
- 3.
- However, the Mode variable figures are worse. For both F-Macro and MCC metrics, Random Balance Bagging, which is a specific method for imbalanced datasets, is the best choice. The “+” sign in this method in Table 5 indicates that the MCC obtained is significantly better than the first method in the ranking.

#### 5.2. Regression

- 1.
- Many outliers at low temperatures for T${}_{X}$, T${}_{Y}$ and T${}_{Z}$, which may be due to the latency in the system’s heating curves at start-up.
- 2.
- Some outliers for the Z-axis at high temperatures. These points may indicate that this axis has been over-worked and strained at certain points during machining, and their identification may be important to avoid damage to the spindle motor or to increase the average life of the tool.
- 3.
- Some outliers for the milling head temperature T${}_{H}$ at high temperatures, but not far away from the median temperature. These values may indicate some rotating efforts of the milling head during 5-axis machining, but not too high to overcome the limit of 35 ${}^{\circ}$C.

## 6. Conclusions

- A real data-extraction architecture connected to an IoT platform for small workshops has been described.
- The data that are extracted can be useful for solving industrial problems. High performance results can be achieved for industrial problems related to both imbalanced classification and regression.
- The best performance was obtained by machine-learning ensemble methods, which require no method optimizations, yielding a straight-forward and simple way for optimal exploitation of the data that were gathered for this study.

- The prediction of two discrete variables: State and Mode (i.e., two classification problems).
- The prediction of four continuous variables: T${}_{X}$, T${}_{Y}$, T${}_{Z}$, T${}_{H}$ (i.e., four regression problems).

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Schafer Tesch da Silva, F.; André da Costa, C.; Paredes Crovato, C.D.; da Rosa Righi, R. Looking at energy through the lens of Industry 4.0: A systematic literature review of concerns and challenges. CAIE
**2020**, 143, 106426. [Google Scholar] [CrossRef] - Mourtzis, D.; Vlachou, E.; Milas, N. Industrial big data as a result of IoT adoption in manufacturing. Procedia CIRP
**2016**, 55, 290–295. [Google Scholar] [CrossRef] [Green Version] - Gajate, A.; Bustillo, A.; Haber, R.E. Transductive neurofuzzy-based torque control of a milling process: Results of a case study. Int. J. Innov. Comput. Inf. Control
**2012**, 8, 3495–3510. [Google Scholar] - Ebrahimi, M.; Victory, J.L. Web-based machine tool condition monitoring. In Network Intelligence: Internet-based Manufacturing, Proceedings of the SPIE, Boston, MA, USA, 29 December 2000; International Society for Optics and Photonics: Bellingham, WA, USA, 2000. [Google Scholar]
- Benardos, P.G.; Vosniakos, G.-C. Predicting surface roughness in machining: A review. Int. J. Mach. Tool. Manuf.
**2003**, 43, 833–844. [Google Scholar] [CrossRef] - Urbikain, G.; Olvera, D.; Lopez de Lacalle, L.N.; Beranoagirre, A. Prediction methods and experimental techniques for chatter avoidance in turning systems: A review. Appl. Sci.
**2019**, 9, 4718. [Google Scholar] [CrossRef] [Green Version] - Bustillo, A.; Grzenda, M.; Macukow, B. Interpreting tree-based prediction models and their data in machining processes. Integr. Comput. Aided Eng.
**2016**, 23, 349–367. [Google Scholar] [CrossRef] [Green Version] - Ferraz, F.; Coelho, R. Data acquisition and monitoring in machine tools with CNC of open architecture using internet. Int. J. Adv. Manuf. Technol.
**2005**, 26, 90–97. [Google Scholar] [CrossRef] - Frumusanu, G.; Constantin, I.; Marinescu, V. Development of a stability intelligent control system for turning. Int. J. Adv. Manuf. Technol.
**2013**, 64, 643–657. [Google Scholar] [CrossRef] - Mourtzis, D.; Vlachou, E.; Milas, N.; Dimitrakopoulos, G. Energy consumption estimation for machining processes based on real-time shop floor monitoring via wireless sensor networks. Procedia CIRP
**2016**, 57, 637–642. [Google Scholar] [CrossRef] - Zhong, R.Y.; Dai, Q.Y.; Qu, T.; Hu, G.J.; Huang, G.Q. RFID-enabled real-time manufacturing execution system for mass-customization production. Robot. Comput. Integr. Manuf.
**2013**, 29, 283–292. [Google Scholar] [CrossRef] - Palasciano, C.; Bustillo, A.; Fantini, P.; Taisch, M. A new approach for machine’s management: From machine’s signal acquisition to energy indexes. J. Ind. Eng. Int.
**2016**, 137, 1503–1515. [Google Scholar] [CrossRef] - Chen, X.; Li, C.; Tang, Y.; Li, L.; Xiao, Q. A framework for energy monitoring of machining orkshops based on IoT. Procedia CIRP
**2018**, 72, 1386–1391. [Google Scholar] [CrossRef] - Chen, X.; Li, C.; Tang, Y.; Li, L.; Xiao, Q. An Internet of Things based energy efficiency monitoring and management system for machining workshop. J. Clean. Prod.
**2018**, 199, 957–968. [Google Scholar] [CrossRef] - Oleaga, I.; Pardo, C.; Zulaika, J.J.; Bustillo, A. A machine-learning based solution for chatter prediction in heavy-duty milling machines. Measurement
**2018**, 128, 34–44. [Google Scholar] [CrossRef] - Bustillo, A.; Rodriguez, J.J. Online breakage detection of multitooth tools using classifier ensembles for imbalanced data. Int. J. Syst. Sci.
**2014**, 45, 2590–2602. [Google Scholar] [CrossRef] - Grzenda, M.; Bustillo, A.; Quintana, G.; Ciurana, J. Improvement of surface roughness models for face milling operations through dimensionality reduction. Integr. Comput. Aided Eng.
**2012**, 19, 179–197. [Google Scholar] [CrossRef] - Grzenda, M.; Bustillo, A. The evolutionary development of roughness prediction models. Appl. Soft Comput.
**2013**, 13, 2913–2922. [Google Scholar] [CrossRef] - Grzenda, M.; Bustillo, A.; Zawistowski, P. A soft computing system using intelligent imputation strategies for roughness prediction in deep drilling. J. Intell.
**2012**, 23, 1733–1743. [Google Scholar] [CrossRef] [Green Version] - Teixidor, D.; Grzenda, M.; Bustillo, A.; Ciurana, J. Modeling pulsed laser micromachining of micro geometries using machine-learning Techniques. J. Intell. Manuf.
**2005**, 26, 801–814. [Google Scholar] [CrossRef] [Green Version] - Bustillo, A.; Correa, M.; Reñones, A. A virtual sensor for online fault detection of multitooth-tools. J. Sens.
**2011**, 11, 2773–2795. [Google Scholar] [CrossRef] [Green Version] - Karandikar, J.M.; Schmitz, T.L.; Abbas, A.E. Spindle speed selection for tool life testing using Bayesian inference. J. Manuf. Syst.
**2012**, 31, 403–411. [Google Scholar] [CrossRef] - Mikołajczyk, T.; Nowicki, K.; Kłodowski, A.; Yu, D. Pimenov, Neural network approach for automatic image analysis of cutting edge wear. Mech. Syst. Signal Process.
**2017**, 88, 100–110. [Google Scholar] [CrossRef] - Rodríguez, J.J.; Quintana, G.; Bustillo, A.; Ciurana, J. A decision-making tool based on decision trees for roughness prediction in face milling. Int. J. Comput. Integr. Manuf.
**2017**, 30, 943–957. [Google Scholar] [CrossRef] - Bustillo, A.; Correa, M. Using artificial intelligence to predict surface roughness in deep drilling of Steel Component. J. Intell. Manuf.
**2012**, 23, 1893–1902. [Google Scholar] [CrossRef] - Ferreiro, S.; Sierra, B.; Irigoien, I.; Gorritxategi, E. Data mining for quality control: Burr detection in the drilling process. CAIE
**2011**, 60, 801–810. [Google Scholar] [CrossRef] - Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res.
**2002**, 16, 321–357. [Google Scholar] [CrossRef] - Martin-Diaz, I.; Morinigo-Sotelo, D.; Duque-Perez, O.; Romero-Troncoso, R.D.J. Early fault detection in induction motors using AdaBoost with imbalanced small data and optimized sampling. IEEE Trans. Ind. Appl.
**2017**, 53, 3066–3075. [Google Scholar] [CrossRef] - GmbH, J.H. Python in Heidenhain Controls; Heidenhain: Traunreut, Germany, 2015. [Google Scholar]
- Zhong, D.; Zhu, Z.; Huang, R. Study on the IOT Architecture and Gateway Technology. In Proceedings of the 2015 14th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES), Guiyang, China, 18–24 August 2015; pp. 196–199. [Google Scholar]
- Gorodkin, J. Comparing two K-category assignments by a K-category correlation coefficient. Cumput. Biol. Chem.
**2004**, 28, 367–374. [Google Scholar] [CrossRef] - Forman, G.; Scholz, M. Apples-to-apples in cross-validation studies: Pitfalls in classifier performance measurement. ACM Sigkdd Explor. Newsl.
**2010**, 12, 49–57. [Google Scholar] [CrossRef] - Fran, E.; Hall, M.A.; Witten, I.H. Data Mining: Practical Machine Learning Tools and Techniques, 4th ed.; Morgan Kaufmann: Burlington, NC, USA, 2016. [Google Scholar]
- John, G.H.; Langley, P. Estimating continuous distributions in bayesian classifiers. In UAI’95, Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, Montréal, QC, Canada, 18–20 August 1995; Besnard, P., Hanks, S., Eds.; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1995; pp. 338–345. [Google Scholar]
- Aha, D.; Kibler, D. Instance-based learning algorithms. Mach. Learn.
**1991**, 6, 37–66. [Google Scholar] [CrossRef] [Green Version] - Quinlan, R.J. C4.5: Programs for Machine Learning; Morgan Kaufmann: San Francisco, CA, USA, 1993. [Google Scholar]
- Bishop, C.M. Neural Networks for Pattern Recognition; Oxford University Press: New York, NY, USA, 1995. [Google Scholar]
- Vapnik, V. The Nature of Statistical Learning Theory; Springer: New York, NY, USA, 1995. [Google Scholar]
- Chang, C.-C.; Lin, C.-J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol.
**2011**, 2, 1–27. [Google Scholar] [CrossRef] - Fan, R.E.; Chang, K.W.; Hsieh, C.J.; Wang, X.R.; Lin, C.J. LIBLINEAR: A library for large linear classification. J. Mach. Learn. Res.
**2008**, 9, 1871–1874. [Google Scholar] - Breiman, L. Bagging predictors. Mach. Learn.
**1996**, 24, 123–140. [Google Scholar] [CrossRef] [Green Version] - Díez-Pastor, J.-F.; Rodríguez, J.J.; García-Osorio, C.; Kuncheva, L.I. Random Balance: Ensembles of variable priors classifiers for imbalanced data. Knowk. Based Syst.
**2015**, 85, 96–111. [Google Scholar] [CrossRef] - Breiman, L. Random Forests. Mach. Learn.
**2001**, 1, 5–32. [Google Scholar] [CrossRef] [Green Version] - Rodríguez, J.J.; Kuncheva, L.I.; Alonso, C. Rotation forest: A new classifier ensemble method. IEEE Trans. Pattern Anal.
**2006**, 28, 1619–1630. [Google Scholar] [CrossRef] [PubMed] - Freund, Y.; Schapire, R.E. Experiments with a new boosting algorithm. In ICML’96, Proceedings of the 30th International Conference on International Conference on Machine Learning, Bari, Italy, 19–21 July 1996; Saitta, L., Ed.; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1996; pp. 148–156. [Google Scholar]
- Friedman, J.; Hastie, R.; Tibshirani, R. Additive Logistic Regression: A Statistical View of Boosting. Ann. Stat.
**2000**, 28, 337–407. [Google Scholar] [CrossRef] - Drucker, H.; Burges, C.J.; Smola, A.J.; Vapnik, V. Support Vector Regression Machines. Adv. Neural. Inf. Process. Syst.
**1996**, 9, 155–161. [Google Scholar] - Altman, N.S. An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat.
**1992**, 46, 175–185. [Google Scholar] - Quinlan, R.J. Learning with Continuous Classes. In Proceedings of the 5th Australian Joint Conference on Artificial Intelligence, Singapore, 16–18 November 1992; pp. 343–384. [Google Scholar]
- Mendes-Moreira, J.; Soares, C.; Jorge, A.M.; Freire de Sousa, J. Ensemble Approaches for Regression: A Survey. ACM Comput. Surv.
**2012**, 45, 1–40. [Google Scholar] [CrossRef] - Breiman, L. Using iterated bagging to debias regressions. Mach. Learn.
**2001**, 45, 261–277. [Google Scholar] [CrossRef] [Green Version] - Pardo, C.; Diez-Pastor, J.-F.; Garcia-Osorio, C.; Rodríguez, J.J. Rotation forest for regression. Appl. Math. Comput.
**2013**, 219, 9914–9924. [Google Scholar] [CrossRef] - Nadeau, C.; Bengio, Y. Inference for the Generalization Error. Mach. Learn.
**2003**, 52, 239–281. [Google Scholar] [CrossRef] [Green Version] - Brain, D.; Webb, G. On the Effect of Data Set Size on Bias and Variance in Classification Learning. In Proceedings of the Fourth Australian Knowledge Acquisition Workshop, Sydney, Australia, 5–6 December 1999; pp. 117–128. [Google Scholar]
- Kuncheva, L.I. Combining Pattern Classifiers: Methods and Algorithms; Wiley: Hoboken, NJ, USA, 2004. [Google Scholar]
- Dekking, F.M.; Kraaikamp, C.; Lopuhaä, H.P.; Meester, L.E. A Modern Introduction to Probability and Statistics; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
- Tsoumakas, G.; Katakis, I. Multi-Label Classification: An Overview. Int. J. Data. Warehous.
**2007**, 3, 1–13. [Google Scholar] [CrossRef] [Green Version]

**Figure 1.**Outline of the design of the IoT solution. This schematic diagram shows the operation of the IoT solution. Note that only read operations are performed, so the process will never alter machine operation.

**Table 1.**Dataset attributes and output with their variation range. The inputs and outputs and their abbreviations in the dataset are summarized; the output variables are outlined in bold.

Variable | Abbreviation |
---|---|

Axis Machine programmed position (X, Y, Z, B and C-axis) | $Axi{s}_{X,Y,Z,B,C}$ |

Cutting tool measured position (X, Y, Z, B and C-axis) | $Too{l}_{X,Y,Z,B,C}$ |

Machine speed (X, Y, Z, B and C-axis) | $Spee{d}_{X,Y,Z,B,C}$ |

Motor temperature (X, Y, Z motors and milling head H) | ${\mathbf{T}}_{\mathbf{X},\mathbf{Y},\mathbf{Z},\mathbf{H}}$ |

Machining mode | Mode |

Machine state | State |

State | No. of Instances | % | Mode | No. of Instances | % |
---|---|---|---|---|---|

1 | 18.288 | 34.75% | 1 | 38,938 | 74.00% |

2 | 29.124 | 55.40% | 2 | 7,034 | 13.37% |

3 | 468 | 0.89% | 3 | 5,463 | 10.38% |

4 | 4.712 | 8.95% | 4 | 700 | 1.33% |

5 | 487 | 0.93% |

**Table 3.**Ranking of F-Macro in the classification problems. V is the number of times the method has a better statistically significant F-Macro when compared with the other 12 methods through the 2 predicted variables. D is the number of times the method has a worse statistically significant F-Macro when compared with the other 12 methods through the 2 predicted variables. The ordering criteria $\Delta $, is equal to V – D. The columns State and Mode have the average F-Macro figures for the method using $10\times 10$CV. (*) represents this method has an average F-Macro that is significantly worse than the one for the best-ranked method. The best F-Macros for State and Mode are highlighted in bold.

$\Delta $ | V | D | Method | State | Mode |
---|---|---|---|---|---|

16 | 16 | 0 | Rotation Forest of Random Forest | 0.99342 | 0.92407 |

16 | 16 | 0 | Random Forest | 0.99283 | 0.92444 |

15 | 16 | 1 | LogitBoost of REPTree | 0.99370 | 0.92056 |

14 | 15 | 1 | AdaBoost.M1 of C4.5 | 0.99297 | 0.92297 |

10 | 15 | 5 | Bagging of Random Balance of C4.5 | 0.98857 * | 0.92785 |

8 | 13 | 5 | Rotation Forest of C4.5 | 0.99298 | 0.91275 * |

2 | 11 | 9 | Bagging of C4.5 | 0.99115 * | 0.91359 * |

−3 | 9 | 12 | C 4.5 | 0.99056 * | 0.90944 * |

−6 | 8 | 14 | kNN | 0.98119 * | 0.90899 * |

−13 | 5 | 18 | Radial Basis Function SVM | 0.82836 * | 0.77250 * |

−15 | 4 | 19 | Multilayer Perceptron | 0.79272 * | 0.66775 * |

−21 | 1 | 22 | Naïve Bayes | 0.64369 * | 0.44891 * |

−23 | 0 | 23 | Linear SVM | 0.64819 * | 0.39074 * |

**Table 4.**Ranking of F-Micro in the classification problems. V is the number of times the method has a better statistically significant F-Micro when compared with the other 12 methods through the 2 predicted variables. D is the number of times the method has a worse statistically significant F-Micro when compared with the other 12 methods through the 2 predicted variables. The ordering criteria $\Delta $, is equal to V – D. The columns State and Mode have the average F-Micro figures for the method using $10\times 10$CV. (*) represents this method has an average F-Micro that is significantly worse than the one for the best-ranked method. The best F-Micros for State and Mode are highlighted in bold.

$\Delta $ | V | D | Method | State | Mode |
---|---|---|---|---|---|

20 | 20 | 0 | Rotation Forest of C4.5 | 0.99756 | 0.94374 |

13 | 16 | 3 | Bagging of C4.5 | 0.99711 * | 0.94292 |

12 | 15 | 3 | LogitBoost of REPTree | 0.99759 | 0.94125 * |

11 | 14 | 3 | AdaBoost.M1 of C4.5 | 0.99755 | 0.93946 * |

10 | 15 | 5 | C 4.5 | 0.99696 * | 0.94307 * |

10 | 13 | 3 | Rotation Forest of Random Forest | 0.99752 | 0.94007 * |

3 | 10 | 7 | Random Forest | 0.99710 * | 0.94029 * |

1 | 10 | 9 | Bagging de Random Balance of C4.5 | 0.99664 * | 0.93855 * |

−8 | 8 | 16 | kNN | 0.99198 * | 0.93582 * |

−12 | 6 | 18 | Multilayer Perceptron | 0.93112 * | 0.90334 * |

−16 | 4 | 20 | Radial Basis Function SVM | 0.88398 * | 0.88322 * |

−22 | 1 | 23 | Linear SVM | 0.80947 * | 0.71559 * |

−22 | 1 | 23 | Naïve Bayes | 0.81845 * | 0.51293 * |

**Table 5.**Ranking of MCC in the classification problems. V is the number of times the method has a better statistically significant MCC when compared with the other 12 methods through the 2 predicted variables. D is the number of times the method has a worse statistically significant MCC when compared with the other 12 methods through the 2 predicted variables. The ordering criteria $\Delta $, is equal to V – D. The columns State and Mode have the average MCC figures for the method using $10\times 10$CV. (*) represents this method has an average MCC that is significantly worse than the one for the best-ranked method. (+) represents this method has an average MCC that is significantly better than the one for the best-ranked method. The best MCC for State and Mode are highlighted in bold.

$\Delta $ | V | D | Method | State | Mode |
---|---|---|---|---|---|

17 | 18 | 1 | Rotation Forest of C4.5 | 0.99596 | 0.85325 |

12 | 17 | 5 | Bagging of Random Balance of C4.5 | 0.99406 * | 0.86118+ |

12 | 14 | 2 | Bagging of C4.5 | 0.99504 * | 0.85069 |

10 | 14 | 4 | C 4.5 | 0.99475 * | 0.85193 |

10 | 13 | 3 | LogitBoost of REPTree | 0.99591 | 0.84690 * |

9 | 13 | 4 | AdaBoost.M1 of C4.5 | 0.99579 | 0.84381 * |

8 | 12 | 4 | Rotation Forest of Random Forest | 0.99569 | 0.84476 * |

2 | 10 | 8 | Random Forest | 0.99483 * | 0.84530 * |

−8 | 8 | 16 | kNN | 0.98624 * | 0.83288 * |

−12 | 6 | 18 | Multilayer Perceptron | 0.87607 * | 0.76677 * |

−16 | 4 | 20 | Radial Basis Function SVM | 0.81216 * | 0.68454 * |

−21 | 2 | 22 | Naïve Bayes | 0.64746 * | 0.29481 * |

−23 | 0 | 23 | Linear SVM | 0.62066 * | 0.26261 * |

**Table 6.**Ranking of RRSE in the regression problems. V is the number of times the method has a better statistically significant RRSE when compared with the other 9 methods through the 4 predicted variables. D is the number of times the method has a worse statistically significant RRSE when compared with the other 9 methods through the 4 predicted variables. The ordering criteria $\Delta $, is equal to V – D. The columns ${T}_{H}$, ${T}_{x}$, ${T}_{y}$ and ${T}_{z}$ have the average RRSE figures for the method using $10\times 10$CV. (*) represents this method has an average RRSE that is significantly worse than the one for the best ranked method. The best RRSEs for each predicted variable are highlighted in bold.

$\Delta $ | V | D | Method | ${\mathit{T}}_{\mathit{H}}$ | ${\mathit{T}}_{\mathit{x}}$ | ${\mathit{T}}_{\mathit{y}}$ | ${\mathit{T}}_{\mathit{z}}$ |
---|---|---|---|---|---|---|---|

36 | 36 | 0 | Rotation Forest of REP-Tree | 55.072 | 61.526 | 60.957 | 64.021 |

28 | 32 | 4 | Bagging of REP-Tree | 55.582 * | 62.668 * | 62.208 * | 65.121 * |

20 | 28 | 8 | Iterated Bagging of REP-Tree | 56.124 * | 63.446 * | 62.892 * | 65.952 * |

11 | 23 | 12 | REP-Tree | 58.935 * | 67.659 * | 67.012 * | 70.949 * |

5 | 20 | 15 | k-NN | 64.560 * | 68.794 * | 67.817 * | 71.014 * |

−7 | 13 | 20 | Radial Basis Function SVM | 78.018 * | 85.287 * | 85.299 * | 88.811 * |

−9 | 12 | 21 | Multilayer Perceptron | 79.858 * | 87.524 * | 87.074 * | 90.586 * |

−22 | 6 | 28 | Linear Regression | 93.251 * | 95.243 * | 94.902 * | 95.927 * |

−26 | 4 | 30 | Linear SVM | 94.522 * | 95.681 * | 95.800 * | 97.122 * |

−36 | 0 | 36 | ZeroR (i.e., always predict the average) | 100.000 * | 100.000 * | 100.000 * | 100.000 * |

**Table 7.**Average of the absolute error for the best method, and the error that would be committed if the average of ${T}_{H}$, ${T}_{x}$, ${T}_{y}$, ${T}_{z}$ were always predicted (i.e., if ZeroR is used).

Variable | Rotation Forest of REP-Tree | ZeroR |
---|---|---|

${T}_{H}$ | 0.68 | 1.43 |

${T}_{x}$ | 1.40 | 2.49 |

${T}_{Y}$ | 1.47 | 2.63 |

${T}_{Z}$ | 1.99 | 3.31 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Garrido-Labrador, J.L.; Puente-Gabarri, D.; Ramírez-Sanz, J.M.; Ayala-Dulanto, D.; Maudes, J.
Using Ensembles for Accurate Modelling of Manufacturing Processes in an IoT Data-Acquisition Solution. *Appl. Sci.* **2020**, *10*, 4606.
https://doi.org/10.3390/app10134606

**AMA Style**

Garrido-Labrador JL, Puente-Gabarri D, Ramírez-Sanz JM, Ayala-Dulanto D, Maudes J.
Using Ensembles for Accurate Modelling of Manufacturing Processes in an IoT Data-Acquisition Solution. *Applied Sciences*. 2020; 10(13):4606.
https://doi.org/10.3390/app10134606

**Chicago/Turabian Style**

Garrido-Labrador, José Luis, Daniel Puente-Gabarri, José Miguel Ramírez-Sanz, David Ayala-Dulanto, and Jesus Maudes.
2020. "Using Ensembles for Accurate Modelling of Manufacturing Processes in an IoT Data-Acquisition Solution" *Applied Sciences* 10, no. 13: 4606.
https://doi.org/10.3390/app10134606