This section will discuss the results obtained during the development of the work, showing, step by step, and supported by graphs and tables, the results achieved and performing an analysis of them.
5.1. Cross-Validation
In this subsection, the results obtained during the stratified cross-validation process are presented, grouped by machine learning technique and algorithm. Due to the large number of models tested, the results presented refer only to the models with the most promising results.
5.1.1. Linear Discriminant Analysis
The results of the stratified cross-validation process for LDA models are shown in
Table 2. The most significant results are highlighted in bold.
The table shows the performance of two different configurations. Both use the same solver, LSQR, but different shrinkage values: 0.1 for one configuration and auto-calculated (based on the dataset) for the other.
In this case, a notable difference is observed in the overall performance of the models. Automatic shrinkage calculation achieves an accuracy of 78.8%, compared to 91.1% when using an explicit value, 0.1, as the shrinkage, demonstrating better generalization of this second model. Regarding the models’ performance for the normal class, setting the regularization value results in a considerable improvement in recall and F1-score, increasing from 79.2% to 95.4% and from 87.6% to 95.4%, respectively. The detection of normal MQTT communications is accurate, especially when the shrinkage hyperparameter is set to 0.1, achieving a good balance between precision and recall. This method correctly identifies a higher proportion of genuine normal communications compared to the shrinkage calculation. When analyzing the DoS class, a mixed behavior is observed. Although applying a fixed value increased precision from 40.1% to 100%, this came at the expense of a reduced recall rate, which fell from 76.7% to 45.3%. This indicates that the model has a conservative bias in predicting this type of attack, which generates numerous false negatives. Although the balance between the two metrics is 62.3%, the model’s performance is inadequate; detecting all real DoS attacks would be preferable, even at the cost of reduced precision. Detection of the Intrusion class performs well below the minimum required, with an F1-score of only 15.4%, and that is in the best-case scenario. This demonstrates the model’s inability to correctly classify the attack, resulting in disastrous accuracy and recall values, a concerning false positive rate, and even with this overprediction, only about 50% of real attacks are correctly detected.
Figure 3 shows, in a violin plot, the mean values of the different metrics obtained during the stratified cross-validation. Unlike
Table 3, these mean values are calculated from a macroscopic perspective.
The image demonstrates the superiority, in three of the four metrics, of the configuration that sets the shrinkage to 0.1 compared to the configuration that automatically determines it based on the input dataset. Automatic shrinkage configuration exhibits greater variability in all metrics, especially in recall, where the distribution is wider and has steeper tails, indicating less consistent behavior across folds. On the other hand, the fixed shrinkage configuration shows less variability, with more compact, symmetrical, and narrow distributions, demonstrating stable results.
5.1.2. Quadratic Discriminant Analysis
Table 3 shows the mean values of the different metrics for each metric in the stratified cross-validation of the QDA models. The best results for each pair of metric-class are highlighted in bold.
Table 3.
Violin diagrams illustrating the distribution of scores achieved during the cross-validation process of the QDA technique. Each color of the violin represents a different model, and there is a diagram for each metric.
Table 3.
Violin diagrams illustrating the distribution of scores achieved during the cross-validation process of the QDA technique. Each color of the violin represents a different model, and there is a diagram for each metric.
| Regularization of parameters: 0.0 |
| | acc | precision | recall | F1-score |
| Normal | 0.435 | 0.998 | 0.427 | 0.598 |
| DoS | 0.435 | 0.999 | 0.454 | 0.624 |
| Intrusion | 0.435 | 0.019 | 0.987 | 0.036 |
| Regularization of parameters: 0.1 |
| | acc | precision | recall | F1-score |
| Normal | 0.914 | 0.954 | 0.957 | 0.955 |
| DoS | 0.914 | 0.999 | 0.453 | 0.624 |
| Intrusion | 0.914 | 0.099 | 0.447 | 0.162 |
The table shows the results of two different Quadratic Discriminant Analysis configurations, one without any regularization, and the other with a regularization of 0.1.
In general, the unregulated model exhibits extremely poor performance, with an accuracy of only 43.5%, which means that more than half of its predictions are incorrect. At the opposite extreme is the model with a regularization of 0.1, which achieves an overall accuracy of 91.4%. Analyzing the performance of the models for each individual class allows us to further refine the performance of each model. On the one hand, the detection of the normal class (MQTT communications not under attack) is adequately performed by the regularized model, which achieves an F1-score of 95.5%. However, the unregulated model, despite showing higher accuracy than the previous model, has low recall. This implies that, although the positive classifications of standard communications are correct, less than half of these actual communications are detected. When analyzing the DoS class, we observe the same phenomenon in both configurations. An accuracy of 99.9%, but a recall rate of less than 50%, implying failures in the detection of this cyberattack. Finally, performance worsens when analyzing the Intrusion class. For the detection of this cyberattack, both configurations exhibit poor accuracy and recall metrics. Neither configuration correctly detects this attack, failing to detect actual attacks or accurately predict them. The difference between accuracy and recall suggests that the model tends to overpredict the attack.
The violin plots,
Figure 4, allow analysis of the variability of the results between folds of the stratified cross-validation for the two QDA configurations. All metrics are calculated from a macroscopic perspective. The first configuration, without regularization, has distributions centered on low values; while the second configuration, with a regularization of 0.1, shows distributions centered on higher values. However, using recall and precision metrics, it can be seen that the latter configuration exhibits wider distributions, resulting from greater inconsistency and variation in the results between folds.
5.1.3. Naive Bayes
The results of the stratified cross-validation process for the Naive Bayes models are presented below. In these results, shown in
Table 4, the highest values for each pair of metric-class are highlighted in bold. The table shows three different configurations: one with the Bernoulli implementation and the other two with the Gaussian implementation.
In general, all three configurations demonstrate a high capacity to correctly classify the type of MQTT communication that is being performed. Specifically, the model using the Gaussian implementation and a var_smoothing of 1 achieves the best accuracy, at 94.9%. The models exhibit robust performance with respect to the Normal class, with an accuracy close to 95.4% and a recall greater than 95.6% in Bernoulli, increasing to 99.8% in the Gaussian implementation. The F1-score, which reaches its maximum (97.5%) in Gaussian implementation, reflects a good balance between precision and recall. The configurations, when dealing with the DoS class, exhibit more variable behavior. In Gaussian implementation, the recall remains low, at 45.4%, while the accuracy ranges from 75.9% to 99.9%, depending on the parameter var_smoothing. It is evident how this parameter affects the false positive rate, which improves as the parameter increases. However, recall remains at the same value, indicating poor detection of real DoS attacks, detecting less than 50% of actual attacks. The parameter var_smoothing only improved the quality of the positive predictions, reducing the number of false positives but maintaining the false negatives. The F1-score improves slightly in the second configuration, reaching 62.4%, but this is still limited. In the case of Bernoulli implementation, the accuracy reaches 58.5%, but the recall remains at 45.4%, demonstrating poor performance in DoS detection.
Figure 5 represents the distribution of scores, calculated using a macro average, throughout the cross-validation process. The violin plots, representing the three Naive Bayes configurations in
Table 4, show similar performance among the three models, except for the precision metric, where the Gaussian implementation of the model with a var_smoothing factor of 1 clearly stands out. On the other hand, the distribution of scores is quite concentrated, demonstrating the robustness of the models against different data distributions.
5.1.4. K-Nearest Neighbors
Next, in this section, the results of the cross-validation process for the KNN configurations are presented and discussed.
Table 5 shows the average values of the metrics obtained by the best-performing models in stratified cross-validation. The best values for each of the classes and metrics are highlighted in bold.
Regarding the model configuration, the weight assignment based on neighbor distance is the distribution present in the three models, and again, all three configurations use a moderate number of neighbors: 5, 10, and 20.
All three models achieve high performance in correctly classifying records, with accuracies greater than 98%. Analyzing the metrics individually for each class, it can be seen that the Normal and DoS classes achieve high performance in all metrics: precision, recall, and F1-score, regardless of the model. Although all models perform similarly, regarding the classification of the Normal class, the model that uses fiveneighbors stands out; compared to the one that uses 20 neighbors, it excels in the classification of DoS attacks, obtaining the best metrics for these classes. However, on the other hand, the classification of the Intrusion class does not achieve such excellent performance, seeing a decrease in precision for all three configurations. The recall metric continues to perform well, albeit with a slight decrease. Therefore, attacks are being detected, but this cyberattack is being overclassified, implying a bias in the models toward predicting this type of attack.
Figure 6 shows, in a violin graph, the distribution of accuracy, precision, recall, and F1-score metrics for the three models during the cross-validation process, calculated from a macroscopic perspective. These images show how all models have low viability, with narrow violins, indicating the stability of the models across the different folds that make up the cross-validation.
5.1.5. Decision Tree
Table 6 below shows the average values of the metrics obtained during cross-validation of the Decision Tree-based configurations. The best values for each of the calculations are in bold. The best values for each calculation are shown in bold.
The table presents three DT models that differ in the maximum depth of the diagram, impurity criterion, and splitting strategy. The three trees use different depths (20, 25, and 30) while sharing, in pairs, the impurity criterion and the splitting strategy.
From a general perspective, the three models exhibit identical behavior. Regardless of maximum depth, impurity criterion, and splitting strategy, an accuracy of 99.2% is achieved, indicating excellent overall communication detection and classification quality. Analyzing the performance of each configuration for individual classes provides a more detailed understanding of each model’s behavior. When considering normal communications, which are not under attack, the three models demonstrate excellent and similar performance. All three models achieve precision and recall rates greater than 99%, which means that the detection of these communications is highly effective. The DoS class again demonstrates commendable performance across all three configurations, but with a greater difference between models. For detecting this type of attack, the model with a maximum diagram depth of 25 nodes stands out, achieving the best score across all three metrics, although all three configurations deliver similar performance. The Intrusion class shows a change compared to the previous classes. In this case, although the detection of real attacks is adequate and with very high detection rates, exceeding 93%, there is a reduction in the quality of the prediction, with an increased number of false positives compared to previous classes. This increase in prediction failures is reflected in the precision metric, which drops to 71.4% in the best case, corresponding to the model with a maximum diagram depth of 35 nodes, the Gini impurity criterion, and a random splitting strategy. In any case, although the false positive rate increased, the balance between precision and recall (F1-score) remains adequate, at around 80% in all three cases. The model with the best F1-score uses 20 nodes as its maximum depth, entropy as the impurity criterion, and a splitting strategy based on the best split, although it does not achieve the best results in either precision or recall. With a slightly lower F1-score (0.3 points lower), the model with a maximum depth of 35 nodes, Gini coefficient as impurity criterion, and a random splitting strategy exhibits the second-highest recall (0.977) and the highest precision.
Using another representation, shown in
Figure 7, it can be seen how the models maintain similar performance across all four metrics. The scores, calculated using a macro approach, exhibit a close distribution throughout the cross-validation process, demonstrating the robustness of the configurations.
5.1.6. Random Forest
Table 7 shows the average results of three Random Forest configurations obtained by stratified cross-validation. The highest results, calculated for each pair of metric-class, are highlighted in bold.
The three configurations shown in the table above share two of the four adjusted hyperparameters: the impurity calculation criterion, which is set to Entropy; and the maximum number of features used by the models, which is half the number of features in the original dataset. The three configurations differ in the number of decision trees used for prediction, with values of 70 and 130; and in the maximum depth of the tree diagrams, with values of 15 and 20 nodes. From an overall perspective, all three configurations demonstrate excellent performance. Accuracy is 99.2% for all three models, indicating a high classification capacity. Analyzing the performance and behavior of the models for each class reveals some differences. Evaluating the models’ ability to correctly detect normal MQTT communications, all three models achieve outstanding performance, with accuracy and recall of 99.8% and 99.4%, respectively. For this specific class, all three configurations exhibit the same behavior, showing no differences in the average value. The DoS class, like the Normal class, demonstrates very high performance. In this case, there is a slight variation in the results obtained when considering the parameter configuration, although this variation is minimal. The F1-score values range from 98.4% to 98.8%, depending on the model, indicating a good balance between precision and recall. However, when analyzing the performance of the configurations for the Intrusion attack, a reduction in model performance is observed. Although recall remains high, at 97.1% for models with a maximum depth of 15 nodes, accuracy drops to 66.9% for the model with 130 estimators and a maximum depth of 15 nodes. This indicates that, while most real Intrusion attacks are detected, the models generate numerous false positives, thus reducing accuracy. Even with this performance reduction, the balance between accuracy and recall is high, 81.9% for the model with the highest maximum depth in its diagrams, which achieves the best accuracy value, at the cost of slightly lower recall compared to the other models.
Reviewing the results, it can be seen that the number of estimators has a minimal impact on model performance, indicating that the configuration with 130 trees does not offer a significant benefit compared to the configuration with fewer estimators (70). Increasing the maximum depth of the diagrams did have a greater impact, modifying the balance between precision and recall in the Intrusion class.
The violin diagrams,
Figure 8, represent the distribution of the metric scores for the four configurations listed in
Table 7. These scores, calculated using a macro average, do not vary significantly between the different folds of the cross-validation, as represented by the short tails of the violins.
5.1.7. Gradient-Boosted Tree
Table 8 below shows the average results of the metrics for each of the three classes during the stratified cross-validation process for four different Gradient-Boosted Tree models. The best results for each pair of metric-class appear in bold.
The four configurations listed in the table above vary in the number of estimators, the maximum depth of the diagrams, and the learning rate.
All configurations, regardless of their hyperparameter configuration, achieve high accuracy values, above 90%, demonstrating a high detection capacity of cyberattacks on IoT communications in MQTT. Analyzing the performance of the models for each individual class, it is observed that the standard communications classification, without any attacks, has outstanding performance. All models achieve a good balance between precision and recall, with an F1-score of 0.995%. However, the model with the highest recall, that is, the model that detects the highest percentage of real normal communications, is the model with 180 estimators, a maximum depth of 11 and a learning rate of 0.1. Nevertheless, the model with the highest recall (that is, the model that detects the highest percentage of normal communications) is the one with 180 estimators, a maximum depth of 11, and a learning rate of 0.1. However, the performance of the different configurations for detecting DoS cyberattacks is outstanding. The model using 60 estimators, a maximum diagram depth of 7, and a learning rate of 0.1 achieves the best balance, the F1-score, between accuracy and recall; while more complex models, such as the one with 180 estimators, exhibit the best recall, detecting a greater number of genuine DoS attacks.
Classification performance decreases when considering communications under an intrusion. This deterioration is widespread across all four models, which show an increase in the number of false positives in the prediction of these attacks. This is reflected in the recall rate, which drops to 68.2% in the best-case scenario. This model, which uses 180 estimators, achieves an F1-score of 79.7% for this type of attack, considerably below that 99.5% and 97.9% for normal communications and DoS attacks, respectively.
The violin plots, represented in
Figure 9, show the distribution of scores obtained during cross-validation for the GBT configurations listed in
Table 9. All metrics are calculated from a macro perspective.
The violin plot demonstrates the robustness of the four configurations throughout the 10-fold cross-validation process. The violins, for each metric and model, exhibit a short tail, representing a concentrated distribution of scores, which confirms the stability of the models. When comparing the models, considerable similarity is observed between them, with slight variations in the vertical position of the violin, which corresponds to better performance on that metric.
5.1.8. Artificial Neural Networks
The average results of the metrics for each class during the stratified cross-validation process of the MLP-based models are shown in
Table 9. The best values for each pair of metric-class are highlighted in bold.
Table 9.
Average MLP values during cross-validation.
Table 9.
Average MLP values during cross-validation.
| Hidden layers: 5, Hidden neurons: 20, Dropout: 0, Function activation: ReLU |
| | acc | precision | recall | F1-score |
| Normal | 0.981 | 0.998 | 0.982 | 0.990 |
| DoS | 0.981 | 0.974 | 0.970 | 0.972 |
| Intrusion | 0.981 | 0.398 | 0.937 | 0.556 |
| Hidden layers: 5, Hidden neurons: 25, Dropout: 0, Function activation: ReLU |
| | acc | precision | recall | F1-score |
| Normal | 0.981 | 0.998 | 0.983 | 0.991 |
| DoS | 0.981 | 0.966 | 0.961 | 0.963 |
| Intrusion | 0.981 | 0.416 | 0.954 | 0.577 |
The table shows two MLP configurations that differ only in the number of hidden layers. One model uses 20 hidden layers, while the second model increases the number of hidden layers to 25. The remaining parameters (number of hidden neurons, dropout, and activation function) remain constant at five neurons per hidden layer, no dropout regularization, and ReLU, respectively.
Both models exhibit a high overall classification rate, reaching 98.1% accuracy, correctly detecting the cyberattack present in communications in most instances, regardless of the architecture. However, when examining the remaining metrics by performing a class-by-class analysis, certain differences in the models’ behavior are detected. Normal communications classification demonstrates outstanding performance, with 99.8% accuracy and recall rates exceeding 98% in both configurations. Increasing the number of hidden layers had minimal impact on model performance, increasing recall and F1-score by only 0.1% compared to the simpler model. A similar phenomenon occurs when analyzing the performance of multiclassifiers for DoS attacks. Both configurations show excellent performance, although slightly lower than the normal class, in detecting DoS attacks, reaching accuracy and recall values exceeding 96%. However, in contrast to the previous case, here increasing the complexity of the neural network architecture, by increasing the number of layers, resulted in a detriment to all three metrics. Finally, analyzing the performance of the models for the Intrusion class, the decline in classification quality is evident. Although recall remains high, exceeding 95% for the model with a more complex architecture, the accuracy value plummeted to 41.6% for the same configuration; while the model correctly detects and classifies real Intrusion attacks, it tends to overdetect this attack, generating many false positives that reduce the accuracy metric. When comparing the two configurations, an increase in the number of layers led to improvements in accuracy, recall, and F1-score.
Figure 10 shows, in a violin plot, the distribution of scores obtained during cross-validation for the two ANN configurations discussed previously. The scores shown were calculated using macro averaging.
The violin plots reflect the similarity in the metrics, which was also observed in
Table 9. Both configurations exhibit similar performance, with little difference between them and high stability in each of the metrics throughout the 10 folds of the cross-validation.
5.2. Validation
The following subsection presents the results obtained during the validation process for the configurations described in the previous subsection. For the validation process, 75% of the data (the same data used for stratified cross-validation) was used to train the models, and the remaining 25% was used to validate the models. The oversampling process, performed using SMOTE, was carried out only during the training phase. As in the previous section, tables and graphs will be used to present the results obtained in this process.
In summary, these are the configurations selected to represent each of the machine learning techniques and algorithms:
Linear Discriminant Analysis: LSQR solver and a shrinkage factor of 0.1
Quadratic Discriminant Analysis: 0.1 as regularization parameter
Naive Bayes: Gaussian implementation and a variable smoothing of 1
K-Nearest Neighbors: use of five neighbors to make the prediction and a weight assignment based on distance.
Decision Tree: Maximum depth of 35, Gini impurity criterion and a random splitter strategy
Random Forest: 130 estimators, maximum depth of 20, Shannon entropy criterion and 50% of features
Gradient-Boosted Trees: 180 estimators, a maximum diagram depth of 11 nodes, and a learning rate of 0.1
Artificial Neural Network: five hidden layers, 25 hidden neurons and ReLU activation function
Table 10 collects the metrics, obtained during the cross-validation process, of the different techniques. From the table above, it can be seen that the performance of the models remained unchanged compared to that obtained during the stratified cross-validation process. The results obtained demonstrate that there is no overfitting in the models, allowing for a specific generalization to be made in specific models.
When analyzing model performance, the Random Forest technique stands out for its high accuracy (99.3%), demonstrating a strong ability to correctly classify ongoing communications. The Decision Tree and Gradient-Boosted Tree techniques, although 0.1 points lower, also show a strong capacity to correctly detect communications. Subsequently, though at a different level, K-Nearest Neighbors and Artificial Neural Network generally perform well. Examining the performance of the techniques by class, Random Forest again excels, achieving the highest values in all three metrics for the Normal class. In terms of precision, it achieves 99.8%, tied with Decision Tree and Artificial Neural Network, indicating a high true positive rate. In addition to correctly predicting positive results for normal communications, it detects the vast majority of actual normal communications, with a recall rate of 99.4%, surpassing any other model. Finally, the balance between these two metrics is high, with an F1-score of 99.6%, tied to Decision Tree. Decision Tree’s performance is virtually identical to that of Random Forest, with a recall rate only 0.1% lower. The Gradient-Boosted Tree model exhibits similar behavior to Random Forest, accurately classifying normal communications. On the other hand, regarding the DoS class, Linear Discriminant Analysis and Random Forest stand out as the best-performing techniques. The Linear Discriminant Analysis model demonstrates the highest precision, at 100%, indicating that all positive predictions were correct. However, the false negative rate for this model is high, with a recall of 45.7%. Random Forest, although with a lower precision of 98.5%, presents the highest recall and F1-score values of all models. Both metrics show a value of 98.6%, tied with Decision Tree in the case of recall. Random Forest’s performance in the DoS class, as in the normal class, is excellent, correctly classifying the majority of DoS samples. In this case, Gradient-Boosted Trees shows a similar performance, with worse precision, 1.3 points lower, indicating that the positive predictions made are incorrect in a greater proportion. Finally, when considering the Intrusion class, a decline in performance is observed across all techniques. None of them maintains the performance achieved by other classes within the Intrusion class. The generalization of this class is considerably worse. The best-performing techniques are Decision Tree, with an accuracy of 72.3%, Gradient-Boosted Trees, with a recall of 95.8%, and Random Forest, with an F1-score of 81.4%. Models generally exhibit a high recall rate, which means that they correctly detect real intrusion attacks; however, they also have a high false positive rate, which reduces accuracy. Nevertheless, although the performance of the models is poorer and the detection of these types of attack is not as accurate as in previous classes, the results are still acceptable, with a balance between precision and recall of 81.4%. For the Random Forest model, the recall is 94.9% compared to 71.2% precision. Despite the model’s tendency to overpredict this type of attack, its performance is adequate.
The results shown in
Table 10, with the best results for each metric-class pair highlighted in bold, which collect the metrics of the validation process of the best configuration of each of the Machine Learning techniques selected to develop a shallow learning-based multiclassifier for MQTT communications, together with the review and analysis of these in the previous paragraph, highlight Random Forest as the best algorithm for this purpose. Of all the validated models, a total of eight different algorithms, Random Forest exhibits the highest accuracy metric at 99.3% (0.1% higher than the next highest algorithm), making RF the technique with the highest absolute number of correct detections. However, given the significant imbalance in the problem, additional metrics are necessary to evaluate. Evaluating the classes independently reveals that RF achieves the highest F1-score in each of the three classes to be detected, indicating that it is the model with the best balance between precision and recall. This balance is 99.6%, 98.6%, and 81.4% for the Normal, Dos and Intrusion classes, respectively. A strong overall classification system, along with the best balance between accuracy and recall, positions Random Forest as the leading method for multiclassifying cyberattacks on MQTT networks. Its classification performance for both normal and low-doS communications is excellent, achieving the highest accuracy for the Normal class and the best recall for DoS. It is only surpassed by Naive Bayes in correctly classifying real-world normal data (by 0.5%) and by Linear Discriminant Analysis in accurately predicting DoS attacks (by 1.5%). Performance is compromised in the correct detection of the Intrusion class. Random Forest (along with the other models) does not exhibit the generalization capabilities present in the first two classes. Thus, precision drops to 71.2%, and recall, although higher, also drops to 94.9%. The sensivity of the model remains high, an aspect that is especially interesting in the cybersecurity section, as it ensures the correct detection of most real attacks in exchange for a higher rate of false positives. Although there are models with better metrics, such as Decision Tree with accuracy and Gradient-Boosted Tree with recall, Random Forest again shows the highest F1-score, with a suitable balance between both metrics, a balance not achieved by either of the previously mentioned methods.
Examining the confusion matrix,
Figure 11 yields several conclusions. First, the model’s detection capabilities for normal communications and DoS attacks are noteworthy. Of the 39,937 predictions made for normal communications, 39,868 were correct, while of the 3271 DoS predictions, 3223 were accurate. Similarly, of the 40,097 actual samples of normal communications, 39,868 were correctly classified, while in the case of DoS attacks, 3223 out of 3269 attacks were detected. However, the drop in detection performance is evident in the Intrusion class. Analyzing the distribution of the model’s predictions, it can be seen that the model correctly classified 182 samples belonging to the Normal class as Intrusions, compared to 450 correct predictions for this class. Of the 474 real-world samples, the model classified 23 as normal communications, 1 as a DoS attack, and the remaining 450 as intrusions. The model tends to overpredict the Intrusion category, erroneously assigning this attack to Normal communications.
This methodological approach prioritizes the detection of structural vulnerabilities over the particularities of any specific implementation, granting the results broad applicability across diverse domains. By focusing on inherent communication behaviors, the proposed solution is positioned as a versatile tool for various technological ecosystems sharing these foundations, ensuring the preservation of its integrity and relevance in the face of the heterogeneity and noise characteristic of large-scale industrial deployments.