Fault Diagnosis for IP-Based Networks Using Incremental Learning Algorithms and Data Stream Methods

Vargas-Arcila, Angela María; Rodríguez-Vivas, Angela; Corrales, Juan Carlos; Sanchis, Araceli; Rendón Gallón, Álvaro

doi:10.3390/technologies14020132

Open AccessArticle

Fault Diagnosis for IP-Based Networks Using Incremental Learning Algorithms and Data Stream Methods

by

Angela María Vargas-Arcila

^1,*

,

Angela Rodríguez-Vivas

²,

Juan Carlos Corrales

¹

,

Araceli Sanchis

³

and

Álvaro Rendón Gallón

¹

Department of Telematics Engineering, Faculty of Electronic and Telecommunications Engineering, Universidad del Cauca, Popayán 190003, Colombia

²

Faculty of Engineering, Corporación Universitaria Comfacauca—Unicomfacauca, Popayán 190003, Colombia

³

Department of Computer Science and Engineering, Universidad Carlos III de Madrid, 28911 Leganés, Spain

^*

Author to whom correspondence should be addressed.

Technologies 2026, 14(2), 132; https://doi.org/10.3390/technologies14020132

Submission received: 15 December 2025 / Revised: 6 February 2026 / Accepted: 7 February 2026 / Published: 19 February 2026

(This article belongs to the Special Issue Artificial Intelligence for Smart Fault Diagnosis and Fault Tolerant Control)

Download

Browse Figures

Versions Notes

Abstract

Network fault diagnosis has evolved in response to the needs of modern networks, transitioning from traditional methods, such as passive and active monitoring, to advanced learning techniques. While conventional methods often introduce invasive traffic and control overhead, newer approaches face challenges such as increased internal processes and the need for extensive knowledge of network behavior. Learning-based methods offer an advantage by not requiring a complete network model, allowing the use of statistical and Machine Learning techniques to process historical data. However, existing learning methods face limitations, such as the need for extensive data samples and extended retraining periods, which can leave systems vulnerable to failures, particularly in dynamic environments. This work addresses these issues by proposing an incremental learning approach for continuous fault diagnosis in IP-based networks. The approach utilizes online learning to process symptoms in real-time, adapting to network changes while managing data imbalance through drift detection and rebalancing strategies, such as ADWIN and SMOTE. We evaluated the performance of this method using 25 incremental algorithms on the SOFI dataset. The results, assessed using metrics such as recall, G-mean, kappa, and MCC, demonstrated high performance over time, indicating the potential for resilient, adaptive fault detection processes in dynamic network environments. Additionally, a non-invasive process can be ensured through peripheral observation of failure symptoms, provided that data collection does not increase network traffic, overhead control, or internal network processes.

Keywords:

fault diagnosis; machine learning; incremental learning; data stream; imbalanced data streams; concept drift; network management; network monitoring

1. Introduction

Network fault diagnosis has been a constant research topic that has evolved in response to advancements and changing needs. The traditional methods include passive monitoring, active monitoring, decentralized probabilistic management approaches, and temporal correlation methods [1]. However, they are invasive because of the increase in network traffic and control overhead. On the other hand, methods for overlay and virtual networks [2] increase the network’s internal processes by installing monitoring agents on all overlay nodes. This last problem also applies to decentralized management methods because of the need for an embedded management process on all networking devices.

All the above methods require in-depth knowledge of network connectivity and operations, as well as an extensive understanding of network behavior. To address this, fault diagnosis based on learning methods has emerged. Learning methods do not require a complete dependency model of a network [1]. For example, a network management system that monitors complex networks can generate a comprehensive history of Simple Network Management Protocol (SNMP) requests or many log files. So, statistical and machine learning (ML) methods can be used to process this information and obtain empirical data.

Several studies have been developed to detect network faults using ML methods, which can be divided into two main groups: unsupervised and supervised learning methods. Unsupervised learning approaches generally focus on detecting anomalies in wireless networks (including cellular networks) and optical networks. This is the case with the work of [3], which examines ML techniques for anomaly detection in wireless community networks, with a focus on hardware fault detection. Using a dataset created by the authors, which includes both traffic and non-traffic features, the study addresses the challenge of obtaining realistic data due to confidentiality issues. Four unsupervised ML methods were tested for detecting a specific gateway failure, finding that non-traffic features improve detection performance. The study highlights the value of feature selection and plans to explore additional anomaly detection methods and feature selection techniques in future work.

The authors of [4] uses ML techniques for anomaly detection in cloud-native Beyond 5G (B5G) systems, employing ElasticSearch’s built-in anomaly detection function. This tool categorizes log data and identifies anomalies based on event rates and rare occurrences. The ML process involves tokenizing text fields, clustering similar data, and classifying them, which is effective for log analysis due to the limited variety of log messages. Additionally, the research incorporates Commercial Off-The-Shelf (COTS) ML algorithms for diagnosing anomalies. The study identifies significant limitations, particularly in log message structuring, which affects anomaly detection precision and efficiency. The research advocates for improved log structuring to provide real-time actionable insights and facilitate rapid issue resolution.

Ref. [5] discusses using probabilistic modeling and ML to diagnose faults in gigabit-capable passive optical networks (GPON). It introduces a Bayesian inference tool named PANDA, which effectively diagnoses network faults using real operational data, even when monitoring information is missing or incomplete. The tool’s accuracy is enhanced through ML techniques, specifically an Expectation Maximization (EM) algorithm, which refines the conditional dependencies within the model. EM implements maximum likelihood estimation for an incomplete data set, which allows learning conditional dependencies without labels in the data. The findings emphasize the benefits of probabilistic approaches for fault diagnosis in complex network environments.

Ref. [6] details an experimental setup where various neural network methods, including LSTNet (Long Short-Term Memory Network), RNN (Recurrent Neural Network), and LSTM (Long Short-Term Memory), were evaluated for their effectiveness in forecasting and detecting failures in optical networks. The study highlights the advantages of using LSTNet, which significantly outperforms traditional models in terms of accuracy and efficiency, while also addressing scalability and fault management challenges in complex network environments. The proposed approach employs unsupervised learning, as it relies on a model that only requires data from normal operating conditions for training, without needing data from fault conditions during this phase.

On the other hand, works focusing on supervised learning are also oriented towards fault diagnosis in wireless or mobile networks, such as [7], and optical networks, such as [8]. Additionally, IP-based computer networks are also considered. For example, ref. [9] describes a proactive failure detection system designed for large-scale IP-based networks, which processes network log messages by transforming them into structured log templates to identify potential failures before they occur. The system employs a supervised machine-learning approach, extracting key features such as frequency, periodicity, and burstiness from these templates. Specifically, it uses Support Vector Machines (SVM) to predict future failures based on historical log data. The method’s performance is evaluated using metrics such as Area Under the ROC Curve (AUC) and F1-score, demonstrating improved accuracy compared to existing methods. Currently, the system learns and detects abnormal logs offline; however, features and models need to be updated automatically for production networks. Therefore, future work will focus on enabling proactive failure detection in real-time.

The work [10] discusses the development and evaluation of PIQoS, a programmable and intelligent Quality of Service (QoS) framework that utilizes ML for network monitoring and management. It aims to improve network error detection and recovery, particularly in the context of link failures and congestion, by employing supervised ML models for error and policy prediction. The study compares supervised and unsupervised learning approaches, highlighting the potential for future exploration of semi-supervised algorithms to strike a balance between accuracy and complexity. Decision trees are a supervised ML model that provides the best accuracy for error detection and cause prediction. Moreover, k-means is identified as the best algorithm among the unsupervised models. Other algorithms evaluated include Naive Bayes, SVM, Random Forests (RF) and Density-Based Spatial Clustering of Applications with Noise (DBSCAN).

Finally, ref. [11] presents a comparative study of two feature extraction techniques, Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), in conjunction with SVM algorithms for early fault classification in a Network Function Virtualization (NFV) environment. The study demonstrates that LDA significantly improves classification accuracy, achieving 90% accuracy with SVM One-vs-All, whereas PCA has a negative impact on performance. The findings indicate that robust feature extraction is crucial for effective fault classification in complex virtualized network systems. The authors suggest that future research should focus on hybrid feature extraction methods and testing in a 5G core platform.

Despite advancements in network fault diagnosis techniques, several limitations persist with existing learning methods. Firstly, unsupervised methods can be complex because they require a large dataset to obtain the desired results [12] or must be combined with supervised approaches to achieve good outcomes. Secondly, the supervised methods discussed in the reviewed studies utilize traditional ML algorithms, which often require more data samples, resulting in lengthy retraining periods whenever the system’s behavior changes significantly. Moreover, the current models must be discarded during the retraining process, leaving systems vulnerable to potential failures. This vulnerability is particularly concerning in dynamic network environments, where changes in network topology or network behavior require the development of new hypotheses and models adapted to the new symptom–fault relationships. Furthermore, an imbalance in supervised learning requires specialized strategies; however, no related work has proposed a plan to address this issue effectively. In summary, there is no resilient method capable of addressing new failures with different symptoms due to dynamic changes in the network, while managing data imbalance.

Thus, the question arises: How can we learn from the behavior of an IP-based network to continuously and resiliently diagnose its failures and adapt to possible changes in network behavior over time? In previous work, we hypothesized that if failures propagate, failures’ symptoms can be observed at levels higher than their origin [13]; then, we could, in a noninvasive and continuous way, learn from those symptoms using incremental learning techniques to classify on the fly those that represent the failure state of the network.

This work aims to diagnose failures of an IP-based network by classifying network behavior online to determine whether the network’s internal structure is in a failure or normal state. We implemented an incremental learning approach (also known as online learning) for this purpose, that is, to process symptoms on the fly and continuously (as streaming data) during the network’s operation to detect faults that occurred in internal network elements. This type of learning is resilient to dynamic network changes because the incremental learning models ensure continuous learning from new symptoms, allowing the model to adapt on its own; that is to say, as the network changes, so does the model.

The implemented diagnosis method uses the incremental learning approach together with drift detection and rebalancing methods to enable the diagnostic process to deal with the inherent problem of network data: data imbalance (the time in the normal state of a network is greater than the time in a failure state).

The main results of this study demonstrate that incorporating incremental rebalancing together with drift detection significantly improves fault diagnosis in IP-based networks under imbalanced and dynamic conditions. However, under severe imbalance, incremental learners tend to favor the majority class, inidcating that these mechanisms alone are insufficient under extreme imbalance conditions. Furthermore, the findings indicate that an external rebalancing and drift-handling mechanism is essential even for algorithms that natively address concept drift, enabling a resilient, non-invasive, and adaptive fault diagnosis process suitable for real-world IP network environments.

The remainder of this paper is organized as follows. Section 2 describes the proposed fault diagnosis methodology, including the incremental learning framework and the mechanisms adopted to handle class imbalance and concept drift, as well as the dataset used for evaluation. Section 3 presents the experimental setup, evaluation methodology, and a detailed discussion of the obtained results. Finally, Section 4 concludes the paper by summarizing the main findings and outlining directions for future research.

2. Materials and Methods

This section first presents the proposed fault diagnosis method, detailing its overall workflow, design rationale, and main components for learning from streaming network data under class imbalance and concept drift. It then introduces the dataset used in the experimental evaluation, which is employed to emulate a continuous stream of network management data and assess the behavior of the proposed method under realistic operating conditions.

2.1. Fault Diagnosis Method

2.1.1. Overview and Design Rationale

In this work, a non-invasive method for fault diagnosis is proposed, based on data collected from peripheral observation of failure symptoms. The data used for fault diagnosis are the network management parameters, which generally involves a data collection process that occurs at fixed intervals ad infinitum. Consequently, it is considered that the method must operate indefinitely and handle an infinite stream of management data.

This situation poses a challenge to the use of ML techniques. Traditional ML approaches assume that all the data needed to generate a classification has been collected and is available in a training set [14]; furthermore, they have limitations in processing large amounts of data, particularly when dealing with continuously growing datasets. In contrast, the proposed fault diagnosis method requires updating the ML model in such a way that it is not necessary to store new data in an inexorable manner. Therefore, incremental learning algorithms emerge to address these and other issues explained throughout this paper.

Algorithms with incremental learning (also known as online learning) can update their learning model using new data without reprocessing previously used data. Then, they discard the training data if it is no longer required. These algorithms are used when the data to process is not all available at the beginning of the process [14]. They are ideal when the input to a learning process is streaming data; in other words, when data arrives in real-time as an ongoing stream, these methods can make predictions in real-time while continuously refining the model based on the evolving input stream [15,16]. Two fundamental aspects characterize incremental learning. First, it can incorporate information from new experiences not previously available in the dataset into the model. Second, it can evolve the model so that it increasingly represents more complex concepts. So, the term “any-time learning” emerges [17].

In this regard, this work addresses the fault diagnosis issue in an IP-based network using incremental learning approaches, which process the network’s continuously generated monitoring data as a data stream.

On the other hand, in networking, the time elapsed in the normal state of a network is greater than in a fault state. Therefore, the number of instances we will classify as failures will be significantly less than those that represent normal network behavior. This phenomenon is known as class imbalance and is a problem generally associated with concept drift [18]. The concept of drift refers to the characteristic of evolving data streams or real-time data that is continuously updated, i.e., data distribution constantly changes over time [18,19].

Incremental algorithms handle the frequent arrival of new data, but additional methods are required to support learning in contexts of imbalance and concept drift. Dealing with imbalanced static datasets remains challenging, making it even more difficult for imbalanced data streams, which has given rise to an emerging paradigm within ML.

The works of [20,21] conduct a comprehensive study of several approaches proposed to address the double challenge of imbalance and concept drift. Notably, there is no widely adopted algorithm to tackle this issue, prompting authors to adapt existing techniques for static datasets to contexts where data arrive sequentially to infinity. The field of learning from imbalanced and drifting data streams is still recent, and many related issues still await to be appropriately analyzed, understood, categorized, and addressed [20].

2.1.2. Proposed Fault Diagnosis Method (Core Method)

In light of these challenges, we propose a network fault diagnosis method based on creating a ML model built from three essential elements, as illustrated in Figure 1: a drift detector, a rebalancing method, and an incremental learning algorithm. These components must work with data streams and communicate with one another to obtain information on the current status of the data stream, enabling the fault diagnosis method to rebalance the stream and train the model accordingly.

The concept drift detector aims to detect changes in the data distribution, bringing the class imbalance status. It detects which classes belong to the majority and minority, the current class ratio, and thus identifies which classes require more attention from the classification algorithm. It focuses more on the minority class and warns about possible changes in the skew of bias, so that the rebalancing method can effectively perform its balancing task.

If the drift detection is successful, the rebalancing method rebalances the data arrived until then, and the incremental learning algorithm retrains the model using the rebalanced data. Thus, the diagnosis model can classify the incoming data to make real-time predictions.

This workflow can be applied uniformly across different incremental learning algorithms, allowing the proposed method to be evaluated independently of classifier-specific mechanisms. As such, the contribution of this work lies in the diagnosis strategy itself rather than in the optimization of a particular algorithm.

2.2. Dataset

We used the processed version of the SOFI (symptom–fault relationship for IP-network) dataset [22] to validate the diagnosis process. It contains both healthy and in-failure network data, where the in-failure data corresponds to impact link-failures. A link failure occurs when an event causes the connection between two devices to be down or results in excessive packet discards on a link [23]. A link failure with impact occurs when the traffic significantly decreases during the failure [24].

The SOFI dataset was collected in an emulated large campus network through the periodic collection of SNMP parameters polled only on the two peripheral network elements [13] shown in Figure 2. The collected management parameter set comprises two types: traffic descriptors and status descriptors. The first type contains traffic parameters and is collected from each device’s interfaces (bits received, bits sent, inbound packets discarded, inbound packets with errors, outbound packets discarded, outbound packets with errors, and operational status). The second data type is related to the device (device uptime, SNMP availability, Internet Control Message Protocol (ICMP) response time, ICMP loss, and ICMP ping). The structured version of SOFI groups the parameters into three interface sets (internal, peripheral, and external) to ensure the same fixed structure for both monitored network devices. Six hundred forty-nine hours (649) of network monitoring data were collected, ten (10) of which correspond to artificially induced faults [22]. Therefore, the class ratio in the SOFI dataset is approximately 1:70.

The dataset consists of two files (one for each monitored network device) containing 12,971 instances labeled with NE and F classes and 34 numeric features. NE indicates normal behavior, and F represents faulty behavior of the monitored network [22].

Utilizing the SOFI dataset as the incoming data stream for the diagnosis process enables the fault diagnosis model to detect failures originating from the network’s internal layers by analyzing information collected exclusively from the network elements that interconnect the network with other networks (the peripheral network elements depicted in Figure 2).

3. Experiments

Consistent with the fault diagnosis method, we meticulously implemented several incremental learning algorithms for data streams in conjunction with a rebalancing and drift detection method. We then compared their performance to that of the basic implementations of these algorithms.

3.1. Experimental Setup

The interaction between the three elements of the network fault diagnosis method is based on the meta-strategy proposed by [25] called RebalanceStream. RebalanceStream uses Adaptive Sliding Window (ADWIN) [26] to detect if a concept drift is in the stream. When that detection is successful, it rebalances the data that has arrived up to that point using the Synthetic Minority Over-sampling Technique (SMOTE) [27]. Then, it trains a new model using the rebalanced data. The two models (the current and the new one) are evaluated through the kappa-statistic performance evaluation methodology [28]. The best-trained model is chosen to continue.

The RebalanceStream strategy comprises the basic functional elements required in the current work; however, its source code only receives a synthetic data stream, is configured for a single online algorithm, and provides performance results based on a single metric. We modified the code to support any real data stream and introduced four additional performance measures. The implementation was written in Java using the Massive Online Analysis (MOA) [29].

Each implemented element of the network fault diagnosis method is described below.

Concept drift detector: The ADWIN [26] is a change detection method that utilises a variable-length window of recent instances to monitor the prediction errors for those instances. ADWIN is widely embedded in incremental algorithms, such as Concept-adapting Very Fast Decision Tree ( ${CVFDT}_{NBC}$ ) [30], Hoeffding adaptive tree, which leverages bagging [31], and adaptable diversity-based online boosting (ADOB) [32], validating its great utility for data stream analysis. Therefore, it is justified to use it in our experiment. So, although there are many other drift detectors [33], ADWIN is sufficient for the research scope.
Rebalancing method: It is necessary to consider techniques to ensure that the algorithms do not overlook learning from the minority class, i.e., instances that represent network failures. Oversampling is the most used and efficient rebalance approach. Thus, we incorporate SMOTE [27] to achieve incremental rebalancing learning [34], thereby generating synthetic instances from the in-fault network state.
Incremental learning algorithm: To have a diverse set of data for comparison, we applied the diagnosis method to 25 incremental algorithms. We aim to assess whether the proposed fault diagnosis method can enhance the classification process for imbalanced data streams. (Table 1 lists all the algorithms, including their names and types).

In order to make the implementation details explicit, the experimental implementation builds upon the codebase originally proposed by [25]. To adapt this framework to the objectives of the present study, the following controlled modifications were introduced: (i) the use of a real-world streaming dataset instead of synthetic streams, and (ii) the integration of additional evaluation metrics tailored to highly imbalanced data. All other components, including learning algorithms, parameter configurations, and evaluation protocol, were preserved as originally implemented, with default settings provided by the MOA framework.

It is noteworthy that several online classifiers embed strategies to deal with concept drift (Table 1 marks these algorithms with an asterisk). Nevertheless, the independent concept drift detector of our diagnosis method is still necessary to achieve class rebalancing.

As mentioned, SOFI was collected through the monitoring of two devices (PeripheralDevice-I and II, to present the results), resulting in two dataset files. Each of the 25 versions of the implemented diagnosis method was executed for these two data sets. It is also important to note that all data were normalized before being processed by the diagnosis method to avoid memory overflows. The timestamp attribute of the SOFI dataset was not included in this experiment since this parameter only indicates the instances’ arriving order.

3.2. Evaluation Method and Imbalanced Stream Evaluation Metrics

The diagnosis method performs online classification of network failures, making it a learning model that must be evaluated in a streaming context. Prequential evaluation is the most commonly used method for this task [28], so we adopted this approach.

This evaluation method uses each incoming instance to test and train the model, ensuring that the evaluation is always performed with not-seen-before instances. Each evaluation process updates the confusion matrix, so we can read the matrix every n incoming instance to compute a set of five performance metrics. It ensures we obtain a performance history over time, that is, how the fault classification has been adapted. In this experiment, n is equal to 200 samples.

For comparison purposes, this evaluation method was applied to the proposed diagnosis method, which performs incremental rebalancing learning, and to the online base algorithm without any rebalancing strategy. Although no separate section is explicitly labeled as an ablation study, this experimental design inherently performs a component-level ablation analysis by evaluating each incremental learning algorithm under two controlled configurations: (i) operating independently on the original data stream, and (ii) operating within the proposed diagnosis pipeline that incorporates incremental rebalancing and explicit drift detection. By keeping the learning algorithm, data stream, and evaluation protocol identical across both configurations, this comparison isolates the contribution of the rebalancing and drift-handling components to the observed performance differences.

We deal with an imbalanced data stream, so traditional metrics are inadequate for evaluating the performance of the fault diagnosis method. Prequential accuracy is the most commonly used measure [53], but accuracy can be misleading in an imbalanced context. Thus, ref. [53] suggests using the prequential AUC [54], G-mean, and recall or sensitivity metrics for imbalanced data streams.

On the other hand, RebalanceStream utilizes the standard kappa statistic within the rebalance stream process, ensuring fidelity to the base code and incorporating the kappa statistic as one of the performance measures. Further, the kappa statistic is widely used when dealing with imbalanced data [55]. Nevertheless, some studies prefer the Mathews Correlation Coefficient (MCC) over the kappa statistic [56], so that measure was also used.

The above reasons led us to select these five metrics (recall or sensitivity, G-mean, kappa, MCC, and prequential AUC) as the most informative to evaluate the fault diagnosis method. Prequential AUC is a new metric suitable for data stream scenarios proposed by [54], and MOA provides its calculation; thus, our experiment uses this implementation. The other four metrics applied in a streaming context are measured following the prequential evaluation process.

3.3. Experimental Results and Discussion

The prequential evaluation provides a set of five values, one for each metric, representing the fault diagnosis performance as the data arrive. Each prequential metric is plotted in a line chart, resulting in 250 performance graphs in the experiment (five metrics for each of the twenty-five algorithms and both dataset files).

Each graph represents the performance curve of the tested algorithm according to the corresponding metric. Figure 3a illustrates the format of the obtained curves. The vertical axis indicates the metric values, and the horizontal axis shows the number of incoming instances that have arrived until the measurement time. The black series are the results of the diagnosis method, that is, with the rebalance stream strategy. The red series represents the online base algorithm performance results without rebalancing or concept drift treatment.

Due to the large number of graphs obtained in the experiment, it was necessary to condense the results and represent them visually to facilitate comparison of the algorithms’ performance according to each metric and to determine whether the rebalancing strategy for fault classification represents an improvement over the base algorithm. Therefore, heat maps were created for each of the metrics.

Three measurements were obtained to create the heatmaps and their calculation was also incorporated into the code of RebalanceStream:

The mean of the rebalanced curve (black curve),
The mean of the base curve, without balancing (red curve),
The weighted difference between them.

The weighted difference is a measure between 0 and 1, indicating the difference between the two performance lines (black and red curve), considering a weighting function that assigns some importance to the performance obtained at each measurement.

The authors of [31] argue that each instance will become increasingly less significant to the overall average. Then, our weighting function is a convex exponential function that assigns a high valuation to differences in the initial intervals and then decreases in value. For example, Figure 3b shows the weighting function for a prequential evaluation of 20 measurements. The weighted difference is the summation of the weighted differences at each point, and the result is normalized to obtain a value between 0 and 1.

For each performance metric (recall or sensitivity, G-mean, kappa, MCC, and prequential AUC), three heatmaps were obtained representing the three measures mentioned above (mean of the rebalanced curve, mean of the base curve, and weighted difference) for the 25 algorithms in Table 1. Figure 4a shows the color settings used in the heatmaps. In the color configuration for the mean measurements (rebalanced and base), a redder color indicates a better measurement as it is closer to 1. Conversely, a lighter color signifies that the value is closer to 0, indicating worse performance. In the color configuration for the weighted difference, a positive difference is colored green; the more saturated the green, the more significant the improvement. Gray indicates no difference, while blue signifies a negative difference.

Figure 4b–f present the created heatmaps. As can be seen, some image fields are set as “NaN”, which means that the metric was impossible to calculate at some points (due to deplorable behavior), making the developed tool unable to calculate the corresponding metric’s means.

The sensitivity during fault diagnosis is higher when using the rebalancing and concept drift approach than when using only the base incremental learning algorithm. So, the diagnosis method learns from minority class despite huge class imbalance. This behavior is attributed to the fact that SMOTE was able to engage in online learning, positively affecting failure classification.

The prequential measurements of G-Mean, kappa, and MCC also confirm that the diagnosis method does not neglect the learning of any of the two states of the network (in failure and healthy states).

The intensity of the color in the weighted difference heatmap for the Sensitivity, G-mean, kappa, and MCC metrics suggests that using an incremental algorithm without the components of the proposed diagnosis method is not enough. On the other hand, most of the base algorithms incorporate concept drift (as indicated in Table 1 of the experiment setup); however, they perform poorly, highlighting the need for a rebalancing and concept drift detection procedure external to the algorithm.

Meanwhile, the proposed diagnosis method and the online base classifier had similar prequential AUC. If this is contrasted with the results mentioned above, it is safe to say that this metric does not provide reliable information to compare the two scenarios, nor to evaluate the classification performance of imbalanced data streams.

If an online-based algorithm were to be selected for the diagnosis method, it is evident that some do not perform well, even with the rebalancing process. However, this occurs with a few. Twenty-one algorithms out of twenty-five yield excellent results when coupled with the concept drift detector and the rebalancing method, as the proposed diagnosis method suggests. Hence, this proposal offers a valuable approach to network fault diagnosis and is particularly suitable for network scenarios where monitoring data is received in real-time.

To numerically summarize the performance of the 21 algorithms with good results in terms of ranks obtained for each metric, it can be observed that for the diagnosis method, a Recall range of 0.8033 to 0.95426 was obtained, while for the base algorithms, the range is 0.22496 to 0.73097. In the case of G-mean, a range of 0.86902 to 0.97121 was achieved for our method, while for the base algorithms, the range is 0.41878 to 0.8527. For the Kappa metric, a range of 0.77976 to 0.94918 was obtained with the proposed diagnosis method, while the Kappa range for the base algorithms is 0.22223 to 0.75529. In the case of MCC, a range of 0.79648 to 0.94929 was obtained for the diagnosis method and a range of 0.2591 to 0.76025 for the base algorithms.

When considering the ranges, it becomes evident that the diagnosis method, as presented in this article, outperforms the individual use of the learning algorithms it incorporates.

When highlighting the best algorithms, we found that for each individual metric there is a specific algorithm with the best performance. The diagnosis method achieved the highest recall value with the BOLE algorithm, the highest G-mean value with the ADOB algorithm, the highest Kappa value with SAM-kNN, and the highest MCC value with the SAM-kNN algorithm.

When comparing the metric values for the algorithms implemented within the diagnosis method and those for the individual algorithms, the SAM-KNN algorithm consistently showed the most significant difference. The models built based on this algorithm differed by 0.73097 for Recall, 0.8527 for G-mean, 0.75529 for Kappa, and 0.76025 for MCC.

The experimental results show that integrating drift detection with incremental rebalancing significantly improves minority-class detection, particularly in terms of recall and G-mean. This behavior can be explained by the interaction between class imbalance and concept drift in streaming environments. Under severe imbalance, incremental learners tend to favor the majority class, causing minority patterns to be underrepresented or forgotten over time.

When concept drift occurs, this bias is further amplified, as newly emerging minority patterns are often misclassified during adaptation phases. The application of incremental rebalancing mitigates this effect by increasing the representation of minority-class instances during training, enabling the classifier to better adapt to evolving concepts. Drift detection further supports this process by triggering timely model updates, preventing performance degradation caused by outdated decision boundaries.

Although some incremental classifiers incorporate internal drift-handling mechanisms, the results indicate that these mechanisms alone are insufficient under extreme imbalance conditions. The proposed method complements such classifiers by addressing imbalanced data explicitly, leading to more stable and robust performance across different learning algorithms.

For clarity, Table 2, Table 3, Table 4, Table 5 and Table 6 present a representative subset of 10 algorithms, selected based on performance ranking, sensitivity to rebalancing, and algorithmic diversity. Although Figure 4 reports the metric results for all evaluated algorithms, the tables focus on this subset to facilitate interpretation. The selected algorithms cover the main algorithmic families and illustrate both positive and negative effects of the the rebalancing and concept drift approach. In all tables, results summarize the average prequential performance over both datasets, comparing the base configuration and the rebalance stream strategy. The reported qualitative effects are derived from the weighted performance differences illustrated in Figure 4, averaged over both datasets.

Table 2 reports the comparative Sensitivity/Recall performance of the selected algorithms under the base configuration and the proposed method. The results show that incorporating incremental rebalancing consistently improves minority-class detection for most learning paradigms, confirming that the proposed approach effectively mitigates the bias toward the majority class commonly observed in incremental learners. Ensemble-based methods benefit the most from rebalancing, whereas some margin-based and instance-based algorithms exhibit limited gains or slight degradation. Overall, the table highlights that rebalancing plays a critical role in improving sensitivity in highly imbalanced data streams, without contradicting the broader trends observed in the remaining evaluation metrics.

Table 3 reports the comparative G-Mean performance of the selected algorithms. Unlike Recall, G-Mean reveals that not all methods benefiting from rebalancing achieve balanced classification. Ensemble-based approaches such as ARF, AWE, and Online Bagging maintain or improve G-Mean, indicating robustness to class imbalance. In contrast, boosting and margin-based methods show performance degradation, suggesting that incremental rebalancing may introduce noise that negatively affects majority-class accuracy.

Table 4 reports the Kappa Statistic comparison. Unlike Recall and G-Mean, Kappa reveals that the the rebalancing and concept drift approach does not consistently improve chance-corrected agreement. Ensemble-based methods such as ARF and AWE maintain stable Kappa values, indicating robustness to class imbalance. In contrast, boosting and margin-based learners exhibit noticeable degradation, suggesting that incremental rebalancing may introduce inconsistencies that negatively affect global agreement.

Table 5 reports the MCC comparison. In contrast to Recall and G-Mean, MCC reveals that rebalancing rarely leads to substantial improvements, as the metric penalizes both false positives and false negatives simultaneously. Only robust ensemble-based methods such as ARF and AWE maintain stable or slightly improved MCC values, whereas boosting, margin-based, and instance-based methods experience noticeable degradation.

Table 6 reports the AUC comparison. Unlike Recall, G-Mean, Kappa, and MCC, AUC shows limited sensitivity to the stream rebalancing and concept drift strategy, with most methods exhibiting nearly identical performance under both configurations. This behavior confirms that AUC mainly captures ranking capability rather than class-specific performance, reinforcing the need to jointly analyze imbalance-aware metrics when evaluating fault diagnosis in data streams.

The comparative tables summarizing Sensitivity/Recall, G-Mean, Kappa Statistic, MCC, and AUC provide a consolidated view of the qualitative and quantitative effects of integrating incremental rebalancing with concept drift detection across representative learning algorithms. The results indicate that the proposed diagnosis method consistently improves or preserves performance in imbalance-aware metrics, particularly Sensitivity and G-Mean, while maintaining stable behavior in Kappa and MCC, which account for agreement beyond chance and balanced error distribution. In contrast, the use of base incremental algorithms without explicit rebalancing generally results in weaker performance across these metrics, even when internal drift-handling mechanisms are present. The tables further show that AUC remains largely unchanged between configurations, supporting the observation that ranking-based metrics alone are insufficient to capture the impact of imbalance and drift in streaming scenarios. Overall, the tabular analysis complements the heatmap-based visualization by clarifying which learning paradigms benefit from the proposed approach and by highlighting its consistent advantages under severe class imbalance.

4. Conclusions and Future Work

This research aimed to diagnose IP-based networks’ faults through learning from monitored network parameters of peripheral network elements. The above was possible because there is a fault propagation phenomenon that allows the observation of failure symptoms at a higher network level than the fault origin [13].

The implemented diagnosis method respects the dynamic nature of networks (potential changes in network behavior over time) because it is based on data stream learning methods. It comprises three elements: a concept drift detector, a data rebalancing strategy, and an incremental learning algorithm.

To probe the method, 25 different incremental learning algorithms were used in collaboration with ADWIN as the drift detector and SMOTE (adapted to the streaming scenario) as the rebalancing strategy. For comparison purposes, each incremental algorithm was also implemented without the use of additional methods. The performance measurement was calculated through five metrics for data streams: recall or sensitivity, G-mean, kappa, MCC, and prequential AUC. Since it is a continuous learning process with data streams, the measurements of these metrics consist of multiple historical records, and therefore, each represents a performance curve. Consequently, each curve was translated into a normalized value to create heatmaps and facilitate comparison.

The results indicate that incremental learning is a suitable approach to diagnosing internal network faults by learning from network behavior manifested through management parameters. However, while incremental learning is effective for this purpose, it must be complemented with drift detection and rebalancing processes. The above is because the network undergoes constant changes, and monitoring data consists primarily of normal behaviors with only a few instances of network faults. These findings were confirmed by comparing the performance results of the model built with incremental learning combined with ADWIN and SMOTE to the model built with only the incremental learning algorithm.

Learning methods for data streams offer advantages for network fault detection by enabling timely diagnosis and resilience to dynamic network changes. Additionally, a noninvasive process can be ensured when combined with the concept of peripheral observation of failure symptoms (as proposed in the collection of the SOFI dataset), as the data collection does not increase network traffic, control overhead, or internal network processes.

Future studies have significant potential to delve into the normalization process for data streams. While the experimentation in this paper assumed all data were normalized, calculating statistical measures from a non-static dataset remains an area ripe for exploration and innovation. Also, all experiments rely solely on the SOFI dataset. The generalizability of the proposed method across different network topologies, fault types, and traffic patterns remains to be demonstrated.

On the other hand, with the aim of optimizing the obtained results, it would be of interest to carry out a comparative evaluation with other approaches, such as HDDM [57] and FTDD [58].

Additionally, future work could focus on analyzing the computational complexity and memory overhead of the proposed fault diagnosis method, particularly in the context of deployment on real network devices. This includes evaluating time and memory requirements under different streaming conditions and implementation settings, as well as assessing the feasibility of integrating the approach into operational network monitoring systems.

Finally, researchers must recognize that fault detection, when treated as a classification process, presents challenges stemming from streaming and excessively imbalanced data. The above underscores the importance of meticulous metrics selection for performance evaluation, with the recommendation that the prequential AUC is not a suitable performance metric for unbalanced data streams.

Author Contributions

Conceptualization, A.M.V.-A. and A.S.; methodology, A.M.V.-A. and A.S.; software, A.M.V.-A.; validation, A.M.V.-A.; formal analysis, A.M.V.-A. and Á.R.G.; investigation, A.M.V.-A.; resources, A.M.V.-A. and A.S.; data curation, A.M.V.-A.; writing—original draft preparation, A.M.V.-A.; writing—review and editing, A.R.-V.; visualization, Á.R.G.; supervision, J.C.C.; project administration, J.C.C.; funding acquisition, Á.R.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received institutional support from Universidad del Cauca, Corporación Universitaria Comfacauca—Unicomfacauca, and Universidad Carlos III de Madrid. The Spanish Government has also supported this work under projects TRA2016-78886-C3-1-R and PID2019-104793RB-C31.

Data Availability Statement

The dataset generated and analyzed during this study is the SOFI dataset, whose collection and structuring process is described in [22]. The dataset is openly available in Mendeley Data at https://data.mendeley.com/datasets/tc6ysmh5j8/2 (accessed on 6 February 2026).

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

ADAGRAD	Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
ADOB	Adaptable Diversity-based Online Boosting
ADWIN	Adaptive Sliding Window

ARF	Adaptive Random Forest
ASHT	Adaptive-Size Hoeffding Tree
AUC	Area Under the ROC Curve
AUE2	Accuracy Updated Ensemble
AWE	Accuracy Weighted Ensemble
B5G	Beyond 5G
BOLE	Boosting-like Online Learning Ensemble
COTS	Commercial Off-The-Shelf
CVFDT	Concept-adapting Very Fast Decision Tree
DACC	Dynamic Adaptation to Concept Changes
DBSCAN	Density-Based Spatial Clustering of Applications with Noise
EM	Expectation Maximization
FTTH	Fiber To The Home
G-Mean	Geometric Mean
GPON	Gigabit-capable Passive Optical Network
HAT	Hoeffding Adaptive Tree
HOT	Hoeffding Option Tree
ICMP	Internet Control Message Protocol
IP	Internet Protocol
kNN	k Nearest Neighbor
LDA	Linear Discriminant Analysis
LSTM	Long Short-Term Memory
LSTNet	Long Short-Term Memory Network
MCC	Matthews Correlation Coefficient
ML	ML
MOA	Massive Online Analysis
NFV	Network Function Virtualization
ONDM	Optical Network Design and Modeling
OSBoost	Online Smooth Boost
PCA	Principal Component Analysis
PEGASOS	Primal Estimated sub-Gradient SOlver for SVM
PIQoS	Programmable and Intelligent Quality of Service
QoS	Quality of Service
RCD	Recurring Concept Drifts
RF	Random Forest
RNN	Recurrent Neural Network
ROC	Receiver Operating Characteristic
SAM-kNN	Self Adjusting Memory model for the k Nearest Neighbor
SDM	SIAM International Conference on Data Mining
SIGKDD	Special Interest Group on Knowledge Discovery and Data Mining
SMOTE	Synthetic Minority Over-sampling Technique
SNMP	Simple Network Management Protocol
SVM	Support Vector Machine
VNF	Virtual Network Function

References

Dusia, A.; Sethi, A.S. Recent Advances in Fault Localization in Computer Networks. IEEE Commun. Surv. Tutor. 2016, 18, 3030–3051. [Google Scholar] [CrossRef]
Yan, C.; Wang, Y.; Qiu, X.; Li, W.; Guan, L. Multi-layer fault diagnosis method in the Network Virtualization Environment. In The 16th Asia-Pacific Network Operations and Management Symposium; IEEE: Piscataway, NJ, USA, 2014; pp. 1–6. [Google Scholar] [CrossRef]
Cerdà-Alabern, L.; Iuhasz, G.; Gemmi, G. Anomaly detection for fault detection in wireless community networks using machine learning. Comput. Commun. 2023, 202, 191–203. [Google Scholar] [CrossRef]
Zanatta Bruno, G.; Chaves Rodrigues, K.B.; Vieira Cardoso, K.; Luz Correa, S.; Bonato Both, C. Anomaly Detection in Cloud-native B5G Systems using Observability and Machine Learning COTS Solutions. J. Internet Serv. Appl. 2023, 14, 189–199. [Google Scholar] [CrossRef]
Gosselin, S.; Courant, J.L.; Tembo, S.R.; Vaton, S. Application of probabilistic modeling and machine learning to the diagnosis of FTTH GPON networks. In 2017 International Conference on Optical Network Design and Modeling (ONDM); IEEE: Piscataway, NJ, USA, 2017; pp. 1–3. [Google Scholar] [CrossRef]
Silva, M.F.; Pacini, A.; Sgambelluri, A.; Valcarenghi, L. Learning Long- and Short-Term Temporal Patterns for ML-Driven Fault Management in Optical Communication Networks. IEEE Trans. Netw. Serv. Manag. 2022, 19, 2195–2206. [Google Scholar] [CrossRef]
Fernandes, N.F. Diagnosing Failures in the Mobile Network Operation using Ensemble of Classifiers. In Proceedings of the 9th Latin American Network Operations and Management Symposium, LANOMS, Niterói, Brazil, 25–27 September 2019; Available online: https://dl.ifip.org/db/conf/lanoms/lanoms2019/196395_1.pdf (accessed on 3 December 2025).
Patri, S.K.; Dick, I.; Kaeval, K.; Müller, J.; Pedreno-Manresa, J.J.; Autenrieth, A.; Elbers, J.P.; Tikas, M.; Mas-Machuca, C. Machine Learning enabled Fault-Detection Algorithms for Optical Spectrum-as-a-Service Users. In 2023 International Conference on Optical Network Design and Modeling (ONDM); IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar] [CrossRef]
Kimura, T.; Watanabe, A.; Toyono, T.; Ishibashi, K. Proactive failure detection learning generation patterns of large-scale network logs. IEICE Trans. Commun. 2019, 102, 306–316. [Google Scholar] [CrossRef]
Lekhala, U.; Haque, I. PIQoS: A Programmable and Intelligent QoS Framework. In IEEE INFOCOM 2019—IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS); IEEE: Piscataway, NJ, USA, 2019; pp. 234–239. [Google Scholar] [CrossRef]
Elmajed, A.; Faucheux, F. Comparing Feature Extraction techniques using SVM for Early Fault Classification in NFV context. In 2021 24th Conference on Innovation in Clouds, Internet and Networks and Workshops (ICIN); IEEE: Piscataway, NJ, USA, 2021; pp. 57–61. [Google Scholar] [CrossRef]
Delua, J. Supervised vs. Unsupervised Learning: What’s the Difference? IBM Think. 2025. Available online: https://www.ibm.com/think/topics/supervised-vs-unsupervised-learning (accessed on 2 December 2025).
Vargas-Arcila, A.; Corrales, J.C.; Sanchis, A.; Rendon Gallon, A. Peripheral diagnosis for propagated network faults. J. Netw. Syst. Manag. 2021, 29, 14. [Google Scholar] [CrossRef]
Bramer, M. Principles of Data Mining; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
Petrella, L.; Gomes, M.; Perdigão, F.; Santos, M.; Fernandes, P.; Pinto, C.; Nunes, S.; Morgado, M.; Caixinha, M.; Santos, J. Eye Scan Ultrasound System for Automatic Cataract Detection: From a Preclinical to a Clinical Prototype. In XV Mediterranean Conference on Medical and Biological Engineering and Computing—MEDICON 2019; Henriques, J., Neves, N., de Carvalho, P., Eds.; Springer: Cham, Switzerland, 2020; pp. 811–819. [Google Scholar]
Utgoff, P.E. Incremental learning. In Encyclopedia of Machine Learning; Springer: New York, NY, USA, 2011; pp. 515–518. [Google Scholar]
Del-Campo-Ávila, J. Nuevos Enfoques en Aprendizaje Incremental. Ph.D. Thesis, Universidad de Málaga, Málaga, Spain, 2019. Available online: https://riuma.uma.es/xmlui/handle/10630/6902 (accessed on 3 December 2025).
Li, Z.; Huang, W.; Xiong, Y.; Ren, S.; Zhu, T. Incremental learning imbalanced data streams with concept drift: The dynamic updated ensemble algorithm. Knowl.-Based Syst. 2020, 195, 105694. [Google Scholar] [CrossRef]
Gunasekara, N.; Pfahringer, B.; Gomes, H.M.; Bifet, A.; Koh, Y.S. Recurrent concept drifts on data streams. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI ’24; International Joint Conferences on Artificial Intelligence Organization: Palo Alto, CA, USA, 2024; pp. 8029–8037. [Google Scholar] [CrossRef]
Fernández, A.; García, S.; Galar, M.; Prati, R.C.; Krawczyk, B.; Herrera, F. Learning from Imbalanced Data Streams. In Learning from Imbalanced Data Sets; Springer International Publishing: Cham, Switzerland, 2018; pp. 279–303. [Google Scholar] [CrossRef]
Suárez-Cetrulo, A.L.; Quintana, D.; Cervantes, A. A survey on machine learning for recurring concept drifting data streams. Expert Syst. Appl. 2023, 213, 118934. [Google Scholar] [CrossRef]
Vargas-Arcila, A.; Corrales, J.C.; Sanchis, A.; Rendon, A. SOFI dataset: Symptom-fault relationship for IP-network. Comput. Netw. 2022, 216, 109233. [Google Scholar] [CrossRef]
Potharaju, R.; Jain, N. When the network crumbles: An empirical study of cloud network failures and their impact on services. In SOCC ’13: Proceedings of the 4th Annual Symposium on Cloud Computing; Association for Computing Machinery: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
Gill, P.; Jain, N.; Nagappan, N. Understanding network failures in data centers: Measurement, analysis, and implications. SIGCOMM Comput. Commun. Rev. 2011, 41, 350–361. [Google Scholar] [CrossRef]
Bernardo, A.; Della Valle, E.; Bifet, A. Rebalancing learning on evolving data streams. arXiv 2019, arXiv:1911.07361. [Google Scholar] [CrossRef]
Bifet, A.; Gavaldà, R. Learning from Time-Changing Data with Adaptive Windowing. In Proceedings of the 2007 SIAM International Conference on Data Mining (SDM); Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2007; pp. 443–448. [Google Scholar] [CrossRef]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Bifet, A.; de Francisci Morales, G.; Read, J.; Holmes, G.; Pfahringer, B. Efficient Online Evaluation of Big Data Stream Classifiers. In KDD ’15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2015; pp. 59–68. [Google Scholar] [CrossRef]
Bifet, A.; Holmes, G.; Pfahringer, B.; Kranen, P.; Kremer, H.; Jansen, T.; Seidl, T. Moa: Massive online analysis, a framework for stream classification and clustering. In Proceedings of the First Workshop on Applications of Pattern Analysis, Windsor, UK, 1–3 September 2010; pp. 44–50. [Google Scholar]
Nishimura, S.; Terabe, M.; Hashimoto, K.; Mihara, K. Learning higher accuracy decision trees from concept drifting data streams. In International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems; Springer: Berlin/Heidelberg, Germany, 2008; pp. 179–188. [Google Scholar]
Bifet, A.; Gavalda, R.; Holmes, G.; Pfahringer, B. Machine Learning for Data Streams: With Practical Examples in MOA; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Santos, S.G.T.d.C.; Goncalves Junior, P.M.; Silva, G.D.d.S.; de Barros, R.S.M. Speeding Up Recovery from Concept Drifts. In Machine Learning and Knowledge Discovery in Databases; Calders, T., Esposito, F., Hüllermeier, E., Meo, R., Eds.; Springer: Berlin/Heidelberg, Germany, 2014; pp. 179–194. [Google Scholar]
Barros, R.; Santos, R.S.M. A large-scale comparison of concept drift detectors. Inf. Sci. 2018, 451–452, 348–370. [Google Scholar] [CrossRef]
Bernardo, A.; della Valle, E.; Bifet, A. Incremental Rebalancing Learning on Evolving Data Streams. In 2020 International Conference on Data Mining Workshops (ICDMW); IEEE: Piscataway, NJ, USA, 2020; pp. 844–850. [Google Scholar] [CrossRef]
Hulten, G.; Spencer, L.; Domingos, P. Mining time-changing data streams. In KDD ’01’: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2001; pp. 97–106. [Google Scholar] [CrossRef]
Oza, N.C.; Russell, S.J. Online Bagging and Boosting. In Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, Key West, FL, USA, 4–7 January 2001; Volume R3, pp. 229–236. Available online: https://proceedings.mlr.press/r3/oza01a.html (accessed on 11 November 2025).
Wang, H.; Fan, W.; Yu, P.S.; Han, J. Mining concept-drifting data streams using ensemble classifiers. In KDD ’03: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2003; pp. 226–235. [Google Scholar] [CrossRef]
Kolter, J.Z.; Maloof, M.A. Dynamic weighted majority: An ensemble method for drifting concepts. J. Mach. Learn. Res. 2007, 8, 2755–2790. [Google Scholar] [CrossRef]
Pfahringer, B.; Holmes, G.; Kirkby, R. New Options for Hoeffding Trees. In AI 2007: Advances in Artificial Intelligence; Orgun, M.A., Thornton, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2007; pp. 90–99. [Google Scholar]
Shalev-Shwartz, S.; Singer, Y.; Srebro, N. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. In ICML ’07: Proceedings of the 24th International Conference on Machine Learning; Association for Computing Machinery: New York, NY, USA, 2007; pp. 807–814. [Google Scholar] [CrossRef]
Bach, S.H.; Maloof, M.A. Paired Learners for Concept Drift. In 2008 Eighth IEEE International Conference on Data Mining; IEEE: Piscataway, NJ, USA, 2008; pp. 23–32. [Google Scholar] [CrossRef]
Pelossof, R.; Jones, M.; Vovsha, I.; Rudin, C. Online coordinate boosting. In 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops; IEEE: Piscataway, NJ, USA, 2009; pp. 1354–1361. [Google Scholar] [CrossRef][Green Version]
Duchi, J.; Hazan, E.; Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 2011, 12, 2121–2159. [Google Scholar]
Elwell, R.; Polikar, R. Incremental Learning of Concept Drift in Nonstationary Environments. IEEE Trans. Neural Netw. 2011, 22, 1517–1531. [Google Scholar] [CrossRef]
Chen, S.T.; Lin, H.T.; Lu, C.J. An Online Boosting Algorithm with Theoretical Justifications. arXiv 2012, arXiv:1206.6422. [Google Scholar] [CrossRef]
Brzezinski, D.; Stefanowski, J. Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm. IEEE Trans. Neural Netw. Learn. Syst. 2014, 25, 81–94. [Google Scholar] [CrossRef] [PubMed]
Jaber, G.; Cornuéjols, A.; Tarroux, P. A New On-Line Learning Method for Coping with Recurring Concepts: The ADACC System. In Neural Information Processing; Lee, M., Hirose, A., Hou, Z.G., Kil, R.M., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 595–604. [Google Scholar]
Gonçalves, P.M., Jr.; de Barros, R.S.M. RCD: A recurring concept drift framework. Pattern Recognit. Lett. 2013, 34, 1018–1025. [Google Scholar] [CrossRef]
Barros, R.; Garrido, T.; de Carvalho Santos, S.; Gonçalves Júnior, P.M. A Boosting-like Online Learning Ensemble. In 2016 International Joint Conference on Neural Networks (IJCNN); IEEE: Piscataway, NJ, USA, 2016; pp. 1871–1878. [Google Scholar] [CrossRef]
Barddal, J.P.; Murilo Gomes, H.; Enembreck, F.; Pfahringer, B.; Bifet, A. On Dynamic Feature Weighting for Feature Drifting Data Streams. In Machine Learning and Knowledge Discovery in Databases; Frasconi, P., Landwehr, N., Manco, G., Vreeken, J., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 129–144. [Google Scholar]
Losing, V.; Hammer, B.; Wersing, H. KNN Classifier with Self Adjusting Memory for Heterogeneous Concept Drift. In 2016 IEEE 16th International Conference on Data Mining (ICDM); IEEE: Piscataway, NJ, USA, 2016; pp. 291–300. [Google Scholar] [CrossRef]
Gomes, H.M.; Bifet, A.; Read, J.; Barddal, J.P.; Enembreck, F.; Pfharinger, B.; Holmes, G.; Abdessalem, T. Adaptive random forests for evolving data stream classification. Mach. Learn. 2017, 106, 1469–1495. [Google Scholar] [CrossRef]
Fernandez, A.; García, S.; Galar, M.; Prati, R.C.; Krawczyk, B.; Herrera, F. Learning from Imbalanced Data Sets; Springer: Cham, Switzerland, 2018; Volume 10. [Google Scholar]
Brzezinski, D.; Stefanowski, J. Prequential AUC: Properties of the area under the ROC curve for data streams with concept drift. Knowl. Inf. Syst. 2017, 52, 531–562. [Google Scholar] [CrossRef]
Bifet, A.; Frank, E. Sentiment Knowledge Discovery in Twitter Streaming Data. In Discovery Science; Pfahringer, B., Holmes, G., Hoffmann, A., Eds.; Springer: Berlin/Heidelberg, 2010; pp. 1–15. [Google Scholar]
Delgado, R.; Tibau, X.A. Why Cohen’s Kappa should be avoided as performance measure in classification. PLoS ONE 2019, 14, e0222916. [Google Scholar] [CrossRef] [PubMed]
Frías-Blanco, I.; Campo-Ávila, J.d.; Ramos-Jiménez, G.; Morales-Bueno, R.; Ortiz-Díaz, A.; Caballero-Mota, Y. Online and Non-Parametric Drift Detection Methods Based on Hoeffding’s Bounds. IEEE Trans. Knowl. Data Eng. 2015, 27, 810–823. [Google Scholar] [CrossRef]
de Lima Cabral, D.R.; de Barros, R.S.M. Concept drift detection based on Fisher’s Exact test. Inf. Sci. 2018, 442–443, 220–234. [Google Scholar] [CrossRef]

Figure 1. Elements of the network fault diagnosis method.

Figure 2. Elements of the monitored network.

Figure 3. Performance curves format and weighted difference elements. (a) Example of curves and how to compute the weighted difference of a point. The vertical axis indicates the metric values and the horizontal axis indicates the number of the incoming instances that have arrived until the measurement time. (b) Example of the weighting function for a prequential evaluation of 20 measurements. The vertical axis indicates the values for the weighting function and the horizontal axis indicates the quantity of measurements for prequential evaluation.

Figure 4. Heatmaps of prequential results. (a) The color scale for evaluation metrics. (b) Sensitivity/Recall. (c) G-mean. (d) Kappa Statistic. (e) MCC. (f) AUC.

Table 1. Online Classifiers for Evaluating the Diagnosis Method.

No.	Algorithm	Type	Reference
1	CVFDT	Decision Trees	[35]
2	Online Bagging	Ensemble method	[36]
3	Online Boosting	Ensemble method	[36]
4 *	Accuracy Weighted Ensemble (AWE)	Ensemble method	[37]
5 *	Dynamic Weighted Majority (DWM)	Ensemble method	[38]
6	Hoeffding Option Tree (HOT)	Ensemble method	[39]
7	Stocastic variant of PEGASOS (Primal Estimated sub-Gradient SOlver for SVM)	Function classifier	[40]
8 *	CVFDT_NB	Decision Trees	[31]
9 *	Paired learning	Ensemble method	[41]
10 *	ADWIN Bagging	Ensemble method	[31]
11 *	ASHT (Adaptive-Size Hoeffding Tree) Bagging	Ensemble method	[42]
12	Hoeffding Adaptive Tree	Decision Trees	[31]
13	Online Coordinate Boosting	Ensemble method	[42]
14 *	Leveraging Bagging	Ensemble method	[31]
15	ADAGRAD	Function classifier	[43]
16 *	Learn++NSE	Ensemble method	[44]
17	QSBoost (Online Smooth Boost)	Ensemble method	[45]
18 *	Accuracy Updated Ensemble (AUE2)	Ensemble method	[46]
19 *	Dynamic Adaptation to Concept Changes (DACC)	Ensemble method	[47]
20 *	Recurring Concept Drifts (RCD)	Ensemble method	[48]
21 *	ADOB	Ensemble method	[32]
22 *	Boosting-like Online Learning Ensemble (BOLE)	Ensemble method	[49]
23 *	Hoeffding Adaptive Tree (HAT) with feature weighted kNN (HAT-kNN-FW) and NB (HAT-kNN-NB)	Decision Trees	[50]
24 *	SAM-kNN (Self Adjusting Memory model for the k Nearest Neighbor)	Lazy learning	[51]
25 *	Adaptive Random Forest (ARF)	Ensemble method	[52]

* It embeds strategies to deal with concept drift.

Table 2. Comparative analysis of Sensitivity/Recall for representative algorithms.

Algorithm	Base	Rebalance Stream	$Δ$	Effect	Interpretation
ARF	High	Higher	Large	Strong improvement	Ensemble diversity significantly improves minority-class detection after rebalancing.
Online Bagging	High	Slightly higher	Moderate	Improvement	Increased exposure to minority samples enhances recall.
AWE	Medium–High	Higher	Moderate	Improvement	Window-based ensemble adapts effectively to rebalanced streams.
AUE2	High	High	Small	Marginal	Robust weighting limits sensitivity to sampling changes.
Online Boosting	Medium	Lower	Negative	Degradation	Boosting amplifies noise introduced by artificial resampling.
Hoeffding Tree	Medium	Medium	Small	Marginal	Split criteria are weakly affected by stream rebalancing.
Naive Bayes	Medium	Medium	Small	Marginal	Probabilistic smoothing mitigates imbalance effects.
kNN-based	Low	Low	Negative	Degradation	Local neighborhood imbalance persists despite rebalancing.
PEGASOS	Low–Medium	Lower	Negative	Degradation	Margin optimization is distorted by synthetic samples.
DWM	Medium	Medium	≈0	No effect	Weight adaptation partially compensates class imbalance.

Δ

denotes the weighted performance difference between base and rebalance stream configurations, as illustrated in Figure 4b.

Table 3. Comparative analysis of G-Mean for representative algorithms.

Algorithm	Base	Rebalance Stream	$Δ$	Effect	Interpretation
ARF	High	High	Small	Marginal improvement	Balanced ensemble voting preserves performance on both classes.
Online Bagging	Medium–High	Higher	Moderate	Improvement	Rebalancing enhances minority detection without harming majority accuracy.
AWE	Medium	Higher	Moderate	Improvement	Adaptive windowing improves class balance handling.
AUE2	High	High	≈0	No effect	Robust weighting makes the method insensitive to rebalancing.
Online Boosting	Medium	Lower	Negative	Degradation	Noise amplification affects majority-class performance.
Hoeffding Tree	Medium	Medium	Small	Marginal improvement	Tree structure mildly benefits from more balanced splits.
Naive Bayes	Medium	Medium	Small	Marginal improvement	Probabilistic estimates compensate moderate imbalance.
kNN-based	Low–Medium	Low	Negative	Degradation	Local decision boundaries are distorted by resampling.
PEGASOS	Low–Medium	Lower	Negative	Degradation	Margin optimization struggles to balance both classes.
DWM	Medium	Medium	≈0	No effect	Dynamic weighting partially offsets imbalance effects.

Δ

denotes the weighted performance difference between base and rebalance stream configurations, as illustrated in Figure 4c.

Table 4. Comparative analysis of Kappa Statistic for representative algorithms.

Algorithm	Base	Rebalance Stream	$Δ$	Effect	Interpretation
ARF	High	High	Small	Marginal improvement	Ensemble regularization preserves agreement beyond chance under rebalancing.
Online Bagging	Medium–High	Medium–High	≈0	No effect	Variance reduction balances gains and losses across classes.
AWE	Medium	Medium–High	Moderate	Improvement	Adaptive weighting improves global agreement.
AUE2	High	High	≈0	No effect	Robust ensemble weighting makes the method insensitive to resampling.
Online Boosting	Medium	Lower	Negative	Degradation	Error amplification reduces agreement consistency.
Hoeffding Tree	Medium	Medium	Small	Marginal improvement	Slightly more balanced splits improve chance-corrected agreement.
Naive Bayes	Medium	Medium	Small	Marginal improvement	Probabilistic calibration stabilizes predictions.
kNN-based	Low–Medium	Low	Negative	Degradation	Local decisions increase disagreement under imbalance.
PEGASOS	Low–Medium	Lower	Negative	Degradation	Margin optimization degrades global agreement after rebalancing.
DWM	Medium	Medium	≈0	No effect	Dynamic weights compensate class skew without improving agreement.

Δ

denotes the weighted performance difference between base and rebalance stream configurations, as illustrated in Figure 4d.

Table 5. Comparative analysis of Matthews Correlation Coefficient (MCC) for representative algorithms.

Algorithm	Base	Rebalance Stream	$Δ$	Effect	Interpretation
ARF	High	High	Small	Marginal improvement	Ensemble diversity preserves balanced correlation between classes.
Online Bagging	Medium–High	Medium–High	≈0	No effect	Variance reduction balances gains and losses across classes.
AWE	Medium	Medium–High	Moderate	Improvement	Adaptive weighting improves correlation consistency.
AUE2	High	High	≈0	No effect	Robust ensemble weighting limits sensitivity to resampling.
Online Boosting	Medium	Lower	Negative	Degradation	Noise amplification degrades joint class correlation.
Hoeffding Tree	Medium	Medium	Small	Marginal improvement	Slightly more balanced splits improve MCC marginally.
Naive Bayes	Medium	Medium	Small	Marginal improvement	Probabilistic calibration stabilizes correlation estimates.
kNN-based	Low–Medium	Low	Negative	Degradation	Local decision errors strongly affect MCC.
PEGASOS	Low–Medium	Lower	Negative	Degradation	Margin distortion leads to poor joint class correlation.
DWM	Medium	Medium	≈0	No effect	Dynamic weighting compensates imbalance without improving MCC.

Δ

denotes the weighted performance difference between base and rebalance stream configurations, as illustrated in Figure 4e.

Table 6. Comparative analysis of Area Under the ROC Curve (AUC) for representative algorithms.

Algorithm	Base	Rebalance Stream	$Δ$	Effect	Interpretation
ARF	High	High	≈0	No effect	Ranking capability remains stable under rebalancing.
Online Bagging	High	High	Small	Marginal improvement	Slight improvement in score ordering after rebalancing.
AWE	Medium–High	Medium–High	≈0	No effect	Ensemble weighting preserves ranking robustness.
AUE2	High	High	≈0	No effect	Insensitivity to sampling changes explains stable AUC.
Online Boosting	Medium–High	Medium	Negative	Marginal degradation	Noise affects score calibration but not ranking severely.
Hoeffding Tree	Medium–High	Medium–High	≈0	No effect	Decision boundaries remain largely unchanged.
Naive Bayes	Medium–High	Medium–High	≈0	No effect	Probabilistic ranking is robust to class imbalance.
kNN-based	Medium	Medium	≈0	No effect	Neighborhood-based ranking weakly affected by rebalancing.
PEGASOS	Medium–High	Medium	Negative	Marginal degradation	Margin distortion slightly alters score ordering.
DWM	Medium–High	Medium–High	≈0	No effect	Dynamic weighting maintains stable ranking performance.

Δ

denotes the weighted performance difference between base and rebalance stream configurations, as illustrated in Figure 4f.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Vargas-Arcila, A.M.; Rodríguez-Vivas, A.; Corrales, J.C.; Sanchis, A.; Rendón Gallón, Á. Fault Diagnosis for IP-Based Networks Using Incremental Learning Algorithms and Data Stream Methods. Technologies 2026, 14, 132. https://doi.org/10.3390/technologies14020132

AMA Style

Vargas-Arcila AM, Rodríguez-Vivas A, Corrales JC, Sanchis A, Rendón Gallón Á. Fault Diagnosis for IP-Based Networks Using Incremental Learning Algorithms and Data Stream Methods. Technologies. 2026; 14(2):132. https://doi.org/10.3390/technologies14020132

Chicago/Turabian Style

Vargas-Arcila, Angela María, Angela Rodríguez-Vivas, Juan Carlos Corrales, Araceli Sanchis, and Álvaro Rendón Gallón. 2026. "Fault Diagnosis for IP-Based Networks Using Incremental Learning Algorithms and Data Stream Methods" Technologies 14, no. 2: 132. https://doi.org/10.3390/technologies14020132

APA Style

Vargas-Arcila, A. M., Rodríguez-Vivas, A., Corrales, J. C., Sanchis, A., & Rendón Gallón, Á. (2026). Fault Diagnosis for IP-Based Networks Using Incremental Learning Algorithms and Data Stream Methods. Technologies, 14(2), 132. https://doi.org/10.3390/technologies14020132

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fault Diagnosis for IP-Based Networks Using Incremental Learning Algorithms and Data Stream Methods

Abstract

1. Introduction

2. Materials and Methods

2.1. Fault Diagnosis Method

2.1.1. Overview and Design Rationale

2.1.2. Proposed Fault Diagnosis Method (Core Method)

2.2. Dataset

3. Experiments

3.1. Experimental Setup

3.2. Evaluation Method and Imbalanced Stream Evaluation Metrics

3.3. Experimental Results and Discussion

4. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI