You are currently viewing a new version of our website. To view the old version click .
Information
  • Article
  • Open Access

28 April 2019

Efficient Ensemble Classification for Multi-Label Data Streams with Concept Drift

,
and
1
School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China
2
School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
3
Henan Key Lab of Analysis and Applications of Education Big Data, Xinyang Normal University, Xinyang 464000, China
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Classifier Ensembles: Efficient Techniques to Define Robust System Structures

Abstract

Most existing multi-label data streams classification methods focus on extending single-label streams classification approaches to multi-label cases, without considering the special characteristics of multi-label stream data, such as label dependency, concept drift, and recurrent concepts. Motivated by these challenges, we devise an efficient ensemble paradigm for multi-label data streams classification. The algorithm deploys a novel change detection based on Jensen–Shannon divergence to identify different kinds of concept drift in data streams. Moreover, our method tries to consider label dependency by pruning away infrequent label combinations to enhance classification performance. Empirical results on both synthetic and real-world datasets have demonstrated its effectiveness.

1. Introduction

In recent years, sensor networks [1], spam filtering [2], intrusion detection [3], and credit card fraud detection [4] have contributed to different new applications in continuously arriving data known as data streams [5]. In the data streams model, instances arrive at a higher rate, and the algorithms must process them with strict constraints of time and memory [6].
However, traditional methods focus on classifying data streams under single-label scenarios where each instance belongs to a single label. In practice, many real-world applications involve data with multi-label data streams. A multi-label data stream is a kind of stream that shares the same properties as multi-label data [7]. Typical multi-label data include news articles, e-mails, RSS feeds, medical text, etc. Multi-label stream classification is a non-trivial task, because traditional multi-label classification approaches work under the batch settings. An important feature of multi-label data streams is concept drift, i.e., the underlying distribution of data may change over time. Such changes might deteriorate the predictive accuracy of classifiers.
In brief, concept drifts can be categorized depending on speed into gradual and abrupt concept drifts [8]. Several approaches have been presented to handle concept drifts, which fall into three general groups, i.e., window-based approaches, weight-based approaches, and ensemble-based approaches [9]. Among them, ensemble classifier is particularly effective and popular, but it calls for generalization to accommodate multi-label data streams environments.
Most existing methods can only deal with one type of concept drift. However, in the real world, data stream may contain different types of concept drifts. Therefore, it is highly crucial that an effective classifier must be able to respond to both sudden and gradual changes.
It is common in real-world data streams for previously seen concepts may reappear in the future. For example, weather forecasting models change with the seasons, and a popular topic may appear in a social network at a particular time of the year. This demonstrates a unique kind of drift known as recurring concepts. However, only few approaches take it into consideration recurring concepts in the context of multi-label data streams.
One important advantage of ensemble techniques in streaming data is that they can handle recurring concepts. Since ensembles contain models built from different parts of the data streams, such models can be reused to classify new instances if they have come from the same concept. Another advantage is the ability to utilize traditional classification algorithms in the concept drift scenarios. Following these critical motivations, we devise an efficient scheme named Multi-Label ensemble with Adaptive Windowing (MLAW) for multi-label data streams classification. The ensemble maintains a weighted set of component classifiers and predicts the class of incoming instances by aggregating the predictions of components using a weighted voting mechanism [10]. In summary, the contribution of this paper is a simple and effective method for multi-label data stream scenarios. The key contribution of our algorithm is threefold:
(1)
To address the issue of concept drift, a change detection mechanism is employed to offer fast reactions to sudden concept changes. Moreover, a periodic weighting mechanism is adopted to cope with the gradual concept drift.
(2)
To deal with the problem of recurrent concept drift, the Jensen–Shannon divergence is selected as a metric to measure the distribution between two consecutive windows, which represent the older and the more recent instances, respectively. Moreover, a novel ensemble classification framework is presented, which maintains a pool of classifiers (each classifier represents one of the existing concepts) and predicts the class of incoming instances using a weighted majority voting rule. Once a change occurs, a new classifier is learned and then a new concept is identified and added to the pool.
(3)
A label dependency is taken into account by pruning away some infrequent label combinations to enhance classification performance.
The rest of this paper is arranged as follows. In Section 2, we discuss existing work on multi-label stream classification. The details of proposed algorithm are elaborated in Section 3. Experimental results and discussions are shown in Section 4. Finally, Section 5 comes to the conclusion.

3. Proposed Method

3.1. Change Detection Based on Jensen–Shannon Divergence

Most of the previous change detections [31,32,33] focus on detecting change in the error-rate of the classifiers, without considering changes in the distribution of data. Nevertheless, we adopt the two-window scheme, which is also the most extensively used method by comparing the data distribution between two consecutive windows.
Let W1 and W2 denote reference window and current window, respectively. As shown in Figure 2, the size of both the windows is n. The problem of change detection in data streams is to decide the null hypothesis H0 against alternative hypothesis H1 as below:
{ H 0         d ( W 1 , W 2 ) ε H 1       d ( W 1 , W 2 ) > ε
where d(W1, W2) is a distance function that measures the dissimilarity of two windows and ε is a distance-based threshold used to decide whether a change occurs. A change occurs if the dissimilarity measure between two windows exceeds a given threshold.
Figure 2. Two-window change detection model.
Kullback–Leibler divergence, which is called the relative entropy, is the most popular and robust metric for distance between two distributions [34]. For two discrete distributions P and Q, there are two probability functions p(x) and q(x), and Kullback–Leibler divergence is defined as
K L ( P | | Q ) = x χ p ( x ) l o g p ( x ) q ( x )
where x is the space of events. This measure is always non-negative and is zero if and only if the distributions are identical. However, it is not symmetric and does not satisfy the triangle inequality.
To overcome this defect, Jensen–Shannon divergence is adopted as the measure defined in Equation (2), which is a symmetric and bounded variant of the Kullback–Leibler divergence.
J S ( P | | Q ) = x χ ( p ( x ) l o g 2 p ( x ) p ( x ) + q ( x ) + q ( x ) l o g 2 q ( x ) p ( x ) + q ( x ) )
Theorem 2.
Let W1 and W2 denote two consecutive windows, where W2 contains the most recent instances. With probability 1 − δ, we have |µW1 − µW2| ≤ 2 ε , where ε is the Hoeffding bound and µW1 and µW2 are the mean of two windows [35].
In general, three types of concept drifts can be detected: sudden, gradual, and recurring concept drifts. The input of the change detection consists of a data stream S and window size n. The output is an alarm if a change occurs. At the initial stage, the reference is initialized with the first batch of data streams. The current window moves on the data streams and captures the next batch of data stream. When a change is detected, the change detector makes an alarm. If a change does not occur, the basic window W2 slides step by step until the change is detected. A concept drift is detected when the measure exceeds a threshold and a recurring concept is recognized when the measure results in a value of zero. Meanwhile, a more general data-driven approach called bootstrap is adopted to get the threshold. The pseudo-code of the concept drift detection is presented in Algorithm 1.
Algorithm 1 Two-windows-based change detection.
input: data streams S, window size n;
output: ChangeAlarm;
01: Initialization t = 0;
02: Set reference window W1 = {xt+1, …, xt+n};
03: Set current window W2 = { xt+n+1, …, xt+2n};
04:  for each instance in S do
05:    calculate d(W1, W2) according to Equation (2);
06:    calculate ε according to Theorem 2;
07:      if d(W1, W2) > ε then
08:          t←current time;
09:          Report a change alarm at time t;
10:          Clear all the windows and goto 02;
11:          else if d(W1, W2) = = 0 then
12:            alarm a recurring concept;
13:            end if
14:        else slide W2 by 1 point;
15:         end if
16:    end for
17: end

3.2. Ensemble Classifier for Multi-Label Data Streams Using Change Detection

3.2.1. The Framework of Our Method

Ensembles are popular approaches for enhancing classification accuracy in the single-label learning environment. However, they must be generalized for data streams environments. Change detectors are often a part of online classifiers that ensure quick reactions to sudden changes. Unfortunately, they are not capable of detecting gradual changes; whereas, block-based algorithms can outperform online approaches in a slowly drifting environment.
Based on these findings, an efficient ensemble with internal drift detection was proposed. The framework of our method integrates change detection technique and ensemble learning for data streams. The internal drift detector is employed to decide when to build and add new component classifiers.
We first give the general framework described in Figure 3, and then describe the particular model and its analysis. In this paper, a series of component classifiers are built from different portions of the data stream, which maintains a pool of classifiers (each classifier represents one of the existing concepts) and predicts the class of incoming instances using a weighted voting weighted scheme. Furthermore, a change detector was used to monitor the distribution between two consecutive windows, which represent the older and the more recent instances, respectively. Once a change occurs, a new classifier is learned and then a new concept is identified and added to the pool. Meanwhile, the repository of stored historical concepts is checked for reuse.
Figure 3. The framework of our method.

3.2.2. Exploiting Label Dependency

For exploiting dependencies among the labels, the algorithm uses pruned sets (PS) method pruning away infrequently occurring label sets from the training data. A frequent label combination is the one whose occurring number is not less than m (m > 0) times of an average value. The f(c) denotes the occurring number of a label combination c in the whole data stream.
f ( c ) = m n 2 | L | 1
where |L| represents the size of the label set and 2|L|−1 is the total number of all label combinations. Then n / ( 2 ( | L | ) 1 ) denotes the average occurring number of all label combinations.

3.2.3. Weighting Mechanism

It uses weighted ensemble classifiers to handle concept drifting. The idea is to train an ensemble or group of classifiers from sequential chunks of the data stream. The weights of classifiers are updated dynamically based on their performance on current data. Only the top k-classifiers are kept. The final decision is made by the weighted majority votes of the classifiers.
For each incoming instance x, each ensemble member CiE is weighted according to Equation (4) when concept drift is detected.
w i = 1 M S E r + M S E i + ε
A very small positive value ε is used to avoid division by zero. MSEr is the mean square error of a randomly predicting classifier, which is used as a reference point to the current class distribution.
M S E r = y p ( y ) ( 1 p ( y ) ) 2
MSEi is the prediction error of classifier Ci on the instances of the most recent window W. f y i ( x ) denotes the probability given by classifier Ci that x is an instance of class y.
M S E i = 1 | W | ( x , y ) W ( 1 f y i ( x ) ) 2
At the same time, a new classifier C’ is built on the instances in W2, weighted, and added to the ensemble. If the ensemble is full, the weakest classifier, i.e., the component with the lowest weight, is replaced by new one based on the result of the evaluation.
In our experiments, we will use Multi-label Hoeffding Trees as component classifiers, but one could use any online learning algorithm as a component learner. The PS is employed at each leaf node of a Hoeffding Tree. The pseudo-code of MLAW is listed in Algorithm 2.
Algorithm 2 MLAW Algorithm.
input:S: multi-label data streams, k: number of ensemble members
output:E: ensemble of k weighted classifiers
01: begin
02: E←Ø;
03: for all instances xtS do
04: WW ∪ {xt};
05: if change detected == true then
06:  create a new classifier C’;
07:   update the weight of all classifiers in ensemble;
08:  if |E| < k then EEC’;
09:    else prune the worst classifier;
10:  else if concept is recurring
11:    reuse the classifier in E;
12:      end if
13:     end if
14:    end if
15:  end for
16: end

4. Experiments

The experiments were performed on 3.0 GHz Pentium PC machines with 16 GB of memory, running Microsoft Windows 8. The experiments were implemented in Java with the help of Massive Online Analysis (MOA). MOA is a software environment for implementing algorithms and running experiments for online learning [36]. Software is available at http://moa.cms.waikato. ac.nz/.

4.1. Datasets

The multi-label datasets are from different domains, which can be found in the Mulan website (http://mulan.sourceforge.net/datasets-mlc.html). The datasets are described in Table 1. |D|: the number of distances, |X|: the number of attributes (b: binary/n: numeric), |L|: the number of possible labels; and n and d indicate numeric and binary attributes, respectively.
Table 1. The characteristics of datasets.
Label Cardinality (LC) is a standard measure showing how multi-label is a dataset and it is simply the average number of labels associated with each instance.
L C = 1 | D | i = 1 | D | | y i |
We adopted three real-world datasets: TMC2007-500, Medical Mill, and IMDB. We generated the datasets into stream by MOA. The synthetic dataset Random Tree and Radial Basis Function (RBF) were generated by MOA. Random Tree consists of 100,000 instances with sudden concept drift. RBF dataset contains of 100,000 instances with gradual concept drift.

4.2. Evaluation Metrics

This paper introduces four popular performance metrics specifically designed for multi-label data streams classifications: Hamming loss, Subset accuracy, F1, and Log-Loss.
(1)
Hamming loss: The Hamming loss averages the standard 0/1 classification error over the m labels and hence corresponds to the proportion of labels whose relevance is incorrectly predicted. Hamming loss is defined as Equation (6):
H a m m i n g   loss = 1 N i = 1 N y i   y i   ^ m
where ⊕ represents the symmetric difference between two sets. When considering the Hamming loss as the performance measure, the smaller the value, the better the algorithm performance is. For the next measures, greater values indicate better performance.
(2)
Subset accuracy is defined in [12], on a window of N instances, as:
S u b s e t   a c c u r a c y = 1 N i = 1 N I ( y i   = y i   ^ )
(3)
F1 measure can be described as a weighted average of the recall and precision measures. The calculation equation is as follows:
F 1 = 2 × R e c a l l × P r e c i s i o n R e c a l l + P r e c i s i o n
(4)
Log-Loss distincts from other measures because it punishes worse errors more harshly, and thus provides a good contrast to other measures [20].
L o g L o s s = 1 N L i = 1 N j = 1 L min ( l o g loss ( y ^ j ( i ) , y j ( i ) ) , ln ( N ) )
where
log loss (   y   ^ , y ) = ( ln ( y ^ ) y + ln ( 1   y   ^ ) ( 1 y ) )

4.3. Methods and Results

4.3.1. Parameter Sensitiveness

The experimental design consisted of two independent variables: size of ensemble and the size of window. Firstly, the size of ensemble k was varied from 5 to 20 (k = 5, 7, 10, 12, 15, 18, 20) to see how it affects the performance of the algorithms. Figure 4 presents the accuracy of MLAW and MLOzaBag on TMC2007-500 while using different sizes of ensemble. Each curve demonstrates the relationship between the size of the ensemble and the subset accuracy of the classification.
Figure 4. Varying the size of ensemble.
MLOzaBag is an ensemble method bagging for k models. Compare to MLOzaBag, MLAW is not quite sensitive to the size of ensemble. The accuracy increases when the ensemble has more base classifiers. As there is no strong dependency upon the size of ensemble in terms of accuracy, we chose k = 10 as the value in the comparison experiments.
Next, we fix k to 10 and further analyze the effect of the size of the window on the performance of our algorithm. The experimental design consisted of the independent variable: window size. The size of window w was varied from 500 to 2000, in steps of 250. Table 2 and Table 3 present the Hamming loss and Subset accuracy of our method on five datasets while using different sizes of windows.
Table 2. Hamming loss using different sizes of windows.
Table 3. Subset accuracy using different sizes of windows.
We observe that if the window size is too small, it will not provide enough amounts of data for a new classifier to be accurate. If a window size is too large, it may contain more than one concept, making the adaption to new concepts slowly. MLAW is not quite sensitive to the window size. As there is no strong dependency upon the window size in terms of Subset accuracy and Hamming loss, we set w = 1000 in the comparison experiments.

4.3.2. Comparative Performance Study

This part demonstrated the experimental results regarding the effectiveness and efficiency of the proposed method. The proposed algorithm was evaluated against three algorithms: ML-SAM-knn, Multi-label Hoeffding Tree (MHT), and a bagged of Multi-label Hoeffding Tree (MLOzaBag).
The methods are evaluated using prequential evaluation method with a window of w = 1000 instances. All the algorithms use the default parameters. We chose Multi-label Hoeffding Trees as the base classifier. For all the ensembles, we used k = 10. The performance were evaluated in terms of Hamming loss, Subset accuracy, F1, and Log-Loss, as shown in Table 4, Table 5, Table 6 and Table 7 (the best results for each dataset are indicated in bold).
Table 4. Hamming loss of different algorithms.
Table 5. Subset accuracy of different algorithms.
Table 6. Log-Loss of different algorithms.
Table 7. F1 of different algorithms.
As seen, MLAW outperformed all the other methods under most measures of predictive performance, particularly under Subset accuracy, F1, and Log-Loss. Meanwhile, MLAW achieved the overall best performance in most datasets except IMDB. This is partly due to the management of the recurrent change detection mechanism, which is capable of reusing previous concepts and gaining better performance in different situations, particularly under concept drift environments. When a new frequent label combination is recognized, the learning model will be updated by the instances with this new frequent label combination, and then the updated learning model can efficiently make multi-label classification before concept drifts take place.
Table 8 provides us with the running time (training + testing) of the compared algorithms. It can be observed that ML-SAM-knn consumed the least, followed by MHT, and MLOzaBag consumed the most time. These results clearly demonstrate the realization of the recurring concept of repeated reuse; hence, our method consumes less time than MLOzBag.
Table 8. Running time of different algorithms (CPU second).
To summarize, single classifier like MHT and ML-SAM-knn performed worse than the other ensemble methods, particularly under datasets with concept drift. The reason is that ensemble classifiers adopt a combination of models to obtain a better predictive performance than the single model. MHT and ML-SAM-knn lack a detection mechanism and therefore adapt ineffectively to concept drift. These results clearly show that MLAW consistently boosts the performance in different data streams scenarios.
Figure 5 shows the ability of MLAW coping with concept drift. When the sudden drift occurs at the 50K time stamps, all of the methods react to the changes with a dynamic drop of plotting subset accuracy except our algorithm. The reason is that the proposed algorithm can promptly detect concept drift, according to changes in the concept, and in a timely manner build a classifier to achieve the deal with this type of concept drift. Hence, MLAW suffered the smallest subset accuracy drops. However, the plotting of MHT and ML-SAM-knn appear to be more fluctuated when compared with MLAW, since they are not suitable for non-stationary data streams environments.
Figure 5. Subset accuracy on Random Tree.

5. Conclusions

In this paper, we focus on classifying data streams under multi-label scenarios, and a novel ensemble paradigm is proposed. The ensemble proves to be more efficient compared with others and it aims to take label dependency into account and at the same time deal with concept drift. The experimental results on both synthetic and real-world data streams show that MLAW is capable of constructing a satisfactory model for handling different types of concept drift.
In our future work, we plan to investigate the possibility of adapting the proposed algorithm for tackling concept drift in partially labeled data streams.

Author Contributions

Y.S. wrote the manuscript. H.S. commented on the manuscript. S.W. collected and analyzed the references.

Funding

This work was supported by the National Natural Science Foundation of China (No. 61672086, No. 61572417, and No. 61702550), the Innovation Team Support Plan of University Science and Technology of Henan Province (No. 19IRTSTHN014), and the Excellent Young Teachers Program of XYNU (No. 2016GGJS-08).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Cohen, L.; Avrahami-Bakish, G.; Last, M.; Kandel, A.; Kipersztok, O. Real-time data mining of non-stationary data streams from sensor networks. Inf. Fusion 2008, 9, 344–353. [Google Scholar] [CrossRef]
  2. Bhuiyan, H.; Ashiquzzaman, A.; Juthi, T.I. A Survey of existing E-mail spam filtering methods considering machine learning techniques. Glob. J. Comput. Sci. Technol. 2018, 18, 21–29. [Google Scholar]
  3. Costa, K.A.; Papa, J.P.; Lisboa, C.O.; Munoz, R.; Albuquerque, V.H.C. Internet of Things: A survey on machine learning-based intrusion detection approaches. Comput. Netw. 2019, 151, 147–157. [Google Scholar] [CrossRef]
  4. Livieris, I.E.; Kiriakidou, N.; Kanavos, A.; Tampakas, V.; Pintelas, P. On Ensemble SSL Algorithms for Credit Scoring Problem. Informatics 2018, 5, 40. [Google Scholar] [CrossRef]
  5. Gama, J. Knowledge Discovery from Data Streams; Chapman & Hall/CRC: London, UK, 2010. [Google Scholar]
  6. Domingos, P.; Hulten, G. Mining high-speed data streams. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2000), Boston, MA, USA, 20–23 August 2000; pp. 71–80. [Google Scholar]
  7. Read, J.; Bifet, A.; Holmes, G.; Pfahringer, B. Scalable and efficient multi-label classification for evolving data streams. Mach. Learn. 2012, 88, 243–272. [Google Scholar] [CrossRef]
  8. Tsymbal, A. The Problem of Concept Drift: Definitions and Related Work; Technical Report; Department of Computer Science, Trinity College: Dublin, Ireland, 2004. [Google Scholar]
  9. Gama, J.; Žliobaitė, I.; Bifet, A.; Pechenizkiy, M.; Bouchachia, A. A survey on concept drift adaptation. ACM Comput. Surv. 2014, 46, 231–238. [Google Scholar] [CrossRef]
  10. Livieris, I.E.; Kanavos, A.; Tampakas, V.; Pintelas, P. A Weighted Voting Ensemble Self-Labeled Algorithm for the Detection of Lung Abnormalities from X-Rays. Algorithms 2019, 12, 64. [Google Scholar] [CrossRef]
  11. Webb, G.I.; Hyde, R.; Cao, H.; Nguyen, H.L.; Petitjean, F. Characterizing concept drift. Data Mining Knowl. Discov. 2016, 30, 964–994. [Google Scholar] [CrossRef]
  12. Zhang, M.L.; Zhou, Z.H. A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 2014, 26, 1819–1837. [Google Scholar] [CrossRef]
  13. Zhou, Z.H. Ensemble Methods: Foundations and Algorithms; Chapman and Hall/CRC: London, UK, 2012. [Google Scholar]
  14. Tsoumakas, G.; Katakis, I.; Vlahavas, I. Mining Multi-Label Data. Data Mining and Knowledge Discovery Handbook; Springer: Boston, MA, USA, 2010; pp. 667–685. [Google Scholar]
  15. Clare, A.; King, R.D. Knowledge discovery in multi-label phenotype data. In Proceedings of the Fifth European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2001), Freiburg, Germany, 3–5 September 2001; pp. 42–53. [Google Scholar]
  16. Zhang, M.; Zhou, Z.H. Ml-knn: A lazy learning approach to multi-label learning. Patt. Recogn. 2007, 40, 2038–2048. [Google Scholar] [CrossRef]
  17. Zhang, M.L.; Zhou, Z.H. Multi-label neural networks with applications to functional genomics and text categorization. IEEE Trans. Knowl. Data Eng. 2006, 18, 1338–1351. [Google Scholar] [CrossRef]
  18. Schapire, R.E.; Singer, Y. Boostexter: A boosting-based system for text categorization. Mach. Learn. 2000, 39, 135–168. [Google Scholar] [CrossRef]
  19. Read, J.; Pfahringer, B.; Holmes, G.; Frank, E. Classifier chains for multi-label classification. Mach. Learn. 2011, 85, 333–359. [Google Scholar] [CrossRef]
  20. Cheng, W.; Hullermeier, E. Combining instance-based learning and logistic regression for multilabel classification. Mach. Learn. 2009, 76, 211–225. [Google Scholar] [CrossRef]
  21. Zhang, M.; Zhang, K. Multi-label learning by exploiting label dependency. In Proceedings of the Sixteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2010), Washington, DC, USA, 25–28 July 2010; pp. 999–1008. [Google Scholar]
  22. Dembczynski, K.; Waegeman, W.; Cheng, W.; Hüllermeier, E. On label dependence and loss minimization in multi-label classification. Mach. Learn. 2012, 88, 5–45. [Google Scholar] [CrossRef]
  23. Qu, W.; Zhang, Y.; Zhu, Y.J. Mining multi-label concept-drifting data streams using dynamic classifier ensemble. In Proceedings of the First Asian Conference on Machine Learning (ACML 2009, LNCS 5828), Nanjing, China, 2–4 November 2009; pp. 308–321. [Google Scholar]
  24. Kong, X.; Yu, P.S. An ensemble-based approach to fast classification of multi-label data streams. In Proceedings of the Seventh International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom 2011), Orlando, FL, USA, 15–18 October 2011; pp. 95–104. [Google Scholar]
  25. Read, J.; Bifet, A.; Pfahringer, B. Efficient Multi-Label Classification for Evolving Data Streams; Technical Report; University of Waikato: Hamilton‎, New Zealand, 2011. [Google Scholar]
  26. Xioufis, E.S.; Spiliopoulou, M.; Tsoumakas, G. Dealing with concept drift and class imbalance in multi-label stream classification. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI 2011), Barcelona, Spain, 16–22 July 2011; pp. 1583–1588. [Google Scholar]
  27. Shi, Z.; Wen, Y.; Feng, C. Drift detection for multi-label data streams based on label grouping and entropy. In Proceedings of the Fourteenth International Conference on Data Mining Workshop (ICDM 2014), Shenzhen, China, 14 December 2014; pp. 724–731. [Google Scholar]
  28. Aljaž, O.; Panov, P.; Džeroski, S. Multi-label classification via multi-target regression on data streams. Mach. Learn. 2017, 106, 745–770. [Google Scholar]
  29. Roseberry, M.; Cano, A. Multi-label kNN Classifier with Self Adjusting Memory for Drifting Data Streams. In Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications, Dublin, Ireland, 10–14 September 2018; pp. 23–37. [Google Scholar]
  30. Büyükçakir, A.; Bonab, H.; Can, F. A novel online stacked ensemble for multi-label stream classification. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 22–26 October 2018; pp. 1063–1072. [Google Scholar]
  31. Gama, J.; Medas, P.; Castillo, G.; Rodrigues, P.P. Learning with drift detection. In Proceedings of the Seventeenth Brazilian Symposium on Artificial Intelligence (SBIA 2004, LNCS 3171), São Luis, Maranhão, Brazil, 29 September–1 October 2004; pp. 286–295. [Google Scholar]
  32. Bifet, A.; Gavalda, R. Learning from time-changing data with adaptive windowing. In Proceedings of the Seventh SIAM International Conference on Data Mining (SDM 2007), Bethesda, MD, USA, 20–22 April 2006; pp. 443–448. [Google Scholar]
  33. Ross, G.J.; Adams, N.M.; Tasoulis, D.K. Exponentially weighted moving average charts for detecting concept drift. Patt. Recogn. Lett. 2012, 33, 191–198. [Google Scholar] [CrossRef]
  34. Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Statist. 1951, 22, 79–86. [Google Scholar] [CrossRef]
  35. Sun, Y.; Wang, Z.; Liu, H. Online Ensemble Using Adaptive Windowing for Data Streams with Concept Drift. Int. J. Distrib. Sens. Netw. 2016, 1–9. [Google Scholar] [CrossRef]
  36. Bifet, A.; Holmes, G.; Kirkby, R.; Pfahringer, B. MOA: Massive online analysis. J. Mach. Learn. Res. 2010, 11, 1601–1604. [Google Scholar] [CrossRef]

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.