The rapid growth in data makes the quest for highly scalable learners a popular one. To achieve the trade-off between structure complexity and classification accuracy, the k
-dependence Bayesian classifier (KDB) allows to represent different number of interdependencies for different data sizes. In this paper, we proposed two methods to improve the classification performance of KDB. Firstly, we use the minimal-redundancy-maximal-relevance analysis, which sorts the predictive features to identify redundant ones. Then, we propose an improved discriminative model selection to select an optimal sub-model by removing redundant features and arcs in the Bayesian network. Experimental results on 40 UCI datasets demonstrate that these two techniques are complementary and the proposed algorithm achieves competitive classification performance, and less classification time than other state-of-the-art Bayesian network classifiers like tree-augmented naive Bayes and averaged one-dependence estimators.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited