Classification algorithms are widely taken into account for clinical decision support systems. However, it is not always straightforward to understand the behavior of such algorithms on a multiple disease prediction task. When a new classifier is introduced, we, in most cases, will ask ourselves whether the classifier performs well on a particular clinical dataset or not. The decision to utilize classifiers mostly relies upon the type of data and classification task, thus making it often made arbitrarily. In this study, a comparative evaluation of a wide-array classifier pertaining to six different families, i.e., tree, ensemble, neural, probability, discriminant, and rule-based classifiers are dealt with. A number of real-world publicly datasets ranging from different diseases are taken into account in the experiment in order to demonstrate the generalizability of the classifiers in multiple disease prediction. A total of 25 classifiers, 14 datasets, and three different resampling techniques are explored. This study reveals that the classifier that is likely to become the best performer is the conditional inference tree forest (cforest), followed by linear discriminant analysis, generalize linear model, random forest, and Gaussian process classifier. This work contributes to existing literature regarding a thorough benchmark of classification algorithms for multiple diseases prediction.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited