The clinical decision support system provides an automatic diagnosis of human diseases using machine learning techniques to analyze features of patients and classify patients according to different diseases. An analysis of real-world electronic health record (EHR) data has revealed that a patient could be diagnosed as having more than one disease simultaneously. Therefore, to suggest a list of possible diseases, the task of classifying patients is transferred into a multi-label learning task. For most multi-label learning techniques, the class imbalance that exists in EHR data may bring about performance degradation. Cross-Coupling Aggregation (COCOA) is a typical multi-label learning approach that is aimed at leveraging label correlation and exploring class imbalance. For each label, COCOA aggregates the predictive result of a binary-class imbalance classifier corresponding to this label as well as the predictive results of some multi-class imbalance classifiers corresponding to the pairs of this label and other labels. However, class imbalance may still affect a multi-class imbalance learner when the number of a coupling label is too small. To improve the performance of COCOA, a regularized ensemble approach integrated into a multi-class classification process of COCOA named as COCOA-RE is presented in this paper. To provide disease diagnosis, COCOA-RE learns from the available laboratory test reports and essential information of patients and produces a multi-label predictive model. Experiments were performed to validate the effectiveness of the proposed multi-label learning approach, and the proposed approach was implemented in a developed system prototype.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited