Ensemble Algorithms and/or Explainability

A special issue of Algorithms (ISSN 1999-4893).

Deadline for manuscript submissions: closed (10 October 2022) | Viewed by 24122

Special Issue Editors


E-Mail Website
Guest Editor
Department of Mathematics, University of Patras, GR 265-00 Patras, Greece
Interests: software engineering; AI in education; intelligent systems; decision support systems; machine learning; data mining; knowledge discovery
Special Issues, Collections and Topics in MDPI journals

E-Mail Website1 Website2
Guest Editor
Department of Mathematics, University of Patras, GR 265-00 Patras, Greece
Interests: artificial intelligence; machine learning; neural networks; deep learning; optimization algorithms
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

We invite you to submit your latest research in the area of “Ensemble Algorithms and/or Explainability” to this Special Issue.

During the last several decades, in the area of machine learning and data mining, ensemble methods have constituted a state-of-the-art choice for the development of powerful and robust prediction models. These models exploit the individual predictions of a variety of constituent learning algorithms to obtain better prediction performance, which was proved both theoretically and experimentally. Thus, many ensemble learning algorithms have been proposed in the literature and found their application in various real-world problems ranging from face and emotion recognition through text classification and medical diagnosis to financial forecasting, to mention only a few.

Recently, the European Union General Data Protection Regulation (GDPR) demanded a “right to explanation” for decisions performed by automated and artificial intelligent algorithmic systems. This demand, combined with the need to be able to interpret or explain and justify the decisions/predictions of a classifier or ensemble which has already been recognized by many researchers, led to the development of “interpretable/explainable machine learning and artificial intelligence” which has gained great attention from the scientific community.

Given that ensembles and deep-learning models produce more accurate predictions, we need to develop new methods and algorithms in order to create explainable ML and AI models which are nearly as accurate as the non-explainable ones.

The aim of this Special Issue is to present the recent advances related to all kinds of ensemble learning algorithms and methodologies and investigate the impact of their application in a diversity of real-world problems. At the same time, the need to research the explainability issues involved in theory and practice has become of paramount importance for all kinds of daily and industrial applications.

Prof. Dr. Panagiotis Pintelas
Dr. Ioannis E. Livieris
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Implementation of ensemble learning algorithms
  • Ensemble learning methodologies for handling imbalanced data
  • Ensemble methods in clustering
  • Homogeneous and heterogeneous ensembles
  • Black, white, and gray box models
  • Distributed ensemble learning algorithms
  • Ensemble methods in agent and multi-agent systems
  • Explainable artificial intelligence (XAI)
  • Human-understandable machine learning
  • Transparency
  • Interpretability and explainability
  • Graph neural networks for explainability
  • Interpretable machine learning
  • Machine learning and knowledge-graphs
  • Fuzzy systems and explainability
  • Interactive data mining and explanations
  • Black-box model interpretation

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

4 pages, 175 KiB  
Editorial
Special Issue on Ensemble Learning and/or Explainability
by Panagiotis Pintelas and Ioannis E. Livieris
Algorithms 2023, 16(1), 49; https://doi.org/10.3390/a16010049 - 11 Jan 2023
Viewed by 1537
Abstract
This article will summarize the works published in a Special Issue of Algorithms, entitled “Ensemble Learning and/or Explainability”(https://www [...] Full article
(This article belongs to the Special Issue Ensemble Algorithms and/or Explainability)

Research

Jump to: Editorial

27 pages, 4121 KiB  
Article
Ensembles of Random SHAPs
by Lev Utkin and Andrei Konstantinov
Algorithms 2022, 15(11), 431; https://doi.org/10.3390/a15110431 - 17 Nov 2022
Cited by 7 | Viewed by 1811
Abstract
The ensemble-based modifications of the well-known SHapley Additive exPlanations (SHAP) method for the local explanation of a black-box model are proposed. The modifications aim to simplify the SHAP which is computationally expensive when there is a large number of features. The main idea [...] Read more.
The ensemble-based modifications of the well-known SHapley Additive exPlanations (SHAP) method for the local explanation of a black-box model are proposed. The modifications aim to simplify the SHAP which is computationally expensive when there is a large number of features. The main idea behind the proposed modifications is to approximate the SHAP by an ensemble of SHAPs with a smaller number of features. According to the first modification, called the ER-SHAP, several features are randomly selected many times from the feature set, and the Shapley values for the features are computed by means of “small” SHAPs. The explanation results are averaged to obtain the final Shapley values. According to the second modification, called the ERW-SHAP, several points are generated around the explained instance for diversity purposes, and the results of their explanation are combined with weights depending on the distances between the points and the explained instance. The third modification, called the ER-SHAP-RF, uses the random forest for a preliminary explanation of the instances and determines a feature probability distribution which is applied to the selection of the features in the ensemble-based procedure of the ER-SHAP. Many numerical experiments illustrating the proposed modifications demonstrate their efficiency and properties for a local explanation. Full article
(This article belongs to the Special Issue Ensemble Algorithms and/or Explainability)
Show Figures

Figure 1

20 pages, 1288 KiB  
Article
Extreme Learning Machine Enhanced Gradient Boosting for Credit Scoring
by Yao Zou and Changchun Gao
Algorithms 2022, 15(5), 149; https://doi.org/10.3390/a15050149 - 27 Apr 2022
Cited by 3 | Viewed by 2831
Abstract
Credit scoring is an effective tool for banks and lending companies to manage the potential credit risk of borrowers. Machine learning algorithms have made grand progress in automatic and accurate discrimination of good and bad borrowers. Notably, ensemble approaches are a group of [...] Read more.
Credit scoring is an effective tool for banks and lending companies to manage the potential credit risk of borrowers. Machine learning algorithms have made grand progress in automatic and accurate discrimination of good and bad borrowers. Notably, ensemble approaches are a group of powerful tools to enhance the performance of credit scoring. Random forest (RF) and Gradient Boosting Decision Tree (GBDT) have become the mainstream ensemble methods for precise credit scoring. RF is a Bagging-based ensemble that realizes accurate credit scoring enriches the diversity base learners by modifying the training object. However, the optimization pattern that works on invariant training targets may increase the statistical independence of base learners. GBDT is a boosting-based ensemble approach that reduces the credit scoring error by iteratively changing the training target while keeping the training features unchanged. This may harm the diversity of base learners. In this study, we incorporate the advantages of the Bagging ensemble training strategy and boosting ensemble optimization pattern to enhance the diversity of base learners. An extreme learning machine-based supervised augmented GBDT is proposed to enhance the discriminative ability for credit scoring. Experimental results on 4 public credit datasets show a significant improvement in credit scoring and suggest that the proposed method is a good solution to realize accurate credit scoring. Full article
(This article belongs to the Special Issue Ensemble Algorithms and/or Explainability)
Show Figures

Figure 1

32 pages, 4929 KiB  
Article
A Seed-Guided Latent Dirichlet Allocation Approach to Predict the Personality of Online Users Using the PEN Model
by Saravanan Sagadevan, Nurul Hashimah Ahamed Hassain Malim and Mohd Heikal Husin
Algorithms 2022, 15(3), 87; https://doi.org/10.3390/a15030087 - 08 Mar 2022
Cited by 3 | Viewed by 2553
Abstract
There is a growing interest in topic modeling to decipher the valuable information embedded in natural texts. However, there are no studies training an unsupervised model to automatically categorize the social networks (SN) messages according to personality traits. Most of the existing literature [...] Read more.
There is a growing interest in topic modeling to decipher the valuable information embedded in natural texts. However, there are no studies training an unsupervised model to automatically categorize the social networks (SN) messages according to personality traits. Most of the existing literature relied on the Big 5 framework and psychological reports to recognize the personality of users. Furthermore, collecting datasets for other personality themes is an inherent problem that requires unprecedented time and human efforts, and it is bounded with privacy constraints. Alternatively, this study hypothesized that a small set of seed words is enough to decipher the psycholinguistics states encoded in texts, and the auxiliary knowledge could synergize the unsupervised model to categorize the messages according to human traits. Therefore, this study devised a dataless model called Seed-guided Latent Dirichlet Allocation (SLDA) to categorize the SN messages according to the PEN model that comprised Psychoticism, Extraversion, and Neuroticism traits. The intrinsic evaluations were conducted to determine the performance and disclose the nature of texts generated by SLDA, especially in the context of Psychoticism. The extrinsic evaluations were conducted using several machine learning classifiers to posit how well the topic model has identified latent semantic structure that persists over time in the training documents. The findings have shown that SLDA outperformed other models by attaining a coherence score up to 0.78, whereas the machine learning classifiers can achieve precision up to 0.993. We also will be shared the corpus generated by SLDA for further empirical studies. Full article
(This article belongs to the Special Issue Ensemble Algorithms and/or Explainability)
Show Figures

Figure 1

16 pages, 4605 KiB  
Article
Prediction of Injuries in CrossFit Training: A Machine Learning Perspective
by Serafeim Moustakidis, Athanasios Siouras, Konstantinos Vassis, Ioannis Misiris, Elpiniki Papageorgiou and Dimitrios Tsaopoulos
Algorithms 2022, 15(3), 77; https://doi.org/10.3390/a15030077 - 24 Feb 2022
Cited by 4 | Viewed by 3311
Abstract
CrossFit has gained recognition and interest among physically active populations being one of the most popular and rapidly growing exercise regimens worldwide. Due to the intense and repetitive nature of CrossFit, concerns have been raised over the potential injury risks that are associated [...] Read more.
CrossFit has gained recognition and interest among physically active populations being one of the most popular and rapidly growing exercise regimens worldwide. Due to the intense and repetitive nature of CrossFit, concerns have been raised over the potential injury risks that are associated with its training including rhabdomyolysis and musculoskeletal injuries. However, identification of risk factors for predicting injuries in CrossFit athletes has been limited by the absence of relevant big epidemiological studies. The main purpose of this paper is the identification of risk factors and the development of machine learning-based models using ensemble learning that can predict CrossFit injuries. To accomplish the aforementioned targets, a survey-based epidemiological study was conducted in Greece to collect data on musculoskeletal injuries in CrossFit practitioners. A Machine Learning (ML) pipeline was then implemented that involved data pre-processing, feature selection and well-known ML models. The performance of the proposed ML models was assessed using a comprehensive cross validation mechanism whereas a discussion on the nature of the selected features is also provided. An area under the curve (AUC) of 77.93% was achieved by the best ML model using ensemble learning (Adaboost) on the group of six selected risk factors. The effectiveness of the proposed approach was evaluated in a comparative analysis with respect to numerous performance metrics including accuracy, sensitivity, specificity, AUC and confusion matrices to confirm its clinical relevance. The results are the basis for the development of reliable tools for the prediction of injuries in CrossFit. Full article
(This article belongs to the Special Issue Ensemble Algorithms and/or Explainability)
Show Figures

Figure 1

16 pages, 2467 KiB  
Article
Precision-Based Weighted Blending Distributed Ensemble Model for Emotion Classification
by Gayathri Soman, M. V. Vivek, M. V. Judy, Elpiniki Papageorgiou and Vassilis C. Gerogiannis
Algorithms 2022, 15(2), 55; https://doi.org/10.3390/a15020055 - 06 Feb 2022
Cited by 5 | Viewed by 2210
Abstract
Focusing on emotion recognition, this paper addresses the task of emotion classification and its performance with respect to accuracy, by investigating the capabilities of a distributed ensemble model using precision-based weighted blending. Research on emotion recognition and classification refers to the detection of [...] Read more.
Focusing on emotion recognition, this paper addresses the task of emotion classification and its performance with respect to accuracy, by investigating the capabilities of a distributed ensemble model using precision-based weighted blending. Research on emotion recognition and classification refers to the detection of an individual’s emotional state by considering various types of data as input features, such as textual data, facial expressions, vocal, gesture and physiological signal recognition, electrocardiogram (ECG) and electrodermography (EDG)/galvanic skin response (GSR). The extraction of effective emotional features from different types of input data, as well as the analysis of large volume of real-time data, have become increasingly important tasks in order to perform accurate classification. Taking into consideration the volume and variety of the examined problem, a machine learning model that works in a distributed manner is essential. In this direction, we propose a precision-based weighted blending distributed ensemble model for emotion classification. The suggested ensemble model can work well in a distributed manner using the concepts of Spark’s resilient distributed datasets, which provide quick in-memory processing capabilities and also perform iterative computations effectively. Regarding model validation set, weights are assigned to different classifiers in the ensemble model, based on their precision value. Each weight determines the importance of the respective classifier in terms of its performing prediction, while a new model is built upon the derived weights. The produced model performs the task of final prediction on the test dataset. The results disclose that the proposed ensemble model is sufficiently accurate in differentiating between primary emotions (such as sadness, fear, and anger) and secondary emotions. The suggested ensemble model achieved accuracy of 76.2%, 99.4%, and 99.6% on the FER-2013, CK+, and FERG-DB datasets, respectively. Full article
(This article belongs to the Special Issue Ensemble Algorithms and/or Explainability)
Show Figures

Figure 1

22 pages, 459 KiB  
Article
A Rule Extraction Technique Applied to Ensembles of Neural Networks, Random Forests, and Gradient-Boosted Trees
by Guido Bologna
Algorithms 2021, 14(12), 339; https://doi.org/10.3390/a14120339 - 23 Nov 2021
Cited by 9 | Viewed by 2576
Abstract
In machine learning, ensembles of models based on Multi-Layer Perceptrons (MLPs) or decision trees are considered successful models. However, explaining their responses is a complex problem that requires the creation of new methods of interpretation. A natural way to explain the classifications of [...] Read more.
In machine learning, ensembles of models based on Multi-Layer Perceptrons (MLPs) or decision trees are considered successful models. However, explaining their responses is a complex problem that requires the creation of new methods of interpretation. A natural way to explain the classifications of the models is to transform them into propositional rules. In this work, we focus on random forests and gradient-boosted trees. Specifically, these models are converted into an ensemble of interpretable MLPs from which propositional rules are produced. The rule extraction method presented here allows one to precisely locate the discriminating hyperplanes that constitute the antecedents of the rules. In experiments based on eight classification problems, we compared our rule extraction technique to “Skope-Rules” and other state-of-the-art techniques. Experiments were performed with ten-fold cross-validation trials, with propositional rules that were also generated from ensembles of interpretable MLPs. By evaluating the characteristics of the extracted rules in terms of complexity, fidelity, and accuracy, the results obtained showed that our rule extraction technique is competitive. To the best of our knowledge, this is one of the few works showing a rule extraction technique that has been applied to both ensembles of decision trees and neural networks. Full article
(This article belongs to the Special Issue Ensemble Algorithms and/or Explainability)
Show Figures

Figure 1

22 pages, 6303 KiB  
Article
Ensembling EfficientNets for the Classification and Interpretation of Histopathology Images
by Athanasios Kallipolitis, Kyriakos Revelos and Ilias Maglogiannis
Algorithms 2021, 14(10), 278; https://doi.org/10.3390/a14100278 - 26 Sep 2021
Cited by 20 | Viewed by 2667
Abstract
The extended utilization of digitized Whole Slide Images is transforming the workflow of traditional clinical histopathology to the digital era. The ongoing transformation has demonstrated major potentials towards the exploitation of Machine Learning and Deep Learning techniques as assistive tools for specialized medical [...] Read more.
The extended utilization of digitized Whole Slide Images is transforming the workflow of traditional clinical histopathology to the digital era. The ongoing transformation has demonstrated major potentials towards the exploitation of Machine Learning and Deep Learning techniques as assistive tools for specialized medical personnel. While the performance of the implemented algorithms is continually boosted by the mass production of generated Whole Slide Images and the development of state-of the-art deep convolutional architectures, ensemble models provide an additional methodology towards the improvement of the prediction accuracy. Despite the earlier belief related to deep convolutional networks being treated as black boxes, important steps for the interpretation of such predictive models have also been proposed recently. However, this trend is not fully unveiled for the ensemble models. The paper investigates the application of an explanation scheme for ensemble classifiers, while providing satisfactory classification results of histopathology breast and colon cancer images in terms of accuracy. The results can be interpreted by the hidden layers’ activation of the included subnetworks and provide more accurate results than single network implementations. Full article
(This article belongs to the Special Issue Ensemble Algorithms and/or Explainability)
Show Figures

Figure 1

14 pages, 620 KiB  
Article
The Study of Multiple Classes Boosting Classification Method Based on Local Similarity
by Shixun Wang and Qiang Chen
Algorithms 2021, 14(2), 37; https://doi.org/10.3390/a14020037 - 26 Jan 2021
Cited by 5 | Viewed by 2132
Abstract
Boosting of the ensemble learning model has made great progress, but most of the methods are Boosting the single mode. For this reason, based on the simple multiclass enhancement framework that uses local similarity as a weak learner, it is extended to multimodal [...] Read more.
Boosting of the ensemble learning model has made great progress, but most of the methods are Boosting the single mode. For this reason, based on the simple multiclass enhancement framework that uses local similarity as a weak learner, it is extended to multimodal multiclass enhancement Boosting. First, based on the local similarity as a weak learner, the loss function is used to find the basic loss, and the logarithmic data points are binarized. Then, we find the optimal local similarity and find the corresponding loss. Compared with the basic loss, the smaller one is the best so far. Second, the local similarity of the two points is calculated, and then the loss is calculated by the local similarity of the two points. Finally, the text and image are retrieved from each other, and the correct rate of text and image retrieval is obtained, respectively. The experimental results show that the multimodal multi-class enhancement framework with local similarity as the weak learner is evaluated on the standard data set and compared with other most advanced methods, showing the experience proficiency of this method. Full article
(This article belongs to the Special Issue Ensemble Algorithms and/or Explainability)
Show Figures

Figure 1

Back to TopTop