WISCA: A Consensus-Based Approach to Harmonizing Interpretability in Tabular Datasets
Abstract
1. Introduction
2. Materials and Methods
2.1. Datasets
2.2. Machine Learning Models
2.3. Interpretability Algorithms
2.4. Consensus Functions
2.4.1. Arithmetic Mean
2.4.2. Harmonic Mean
2.4.3. Geometric Mean
2.4.4. Voting
2.4.5. Relative Position
2.4.6. Other Functions
2.5. Development of a Novel Consensus Function
2.5.1. Identified Challenges
- Based on attributions. Some consensus functions, such as voting or the relative position, ignore the importance of the input features in the decision. Although they may be less computationally expensive, they overlook the magnitude of the contributions provided by interpretability algorithms.
- Scale attributions. The interpretability approaches used in this work do not handle attributions in the same range, making comparing them difficult and unfair. In consequence, the proposed function will normalize all the attributions in the range [0, 1] using the min-max approach (Equation (5)). To preserve the sign of the attributions, the scaled values are multiplied by the original sign of the attribution (attr_sign), represented by 1 or −1, to mark the difference between positive and negative attributions.
- Distinguish global and local explanations. Global interpretability methods return a single attribution value per feature, which is inferred from the attributions of all individual samples. On the contrary, local methods return a single value per sample. Therefore, if both values were combined into one single equation, local methods would have more importance simply because they contribute more values. To overcome this problem, the attributions of the local methods have to be divided by the number of samples so that each local attribution contributes in the same proportion as the global ones.
- Weight the errors. Local interpretability methods explain each sample individually. However, not all the samples are predicted with the same certainty. For example, a classification sample that is predicted with a probability of 0.5 is the result of a random decision. However, another sample, which is classified with a probability of 0.99, is a very reliable prediction whose explanation is of great interest. Similarly, the difference between the predicted value and the actual value in a regression problem is a measure of how reliable the prediction is. Hence, this can be an essential parameter when assessing local methods. The impact of class probability or regression error in model explanations should be represented by a correction factor.
2.5.2. WISCA Formulation
2.5.3. Correction Factor in Regression
2.5.4. Alternatives for the Classification Correction Factor
- Quadratic. This function represents a smooth and symmetrical curve. It shows a rapid drop towards the minimum at p = 0.5. It reaches its maximum at p = 0 and p = 1.
- Power. This function behaves similarly to the parabola, but does not touch exactly 1 at the ends. When n = 2 it shows a sharper shape than the parabola. As its behavior is nearly identical to the quadratic function, consequently, it can be omitted.
- Cosine. The cosine function shows a smooth and wavy behavior. It reaches perfect maxima at 0 and 1 and a minimum at 0.5. However, the transition is much slower than that of the parabola.
- Exponential. This function is displayed as an inverted bell. The fall speed is adjustable by means of a parameter (a). The higher a, the steeper the fall. At probabilities close to 0 and 1, it behaves more smoothly. However, we want our function to be more constant in its curvature.
- Negative entropy. It behaves very similarly to a quadratic function, but the curve is slightly steeper. That makes the factor take longer to grow, i.e., intermediate between (0, 0.5) and (0.5, 1), the attributions become more similar because the correction factor varies less.
3. Results
3.1. Model Training
3.2. Consensus
4. Discussion
4.1. Assessment of Consensus Functions
4.2. Evaluation of WISCA
4.3. Linear Models vs. WISCA
4.4. Real-World Datasets
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| AI | Artificial Intelligence |
| ANN | Artificial Neural Network |
| AUC | Area Under the Curve |
| DL | Deep Learning |
| EF | Expected Feature |
| KNN | K-Nearest Neighbors |
| LIME | Local Interpretable Model-agnostic Explanations |
| LR | Linear/Logistic Regressor |
| MAE | Mean Absolute Error |
| MDI | Mean Decrease in Impurity |
| ML | Machine Learning |
| MSE | Mean Squared Error |
| NEF | Non-Expected Feature |
| RF | Random Forest |
| SHAP | SHapley Additive exPlanations |
| SVM | Support Vector Machine |
| WISCA | WeIghted Scaled Consensus Attributions |
| XAI | eXplainable Artificial Intelligence |
| XGB | eXtreme Gradient Boosting |
References
- Qu, K.; Guo, F.; Liu, X.; Lin, Y.; Zou, Q. Application of machine learning in microbiology. Front. Microbiol. 2019, 10, 827. [Google Scholar] [CrossRef] [PubMed]
- Jones, D.T. Setting the standards for machine learning in biology. Nat. Rev. Mol. Cell Biol. 2019, 20, 659–660. [Google Scholar] [CrossRef] [PubMed]
- Vamathevan, J.; Clark, D.; Czodrowski, P.; Dunham, I.; Ferran, E.; Lee, G.; Li, B.; Madabhushi, A.; Shah, P.; Spitzer, M.; et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 2019, 18, 463–477. [Google Scholar] [CrossRef] [PubMed]
- Ekins, S.; Puhl, A.C.; Zorn, K.M.; Lane, T.R.; Russo, D.P.; Klein, J.J.; Hickey, A.J.; Clark, A.M. Exploiting machine learning for end-to-end drug discovery and development. Nat. Mater. 2019, 18, 435–441. [Google Scholar] [CrossRef] [PubMed]
- Elbadawi, M.; Gaisford, S.; Basit, A.W. Advanced machine-learning techniques in drug discovery. Drug Discov. Today 2021, 26, 769–777. [Google Scholar] [CrossRef]
- Bi, K.; Xie, L.; Zhang, H.; Chen, X.; Gu, X.; Tian, Q. Accurate medium-range global weather forecasting with 3D neural networks. Nature 2023, 619, 533–538. [Google Scholar] [CrossRef]
- Pathak, J.; Subramanian, S.; Berkeley, L.; Harrington, P.; Raja, S.; Chattopadhyay, A.; Mardani, M.; Kurth, T.; Hall, D.; Li, Z.; et al. FOURCASTNET: ADATA-DRIVEN MODEL FOR HIGH-RESOLUTION WEATHER FORECASTS USING ADAPTIVE FOURIER NEURAL OPERATORS. Ann Arbor 2022, 1001, 48109. [Google Scholar]
- Stirnberg, R.; Cermak, J.; Kotthaus, S.; Haeffelin, M.; Andersen, H.; Fuchs, J.; Kim, M.; Petit, J.E.; Favez, O. Meteorology-driven variability of air pollution (PM 1) revealed with explainable machine learning. Atmos. Chem. Phys. 2021, 21, 3919–3948. [Google Scholar] [CrossRef]
- Waljee, A.K.; Higgins, P.D.R. Machine learning in medicine: A primer for physicians. Am. J. Gastroenterol. 2010, 105, 1224–1226. [Google Scholar] [CrossRef]
- Sidey-Gibbons, J.A.M.; Sidey-Gibbons, C.J. Machine learning in medicine: A practical introduction. BMC Med. Res. Methodol. 2019, 19, 64. [Google Scholar] [CrossRef]
- Rajkomar, A.; Dean, J.; Kohane, I. Machine learning in medicine. N. Engl. J. Med. 2019, 380, 1347–1358. [Google Scholar] [CrossRef] [PubMed]
- Yang, C.C. Explainable artificial intelligence for predictive modeling in healthcare. J. Healthc. Inform. Res. 2022, 6, 228–239. [Google Scholar] [CrossRef] [PubMed]
- Payrovnaziri, S.N.; Chen, Z.; Rengifo-Moreno, P.; Miller, T.; Bian, J.; Chen, J.H.; Liu, X.; He, Z. Explainable artificial intelligence models using real-world electronic health record data: A systematic scoping review. J. Am. Med. Inform. Assoc. 2020, 27, 1173–1185. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Weng, Y.; Lund, J. Applications of explainable artificial intelligence in diagnosis and surgery. Diagnostics 2022, 12, 237. [Google Scholar] [CrossRef]
- Chen, X.Q.; Ma, C.Q.; Ren, Y.S.; Lei, Y.T.; Huynh, N.Q.A.; Narayan, S. Explainable artificial intelligence in finance: A bibliometric review. Fin. Res. Lett. 2023, 56, 104145. [Google Scholar] [CrossRef]
- Demajo, L.M.; Vella, V.; Dingli, A. Explainable ai for interpretable credit scoring. arXiv 2020, arXiv:2012.03749. [Google Scholar] [CrossRef]
- Černevičienė, J.; Kabašinskas, A. Review of multi-criteria decision-making methods in finance using explainable artificial intelligence. Front. Artif. Intell. 2022, 5, 827584. [Google Scholar] [CrossRef]
- Gilpin, L.H.; Bau, D.; Yuan, B.Z.; Bajwa, A.; Specter, M.; Kagal, L. Explaining explanations: An overview of interpretability of machine learning. In Proceedings of the 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), Turin, Italy, 1–3 October 2018; pp. 80–89. [Google Scholar]
- Bennetot, A.; Donadello, I.; El Qadi El Haouari, A.; Dragoni, M.; Frossard, T.; Wagner, B.; Sarranti, A.; Tulli, S.; Trocan, M.; Chatila, R.; et al. A Practical Tutorial on Explainable AI Techniques. ACM Comput. Surv. 2024, 57, 1–44. [Google Scholar] [CrossRef]
- Pavlidis, G. Unlocking the black box: Analysing the EU artificial intelligence act’s framework for explainability in AI. Law Innov. Technol. 2024, 16, 293–308. [Google Scholar] [CrossRef]
- Linardatos, P.; Papastefanopoulos, V.; Kotsiantis, S. Explainable ai: A review of machine learning interpretability methods. Entropy 2020, 23, 18. [Google Scholar] [CrossRef]
- Molnar, C. Interpretable Machine Learning. 2020. Available online: https://christophm.github.io/interpretable-ml-book/ (accessed on 18 February 2026).
- Rainio, O.; Teuho, J.; Klén, R. Evaluation metrics and statistical tests for machine learning. Sci. Rep. 2024, 14, 6086. [Google Scholar] [CrossRef]
- Zhou, J.; Gandomi, A.H.; Chen, F.; Holzinger, A. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics 2021, 10, 593. [Google Scholar] [CrossRef]
- Nauta, M.; Trienes, J.; Pathak, S.; Nguyen, E.; Peters, M.; Schmitt, Y.; Schlötterer, J.; van Keulen, M.; Seifert, C. From anecdotal evidence to quantitative evaluation methods: A systematic review on evaluating explainable ai. ACM Comput. Surv. 2023, 55, 1–42. [Google Scholar] [CrossRef]
- Hoffman, R.R.; Mueller, S.T.; Klein, G.; Litman, J. Measures for explainable AI: Explanation goodness, user satisfaction, mental models, curiosity, trust, and human-AI performance. Front. Comput. Sci. 2023, 5, 1096257. [Google Scholar] [CrossRef]
- Holzinger, A.; Carrington, A.; Müller, H. Measuring the quality of explanations: The system causability scale (SCS) comparing human and machine explanations. KI-Künstl. Intell. 2020, 34, 193–198. [Google Scholar] [CrossRef] [PubMed]
- Islam, S.R.; Eberle, W.; Ghafoor, S.K. Towards quantification of explainability in explainable artificial intelligence methods. In Proceedings of the Thirty-Third International Florida Artificial Intelligence Research Society Conference (FLAIRS 2020), North Miami Beach, FL, USA, 17–20 May 2020; pp. 75–81. [Google Scholar]
- Rosenfeld, A. Better metrics for evaluating explainable artificial intelligence. In Proceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems, Virtual, 3–7 May 2021; pp. 45–50. [Google Scholar]
- Rudin, C.; Chen, C.; Chen, Z.; Huang, H.; Semenova, L.; Zhong, C. Interpretable machine learning: Fundamental principles and 10 grand challenges. Stat. Surv. 2022, 16, 1–85. [Google Scholar] [CrossRef]
- Krishnan, M. Against interpretability: A critical examination of the interpretability problem in machine learning. Philos. Technol. 2020, 33, 487–502. [Google Scholar] [CrossRef]
- Vowels, M.J. Trying to outrun causality with machine learning: Limitations of model explainability techniques for identifying predictive variables. Stat 2022, 1050, 22. [Google Scholar]
- Saeed, W.; Omlin, C. Explainable AI (XAI): A systematic meta-survey of current challenges and future opportunities. Knowl.-Based Syst. 2023, 263, 110273. [Google Scholar] [CrossRef]
- Sarlette, A.; Sepulchre, R. Consensus optimization on manifolds. SIAM J. Contr. Optim. 2009, 48, 56–76. [Google Scholar] [CrossRef]
- Zhang, H.; Kou, G.; Peng, Y. Soft consensus cost models for group decision making and economic interpretations. Eur. J. Oper. Res. 2019, 277, 964–980. [Google Scholar] [CrossRef]
- Bajusz, D.; Racz, A.; Heberger, K. Comparison of data fusion methods as consensus scores for ensemble docking. Molecules 2019, 24, 2690. [Google Scholar] [CrossRef] [PubMed]
- Burgos-Mellado, C.; Llanos, J.J.; Cárdenas, R.; Saez, D.; Olivares, D.E.; Sumner, M.; Costabeber, A. Distributed control strategy based on a consensus algorithm and on the conservative power theory for imbalance and harmonic sharing in 4-wire microgrids. IEEE Trans. Smart Grid 2019, 11, 1604–1619. [Google Scholar] [CrossRef]
- Ayad, H.G.; Kamel, M.S. Cumulative voting consensus method for partitions with variable number of clusters. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 30, 160–173. [Google Scholar] [CrossRef]
- Ayad, H.G.; Kamel, M.S. On voting-based consensus of cluster ensembles. Pattern Recogn. 2010, 43, 1943–1953. [Google Scholar] [CrossRef]
- Fischman, J.B. Estimating preferences of circuit judges: A model of consensus voting. J. Law Econ. 2011, 54, 781–809. [Google Scholar] [CrossRef]
- Lekadir, K.; Frangi, A.F.; Porras, A.R.; Glocker, B.; Cintas, C.; Langlotz, C.P.; Weicken, E.; Asselbergs, F.W.; Prior, F.; Collins, G.S.; et al. FUTURE-AI: International consensus guideline for trustworthy and deployable artificial intelligence in healthcare. BMJ 2025, 388, e081554. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Altmann, A.; Toloşi, L.; Sander, O.; Lengauer, T. Permutation importance: A corrected feature importance measure. Bioinformatics 2010, 26, 1340–1347. [Google Scholar] [CrossRef]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. Why Should I Trust You? Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Strumbelj, E.; Kononenko, I. Explaining prediction models and individual predictions with feature contributions. Knowl. Inform. Syst. 2014, 41, 647–665. [Google Scholar] [CrossRef]
- Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 3319–3328. [Google Scholar]
- Mothilal, R.K.; Sharma, A.; Tan, C. Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona, Spain, 27–30 January 2020; pp. 607–617. [Google Scholar]
- Zamani, M.G.; Nikoo, M.R.; Niknazar, F.; Al-Rawas, G.; Al-Wardy, M.; Gandomi, A.H. A multi-model data fusion methodology for reservoir water quality based on machine learning algorithms and bayesian maximum entropy. J. Clean. Prod. 2023, 416, 137885. [Google Scholar] [CrossRef]
- Röcken, S.; Zavadlav, J. Accurate machine learning force fields via experimental and simulation data fusion. npj Comput. Mater. 2024, 10, 69. [Google Scholar] [CrossRef]
- Steyaert, S.; Pizurica, M.; Nagaraj, D.; Khandelwal, P.; Hernandez-Boussard, T.; Gentles, A.J.; Gevaert, O. Multimodal data fusion for cancer biomarker discovery with deep learning. Nat. Mach. Intell. 2023, 5, 351–362. [Google Scholar] [CrossRef]
- Singh, A.; Gaurav, K. Deep learning and data fusion to estimate surface soil moisture from multi-sensor satellite images. Sci. Rep. 2023, 13, 2251. [Google Scholar] [CrossRef]
- Banegas-Luna, A.; Pérez-Sánchez, H. SIBILA: Automated Machine-Learning-Based Development of Interpretable Machine-Learning Models on High-Performance Computing Platforms. AI 2024, 5, 2353–2374. [Google Scholar] [CrossRef]
- Fernandes, K.; Cardoso, J.; Fernandes, J. Transfer learning with partial observability applied to cervical cancer screening. In Iberian Conference on Pattern Recognition and Image Analysis; Springer International Publishing: Cham, Switzerland, 2017; pp. 243–250. [Google Scholar] [CrossRef]
- Aeberhard, S.; Forina, M. Comparative analysis of statistical pattern recognition methods in high dimensional settings. Pattern Recogn. 1994, 27, 1065–1077. [Google Scholar] [CrossRef]
- Fanaee-T, H. Event labeling combining ensemble detectors and background knowledge. Lect. Notes Artif. Int. 2014, 2, 113–127. [Google Scholar] [CrossRef]











| Synthetic Datasets | Type | Number of Samples | Number of Features | Expected Explanation 1 |
|---|---|---|---|---|
| Dataset 1 | Binary | 2000 | 20 | F2, F3, F9, F17 |
| Dataset 2 | Binary | 1500 | 75 | F5, F25, F55 |
| Dataset 3 | Regression | 2500 | 60 | F1, F56, F58, F60 |
| Dataset 4 | Regression | 2000 | 30 | F19, F21, F24, F26 |
| Dataset 5 | Multiclass | 1000 | 10 | F3, F4, F7, F10 |
| Dataset 6 | Multiclass | 2500 | 30 | F12, F16, F22, F27 |
| Dataset | Formula |
|---|---|
| Dataset 1 | if then 0 else 1 |
| Dataset 2 | if then 0 else 1 |
| Dataset 3 | |
| Dataset 4 | |
| Dataset 5 1 | if then 0 elsif then 1 elsif then 2 |
| Dataset 6 2 | if then 0 elsif then 1 elsif then 2 elsif then 3 elsif then 4 elsif then 5 |
| Method | Mean Hit Rate | 95% CI | Median | IQR | p-Value vs. WISCA | Holm-Adjusted p |
|---|---|---|---|---|---|---|
| WISCA | 0.991 | [0.981, 1.000] | 1.000 | 0.000 | - | - |
| Arithmetic | 0.954 | [0.919, 0.989] | 1.000 | 0.023 | 0.005 | 0.009 |
| Voting | 0.956 | [0.908, 1.000] | 1.000 | 0.000 | 0.053 | 0.053 |
| Ranking | 0.674 | [0.564, 0.783] | 0.716 | 0.480 | <0.001 | <0.001 |
| Geometric | 0.408 | [0.290, 0.527] | 0.556 | 0.695 | <0.001 | <0.001 |
| Harmonic | 0.218 | [0.122, 0.315] | 0.067 | 0.157 | <0.001 | <0.001 |
| Dataset | LR | WISCA |
|---|---|---|
| Dataset 1 | 1.00 | 1.00 |
| Dataset 2 | 1.00 | 1.00 |
| Dataset 3 | 1.00 | 1.00 |
| Dataset 4 | 1.00 | 1.00 |
| Dataset 5 | 1.00 | 1.00 |
| Dataset 6 | 0.91 | 0.98 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Banegas-Luna, A.J.; Pérez-Sánchez, H.; Martínez-Cortés, C. WISCA: A Consensus-Based Approach to Harmonizing Interpretability in Tabular Datasets. Mach. Learn. Knowl. Extr. 2026, 8, 97. https://doi.org/10.3390/make8040097
Banegas-Luna AJ, Pérez-Sánchez H, Martínez-Cortés C. WISCA: A Consensus-Based Approach to Harmonizing Interpretability in Tabular Datasets. Machine Learning and Knowledge Extraction. 2026; 8(4):97. https://doi.org/10.3390/make8040097
Chicago/Turabian StyleBanegas-Luna, Antonio Jesús, Horacio Pérez-Sánchez, and Carlos Martínez-Cortés. 2026. "WISCA: A Consensus-Based Approach to Harmonizing Interpretability in Tabular Datasets" Machine Learning and Knowledge Extraction 8, no. 4: 97. https://doi.org/10.3390/make8040097
APA StyleBanegas-Luna, A. J., Pérez-Sánchez, H., & Martínez-Cortés, C. (2026). WISCA: A Consensus-Based Approach to Harmonizing Interpretability in Tabular Datasets. Machine Learning and Knowledge Extraction, 8(4), 97. https://doi.org/10.3390/make8040097

