Next Article in Journal
Assessing the Link Between Corporate Sustainability Practices and Financial Performance in Boursa Kuwait
Previous Article in Journal
From Algorithm to Empathy: Advancing CSR Authenticity Through AI
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Interpretable Artificial Intelligence Empowering Economic Diversification in Smart Manufacturing †

by
Meshari A. Al-Ebrahim
1,*,
Sami Asaad
2,
Mostafa Abdrabboh
3,
Abdalrahman Alajmi
4 and
Amro A. Nour
5
1
Mechanical Engineering Department, State Audit Bureau (SAB), Kuwait City 71661, Kuwait
2
Electrical & Electronics Engineering Department, College of Engineering, Australian University, Kuwait City 13015, Kuwait
3
College of Engineering & Energy, Abdullah Al Salem University, Kuwait City 72100, Kuwait
4
Department of Mechanical & Aerospace Engineering, College of Engineering, University of Strathclyde, Glasgow G1 1XQ, UK
5
Electrical and Computer Engineering, College of Engineering and Applied Science, American University of Kuwait, Kuwait City 13034, Kuwait
*
Author to whom correspondence should be addressed.
Presented at the International Conference on Digital Transformation, Sustainability and AI, Kuwait, 4–5 February 2026.
Proceedings 2026, 142(1), 4; https://doi.org/10.3390/proceedings2026142004
Published: 4 June 2026

Abstract

Economic diversification across the Gulf region, including Kuwait Vision 2035, positions smart manufacturing as a key enabler of sustainable growth. Yet, industrial datasets in the region are typically small, heterogeneous, and incomplete, limiting the performance and trustworthiness of conventional AI models. This paper introduces a Scalable Random Forest (SRF) framework enhanced with a Decision Path Search (DPS) mechanism to address these challenges through both technical robustness and practical interpretability. The SRF pipeline incorporates leakage-safe preprocessing, mixed-type imputation, and small-data augmentation to improve prediction stability under real industrial constraints, while DPS transforms model internals into actionable operational causal knowledge identifying optimal and avoidance parameter ranges. Case studies, including investment casting, demonstrate that SRF + DPS not only outperforms established baselines such as Random Forest (RF), XGBoost, LightGBM, and CatBoost but also deliver transparent insights that engineers can directly apply to reduce defects and enhance process control. The findings highlight how interpretable AI frameworks can accelerate industrial modernization, strengthen regional manufacturing competitiveness, and support national economic diversification strategies.

1. Introduction

Achieving sustainable economic diversification is a central priority across the Gulf region, reflected in strategic frameworks such as Kuwait Vision 2035, which emphasize industrial modernization, digital transformation, and the adoption of advanced manufacturing technologies [1]. Broader regional assessments further highlight digital transformation as a key enabler of long-term diversification and sustainable growth in GCC economies [2,3]. Smart manufacturing (such as integrating automation, data-driven optimization, and artificial intelligence (AI)) is increasingly recognized as a critical enabler of competitiveness and value creation [4]. However, unlocking its potential in the Gulf remains challenging due to structural characteristics of regional industries, including limited historical data, heterogeneous process conditions, and restricted digital maturity [3]. As a result, many organizations struggle to deploy AI solutions that are robust, interpretable, and suitable for real operational environments.
Manufacturing processes are inherently complex and often governed by nonlinear relationships, multivariate interactions, and context-dependent dependencies that cannot be captured adequately by linear models or traditional statistical tools [4]. These challenges are further intensified in the Gulf manufacturing context, where datasets are typically small due to low production volumes, short data-logging histories, or fragmented data collection practices [2]. Missing values, mixed-type variables, and inconsistent measurement frequencies are common, undermining the effectiveness of conventional machine-learning (ML) approaches that typically require large, clean, and homogeneous datasets [5]. Without appropriate treatment, these limitations can lead to biased predictions, overfitting, and unreliable outcomes that directly impact quality control, defect reduction, and operational stability [6].
Although significant progress has been made in applying ML to industrial systems, much of the literature focuses on large-scale datasets or assumes high levels of data completeness and sensor coverage—conditions that do not represent the reality of many Gulf industries [4]. Furthermore, widely used black-box models such as gradient boosting, deep learning, or ensemble tree methods often provide limited transparency [7], making it difficult for engineers and decision-makers to interpret model behavior, validate predictions, or translate outputs into actionable process adjustments. This lack of interpretability is a critical barrier to industrial adoption, as practitioners require clear, reliable, and explainable recommendations before integrating AI-driven decision-support tools into production workflows [8].
Interpretable AI has emerged as a promising direction for addressing these concerns [6,9], offering methodologies that enhance transparency without sacrificing predictive performance. Approaches such as SHAP [10], LIME [11], and attention-based mechanisms contribute valuable explanatory insights, yet they often operate post hoc and do not provide structured guidance for adjusting process parameters within safe or optimal ranges. For manufacturing practitioners seeking to optimize complex, small-data systems, what is required is not merely an explanation of why a prediction was made, but a clear indication of how to adjust controllable factors to reduce defects, stabilize variability, and improve efficiency [6].
This paper addresses these gaps by introducing a Scalable Random Forest (SRF) framework integrated with a Decision Path Search (DPS) mechanism designed specifically for small and heterogeneous industrial datasets [12]. The SRF framework incorporates a leakage-safe preprocessing pipeline that manages mixed-type imputation, missing values, and small-data augmentation using missForest [5] and Synthetic Minority Oversampling Technique (SMOTE) [13], ensuring statistical validity and minimizing overfitting risks. Building on this foundation, DPS extracts and aggregates decision paths from the ensemble to derive interpretable “causal knowledge” that specify both optimal and avoidance ranges for individual process parameters [12]. Unlike conventional interpretability tools, DPS leverages the internal logic of the model to generate practitioner-ready insights that directly support process optimization.
Case studies, including investment casting and additional industrial datasets, demonstrate the framework’s ability to improve prediction accuracy relative to widely adopted models such as Random Forest (RF) [14], XGBoost [15], LightGBM [16], and CatBoost [17]. More importantly, the interpretable causal knowledge generated through DPS enable engineers to identify controllable factors that influence defect occurrence, understand nonlinear interactions, and make informed adjustments grounded in data-driven reasoning.
By integrating robust ML, rigorous preprocessing, and actionable interpretability, this paper contributes to advancing smart manufacturing capabilities within the Gulf region. The proposed framework aligns with national development goals by supporting improved industrial performance, reduced waste, enhanced product quality, and greater operational resilience. Overall, this work highlights how interpretable AI can serve as a catalyst for economic diversification, empowering regional industries to transition toward more efficient, knowledge-driven, and globally competitive manufacturing ecosystems.

2. Methodology

The proposed methodology integrates an SRF modeling pipeline with a DPS interpretability mechanism to address the challenges of small, heterogeneous, and nonlinear manufacturing datasets common in Gulf-region industries (as shown in Figure 1). The overall workflow consists of four key components: data preprocessing, leakage-safe imputation and augmentation, model development and optimization, and interpretable decision-path extraction. Each stage is designed to preserve statistical integrity while generating reliable and actionable insights for process optimization.

2.1. Data Preprocessing and Validation Framework

Industrial manufacturing datasets frequently contain missing entries, mixed numerical and categorical variables, inconsistent sampling, and imbalance in defect classes. To address these issues, the methodology adopts a structured validation framework that prevents information leakage between training and testing splits, ensuring realistic performance estimates. All preprocessing operations (including scaling, imputation, encoding, and augmentation) are applied strictly within cross-validation folds. This design reflects real-world deployment conditions, where future data encountered by the model will not benefit from knowledge of unseen values present during training.

2.2. Mixed-Type Imputation and Small-Data Augmentation

A robust imputation strategy is essential for handling incomplete manufacturing data without discarding valuable information. The SRF pipeline employs missForest, a nonparametric tree-based imputation technique capable of handling nonlinear dependencies between mixed-type variables. Its iterative structure refines imputation quality while maintaining compatibility with the underlying SRF model.
Although many Gulf-region manufacturing processes generate relatively small datasets, the methodology incorporates small-data augmentation to improve generalization. SMOTE is used to rebalance datasets affected by class imbalance, especially in defect-prediction tasks. It generates synthetic samples by interpolating minority-class observations, reducing bias and improving the model’s ability to recognize rare defect patterns. The combination of missForest and SMOTE provides a statistically consistent and leakage-safe approach to preparing industrial data for ML analysis.

2.3. Scalable Random Forest Development and Optimization

The SRF model extends the conventional RF framework through enhanced scalability, automatic hyperparameter tuning, and built-in diagnostics to monitor overfitting and learning behavior. Hyperparameter spaces (including tree depth, number of estimators, feature selection strategies, and split criteria) are optimized through cross-validation, ensuring reliable performance estimation. In addition, the SRF integrates learning-curve analysis to assess the marginal benefit of additional data and to detect instability arising from small sample sizes.
A central motivation for selecting RF-based architecture is its compatibility with heterogeneous, nonlinear, and noisy industrial datasets. Ensemble-based tree models naturally support mixed data types, capture complex interactions, and resist overfitting when properly tuned. Furthermore, tree structures offer inherent transparency that can be leveraged for interpretability, making them well suited for operational environments where explainability is essential for adoption.

2.4. Decision Path Search for Interpretable Causal Knowledge

While RF models provide high predictive accuracy, their ensemble nature complicates direct interpretation. To overcome this limitation, the methodology incorporates a DPS mechanism designed to extract interpretable knowledge from the internal decision paths of the forest. DPS identifies frequently traversed decision routes that lead to desirable or undesirable outcomes, such as low-defect or high-defect predictions.
These decision paths are aggregated to produce causal knowledge—parameter ranges associated with optimal process performance—and avoidance zones, which indicate conditions likely to generate defects or instability. This transforms the opaque ensemble structure into actionable insights that manufacturing practitioners can apply directly on the factory floor. Unlike post hoc interpretability methods such as SHAP or LIME, DPS derives insights from the internal logic of the model, preserving fidelity and ensuring that recommendations reflect true model behavior.

2.5. Evaluation Strategy and Baseline Comparisons

The SRF + DPS framework is evaluated using real industrial datasets, including an investment casting case study that exemplifies nonlinear process behavior and small-data challenges. Model performance is assessed using metrics such as accuracy, F1-score, and cross-validation error. Baseline comparisons are conducted against widely used ML models. These comparisons highlight the improvements achieved through leakage-safe preprocessing, optimized ensemble structure, and interpretability enhancements.
In addition to quantitative performance, qualitative evaluation of DPS-generated causal knowledge is performed with domain experts to validate the practical relevance of the extracted insights. This dual evaluation approach (predictive and interpretive) ensures both technical rigor and operational usefulness.

3. Results

The evaluation of the SRF + DPS framework demonstrates notable improvements in predictive accuracy, robustness, and interpretability compared with widely used ML models such as standard RF, XGBoost, and CatBoost. Across all industrial datasets examined, including investment casting, cooling-system components, and additional mixed-type manufacturing processes, SRF consistently achieved superior performance under leakage-safe cross-validation. These gains were most prominent in small or imbalanced datasets, where conventional models often suffered from instability or overfitting. Improvements in accuracy and F1-score typically ranged between 5% and 12% relative to baselines, with SRF exhibiting smoother learning curves and smaller variance across folds. This indicates greater reliability when deployed in operational environments where data are limited or noisy.
Beyond predictive performance, one of the most significant outcomes of this paper is the enhanced interpretability provided by the DPS mechanism. DPS successfully extracted meaningful decision paths that translated into causal knowledge (parameter ranges associated with low-defect predictions) and avoidance zones linked to high-defect outcomes. For example, in the investment casting case study, DPS revealed nonlinear interactions between mold temperature, pouring speed, and alloy composition that were not evident through conventional analysis. The extracted causal knowledge provided engineers with actionable insights that could be directly applied to adjust process parameters, thereby reducing defect rates and improving process stability.
The interpretability outcomes were validated through discussions with domain experts, who confirmed that the identified causal knowledge aligned with empirical observations and, in some cases, offered new insights not previously recognized due to the complexity of variable interactions. The actionable nature of DPS output distinguishes the proposed approach from other explainability tools such as SHAP or LIME, which provide useful feature-importance explanations but do not specify operational parameter ranges. By contrast, DPS bridges the gap between model explainability and practical decision-making, offering engineers clear guidance on how to adjust controllable factors.
Industrial relevance was further demonstrated by the framework’s ability to support capability improvement, defect mitigation, and cost reduction. In one observed case, applying DPS-derived causal knowledge led to an estimated 8–15% reduction in defect incidence, depending on the defect category. These improvements translate directly into reduced material waste, lower rework rates, and enhanced production efficiency—outcomes that align with manufacturing modernization goals across the Gulf region. Additionally, the framework’s compatibility with heterogeneous and incomplete datasets makes it particularly suitable for regional industries still progressing toward full digital maturity.
Overall, the results highlight the dual strength of the SRF + DPS framework: it not only outperforms established predictive models but also provides interpretable and actionable insights that empower practitioners to optimize complex manufacturing processes. These capabilities demonstrate the framework’s potential to support smart manufacturing initiatives and contribute meaningfully to national strategies aimed at economic diversification and industrial transformation.

4. Conclusions

This paper presented an interpretable and scalable AI framework designed to address the specific challenges of manufacturing systems characterized by nonlinear interactions, mixed data types, and limited sample sizes—conditions commonly observed in industries across the Gulf region. By integrating the SRF model with the DPS interpretability mechanism, the methodology overcomes key shortcomings of existing ML approaches, which often depend on large, clean datasets and provide limited transparency into model behavior. The results demonstrate that SRF + DPS offers a robust and explainable solution that advances both the scientific understanding of data-driven manufacturing optimization and its practical applicability within real industrial environments.
From a theoretical perspective, the proposed framework contributes to the expanding field of interpretable AI by moving beyond post hoc explanation techniques toward a model-integrated approach that extracts structural insights directly from decision paths. The leakage-safe pipeline (including mixed-type imputation, data augmentation, and cross-validation) ensures that findings remain statistically sound and resistant to common pitfalls such as overfitting or inflated performance estimates. By coupling rigorous preprocessing with ensemble-based modeling and structured interpretability, the framework provides a transparent, generalizable, and repeatable methodology suitable for nonlinear industrial problems where traditional tools may fail.
Practically, the SRF + DPS combination delivers insights that manufacturing practitioners can implement directly. The causal knowledge and avoidance zones derived from DPS offer clear, parameter-specific recommendations that enable more informed process adjustments. Unlike standard feature-importance metrics, which provide useful but abstract indications of influential variables, DPS highlights specific ranges where parameters should be maintained or avoided to minimize defects and enhance product quality. This form of interpretability was validated through discussions with domain engineers, who confirmed both the credibility of the extracted insights and their potential to support day-to-day decision-making. Such actionable outputs represent a significant advancement in bridging the gap between complex AI models and real-world engineering practice.
The industrial impact demonstrated through case studies (such as investment casting) highlights the framework’s potential to enhance capability indices, reduce variability, and decrease defect rates. In observed scenarios, DPS-driven adjustments achieved reductions in defect incidence of 8–15%, leading to measurable cost savings through lowered rework, scrap, and material consumption. These improvements illustrate how interpretable AI, when properly designed for small-data environments, can complement existing quality systems and accelerate the adoption of data-driven process control. The scalability of the approach further ensures that it can be extended across diverse industrial sectors where data limitations and complex process behaviors are prevalent.
At the regional level, this paper aligns closely with national development priorities such as Kuwait Vision 2035 and broader Gulf strategies focused on economic diversification and industrial transformation. Many manufacturing enterprises in the region remain in early stages of digitalization, with limited historical datasets and heterogeneous operational structures. The proposed methodology is intentionally designed to operate effectively under these conditions, enabling industries to harness AI capabilities without requiring highly instrumented, fully automated environments. By providing interpretable and trustworthy insights, the framework supports the cultural and organizational shift necessary for widespread AI adoption, fostering greater confidence among engineers, managers, and policymakers.
Furthermore, the paper contributes to the broader economic agenda by offering a pathway to enhance local manufacturing competitiveness, reduce dependency on imported technologies, and promote knowledge-driven production systems. As Gulf countries continue to expand renewable energy, logistics, petrochemicals, and advanced manufacturing initiatives, the need for data-driven optimization and transparent AI systems will increase. The SRF + DPS framework provides a foundation for such advancements, demonstrating how scientifically rigorous and practically interpretable AI can underpin smarter, more resilient industrial ecosystems.
In conclusion, the proposed interpretable AI framework offers a comprehensive solution that integrates technical robustness, practical relevance, and strategic regional value. It advances the scientific discourse on interpretable ML, delivers actionable insights for industrial practitioners, and supports national objectives toward sustainable economic diversification. Future work may explore domain adaptation strategies, integration with digital-twin environments, and extensions to time-series or sensor-based datasets, further strengthening the role of interpretable AI in shaping the next generation of smart manufacturing in the Gulf region.

Author Contributions

Conceptualization, M.A.A.-E. and S.A.; methodology, M.A.A.-E. and S.A.; software, M.A.A.-E.; validation, M.A.A.-E., M.A. and A.A.; formal analysis, M.A. and A.A.; investigation, S.A. and A.A.N.; resources, M.A. and A.A.N.; data curation, M.A.A.-E. and A.A.; writing—original draft preparation, S.A. and A.A.; writing—review and editing, M.A. and A.A.N.; visualization, S.A. and A.A.N.; supervision, M.A.A.-E. and A.A.; project administration, M.A.A.-E. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request. The data is not publicly available due to research and institutional considerations.

Acknowledgments

During the preparation of this manuscript, the authors used Gemini 3.1 to support the creation of Figure 1 and to improve language clarity, grammar, and sentence flow. The authors carefully reviewed and edited all AI-assisted outputs and take full responsibility for the accuracy, originality, and integrity of the content presented in this manuscript. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest. State Audit Bureau has no commercial conflict of interest with this work.

References

  1. General Secretariat of the Supreme Council for Planning and Development (SCPD). New Kuwait Vision 2035: National Development Plan; SCPD: Kuwait City, Kuwait, 2018. Available online: https://media.gov.kw/assets/img/Ommah22_Awareness/PDF/NewKuwait/Revised%20KNDP%20-%20EN.pdf (accessed on 13 January 2026).
  2. World Bank. The Gulf’s Digital Transformation: A Powerful Engine for Economic Diversification (Gulf Economic Update, Chapter 3). Available online: https://documents1.worldbank.org/curated/en/099703212012571773/pdf/IDU-25e71676-54fb-43fd-b763-66a5b6b568fa.pdf (accessed on 13 January 2026).
  3. Asmyatullin, R.R.; Glavina, S.G. Digitalization as a driver for sustainable development in the GCC economies. Unconv. Resour. 2025, 8, 100231. [Google Scholar] [CrossRef]
  4. Mudgal, P. A Data-Centric Framework for Implementing Artificial Intelligence in Smart Manufacturing. Electronics 2025, 14, 3304. [Google Scholar] [CrossRef]
  5. Stekhoven, D.J.; Bühlmann, P. missForest: Non-parametric missing value imputation for mixed-type data. Bioinformatics 2012, 28, 112–118. [Google Scholar] [CrossRef] [PubMed]
  6. Puthanveettil Madathil, A.; Luo, X.; Liu, Q.; Walker, C.; Madarkar, R.; Qin, Y. A review of explainable artificial intelligence in smart manufacturing. Int. J. Prod. Res. 2025, 63, 8654–8697. [Google Scholar] [CrossRef]
  7. Rudin, C. Stop explaining black box machine learning models for high-stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef] [PubMed]
  8. Gunning, D.; Aha, D.W. DARPA’s Explainable Artificial Intelligence (XAI) Program. AI Mag. 2019, 40, 44–58. [Google Scholar] [CrossRef]
  9. Molnar, C. Interpretable Machine Learning. A Guide for Making Black Box Models Explainable. 2022. Available online: https://originalstatic.aminer.cn/misc/pdf/Molnar-interpretable-machine-learning_compressed.pdf (accessed on 13 January 2026).
  10. Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing System (NIPS 2017), Long Beach, CA, USA, 5–7 December 2017. [Google Scholar]
  11. Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar] [CrossRef]
  12. Al-Ebrahim, M.; Ransing, R.S. A scalable random forest (SRF) approach for non-linear predictive modelling using small manufacturing datasets. J. Intell. Manuf. 2026, 1–47. [Google Scholar] [CrossRef]
  13. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
  14. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  15. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16), San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
  16. Szczepanek, R. Daily Streamflow Forecasting in Mountainous Catchment Using XGBoost, LightGBM and CatBoost. Hydrology 2022, 9, 226. [Google Scholar] [CrossRef]
  17. Hancock, J.T.; Khoshgoftaar, T.M. CatBoost for Big Data: An Interdisciplinary Review. J. Big Data 2020, 7, 94. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Integrated SRF + DPS framework for leakage-safe predictive modelling and operating windows discovery in small, mixed-type manufacturing datasets. The pipeline illustrates fold-contained preprocessing (missForest imputation and SMOTE augmentation), SRF training with stability diagnostics, and DPS aggregation of tree paths into optimal and avoidance parameter ranges.
Figure 1. Integrated SRF + DPS framework for leakage-safe predictive modelling and operating windows discovery in small, mixed-type manufacturing datasets. The pipeline illustrates fold-contained preprocessing (missForest imputation and SMOTE augmentation), SRF training with stability diagnostics, and DPS aggregation of tree paths into optimal and avoidance parameter ranges.
Proceedings 142 00004 g001
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Al-Ebrahim, M.A.; Asaad, S.; Abdrabboh, M.; Alajmi, A.; Nour, A.A. Interpretable Artificial Intelligence Empowering Economic Diversification in Smart Manufacturing. Proceedings 2026, 142, 4. https://doi.org/10.3390/proceedings2026142004

AMA Style

Al-Ebrahim MA, Asaad S, Abdrabboh M, Alajmi A, Nour AA. Interpretable Artificial Intelligence Empowering Economic Diversification in Smart Manufacturing. Proceedings. 2026; 142(1):4. https://doi.org/10.3390/proceedings2026142004

Chicago/Turabian Style

Al-Ebrahim, Meshari A., Sami Asaad, Mostafa Abdrabboh, Abdalrahman Alajmi, and Amro A. Nour. 2026. "Interpretable Artificial Intelligence Empowering Economic Diversification in Smart Manufacturing" Proceedings 142, no. 1: 4. https://doi.org/10.3390/proceedings2026142004

APA Style

Al-Ebrahim, M. A., Asaad, S., Abdrabboh, M., Alajmi, A., & Nour, A. A. (2026). Interpretable Artificial Intelligence Empowering Economic Diversification in Smart Manufacturing. Proceedings, 142(1), 4. https://doi.org/10.3390/proceedings2026142004

Article Metrics

Back to TopTop