1. Introduction
Recommender systems (RS) have become integral to modern digital platforms, guiding user discovery across e-commerce, media streaming, and social networks [
1,
2,
3]. The evolution of these systems has been marked by a drive for greater predictive accuracy, progressing from traditional techniques to complex deep learning architectures like Neural Collaborative Filtering (NCF) and attention-based networks [
1,
3,
4]. These advanced models excel at capturing intricate user-item interactions to improve recommendation relevance [
3]. However, this enhanced predictive power frequently comes at the cost of transparency, as their internal decision-making processes often operate as opaque “black boxes,” inaccessible to users, developers, and auditors alike [
5,
6]. This opacity introduces significant concerns regarding trust, accountability, and fairness, particularly as recommendations increasingly influence critical business outcomes and user behaviours [
7,
8].
The need for transparency is especially acute in high-stakes domains where biassed or erroneous recommendations can have severe consequences [
9,
10]. In e-commerce, for instance, opaque models could perpetuate systemic biases, unfairly limit the visibility of certain products or promoting harmful items [
7,
8]. In finance, an unauditable system recommending investment products could expose users to unsuitable risks or violate regulatory requirements for fairness [
9,
10,
11]. Similarly, a black-box system suggesting medical content in healthcare could provide misleading advice without a clear, auditable rationale [
5,
12]. Although Explainable Artificial Intelligence (XAI) methods like LIME [
13] and SHAP [
14,
15] provide post hoc feature attributions for model predictions. However, research indicates these surrogate-based explanations can be unstable and potentially unfaithful to the true model logic, limiting their reliability in policy-sensitive contexts [
16,
17,
18]. This limitation has motivated a shift toward inherently interpretable, or “glass-box,” models, where transparency is an intrinsic property of the model’s architecture, rather than a post hoc reconstruction [
5,
15].
The motivation behind this work is to develop a recommender system framework that is inherently transparent and auditable by design, without sacrificing the high predictive performance expected of modern systems [
5]. To achieve this, the proposed approach combines three core components: an interpretable Machine Learning (ML) tree ensemble models (e.g., Random Forest (RF) classifier or XGBoost) [
19], human-readable temporal engineered features and an NLP sub-model that learns tag sentiment from user quality signals [
20,
21,
22]. Beyond technical interpretability, the framework aims to align with established paradigms of Trustworthy AI and algorithmic accountability. It does so by employing policy-aligned objectives, providing exact, audit-ready explanations, incorporating fairness-aware training, and conducting sensitivity analysis. By embedding governance constraints directly into the forecasting objective rather than applying them as post hoc element, this methodology aligns with the principles of ‘Safe-by-Design’ AI, ensuring that safety is an intrinsic property of the system architecture [
23].
The reason for adding this multi-layered governance Audit workflow, is that although tree-based models such as Random Forests and gradient-boosted ensembles are often considered more interpretable than deep neural networks [
24], their interpretability is nuanced. Individual decision trees are inherently transparent because their structure—comprising hierarchical splits on human-understandable features—can be visualised and traced from root to leaf, enabling a clear explanation of a single prediction [
19]. However, when hundreds or thousands of trees are aggregated into an ensemble, the global logic becomes substantially more complex, making full comprehension by a human auditor challenging [
19,
24]. In a strict regulatory or legal sense, such as compliance with the “right to explanation” under the EU GDPR [
9,
10] or similar frameworks, Random Forests may not qualify as fully interpretable because their collective decision process cannot be easily summarised without approximation [
5,
11]. Nevertheless, these models remain more amenable to human inspection than deep learning architectures because each constituent tree is human-readable, and global interpretability can be supported through structured visualisation and additive explanation techniques such as TreeSHAP [
15]. Therefore, while tree ensembles are not perfectly transparent in high-stakes contexts, they represent a pragmatic compromise between predictive performance and interpretability, particularly when combined with rigorous documentation and explanation protocols [
5,
15,
24].
This research is evaluated on the MovieLens dataset [
25] and validated via a cross-domain pilot on the Amazon Reviews dataset [
26]. Unlike conventional rule-mimicry approaches, the design enforces a rigorous temporal separation between observed features and future items [
5,
27]. This ensures the model forecasts the success of “emerging items”—candidates with low current visibility—rather than simply replicating current popularity thresholds [
6]. The model’s performance is benchmarked by evaluating tree ensemble models, Random Forest (RF) and Gradient Boosting (XGBoost), against a black-box Multi-Layer Perceptron (MLP) [
3,
4] and Logistic Regression [
28]. The results demonstrate superior discrimination capabilities for the glass-box architectures, with the Random Forest and XGBoost models achieving ROC-AUC scores of 0.92 and 0.91, respectively. These models notably outperformed the neural baseline (AUC 0.86) and Logistic Regression (AUC 0.89). A layer of explainability is delivered through exact attribution methods, primarily TreeSHAP [
15], complemented by partial dependence plots and decision-path visualisations, thereby enabling consistent local and global explanations for a robust audit trail [
13,
16].
We acknowledge that in real-world governance scenarios, algorithmic transparency must be complemented by data integrity mechanisms. To test the predictive transferability of the framework, we conducted a training pilot in a high-friction e-commerce environment using Amazon product data [
26]. While the full multi-layer audit was reserved for the primary case study, this pilot demonstrated that the underlying learning mechanism remains robust even in sparse, review-based domains. As the reliability of policy-aligned features (e.g., rating density, sentiment) depends on authentic user feedback, this glass-box approach framework is designed to operate downstream of robust opinion spam and fraud detection systems [
29,
30]. Explainability is delivered through exact attribution methods (TreeSHAP) [
15] and local surrogates (LIME) to create a verifiable audit trail [
13,
16]. Additionally, automated fairness audits and technical governance reports are integrated directly into the pipeline, aligning the system with the documentation requirements of algorithmic accountability [
31].
Despite significant advancements, modern recommender systems, particularly those based on deep learning, have become increasingly complex and opaque, making their internal logic inaccessible and difficult to audit [
3,
4]. In response, post hoc explanation techniques such as LIME and SHAP have been widely adopted to provide insights into these black-box models [
13,
14,
15]. However, a growing body of evidence indicates that such methods can produce unstable or unfaithful explanations that do not accurately reflect the model’s true decision-making process [
16,
17,
18]. This limitation renders them inadequate for high-stakes, regulation-sensitive domains like finance, healthcare, legal services, and critical infrastructure, where accountability and verifiable compliance are paramount [
9,
10,
12].
Employing inherently interpretable “glass-box” models, has historically been hindered by the perception of a mandatory trade-off between interpretability and predictive accuracy [
5]. Stakeholders have often been forced to choose between high-performing but opaque models and transparent but less accurate ones. This dilemma has left a critical research gap: there is a lack of end-to-end frameworks for building recommender systems that are transparent by design [
32], encode governance policies directly into their learning objective, and can provably match the performance of their black-box performance (see
Section 4.1). Consequently, there is a clear need for further research into the accuracy–interpretability trade-off, demonstrating that auditability and high performance can be co-designed as foundational principles rather than treated as competing objectives [
32].
Unlike post hoc explanation strategies applied to opaque models [
32], this work integrates policy-aligned objectives directly into the recommendation task, ensuring governance constraints persist from feature engineering through to explanation [
5]. While this approach shares conceptual similarities with scorecard methods and rule-distillation, its contribution lies in the end-to-end design of an auditable pipeline that combines interpretable tree-based ensembles with exact attribution techniques such as TreeSHAP [
15] stability and bias audit. It is important to clarify that tree ensembles, including Random Forests and gradient-boosted models, are not fully interpretable in a strict regulatory sense, as their aggregated decision logic cannot be easily summarised for compliance with frameworks like the EU GDPR “right to explanation” [
9,
11]. However, they remain more transparent than deep neural networks because each constituent tree is human-readable: its hierarchical structure can be visualised and traced from root to leaf, enabling case-level reasoning [
19,
24]. Global interpretability can be further supported through structured visualisation and additive explanation methods [
15], making these models a pragmatic compromise between predictive performance and auditability in high-stakes domains [
5,
24]. The study provides the following significant contributions:
Designing of a policy-forecasting Framework: A pipeline with glass-box approach that predicts future compliance with governance rules. We introduce a “Reality Check” to ensure the model learns from latent content and sentiment signals rather than simple threshold imitation [
5,
32]. This mechanism also mitigates cold-start challenges by leveraging engineered features and sentiment cues, enabling robust predictions for emerging items with limited historical data [
33].
Demonstration of superior performance: Empirical evidence comparing the proposed tree ensemble approaches—Random Forest and XGBoost—against Logistic Regression and Neural Networks (MLP). Results show the glass-box models achieve AUC scores of 0.92 and 0.91, respectively, effectively outperforming the black-box baseline (AUC 0.86) without the risk of overfitting observed in the neural network design [
34,
35].
A governance stack for auditing: Going beyond TreeSHAP explanations, the design aims to implement a multi-layered audit workflow including Fairness Audits [
8] (measuring disparate impact [
7] across genres), sensitivity analysis, and automated governance reporting templates to operationalise algorithmic account-ability [
16,
19].
Integration of an Interpretable NLP Feature: An NLP sub-model that learns tag sentiment using a supervisory signal extracted from user-behavioural data (ratings) [
20], providing a transparent, domain-specific feature that enhances both performance and interpretability [
21].
The remainder of this paper is structured as follows:
Section 2 provides a review of relevant literature on explainable and trustworthy recommender systems.
Section 3 outlines the methodology, including temporal data partitioning, feature engineering, and model training with governance audits.
Section 4 presents comparative performance results, sensitivity analysis, and interpretability evaluation using SHAP and LIME, along with cross-domain validation.
Section 5 includes a discussion of the findings, methodological contributions, and limitations. Finally,
Section 6 concludes the study by summarising its impact and proposing directions for future research.
2. Related Work
The literature on Explainable AI (XAI) for RS reveals a fragmented landscape that lacks a consensus on what constitutes interpretability or how it should be evaluated. Early analyses outlined core explanation challenges—distinguishing model explanation from outcome explanation and retrofitted inspection from inherently transparent approaches—while highlighting the absence of standardised evaluation protocols [
36]. This lack of rigorous standards complicates method comparison and hinders deployment in compliance-sensitive environments [
37]. These foundational gaps underscore a persistent need for approaches that are inherently interpretable and auditable against clear governance criteria. More recent work frames this challenge within the broader context of AI alignment, arguing that the societal scale of RS necessitates design-time controls for diversity, user agency, and auditability to align system objectives with democratic values [
38].
Early RS architectures achieved strong accuracy with MF, but the latent factors that drive predictions provide little semantic insight for users or auditors, which limits their suitability in governance-sensitive deployments [
1]. Subsequent deep learning approaches—such as NCF and attention-based architectures—extended predictive capacity and top-K ranking quality by modelling non-linear user–item interactions and multimodal context; however, the internal reasoning of these models typically remains opaque [
4,
39]. The resulting lack of transparency has direct implications for trust and accountability when recommendations influence behaviour and organisational decisions [
6].
Recent surveys have consolidated the role of deep learning in recommender systems, highlighting both its transformative potential and persistent limitations. Zhou et al. [
4] provide a comprehensive taxonomy of deep learning-based RS, covering content-based, sequential, cross-domain, and social recommendation paradigms. Their review underscores the superior predictive performance of neural architectures such as MLPs, CNNs, RNNs, and attention-based models, while acknowledging critical challenges in interpretability, fairness, and governance readiness. Importantly, the authors call for future research on explainability and compliance mechanisms, reinforcing the need for transparent-by-design alternatives in high-stakes domains. This aligns with our motivation to propose a glass-box framework that addresses these gaps through policy-aligned label engineering and exact attribution methods.
Post hoc explainability methods were introduced to mitigate opacity. Techniques like LIME and SHAP offer local feature attributions that help rationalise individual predictions and have seen widespread use in RS pipelines [
13,
40]. Nevertheless, empirical evidence shows that surrogate explanations and saliency-style rationales can be unstable and, at times, unfaithful to the true model logic, properties that are problematic in high-stakes and policy-sensitive settings [
17,
18]. This tension has motivated a turn toward inherently interpretable or “glass-box” models, which emphasises aligning the modelling structure with human reasoning so that explanation is intrinsic rather than retrofitted [
5].
Within this glass-box trajectory, tree-based learners paired with TreeSHAP have become a pragmatic option. TreeSHAP provides theoretically grounded, exact Shapley values for tree ensembles, enabling consistent local and global explanations while preserving competitive predictive performance [
15]. Still, its underlying assumptions, such as feature independence in its interventional formulation, must be surfaced in audits, as real-world RS features can be correlated [
16,
41]. Concurrently, research into other intrinsically transparent methods has advanced. Recent models have augmented collaborative filtering with genre-aware weighting and information entropy to improve accuracy under sparsity while retaining simple, explainable computations [
42]. Graph-based recommenders have been enhanced to compress side information into probabilistic latent classes that can be re-expanded for human-readable path justifications, achieving substantial training-time savings without sacrificing interpretability [
39].
Text-centric designs provide additional intrinsic explanation modalities. An attention-inspired, language-only recommender has been shown to yield signed, word-level contribution scores that directly support “why-this” narratives across multilingual datasets [
43]. Hybrid architectures that jointly predict ratings and generate natural-language reasons likewise demonstrate gains in both predictive error and explanation quality by aligning a contrastive graph encoder with a Transformer decoder [
44,
45,
46,
47]. While such systems enrich the expressiveness of explanations, they also underscore the importance of faithfulness checks to ensure that generated rationales reflect the actual scoring evidence [
18].
Beyond algorithmic transparency, the field is increasingly focused on aligning RS with broader objectives. A systematic review on value-aware RS classifies methods into post-processing re-ranking, in-objective value formulations, and reinforcement learning for long-term goals like profit or engagement, documenting tensions between offline accuracy proxies and online business impact [
48]. Similarly, recent surveys on fairness synthesise multi-stakeholder concerns and report persistent gaps between accuracy-centric development and equitable outcomes [
7,
8]. From an operational perspective, class imbalance is common in RS classification tasks; while remedies like SMOTE remain effective [
49], they are rarely integrated systematically within explainability frameworks, limiting the assessment of trade-offs [
50]. Finally, requirements-engineering research provides structured models to translate high-level ethics guidance into actionable, stakeholder-specific explanation needs (who requires what, when, and by whom), bridging principles and testable system behaviours [
32].
2.1. Auditability, Ethics, and Algorithmic Accountability
Recent scholarship emphasises that algorithmic accountability extends beyond technical interpretability to include mechanisms for auditing, oversight, and redress. Foundational work by Diakopoulos [
51] and Kroll et al. [
52] frames accountability as a socio-technical construct requiring transparency and contestability. Auditing practices for AI systems have been explored by Raji et al. [
53] and Madaio et al. [
54], who advocate for structured evaluation protocols and documentation artefacts such as Model Cards [
31] and Datasheets for Datasets [
55]. Risk management frameworks, including NIST AI RMF 1.0 [
56] and ISO/IEC 23894 [
57], provide operational guidance for embedding governance into system design. Existing scholarship highlights the need for integrating policy-aligned objectives with exact attribution methods to enable audit-ready decision trials in compliance-sensitive environments [
5]. While this study addresses foundational aspects of transparency and fairness, future research will focus on operationalising these mechanisms within a fully modular glass-box framework. To support this trajectory, a functional template for a Model Governance Card has been developed as part of this work, designed to document policy thresholds, performance metrics, fairness audits, and explainability artefacts. This template can be instantiated on any appropriate dataset to facilitate reproducibility and regulatory alignment [
31].
2.2. Data Integrity and Opinion Spam
A critical dimension of trustworthiness in recommender systems, particularly in governance-sensitive scenarios, is the mitigation of opinion spam and review manipulation. As noted by Jindal and Liu [
29], fake reviews and opinion spam distort the rating distributions and sentiment signals that policy-aligned models rely upon. While technical XAI focuses on verifying model logic, governance requires verifying the integrity of the input data itself. Mukherjee et al. [
30] demonstrate that interpretability is often essential for exposing anomalous reviewer behaviour patterns. Although this study focuses on model transparency, we acknowledge that in real-world deployment, glass-box architectures must be coupled with robust spam detection to prevent “dependence on data quality” vulnerabilities in the governance layer [
41,
55].
In summary, the field offers accuracy-driven baselines [
1,
4], influential post hoc explanation tools with known limitations [
17,
18], promising glass-box directions [
15,
16], and a growing body of work on intrinsic transparency. Yet, a specific gap remains for audit-ready, end-to-end designs that encode policy constraints from the outset.
In summary, the literature spans accuracy-driven baselines such as matrix factorisation and deep learning-based recommender systems [
1,
4], influential post hoc explanation techniques with recognised limitations [
17,
18], and promising glass-box approaches leveraging TreeSHAP for exact attribution [
15,
16], alongside growing research on intrinsic transparency, fairness, and value alignment. However, a clear gap persists for audit-ready, end-to-end frameworks that embed governance constraints from the outset. This work aims to address that gap by using a glass-box architecture that integrates policy constraints in the learning objective, employs tree-based ML models compatible with exact attribution methods, and incorporates transparent strategies for handling class imbalance.
A comparative summary of related approaches and the proposed framework is presented in
Table 1.
6. Conclusions
This study introduced a glass-box recommender architecture that prioritises both predictive performance and auditability, addressing a critical gap where accuracy often overshadows interpretability [
5,
36]. By leveraging interpretable temporal features [
27,
58], a tree ensemble backbone comprising Random Forest and XGBoost, the framework ensures audit-grade transparency, making every decision verifiable. Unlike black-box model that rely on unstable, post hoc proxies, the proposed approach emphasises faithfulness and operational utility without sacrificing accuracy [
17]. This commitment was validated through local-level audits, which revealed discrepancies between exact TreeSHAP attributions and approximate LIME explanations—underscoring the risk of unfaithful rationales from post hoc surrogate methods and reinforcing the necessity of glass-box designs for governance-sensitive deployments [
68,
69].
While glass-box architecture enhances auditability and fairness, they introduce computational overhead and scalability challenges compared to black-box models. The integration of TreeSHAP for exact attribution and multi-layered governance audits increases resource requirements, and real-time deployment may require latency profiling and optimisation strategies to maintain user experience [
3,
5,
24,
41,
59]. These trade-offs highlight the ethical and practical tension between transparency and efficiency, which future work will address through adaptive optimisation and approximate attribution methods.
A key finding of this work is the empirical demonstration that strong predictive performance can be achieved without sacrificing transparency. The glass-box architecture demonstrated superior discrimination, with the Random Forest achieving a ROC-AUC of 0.92 and XGBoost reaching 0.91. Both models notably outperformed the neural baseline, which attained an AUC of 0.86, as well as the linear benchmark. Furthermore, the error analysis revealed distinct operational roles: the Random Forest prioritised broader discovery with higher recall, while XGBoost minimised false positives, offering a high-precision alternative. (
Section 4.1). This result challenges the long-held assumption of a mandatory accuracy–interpretability trade-off in structured prediction tasks [
5,
36]. Cross-domain validation on the Amazon dataset [
26] further confirmed the framework’s robustness, identifying emerging hits with an AUC of 0.89. While this represents a slight performance attenuation attributed to the higher data sparsity of the review-based environment, it validates that the governance logic remains transferable across distinct domains.
Future research will focus on strengthening the framework against adversarial threats and expanding governance to align with evolving regulations. The results from the high-friction e-commerce pilot emphasise the need to integrate upstream opinion-spam detection to sanitise user-generated signals before they reach the governance layer, thereby mitigating “shilling” attacks and preserving signal authenticity [
29,
30]. To address allocative disparities observed between genres, future iterations will move beyond post hoc auditing to incorporate dynamic feedback loops and online learning for real-time adaptability [
41], while embedding multi-stakeholder fairness constraints directly into the optimisation objective [
8]. These enhancements will operationalise Trustworthy AI principles by ensuring that fairness, accountability, and adaptability are intrinsic properties of the system architecture rather than retrospective add-ons.