Hierarchical Aggregation of Local Explanations for Student Adaptability

Nnadi, Leonard Chukwualuka; Watanobe, Yutaka

doi:10.3390/app16010333

Open AccessArticle

Hierarchical Aggregation of Local Explanations for Student Adaptability

by

Leonard Chukwualuka Nnadi

^*

and

Yutaka Watanobe

Department of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu 965-8580, Fukushima, Japan

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(1), 333; https://doi.org/10.3390/app16010333

Submission received: 2 December 2025 / Revised: 20 December 2025 / Accepted: 26 December 2025 / Published: 29 December 2025

(This article belongs to the Topic Explainable AI in Education)

Download

Browse Figures

Versions Notes

Abstract

In this study, we present Hierarchical Local Interpretable Model-agnostic Explanations (H-LIME), an innovative extension of the LIME technique that provides interpretable machine learning insights across multiple levels of data hierarchy. While traditional local explanation methods focus on instance-level attributions, they often overlook systemic patterns embedded within educational structures. To address this limitation, H-LIME aggregates local explanations across hierarchical layers, Institution Type, Location, and Educational Level, thereby linking individual predictions to broader, policy-relevant trends. We evaluate H-LIME on a student adaptability dataset using a Random Forest model chosen for its superior explanation stability (approximately 4.5 times more stable than Decision Trees). The framework uncovers consistent global predictors of adaptability, such as education level and class duration, while revealing subgroup-specific factors, including network type and financial condition, whose influence varies across hierarchical contexts. This work demonstrates the effectiveness of H-LIME at uncovering multi-level patterns in educational data and its potential for supporting targeted interventions, strategic planning, and evidence-based decision-making. Beyond education, the hierarchical approach offers a scalable solution for enhancing interpretability in domains where structured data relationships are essential.

Keywords:

hierarchical interpretability; H-LIME; student adaptability; explainable AI; multilevel explanation; educational data; data hierarchies

1. Introduction

As education continues to evolve in response to new technologies and learning environments, the ability of students to adapt has become increasingly important [1,2,3,4]. Whether navigating digital classrooms, adjusting to varied instructional styles, or managing changing academic expectations, adaptability plays a vital role in student success. To support this need, adaptive learning systems have emerged as critical tools for personalizing education in dynamic settings [5,6,7,8].

Machine learning (ML) offers promising capabilities for identifying patterns in student behavior and predicting adaptability. These models can help educators proactively support students. However, their complexity often makes them difficult to interpret, particularly in education, where understanding the “why” behind a prediction is just as important as the prediction itself [9,10]. Educators and decision makers need transparency to make informed decisions and promote trust in automated systems [1,8,11,12].

Explainable Artificial Intelligence (XAI) methods aim to address this challenge by providing insight into how ML models make decisions. One widely used approach, Local Interpretable Model-agnostic Explanations (LIME), helps to explain individual predictions by approximating the model locally using simpler interpretable models [13,14,15]. While LIME is effective at the instance level, it falls short in capturing broader patterns that span groups or institutional contexts [14,16].

To overcome this limitation, this study introduces Hierarchical Local Interpretable Model-agnostic Explanations (H-LIME). This novel framework extends LIME by aggregating local explanations across multiple levels of a data hierarchy. With H-LIME, it becomes possible to understand how feature importance varies not only for individuals, but also across schools, locations, and education levels. The method supports:

Multi-level insightsthat go beyond the individual to reveal trends across institutions and regions;
Context-sensitive explanations aligned with the information needs of different stakeholders;
Actionable findings that can inform policy, resource allocation, and personalized interventions.

Using a real-world student adaptability dataset with institutional and demographic hierarchy, H-LIME reveals patterns that would remain hidden with instance-level methods alone. For example, while class duration may consistently affect students across contexts, the influence of financial background or internet connectivity may depend on specific subgroups. Our key contributions are summarized as follows:

Methodological Framework:We introduce H-LIME, a model-agnostic framework that aggregates local explanations across user-defined hierarchies (Institution→Location→Education Level). Crucially, we incorporate a new group-level stability metric ( $σ$ ) to detect feature polarization, distinguishing between universally irrelevant features and those with high intra-group variance.
Empirical Evaluation: We perform a rigorous comparative analysis of explanation stability, demonstrating that H-LIME explanations derived from Random Forest models are approximately 4.5 times more stable ( $σ \approx 0.0011$ ) than those from Decision Trees, thereby establishing a standard for trustworthy educational insights.
Domain-Specific Discovery: We uncover hierarchical dependencies in student adaptability, revealing that while Class Duration is a consistent global predictor, features like Financial Condition and Network Type exhibit significant variation across rural versus urban subgroups.
Practical Utility: We provide a practical deployment scenario illustrating how H-LIME’s multi-level insights can guide differentiated policy interventions, such as prioritizing digital infrastructure in rural regions versus curriculum adjustments in urban centers, that would be overlooked by standard global interpretation methods.

2. Related Works

Explainable Artificial Intelligence (XAI) addresses the growing need to make machine learning (ML) models more transparent, especially in high-stakes domains such as education, where trust and accountability are critical [9,17]. As models increase in complexity, understanding their internal logic becomes challenging. Techniques like LIME attempt to solve this by approximating complex models locally using interpretable surrogates [14].

2.1. Hierarchical Interpretability

Although LIME and SHAP provide useful explanations at the instance and global levels, they overlook hierarchical relationships often present in structured datasets, such as education systems. These methods fail to offer aggregated insights across levels like school type, location, or education stage [14,16]. Hierarchical interpretability bridges this gap by generating explanations at multiple levels of abstraction. Apley and Zhu’s Accumulated Local Effects (ALEs) [18] enable global analysis while mitigating feature correlation effects, but do not support localized or group-specific interpretations. This limitation is significant in education, where student adaptability is influenced by both individual and institutional factors. Prior work on data aggregation and fairness-aware modeling [19,20,21] has demonstrated the importance of summarizing data without losing key subgroup insights. Recent studies have further demonstrated the versatility of data-driven frameworks in higher education, ranging from sustainability benchmarking in Saudi Arabian institutions [22] and assessing online learning experiences [23] to analyzing the impact of demographic factors on organizational commitment [24].

H-LIME addresses this by extending LIME to support multilevel aggregation. It begins with instance-level explanations and systematically groups them by hierarchical attributes (institution type, location), enabling educators and policymakers to detect patterns that are not visible at the individual level. This hierarchical structure aligns with calls for interpretable ML frameworks that balance local detail with broader applicability [25,26].

Hierarchical Explainability in Other Domains

While hierarchical aggregation is novel in educational data mining, similar concepts have emerged in other high-stakes domains. For instance, Hierarchical Shapley values [27] have been utilized in high-dimensional biological data to attribute importance to correlated feature groups (gene clusters) rather than individual SNPs, thereby improving explanation stability. Similarly, in natural language processing, Hierarchical Attention Networks (HANs) [28] have been widely adopted to provide multi-level explanations, identifying which specific words, sentences, and document sections drive a classification decision. In the medical imaging domain, frameworks like MIMIC-EYE [29] layer clinical features (X-ray regions and ECG signals) to build interpretable clinical decision support systems. However, these methods typically rely on inherent feature hierarchies (pixel-to-object or word-to-sentence) or complex model-specific architectures. H-LIME distinguishes itself by offering a model-agnostic aggregation framework specifically designed for the user-defined, categorical nested structures (Institution→Location) that are prevalent in social and educational policy research.

2.2. XAI in Educational Applications

The integration of Machine Learning and Explainable AI (XAI) has gained traction across high-stakes domains beyond education. For instance, recent frameworks have successfully applied XAI to optimize renewable energy systems [29,30] and enhance clinical decision support through multi-modal deep learning [31], highlighting the growing demand for model transparency in complex decision-making environments. XAI in education has expanded as institutions adopt data-driven tools to support student learning and adaptation. Models are increasingly being applied to predict academic performance, assess adaptability, and guide personalized interventions. However, for these predictions to be useful, they must be interpretable by human stakeholders [32,33,34,35,36,37]. XAI methods such as LIME and SHAP have been widely used to analyze educational data. Beyond online learning and pandemic contexts, researchers have applied LIME to interpret models predicting student engagement, dropout risk, and learning outcomes across various settings [37,38,39,40,41]. These studies highlight important predictors like internet access, financial stability, and self-directed learning behavior.

Although LIME provides helpful local explanations, its focus remains at the instance level. However, most educational decisions are made at higher levels, classrooms, schools, or regions. Educators and administrators often require group-level insights to inform systemic policy. Previous studies have emphasized this need for hierarchical analysis in educational XAI [40,42]. Some early efforts have explored hierarchical interpretability, but these are often tied to specific domains or model types, limiting their adaptability to education [43]. As a response, H-LIME introduces a model-agnostic framework capable of aggregating local explanations into meaningful insights across multiple levels of data hierarchy. This bridges the gap between personalized feedback and institutional decision-making, offering a more holistic understanding of student adaptability.

2.3. Gaps and Opportunities in Educational XAI

Despite progress in applying XAI techniques to educational data, significant limitations remain. Most existing methods are designed for flat data structures and offer explanations only at the individual or model-wide level. As a result, they do not capture the hierarchical nature of educational systems, where insights are needed at multiple levels, students, classes, schools, and regions [14,40,42,44]. While fairness-aware aggregation and group-based modeling have been explored in other domains [20,45], such techniques have not been widely adapted for educational contexts. Moreover, hierarchical explainability tools that do exist are often tied to specific model architectures or constrained to particular use cases, limiting their general applicability [43].

In educational decision-making, the ability to interpret model predictions across levels is essential for targeted interventions and resource allocation. Without hierarchical insights, stakeholders risk overlooking systemic issues or misinterpreting individual-level patterns. This gap highlights the need for general-purpose, model-agnostic frameworks that can bridge local and aggregated explanations. H-LIME responds to this challenge by enabling interpretability across hierarchical structures, offering stakeholders a more comprehensive view of feature importance at both micro and macro levels.

2.4. Comparison of XAI Techniques

Interpretable machine learning techniques such as LIME, SHAP, and ALE offer varying advantages. LIME approximates local model behavior by perturbing inputs around an instance and fitting a simple interpretable model, usually linear [14]. It provides intuitive, instance-specific insights but lacks mechanisms for generalizing across groups or hierarchical levels. SHAP, based on Shapley values, supports both local and global explanations and is valued for its theoretical consistency [46,47]. However, it is computationally intensive and does not natively support aggregation across user-defined hierarchies. Similarly, ALE focuses on global effects and addresses feature correlations [18], but does not offer instance-specific or hierarchical explanations. Additionally, standard tree-based feature importance metrics (Gini impurity or Information Gain in XGBoost) provide a global summary of feature utility but fail to capture the directionality of influence (positive vs. negative) or how these effects vary across specific subgroups.

H-LIME addresses these limitations by extending LIME to generate explanations at individual, subgroup, and global levels within structured data hierarchies (e.g., institution type, location, education level). It retains LIME’s model-agnostic nature while enabling context-aware aggregation. Beyond LIME and SHAP, recent XAI methods such as TreeSHAP [27], ProtoDash [48], and Anchors [49] provide model-specific or prototype-based explanations, but typically operate on flat data and do not support hierarchical reasoning. A summary of these methods and their relative strengths is provided in Table 1.

3. Methodology

This section outlines the general methodology used to analyze and improve student adaptability using machine learning models and hierarchical explainability techniques. Figure 1 presents the structure of the proposed H-LIME framework.

3.1. Dataset Preparation and Hierarchical Structure

This study employed a publicly available dataset from Kaggle [50], containing 1205 instances and 14 attributes capturing student demographics, socio-economic conditions, and learning environments. Features include Gender, Age, Education Level, Institution Type, IT Student status, Location, Load-shedding, Financial Condition, Internet Type, Network Type, Class Duration, LMS usage, and Device Type. The target variable, Adaptivity Level, is categorized into Low, Moderate, and High. Table 2 provides a comprehensive summary of these features, including valid counts, unique categories, and frequency statistics for the most common values.

To support hierarchical interpretability, the dataset was organized into three levels:

Institution Type: Government vs. Non-Government
Location: Urban vs. Rural
Educational Level: Primary (School), Secondary (College), and Tertiary (University)

These levels represent systemic, geographic, and curricular dimensions of the educational landscape (see Figure 2). They enable multilevel insight into which features influence adaptability at the individual and group levels.

Before training and interpretation, the dataset was subjected to several preprocessing steps. Categorical variables such as Gender, Internet Type, and Network Type were encoded using one-hot encoding, and numerical features such as age and class duration were normalized to ensure uniform feature scaling. To address class imbalance, particularly among underrepresented groups like rural tertiary students, Synthetic Minority Over-sampling Technique (SMOTE) was applied to the training set. This preserved the original distribution in validation and test sets while ensuring adequate representation of minority classes during training.

The dataset was then split into training (70%), validation (15%), and test (15%) subsets using hierarchical stratified sampling to maintain proportional representation across all levels (Institution Type, Location, and Education Level). Preprocessing transformations were learned from the training set and applied consistently across all splits to avoid data leakage, ensuring fair and valid performance evaluation during model training and H-LIME explanation.

To guarantee the statistical integrity of our aggregated explanations, we scrutinized the data distribution across the hierarchy’s most granular tiers (Institution→Location→Education Level). Table 3 delineates the instance counts for the leaf nodes within the test set. Although the majority of subgroups exhibit robust sample densities, we acknowledge a pronounced sparsity in specific intersections, most notably at the tertiary level within rural jurisdictions (Non-Government Rural Universities). This scarcity provides a strong rationale for incorporating the standard deviation metric into the H-LIME framework, as it serves as a critical diagnostic for identifying instability in underrepresented strata.

3.2. H-LIME Framework Overview

The Hierarchical Local Interpretable Model-agnostic Explanations (H-LIME) framework extends traditional LIME by introducing a multilevel interpretability mechanism. While LIME provides local explanations for individual predictions, H-LIME aggregates these explanations across predefined hierarchical levels to generate both micro- and macro-level insights. The framework operates in three main stages:

Local Explanation Generation: For each test instance, LIME is applied to generate feature importance scores using a local surrogate model (typically a sparse linear regressor) around the prediction neighborhood.
Hierarchical Aggregation: Instances are grouped according to the defined hierarchical levels (Institution Type, Location, and Educational Level). Feature importances from LIME are averaged within each group to obtain subgroup-level interpretability. This allows for the detection of consistent patterns that would not be evident from single-instance explanations alone.
Global Trend Extraction: Aggregated group-level explanations are further summarized across the full hierarchy to identify features with persistent influence on predictions across the entire population. These global patterns help inform broader educational policy or curriculum interventions.

This multiscale approach enables stakeholders to trace explanations from the individual level (e.g., a student’s prediction) to institutional or regional levels (e.g., all students in rural colleges). It also addresses the limitations of flat interpretability methods, which often overlook structural context in grouped data. Figure 3 illustrates the overall H-LIME pipeline, showing how LIME-generated explanations flow into subgroup and hierarchical aggregations. This allows for interpretable outcomes at all levels of the educational structure.

3.3. Model Training

The model training process is crucial for developing robust and accurate machine learning models capable of predicting student adaptability levels. This subsection outlines the steps involved in the selection, training, and evaluation of the various machine-learning models used in this study. Various machine learning models were selected to ensure a comprehensive analysis of their predictive capabilities and interpretability. The models chosen for this study included Random Forest (RF), Gradient Boosting (GB), XGBoost (XGB), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Decision Tree (DT), Neural Network (NN), and AdaBoost. Each model was trained using a preprocessed training set. The training process involved fitting the models to the training data and tuning the hyperparameters using a validation set to optimize the performance.

3.4. Model Performance Metrics

To assess the predictive capability of the machine learning models, we utilized standard classification metrics: Precision, Recall, and F1-score. These metrics provide a robust evaluation of each model’s ability to correctly classify student adaptability levels, balancing the need for accuracy with the cost of misclassification.

3.4.1. Precision

Precision measures the proportion of correctly identified positive instances among all instances predicted as positive.

P r e c i s i o n = \frac{T P}{T P + F P}

(1)

where:

T P

(True Positive): The number of correctly predicted instances.

F P

(False Positive): The number of instances incorrectly predicted.

In the educational context, high precision ensures that when the model identifies a student as “High Adaptability”, the prediction is reliable, minimizing false positives that could lead to incorrect assumptions about a student’s independence.

3.4.2. Recall

Recall evaluates the model’s ability to identify all relevant positive instances.

Recall = \frac{T P}{T P + F N}

(2)

where

F N

(False Negative) represents adaptable students incorrectly classified as non-adaptable. High recall is crucial to ensure that the system does not overlook students who possess high adaptability, ensuring they are correctly recognized.

3.4.3. F1-Score

The F1-score is the harmonic mean of Precision and Recall.

F 1 - score = \frac{2 \cdot Precision \cdot Recall}{Precision + Recall}

(3)

The F1-score balances the trade-off between precision and recall. It is particularly useful in this study to ensure the model performs well across all adaptability levels without being biased by class imbalances.

3.4.4. Log-Loss Analysis and Model Selection

While precision and recall measure the model’s ability to make correct hard classifications, Log-Loss (Logarithmic Loss) evaluates the uncertainty of the model’s predictions. A lower Log-Loss indicates that the model’s predicted probabilities are closer to the actual class labels, reflecting better calibration. In this study, we employ the multi-class formulation of Log-Loss to account for the three adaptability levels (Low, Moderate, High).

Log-Loss = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{j = 1}^{M} y_{i j} log (p_{i j})

(4)

where:

N is the total number of student observations, M is the number of classes (

M = 3

),

y_{i j}

is the binary indicator (0 or 1) denoting if the class j is the correct label for observation i, and

p_{i j}

is the predicted probability that observation i belongs to class j. A lower Log-Loss indicates that the model’s predicted probabilities are closer to the actual class labels, reflecting better probability calibration, which is essential for the stability of post-hoc explanation methods like LIME.

3.5. LIME and H-LIME

This section delves into the Local Interpretable Model-agnostic Explanations technique and its hierarchical extension, Hierarchical Local Interpretable Model-agnostic Explanations. These methods were employed to provide interpretability to the machine learning models used in this study, with a specific focus on the Random Forest model, which was selected for its superior explanation stability.

3.5.1. LIME

LIME explains the individual predictions of any classifier by learning an interpretable model locally around the prediction. The key idea is to perturb the input data and observe the changes in the predictions to build a simple, interpretable model that approximates the complex model locally [14,51], as summarized in Algorithm 1.

Algorithm 1 LIME Algorithm

Input: An instance x to be explained, black-box model f, number of perturbations N, and interpretable model g (e.g., linear model).

2.: Generate Perturbations: Create N perturbed samples ${x_{1}^{'}, x_{2}^{'}, \dots, x_{N}^{'}}$ around x.

3.: Predict: Use the black-box model f to predict outcomes for the perturbed samples ${f (x_{1}^{'}), f (x_{2}^{'}), \dots, f (x_{N}^{'})}$ .

4.: Weight Perturbations: Assign weights to the perturbed samples based on their proximity to x. A common weighting function is the exponential kernel:

w (x, x^{'}) = exp (- \frac{D (x, x^{'})}{σ^{2}})

Where:

D (x, x^{'})

is the distance between x and

x^{'}

, and

σ

is a kernel width parameter.

5.: Fit Interpretable Model: Fit the interpretable model g to the weighted perturbed samples ${x^{'}, f (x^{'}), w (x, x^{'})}$ .

6.: Output: The local explanation $ξ (x)$ from the fitted interpretable model g.

For a given instance x, LIME solves the following optimization problem:

ξ (x) = arg min_{g \in G} L (f, g, π_{x}) + Ω (g)

(5)

where,

ξ (x)

: The local explanation for the instance x, representing the interpretable model’s feature importance values.

g \in G

: A candidate interpretable model selected from a class of interpretable models (G).

L (f, g, π_{x})

: A loss function measuring the fidelity of the interpretable model g in approximating the prediction of the original model f.

π_{x}

: A locality-aware weighting function that gives higher importance to perturbed samples closer to x.

Ω (g)

: A complexity penalty for the interpretable model (g), ensuring simplicity and interpretability.

This equation minimizes the weighted loss between the predictions of the complex model f and the interpretable model g, with locality emphasized using

π_{x}

. The complexity term

Ω (g)

prevents overfitting by ensuring the interpretable model remains simple. This trade-off between fidelity

L

and simplicity

Ω

makes LIME ideal for generating interpretable feature importance values for each instance.

3.5.2. H-LIME

H-LIME extends LIME by aggregating local explanations at multiple levels of a predefined hierarchy, providing insights that are both instance- and group-specific. This hierarchical approach allows for a more comprehensive understanding of model predictions across different levels of dataset hierarchy. Standard LIME explanations

ξ (x)

often rely on contrastive logic (Why Class A and not Class B?). To resolve multiclass confusion, H-LIME explicitly defines the explanation vector for a specific instance x as

ξ_{c} (x)

, containing only the coefficients

w_{j}^{(c)}

corresponding to the predicted class of interest c (High Adaptability). Unlike standard outputs which might conflate class probabilities, this filtering step ensures that the subsequent aggregation represents the direct contribution of features to the specific outcome, rather than a rejection of alternative classes.

Since LIME generates sparse explanations where different instances may have different top features selected, we enforce a consistent feature space for aggregation. For any feature j that is not selected in the local explanation

ξ (x)

of an instance x, we assign a coefficient

w_{j} = 0

. This ensures that the global aggregation reflects the population-wide prevalence of a feature’s influence, effectively penalizing features that appear only sporadically. The complete hierarchical aggregation procedure is detailed in Algorithm 2.

Algorithm 2 H-LIME algorithm

Input: Dataset X with hierarchical structure, black-box model f, number of perturbations N, interpretable model g, and hierarchical levels ${L_{1}, L_{2}, \dots, L_{k}}$ .

2.: Local Explanations with LIME: For each instance $x \in X$ , apply LIME to generate local explanations $ξ (x)$ .

3.

Aggregation at Each Level: For each hierarchical level

L_{i}

:

Group instances ${x_{1}, x_{2}, \dots, x_{n}}$ based on their values at level $L_{i}$ .
Aggregate local explanations ${ξ (x_{1}), ξ (x_{2}), \dots, ξ (x_{n})}$ to obtain a group-level explanation $η (L_{i})$ .

4.: Global Explanation: Aggregation of group-level explanations to produce a global explanation that provides an overall view of the factors influencing the model predictions.

5.: Output: Hierarchical explanations ${η (L_{1}), η (L_{2}), \dots, η (L_{k})}$ offer insights at different levels of the hierarchy, and $γ$ global explanation that summarizes feature importance across the entire dataset.

For a given hierarchical level

L_{i}

and G at that level, the group-level explanation

η (L_{i}, G)

can be defined as:

η (L_{i}, G) = \frac{1}{| G |} \sum_{x \in G} ξ (x)

(6)

where,

η (L_{i}, G)

: Group-level aggregated feature importance for group G at hierarchical level

L_{i}

.

G: A group of instances within the dataset that share common characteristics, as defined by the hierarchical level

L_{i}

.

L_{i}

: Specific hierarchical level in the dataset.

| G |

: The number of instances in group G.

ξ (x)

: The local explanation for instance x.

x \in G

: Denotes that the instance x belongs to group G.

This equation aggregates the local feature importance values

ξ (x)

across all instances

x \in G

within a group G, taking the mean to ensure group-level representativeness. The choice of mean aggregation is justified because it preserves the overall contribution of features within the group while mitigating the effect of outliers. Alternative aggregation methods (median or weighted mean) were considered, but may introduce biases, particularly in hierarchical datasets with imbalanced group sizes.

To capture the stability of feature influence within a group and to address the risk of feature cancellation (where positive and negative contributions offset each other), we compute the group-level standard deviation, denoted as

σ (L_{i}, G)

:

σ (L_{i}, G) = \sqrt{\frac{1}{| G |} \sum_{x \in G} {(ζ (x) - η (L_{i}, G))}^{2}} .

(7)

A low value of

σ (L_{i}, G)

indicates high agreement among instances within the group, whereas a high value suggests polarization, implying that the feature exerts differing influences across subgroup members.

The global explanation

γ

is then derived by aggregating the group-level explanations, as follows:

γ = \frac{1}{\sum_{i} | L_{i} |} \sum_{i} \sum_{G \in L_{i}} η (L_{i}, G)

(8)

where,

γ

: Global explanation summarizing feature importance across all hierarchical levels.

G \in L_{i}

: Denotes that the group G belongs to hierarchical level

L_{i}

.

\sum | L_{i} |

: Sum of the sizes (number of groups) across all hierarchical levels.

This equation calculates the weighted mean of the feature importance values across all the hierarchical levels and their constituent groups. The normalization factor

\sum | L_{i} |

ensures that the levels with more groups do not dominate the global explanation. By combining insights across levels, this equation captures the hierarchical structure of the data, providing actionable, system-wide explanations. Hierarchical aggregation design is critical for datasets in which the feature importance varies significantly across levels.

Table 4 provides a detailed description of the variables and parameters used in the proposed framework. It includes definitions for key metrics such as true positives (TP), false positives (FP), and false negatives (FN), as well as mathematical notations like likelihood functions, local explanations, and hierarchical group-level explanations, ensuring clarity and consistency in the methodology.

3.5.3. Steps for Implementing H-LIME

The equations described in the H-LIME framework were implemented as part of a structured computational pipeline to generate local, group-level, and global explanations. This structured approach allows for seamless integration of various equations within the H-LIME framework. Their roles within the framework are as follows:

Local Explanations $ξ (x)$ : LIME was used to generate the feature importance values for each instance. These values were used for further analysis. The Python LIME library was utilized to compute the instance-specific feature importances, which were then standardized for compatibility with subsequent computations.
Group-Level Aggregate $η (L_{i}, G)$ : The aggregation equation was directly applied to compute the mean feature importance values for all instances x within a group G. This was performed for each hierarchical level $L_{i}$ . NumPy was used to efficiently perform the aggregation. For large datasets, parallel processing was leveraged to accelerate this step across hierarchical levels.
Global Explanation ( $γ$ ): The global explanation formula was applied across all hierarchical levels, combining group-level explanations to generate a summary of feature importance. This ensured that the global explanation reflected contributions from all levels. A Python function aggregated the outputs from groups and levels, normalizing the contributions, as described previously. This step was optimized to handle large datasets.

The H-LIME framework executes the equations sequentially, starting with local explanations and moving on to group-level and global explanations. The outputs are stored in hierarchical data structures for visualization and analysis. The framework includes additional functions to validate and interpret these computations, ensuring that the results align with the model behavior and stakeholder expectations.

3.6. Assumptions in Model Development and Mathematical Formulation

The model development process and equation in this study rely on several key assumptions to ensure the applicability and interpretability of the results. These assumptions are fundamental to the methodological framework but also introduce potential limitations that must be considered when interpreting the findings.

Feature Independence: LIME, and by extension H-LIME, assumes that features are independent when perturbations are applied to generate local explanations $ξ (x)$ , simplifying the interpretation process. However, this assumption may oversimplify complex feature interactions, such as the correlation between socioeconomic status and technology access. To address this, aggregated insights were interpreted cautiously, and domain knowledge was used to account for potential interactions between features, ensuring meaningful and contextually accurate explanations.
Linearity in Local Models: LIME’s surrogate models assume a linear decision boundary near each instance, thus simplifying local explanations. However, in nonlinear regions, this assumption may not accurately capture the model’s behavior. To address this issue, H-LIME uses hierarchical aggregation to combine multiple local explanations across hierarchical levels. This reduces the biases introduced by the local linearity assumption, thereby providing a more comprehensive understanding of feature importance and context awareness.
Human-Defined Hierarchical Structure: We assume that the dataset can be meaningfully structured into hierarchical levels such as institution type, geographic location, and education level. These groupings are based on domain knowledge and follow the practice of human-defined concepts in explainable AI [14,25]. The utility of H-LIME relies on these definitions being relevant and interpretable for stakeholders.

These assumptions are fundamental to the methodological framework of this study and provide the necessary context for the application of H-LIME to explain student adaptability predictions. However, they also introduce potential limitations that may affect the interpretation and generalization of the findings.

3.7. Experimental Setup

For our experiments, we used a personal computer running Windows 10 Pro, equipped with a Lenovo X1 Yoga (ThinkPad) (Lenovo Group Ltd., Hong Kong, China) Core i7-6600U CPU, 16 GB of DDRAM, and 256 GB SSD. The Anaconda, Inc.-packaged Python version 3.11.5, was utilized to implement the machine learning models. The context for the computing performance of the methods employed in this study is provided by this hardware specification. Table 5 highlights the experimental setup used in our models training.

4. Evaluation Results and Discussion

In this section, we evaluate machine learning models for predicting student adaptability levels, emphasizing performance metrics such as precision, recall, F1-score, and information criteria. We present hierarchical explanations using H-LIME, demonstrating their effectiveness in providing comprehensive, and actionable insights.

4.1. Models Performance Comparison

We compared the performance of the models using precision, recall, F1-score, and Validation Log-Loss. The calculated values for each model are illustrated in Table 6. The Decision Tree model exhibits the highest F1-score (0.946), precision, and recall, indicating a strong ability to classify student adaptability levels based on hard predictions. Random Forest and XGBoost also demonstrate strong performance, with F1-scores (0.928) close to those of the Decision Tree model. KNN, Gradient Boosting, AdaBoost, and Neural Network models show moderate performance, while SVM has the lowest scores among all models evaluated.

Justification for Model Selection

Despite the Decision Tree achieving the highest accuracy, we selected the Random Forest model for the H-LIME explainability analysis. This decision is based on the Validation Log-Loss metric, which evaluates the reliability of the model’s probability estimates.

As shown in Table 6, the Decision Tree exhibited a significantly higher Validation Log-Loss (0.363) compared to the Random Forest (0.185). This discrepancy indicates that while the Decision Tree is accurate in hard classification, it is prone to overfitting and produces less reliable probability estimates. Since the H-LIME framework relies on perturbing inputs and analyzing shifts in prediction probabilities to generate local explanations, the Random Forest’s superior calibration ensures greater stability and fidelity in the resulting interpretability insights.

To substantiate this selection empirically, we conducted a comparative stability analysis of the LIME explanations generated by both models on the test set. The resulting data revealed a marked contrast in robustness: explanations derived from the Decision Tree exhibited considerable volatility, yielding an instability score quantified as the standard deviation of feature weights of

0.0049 \pm 0.0008

. Notably, this metric is approximately

4.5

times higher than the corresponding stability observed in the Random Forest model (

0.0011 \pm 0.0001

). These findings suggest that the Random Forest’s calibrated probability gradients foster significantly more consistent and reproducible explanations, whereas the discrete, rigid decision boundaries inherent to the Decision Tree introduce substantial variance in feature attribution.

4.2. Evaluating Random Forest Model with H-LIME

To investigate the factors shaping student adaptability, we employed the H-LIME method in conjunction with the Random Forest model. The hierarchical aggregation was conducted at three distinct levels: Institution Type, Location, and Educational Level, followed by the aggregation of group-level explanations to produce a global explanation that provides an overall view. This multi-level analysis offers insights into how various factors influence student adaptability across different detail scales.

4.2.1. Result for Local Explanation with LIME

Before applying H-LIME, we utilized LIME to interpret individual predictions. LIME helps approximate the decision-making process of complex models by locally fitting an interpretable surrogate model. To validate the trustworthiness of these local approximations, we quantified the fidelity of the surrogate linear models via the coefficient of determination (

R^{2}

). The analysis yielded a mean

R^{2}

of

0.65

across the test set, suggesting that the generated explanations offer a faithful reflection of the Random Forest’s complex decision dynamics within the perturbed local neighborhoods.

Figure 4 shows the explanation generated by LIME for a student whose adaptability level was predicted as “High” with 100% confidence. However, due to the behavior of LIME in multiclass settings, the explanation plot contrasts the predicted class (“High”) with an alternative class, in this case, “Moderate”. Thus, the plot visualizes how each feature contributes for or against the “Moderate” class, rather than directly explaining the “High” prediction.

Prediction Probabilities

The model confidently predicted “High” adaptability (probability = 1.00), while the probabilities for “Moderate” and “Low” were 0.00. This indicates strong model certainty in the chosen class.

Feature Contributions (Contrastive View via LIME)

By examining the feature contributions away from “Moderate”, we infer the support for the “High” class as follows:

Class Duration (0): This has a strong negative influence on “Moderate,” suggesting that lack of scheduled class duration is less aligned with that class, and may contribute more to a “High” or “Low” classification.
Self LMS Usage (No): Shows a notable negative effect on “Moderate”, reinforcing the importance of self-driven LMS engagement for adaptability.
Education Level (School) and Age (1–5): These features contribute negatively to “Moderate”, indicating their relevance in separating age and education-level subgroups.
Network Type (2G): Slight negative contribution, highlighting digital infrastructure as a factor in adaptability.

Although the visualization contrasts with “Moderate”, the inverse logic reveals which features pull the model’s decision away from that class and toward “High”. This indirect interpretability underscores the limitations of LIME in multiclass tasks, where it does not directly explain why the predicted class was selected.

Note: H-LIME resolves this issue by allowing aggregation of explanations for the actual predicted class across hierarchical groups, rather than relying on contrastive logic.

4.2.2. Institution Type Aggregation

Figure 5 presents feature importance aggregated separately for government and non-government institutions. The results highlight systemic differences in factors influencing adaptability.

Government Institutions (Figure 5a): Older students (e.g., Age 21–25), access to mobile data, and 2G network usage appear as positive contributors to adaptability, suggesting that older learners benefit from flexible digital resources. In contrast, rural location and reliance on mobile devices negatively influence adaptability, emphasizing the need for infrastructure investment and access to better hardware in underserved regions.

Non-Government Institutions (Figure 5b): Negative contributors include mid-level financial condition and gender (boys), indicating socio-economic and gender-related challenges. Technological features such as 4G access and age groups show weaker associations, suggesting that while tech access helps, it does not fully mitigate deeper equity gaps. Support strategies like targeted financial aid and inclusive mentoring may be more effective in this context. H-LIME enables actionable insights into institutional disparities, supporting policies that align interventions with structural and demographic needs.

4.2.3. Location-Based Aggregation Within Institutions

Figure 6 shows how feature importance varies by location (urban vs. rural) within institution types.

(a) Government–Urban (Figure 6a): Younger students (Age 21–25) show lower adaptability, while higher education level (School) contributes positively. Targeted mentorship and educational support may help address early-stage challenges.

(b) Government–Rural (Figure 6b): Education remains a key positive factor, but rural students face greater infrastructural challenges. Investments in teaching quality, scholarships, and rural digital access are essential to mitigate these gaps.

(c) Non-Government–Urban (Figure 6c): Technological constraints like 4G access appear less influential, suggesting better infrastructure. Digital literacy and engagement initiatives could further support younger learners.

(d) Non-Government–Rural (Figure 6d): Barriers are more pronounced. Younger students and those with low LMS engagement are most affected. Personalized learning and increased digital platform use can help improve outcomes.

Across all subgroups, educational attainment consistently drives higher adaptability, while rural environments and younger learners face greater risks. These insights support policies focused on infrastructure equity, early-stage learning support, and LMS engagement.

4.2.4. Educational Level Aggregation Within Urban Government Institutions

Figure 7 shows how adaptability factors differ across primary (school), secondary (college), and tertiary (university) education levels within urban government institutions.

(a) Primary Level (Figure 7a): A low score for Education Level_School and negative contributions from LMS usage and mobile devices indicate difficulties younger students face with self-learning and technology engagement. Improving early education quality and digital literacy at this level is crucial.

(b) Secondary Level (Figure 7b): A strong primary education background (Education Level_School > 1.10) positively affects adaptability. Class duration and rural location also influence outcomes, suggesting the need for structured learning and better rural infrastructure.

(c) Tertiary Level (Figure 7c): Flexible class schedules emerge as the most significant positive predictor, especially for older students. However, early education gaps continue to be associated with reduced adaptability, emphasizing the long-term importance of foundational learning. Younger university students may show higher adaptability with targeted academic support. The findings emphasize tailoring interventions to each educational stage: reinforcing foundational education, adapting schedules to student maturity, and investing in equitable technological access.

4.2.5. Global Feature Importance

Figure 8 presents the global feature importance derived from H-LIME, summarizing adaptability predictors across all hierarchical levels. The strongest negative contributor is Education Level_School, indicating foundational gaps in early education. Socio-economic (e.g., Financial Condition_Rich) and gender-based (e.g., Gender_Boy) disparities also reduce adaptability, highlighting the need for inclusive support strategies. Poor digital access (e.g., Network Type_2G, Device_Mobile) consistently impairs adaptability, reaffirming infrastructure’s role.

Age-related variation shows that students aged 21–25 are particularly at risk, potentially due to academic or career transitions. Students in rural areas (Location_No) and government institutions face structural disadvantages, while non-IT students demonstrate lower adaptability, underscoring the value of technical literacy. In contrast, higher education levels (College, University) show milder negative effects, suggesting improved resilience and access over time. This global view reinforces the importance of addressing early-stage inequalities while supporting digital access and skill development across the educational spectrum.

4.2.6. Comparison with Related Studies

Table 7 presents a comparison of the proposed H-LIME framework with recent studies on student adaptability prediction [8] and explainable AI in education [52]. The comparison highlights predictive performance, interpretability scope, and methodological contributions.

As shown in Table 7, H-LIME demonstrates robust performance with an F1-Score of 0.93, making it highly competitive with state-of-the-art approaches. While Nnadi et al. [8] achieved a marginally higher F1-Score (0.94) using a similar Random Forest architecture, their study utilized standard “flat” XAI methods (SHAP, LIME, ALE). These techniques provide excellent instance-level or global summaries but fail to capture the structural dependencies inherent in educational data, such as institution-specific or regional trends.

Similarly, Guleria et al. [52] focused on individual predictions for career counseling but lacked mechanisms for global or group-level aggregation. The key advantage of H-LIME lies in its hierarchical interpretability. By bridging the gap between local explanations and global trends, H-LIME offers unique "middle-layer" insights, such as identifying that digital infrastructure gaps are critical in rural subgroups but less relevant in urban ones, which remain hidden in the flat analysis models used by related studies.

4.2.7. Interpretability in Practice

Beyond these quantitative metrics, we also qualitatively compared LIME and H-LIME. Although LIME provides individual explanations, these are often fragmented and vary across similar instances. H-LIME, by aggregating local explanations into hierarchical levels, reveals stable group-specific patterns (e.g., how feature importance differs by institution type or education level). This structure enables stakeholders to reason across individual, group, and global scales, facilitating more informed interventions in educational settings.

5. Limitations and Practical Implications

While H-LIME offers detailed, multilevel explanations that support educational decision-making, several limitations and usability concerns remain.

Usability: Non-expert users may find hierarchical outputs overwhelming, especially when concise summaries are preferred. There’s a trade-off between interpretability and specificity; while H-LIME provides granular insights, they may obscure the overall model logic.

Technical Limitations: H-LIME depends on human-defined hierarchies, making it sensitive to poorly structured or imbalanced groupings. Like LIME, it assumes feature independence and local linearity, which may oversimplify complex relationships. Furthermore, this independence assumption may introduce bias when aggregating interlinked educational features (Financial Condition correlating with Device Type). H-LIME aggregates the attributed influence, but if the local surrogate assigns weight to a proxy feature due to collinearity, this misattribution will persist in the group summary.

To address the risk of feature cancellation, where positive and negative contributions within a group offset each other, we incorporated the standard deviation metric (

σ

) into the aggregation logic. While mean aggregation alone might obscure polarized features (approaching zero), the inclusion of

σ

allows H-LIME to distinguish between truly irrelevant features and those with high intra-group variance, ensuring that highly polarized features are explicitly signaled. Although interaction-aware methods like Shapley values (SHAP) offer theoretical advantages for nonlinear features, they are computationally intensive and often produce dense explanations that are harder for non-technical stakeholders to interpret than LIME’s sparse linear approximations. Future work will explore efficient Hierarchical Shapley implementations to address this limitation.

Practical Use: H-LIME provides actionable insights at multiple levels. For example, identifying financial hardship in rural schools can justify aid programs, while recognizing digital access gaps can guide infrastructure upgrades. A hypothetical deployment at a rural tertiary institution showed how institution-type, location, and education-level explanations guided interventions in financial aid, connectivity, and curriculum design, ultimately improving adaptability and satisfaction. These insights reinforce H-LIME’s utility for educators and policymakers seeking scalable, interpretable frameworks to support student-centered strategies.

5.1. Assumptions About Hierarchical Structure

The hierarchical structure of the framework may not accurately reflect the underlying relationships in the data. This could introduce bias or lead to overfitting. Additionally, aggregating explanations across levels may skew the results if certain groups dominate the dataset, leading to biased conclusions that may not be generalizable.

It is important to note that the hierarchical structure (Institution→Location→Education Level) serves as a user-defined analytical lens rather than a rigid model constraint. Because H-LIME aggregates strictly from the instance level upward, the underlying feature contributions for a specific student remain constant regardless of the aggregation order. Consequently, changing the hierarchy order, such as grouping first by Location and then by Institution Type, does not alter the numerical validity of the insights, but instead shifts the policy focus. This flexibility ensures that the findings are not artifacts of a static sequence, but rather adaptable perspectives that enable stakeholders to prioritize different dimensions (addressing regional infrastructure gaps before institutional differences).

5.2. Scalability Challenges

Although the multilevel aggregation approach of H-LIME can increase computational complexity, especially for larger datasets or real-time applications, the following scalability challenges were identified:

Computational Overhead: Generating local explanations for individual instances using LIME and aggregating them across multiple hierarchical levels requires substantial computational resources. This complexity increases only as the dataset grows, and the number of hierarchical levels within the data increases.
Real-Time Applicability: In dynamic educational applications that require rapid decision-making, the time required for perturbations, model predictions, and hierarchical aggregation may exceed acceptable thresholds.

To quantify the actual computational overhead, we performed a runtime benchmark comparing the base LIME generation process against the H-LIME aggregation step. The results indicate that while generating LIME explanations required an average of 0.129 s per instance, the hierarchical aggregation process for the entire test set required only 0.016 s. Thus, the aggregation step introduces negligible latency (<1% of total runtime), confirming that the primary bottleneck remains the underlying LIME generation rather than the hierarchical summarization.

5.3. Mitigation Strategies for Scalability

To address the scalability challenges and enhance the applicability of H-LIME, the following strategies are proposed:

Parallel Processing: Employing parallel computing frameworks can significantly expedite the generation and aggregation of explanations, thereby enhancing the suitability of the framework for larger datasets.
Sampling Techniques: Leveraging representative sampling at each hierarchical level can reduce the computational load without compromising the integrity of insights.
Incremental Aggregation: Adopting an incremental approach to dynamically aggregate explanations as new data arrive can enhance the practicality of H-LIME for real-time applications.

5.4. Broader Applicability and Impacts of H-LIME

One of the key strengths of the H-LIME framework is its inherent adaptability, allowing for the integration of additional attributes into the hierarchical structure. While this study focuses on three specific dimensions, Institution Type, Location, and Education Level, H-LIME is designed to accommodate a diverse range of features across various domains.

For instance, in healthcare, the framework could be applied to hierarchical factors such as patient age groups, hospital departments, and geographic regions to evaluate clinical outcomes. Similarly, environmental studies could leverage hierarchical layers like climate zones, seasonal variations, and pollution levels to derive actionable insights. In the marketing domain, H-LIME could evaluate customer engagement patterns across regional markets and product categories. This flexibility ensures that H-LIME can be adapted to datasets of varying complexity, establishing it as a versatile tool for researchers seeking interpretable, multi-level insights in high-stakes fields.

Societal and Operational Impact

Beyond its domain versatility, H-LIME offers significant operational utility by bridging the gap between technical model outputs and human decision-making. By translating complex, non-linear model behaviors into hierarchical summaries, the framework democratizes access to AI insights, enabling non-technical stakeholders, such as school administrators, public health officials, or business strategists, to interpret data without requiring deep machine learning expertise.

Furthermore, H-LIME enhances accountability in high-stakes applications. By explicitly quantifying the stability and variance of features within subgroups (via the standard deviation metric,

σ

), the framework prevents the ecological fallacy, where broad aggregate statistics might otherwise mask disparities affecting minority populations. This capability is critical for fostering fairness and trust, as it compels decision-makers to acknowledge the heterogeneity of the populations they serve rather than relying on oversimplified global averages.

6. Conclusions and Future Work

This study introduced the Hierarchical Local Interpretable Model-agnostic Explanations (H-LIME) framework, an innovative extension of the widely used LIME technique, aimed at addressing the need for more comprehensive interpretability in educational machine learning models. By aggregating local explanations across different hierarchical levels, such as institution type, location, and educational level, H-LIME provides a multi-layered understanding of the factors influencing student adaptability. Crucially, this study introduced a group-level stability metric (

σ

) to distinguish between universally irrelevant features and those with high intra-group variance, ensuring that polarized predictors are explicitly identified.

Our findings demonstrate that H-LIME not only maintains the interpretability of individual predictions but also offers valuable insights at higher levels of aggregation. Empirical evaluation confirmed the robustness of the framework, with the Random Forest model yielding explanations that were approximately 4.5 times more stable than those of decision trees. The application of H-LIME revealed significant predictors of student adaptability, such as educational level and class duration, which retained their importance across hierarchical contexts. Our results align with and extend existing research by providing a more detailed and context-aware understanding of adaptability in educational settings.

This research contributes valuable knowledge to the field of educational data science, paving the way for more effective and personalized educational strategies. Future research will focus on:

Validating H-LIME’s utility through longitudinal pilot studies with educational institutions to test its effectiveness in guiding real-world interventions and refine the framework based on stakeholder feedback.
Optimizing computational efficiency using parallel computing to support real-time dashboards for educators.
Addressing feature interactions by exploring Hierarchical Shapley values (H-SHAP) to capture non-linear dependencies that linear surrogates might miss.
Expanding the application of H-LIME to other high-stakes domains, such as healthcare, finance, cybersecurity, and workforce development, where hierarchical data structures are prevalent.

These efforts aim to establish H-LIME as a robust and accessible tool for driving data-informed educational strategies and improving student adaptability.

Author Contributions

Conceptualization, L.C.N.; Validation, Y.W.; Methodology, L.C.N.; Writing—original draft, L.C.N.; Supervision, Writing—review & editing, Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data presented in the study are openly available in (https://www.kaggle.com/datasets/mdmahmudulhasansuzan/students-adaptability-level-in-online-education, accessed on 12 March 2025), and derived data supporting the findings of this study are available from the corresponding author on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chitti, M.; Chitti, P.; Jayabalan, M. Need for interpretable student performance prediction. In Proceedings of the 2020 13th International Conference on Developments in eSystems Engineering (DeSE), Liverpool, UK, 14–17 December 2020; IEEE: New York, NY, USA, 2020; pp. 269–272. [Google Scholar]
Guo, Y.; Gunay, C.; Tangirala, S.; Kerven, D.; Jin, W.; Savage, J.C.; Lee, S. Identifying Critical LMS Features for Predicting At-risk Students. arXiv 2022, arXiv:2204.13700. [Google Scholar]
Khosravi, H.; Shum, S.B.; Chen, G.; Conati, C.; Tsai, Y.S.; Kay, J.; Knight, S.; Martinez-Maldonado, R.; Sadiq, S.; Gašević, D. Explainable artificial intelligence in education. Comput. Educ. Artif. Intell. 2022, 3, 100074. [Google Scholar] [CrossRef]
Alamri, R.; Alharbi, B. Explainable student performance prediction models: A systematic review. IEEE Access 2021, 9, 33132–33143. [Google Scholar] [CrossRef]
Holliman, A.J.; Collie, R.J.; Martin, A.J. Adaptability and academic development. In The Encyclopedia of Child and Adolescent Development; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2019; pp. 1–11. [Google Scholar]
Ng, P.T. Learning in an era of uncertainty in Singapore: Diversity, lifelong learning, inspiration and paradigm shift. Educ. Res. Policy Pract. 2025, 24, 121–127. [Google Scholar] [CrossRef]
Feraco, T.; Casali, N.; Meneghetti, C. Adaptability favors positive academic responses and posttraumatic growth under COVID-19: A longitudinal study with adolescents. Eur. J. Psychol. Educ. 2023, 38, 1771–1789. [Google Scholar] [CrossRef]
Nnadi, L.C.; Watanobe, Y.; Rahman, M.M.; John-Otumu, A.M. Prediction of Students’ Adaptability Using Explainable AI in Educational Machine Learning Models. Appl. Sci. 2024, 14, 5141. [Google Scholar] [CrossRef]
Gillani, N.; Eynon, R.; Chiabaut, C.; Finkel, K. Unpacking the “Black Box” of AI in education. Educ. Technol. Soc. 2023, 26, 99–111. [Google Scholar]
Swamy, V.; Radmehr, B.; Krco, N.; Marras, M.; Käser, T. Evaluating the explainers: Black-box explainable machine learning for student success prediction in MOOCs. arXiv 2022, arXiv:2207.00551. [Google Scholar] [CrossRef]
Bhutoria, A. Personalized education and artificial intelligence in the United States, China, and India: A systematic review using a human-in-the-loop model. Comput. Educ. Artif. Intell. 2022, 3, 100068. [Google Scholar] [CrossRef]
Tetzlaff, L.; Schmiedek, F.; Brod, G. Developing personalized education: A dynamic framework. Educ. Psychol. Rev. 2021, 33, 863–882. [Google Scholar] [CrossRef]
Goethals, S.; Martens, D.; Evgeniou, T. Manipulation risks in explainable ai: The implications of the disagreement problem. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Turin, Italy, 18–22 September 2023; Springer: Cham, Switzerland, 2023; pp. 185–200. [Google Scholar]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
Laugel, T.; Renard, X.; Lesot, M.J.; Marsala, C.; Detyniecki, M. Defining locality for surrogates in post-hoc interpretablity. arXiv 2018, arXiv:1806.07498. [Google Scholar] [CrossRef]
Hryniewska, W.; Grudzień, A.; Biecek, P. LIMEcraft: Handcrafted superpixel selection and inspection for Visual eXplanations. Mach. Learn. 2024, 113, 3143–3160. [Google Scholar] [CrossRef]
Farrow, R. The possibilities and limits of XAI in education: A socio-technical perspective. Learn. Media Technol. 2023, 48, 266–279. [Google Scholar] [CrossRef]
Apley, D.W.; Zhu, J. Visualizing the effects of predictor variables in black box supervised learning models. J. R. Stat. Soc. Ser. B Stat. Methodol. 2020, 82, 1059–1086. [Google Scholar] [CrossRef]
McKee, G.J.; Miljkovic, D. Data Aggregation and Information Loss. In Proceedings of the 2007 Annual Meeting, Portland, OR, USA, 29 July–1August 2007. [Google Scholar] [CrossRef]
Chen, Z.S.; Zhu, Z.; Wang, Z.J.; Tsang, Y. Fairness-aware large-scale collective opinion generation paradigm: A case study of evaluating blockchain adoption barriers in medical supply chain. Inf. Sci. 2023, 635, 257–278. [Google Scholar] [CrossRef]
Chang, J.P.; Chen, Z.S.; Wang, Z.J.; Jin, L.; Pedrycz, W.; Martínez, L.; Skibniewski, M.J. Assessing spatial synergy between integrated urban rail transit system and urban form: A BULI-based MCLSGA model with the wisdom of crowds. IEEE Trans. Fuzzy Syst. 2022, 31, 434–448. [Google Scholar] [CrossRef]
Wajid, H. A data-driven benchmarking framework for enhancing sustainability performance in UI GreenMetric rankings: Insights from Saudi Arabian higher education institutions. Chall. Sustain. 2025, 13, 445–458. [Google Scholar] [CrossRef]
Kanojia, P.; Malhotra, R.; Uniyal, A.; Johri, A.; Qureshi, S. An examination of the effects of demographic factors on organizational commitment: Data from North Indian higher education. J. Res. Innov. Technol. 2025, 4, 322–331. [Google Scholar] [CrossRef]
Tessema, A.M.; Nicola-Gavrilă, L. Assessment of the experiences of higher education institution on online learning: The case of some selected institutions. J. Res. Innov. Technol. 2023, 2, 49–56. [Google Scholar]
Christoph, M. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable; Leanpub: Victoria, BC, Canada, 2020. [Google Scholar]
Ochoa, X. Adaptive multilevel clustering model for the prediction of academic risk. In Proceedings of the 2016 XI Latin American Conference on Learning Objects and Technology (LACLO), San Carlos, Costa Rica, 3–7 October 2016; IEEE: New York, NY, USA, 2016; pp. 1–8. [Google Scholar]
Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef] [PubMed]
Yang, Z.; Yang, D.; Dyer, C.; He, X.; Smola, A.; Hovy, E. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 1480–1489. [Google Scholar]
Kumar, S.; Imambi, S.S. MIMIC-EYE: A Secure and Explainable Multi-Modal Deep Learning Framework for Clinical Decision Support. Int. J. Comput. Methods Exp. Meas. 2025, 13, 484–506. [Google Scholar] [CrossRef]
Khalil, R.A.E.H.H.; Enjadat, S.M. International Journal of Energy Production and Management. Management 2025, 10, 321–331. [Google Scholar]
Amuta, E.O.; Alagbe, E.E.; Iyogun, P.; Sobola, G.O. Solar energy harvesting and storage optimization using machine learning. Int. J. Energy Prod. Manag. 2025, 10, 195–205. [Google Scholar] [CrossRef]
Tiwari, R.G.; Jain, A.K.; Kukreja, V.; Ujjwal, N. Education 4.0: Explainable machine learning for classification of student adaptability. In Proceedings of the 2022 International Conference on Data Analytics for Business and Industry (ICDABI), Sakhir, Bahrain, 25–26 October 2022; IEEE: New York, NY, USA, 2022; pp. 6–10. [Google Scholar]
Suzan, M.H.; Samrin, N.A.; Biswas, A.A.; Pramanik, A. Students’ Adaptability Level Prediction in Online Education using Machine Learning Approaches. In Proceedings of the 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kharagpur, India, 6–8 July 2021; IEEE: New York, NY, USA 2021; pp. 1–7. [Google Scholar]
Stockinger, K.; Rinas, R.; Daumiller, M. Student adaptability, emotions, and achievement: Navigating new academic terrains in a global crisis. Learn. Individ. Differ. 2021, 90, 102046. [Google Scholar] [CrossRef]
Zhang, K.; Wu, S.; Xu, Y.; Cao, W.; Goetz, T.; Parks-Stamm, E.J. Adaptability promotes student engagement under COVID-19: The multiple mediating effects of academic emotion. Front. Psychol. 2021, 11, 633265. [Google Scholar] [CrossRef]
Pedreschi, D.; Giannotti, F.; Guidotti, R.; Monreale, A.; Pappalardo, L.; Ruggieri, S.; Turini, F. Open the black box data-driven explanation of black box decision systems. arXiv 2018, arXiv:1806.09936. [Google Scholar] [CrossRef]
Zhao, H.; Chen, H.; Yang, F.; Liu, N.; Deng, H.; Cai, H.; Wang, S.; Yin, D.; Du, M. Explainability for large language models: A survey. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–38. [Google Scholar] [CrossRef]
Nnadi, L.C.; Ding, D.; Muepu, D.M.; Amin, M.F.I.; Kabir, R. Interpretable Machine Learning for Predicting and Explaining Code Submission Outcomes in an Online Judge System. In Proceedings of the International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, Kytakyushu, Japan, 1–4 July 2025; Springer: Berlin/Heidelberg, Germany, 2025; pp. 77–88. [Google Scholar]
Singh, C.; Askari, A.; Caruana, R.; Gao, J. Augmenting interpretable models with large language models during training. Nat. Commun. 2023, 14, 7913. [Google Scholar] [CrossRef]
Yadav, G.; Tseng, Y.J.; Ni, X. Contextualizing problems to student interests at scale in intelligent tutoring system using large language models. arXiv 2023, arXiv:2306.00190. [Google Scholar] [CrossRef]
Hooda, M.; Rana, C.; Dahiya, O.; Shet, J.P.; Singh, B.K. Integrating LA and EDM for improving students Success in higher Education using FCN algorithm. Math. Probl. Eng. 2022, 2022, 7690103. [Google Scholar] [CrossRef]
Hagadorn, J.I.; Shaffer, M.L. Hierarchical data structures and multilevel modeling. J. Pediatr. 2019, 212, 250–251. [Google Scholar] [CrossRef] [PubMed]
Lengerich, B.J.; Bordt, S.; Nori, H.; Nunnally, M.E.; Aphinyanaphongs, Y.; Kellis, M.; Caruana, R. LLMs understand glass-box models, discover surprises, and suggest repairs. arXiv 2023, arXiv:2308.01157. [Google Scholar] [CrossRef]
Bennetot, A.; Donadello, I.; Qadi, A.E.; Dragoni, M.; Frossard, T.; Wagner, B.; Saranti, A.; Tulli, S.; Trocan, M.; Chatila, R.; et al. A practical guide on explainable ai techniques applied on biomedical use case applications. arXiv 2021, arXiv:2111.14260. [Google Scholar] [CrossRef]
Keller, N.; Jenny, M.A.; Spies, C.A.; Herzog, S.M. Augmenting decision competence in healthcare using AI-based cognitive models. In Proceedings of the 2020 IEEE International Conference on Healthcare Informatics (ICHI), Oldenburg, Germany, 30 November–3 December 2020; IEEE: New York, NY, USA, 2020; pp. 1–4. [Google Scholar]
Lundberg, S. A unified approach to interpreting model predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar] [CrossRef]
Aas, K.; Jullum, M.; Løland, A. Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. Artif. Intell. 2021, 298, 103502. [Google Scholar] [CrossRef]
Gurumoorthy, K.S.; Dhurandhar, A.; Cecchi, G.; Aggarwal, C. Efficient data representation by selecting prototypes with importance weights. In Proceedings of the 2019 IEEE International Conference on Data Mining (ICDM), Beijing, China, 8–11 November 2019; IEEE: New York, NY, USA, 2019; pp. 260–269. [Google Scholar]
Ribeiro, M.T.; Singh, S.; Guestrin, C. Anchors: High-precision model-agnostic explanations. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
Suzan, M.H.; Samrin, N.A. Students Adaptability Level in Online Education; Kaggle: San Francisco, CA, USA, 2022. [Google Scholar]
Lundberg, S.; Lee, S.I. An unexpected unity among methods for interpreting model predictions. arXiv 2016, arXiv:1611.07478. [Google Scholar] [CrossRef]
Guleria, P.; Sood, M. Explainable AI and machine learning: Performance evaluation and explainability of classifiers on educational data mining inspired career counseling. Educ. Inf. Technol. 2023, 28, 1081–1116. [Google Scholar] [CrossRef] [PubMed]

Figure 1. An overview of the H-LIME framework.

Figure 2. Hierarchical Structure Definition for Analyzing Student Adaptability.

Figure 3. H-LIME Workflow Diagram.

Figure 4. LIME explanation for an instance predicted as “High” adaptability. The visualization contrasts the model’s decision against the “Moderate” class, showing the features that contribute for or against that alternative; this reflects LIME’s limitation in directly explaining multiclass predictions. Note: The prediction probabilities shown (Low: 0.00, Moderate: 0.00) result from rounding by the visualization tool. The model’s underlying probabilities are non-zero (e.g., <0.01), consistent with the Random Forest’s validation Log-Loss of 0.185.

Figure 5. Aggregated feature importance for (a) Government institutions. Aggregated feature importance for (b) Non-Government institutions. Bars represent the mean LIME importance; error bars represent the standard deviation (

σ

), indicating the stability of the explanation within the group.

Figure 5. Aggregated feature importance for (a) Government institutions. Aggregated feature importance for (b) Non-Government institutions. Bars represent the mean LIME importance; error bars represent the standard deviation (

σ

), indicating the stability of the explanation within the group.

Figure 6. Feature importance by location. (a) Government–Urban and (b) Government–Rural. Feature importance by location. (c) Non-Government–Urban and (d) Non-Government–Rural. Error bars (σ) reveal features with high polarization (wide bars) versus high stability (narrow bars).

Figure 7. Deep-level feature importance for Government Urban institutions across (a) School and (b) College. Deep-level feature importance for Government Urban institutions across (c) University level. These plots demonstrate how adaptability predictors shift significantly depending on the educational stage, even within the same institution type and location. (a) Feature Importance for Primary Education in Government Institutions within Urban Areas. (b) Feature Importance for Secondary Education in Government Institutions within Urban Areas. (c) Feature Importance for Tertiary Education in Government Institutions within Urban Areas.

Figure 8. Global feature importance derived from H-LIME. Unlike standard feature importance (Gini impurity), this metric captures the directionality of influence. Error bars indicate the variability of feature impact across the entire student population.

Table 1. Comparison of Explanation Techniques.

Technique	Scope	Strengths	Limitations
LIME	Local explanations via surrogate models	-Easy to implement -Model-agnostic -Highlights instance-specific influential features	-Limited to individual predictions -No support for aggregation or hierarchy
SHAP	Local + Global (based on Shapley values)	-Theoretically grounded -Consistent across predictions -Supports global feature ranking	-Computationally intensive -Does not natively support hierarchical aggregation
ALE	Global (accumulated local effects)	-Accounts for feature correlation -Robust to collinearity -Suitable for global interpretability	-No instance-level insights -No hierarchical aggregation capability
H-LIME (Proposed)	Local + Hierarchical + Global	-Supports aggregation across hierarchical levels (e.g., institution, Eduction, location) -Provides local, group-level, and global interpretability Model-agnostic and scalable	-Requires predefined data hierarchy -More complex to implement -Depends on meaningful group definitions

Table 2. Summary statistics of the dataset.

Variable	Count	Unique	Top	Freq
Gender	1205	2	Boy	603
Age	1205	6	21–25	374
Education Level	1205	3	School	530
Institution Type	1205	2	Non-Government	623
IT Student	1205	2	No	501
Location	1205	2	Yes	135
Load-shedding	1205	2	Low	1004
Financial Condition	1205	3	Mid	676
Internet Type	1205	2	Mobile	655
Network Type	1205	3	4G	775
Class Duration	1205	3	1–3	840
Self Lms	1205	2	No	955
Device	1205	3	Mobile	1013
Adaptivity Level	1205	3	Moderate	625

Table 3. Frequency distribution of test instances across hierarchical leaf nodes.

Institution Type	Location	Education Level	Instance Count
Government	Rural	College	24
		School	24
		University	9
	Urban	College	19
		School	27
		University	15
Non-Government	Rural	College	11
		School	19
		University	5
	Urban	College	7
		School	11
		University	10

Table 4. Summary of Variables and Parameters.

Variable/Parameter	Description
$T P$	True Positives is the number of correctly predicted instances.
$F P$	False Positive is the number of instances incorrectly predicted.
$F N$	False-Negative Instances incorrectly classified as negative. (Missed positive instances).
N	is the total number of student observations.
M	is the number of classes ( $M = 3$ ).
$y_{i j}$	is the binary indicator (0 or 1) denoting if class j is the correct label for observation i.
$p_{i j}$	is the predicted probability that observation i belongs to class j.
$ξ (x)$	The local explanation for the instance x, representing the interpretable model’s feature importance values.
$g \in G$	A candidate interpretable model selected from a class of interpretable models (G).
$L (f, g, π_{x})$	A loss function measuring the fidelity of the interpretable model g in approximating the prediction of the original model f.
$π_{x}$	A locality-aware weighting function that gives higher importance to perturbed samples closer to x.
$Ω (g)$	A complexity penalty for the interpretable model (g), ensuring simplicity and interpretability.
$η (L_{i}, G)$	Aggregated feature importance for group G at hierarchical level $L_{i}$ .
G	A group of instances within the dataset that share common characteristics, as defined by the hierarchical level $L_{i}$ .
$\| G \|$	The number of instances in group G.
$L_{i}$	A specific hierarchical level in the dataset.
$x \in G$	Denotes that the instance x belongs to group G.
Y	Global explanation summarizing feature importance across all hierarchical levels.
$\| L_{i} \|$	Total number of groups at hierarchical level $L_{i}$ .
$Σ \| L_{i} \|$	Sum of the sizes (number of groups) across all hierarchical levels.
$G \in L_{i}$	Denotes that the group G belongs to hierarchical level $L_{i}$ .

Table 5. Experimental settings for the trained models.

Model	Parameters	Values
Random Forest	max_depth	None
	min_samples_leaf	1
	min_samples_split	2
	n_estimators	100
Gradient Boosting	learning_rate	0.1
	max_depth	5
	min_samples_leaf	4
	min_samples_split	10
	n_estimators	300
XGBoost	colsample_bytree	0.9
	learning_rate	0.2
	max_depth	5
	n_estimators	200
	subsample	0.9
SVM	C	100
	Gamma	1
	kernel	rbf
KNN	Metric	manhattan
	n_neighbors	5
	weights	Distance
Decision Tree	max_depth	None
	min_samples_leaf	1
	min_samples_split	2
Neural Network	activation	relu
	alpha	0.001
	hidden_layer_sizes	100
	learning_rate	Constant
	solver	Adam
AdaBoost	learning_rate	1.0
	n_estimators	100

Table 6. Comparison of Machine Learning Models on Validation Data.

Model	Accuracy	Precision	Recall	F1-Score	Log-Loss
Decision Tree	0.947	0.947	0.947	0.946	0.363
Random Forest	0.929	0.929	0.929	0.928	0.185
XGBoost	0.929	0.929	0.929	0.928	0.204
KNN	0.843	0.851	0.843	0.841	1.501
Gradient Boosting	0.833	0.838	0.833	0.834	0.439
Neural Network	0.776	0.775	0.776	0.775	0.547
AdaBoost	0.737	0.734	0.737	0.734	1.024
Logistic Regression	0.637	0.635	0.637	0.636	0.814
SVM	0.477	0.574	0.477	0.459	0.980

Note: Best values in each category are marked in bold. While Decision Tree achieves the highest classification metrics, Random Forest achieves the lowest Log-Loss, indicating superior probability calibration.

Table 7. Comparison with Related Studies.

Metric/Feature	H-LIME (Ours)	Nnadi et al. [8]	Guleria et al. [52]
Predictive Performance
F1-score	0.93	0.94	0.91
Recall	0.93	0.93	0.91
Precision	0.93	0.94	-
Best Model	Random Forest	Random Forest	Naïve Bayes
Interpretability Scope
Local (Instance)	Yes	Yes	Yes
Global (Model-wide)	Yes	Yes	No
Hierarchical/Subgroup	Yes	No	No
Methodology
XAI Techniques	Hierarchical Aggregation (H-LIME)	SHAP, LIME, Anchors, ALE, Counterfactual	Basic White/Black Box Methods
Dataset Focus	Student Adaptability (Multi-level Hierarchy)	Student Adaptability (Flat Structure)	Career Counseling

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nnadi, L.C.; Watanobe, Y. Hierarchical Aggregation of Local Explanations for Student Adaptability. Appl. Sci. 2026, 16, 333. https://doi.org/10.3390/app16010333

AMA Style

Nnadi LC, Watanobe Y. Hierarchical Aggregation of Local Explanations for Student Adaptability. Applied Sciences. 2026; 16(1):333. https://doi.org/10.3390/app16010333

Chicago/Turabian Style

Nnadi, Leonard Chukwualuka, and Yutaka Watanobe. 2026. "Hierarchical Aggregation of Local Explanations for Student Adaptability" Applied Sciences 16, no. 1: 333. https://doi.org/10.3390/app16010333

APA Style

Nnadi, L. C., & Watanobe, Y. (2026). Hierarchical Aggregation of Local Explanations for Student Adaptability. Applied Sciences, 16(1), 333. https://doi.org/10.3390/app16010333

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Hierarchical Aggregation of Local Explanations for Student Adaptability

Abstract

1. Introduction

2. Related Works

2.1. Hierarchical Interpretability

Hierarchical Explainability in Other Domains

2.2. XAI in Educational Applications

2.3. Gaps and Opportunities in Educational XAI

2.4. Comparison of XAI Techniques

3. Methodology

3.1. Dataset Preparation and Hierarchical Structure

3.2. H-LIME Framework Overview

3.3. Model Training

3.4. Model Performance Metrics

3.4.1. Precision

3.4.2. Recall

3.4.3. F1-Score

3.4.4. Log-Loss Analysis and Model Selection

3.5. LIME and H-LIME

3.5.1. LIME

3.5.2. H-LIME

3.5.3. Steps for Implementing H-LIME

3.6. Assumptions in Model Development and Mathematical Formulation

3.7. Experimental Setup

4. Evaluation Results and Discussion

4.1. Models Performance Comparison

Justification for Model Selection

4.2. Evaluating Random Forest Model with H-LIME

4.2.1. Result for Local Explanation with LIME

4.2.2. Institution Type Aggregation

4.2.3. Location-Based Aggregation Within Institutions

4.2.4. Educational Level Aggregation Within Urban Government Institutions

4.2.5. Global Feature Importance

4.2.6. Comparison with Related Studies

4.2.7. Interpretability in Practice

5. Limitations and Practical Implications

5.1. Assumptions About Hierarchical Structure

5.2. Scalability Challenges

5.3. Mitigation Strategies for Scalability

5.4. Broader Applicability and Impacts of H-LIME

Societal and Operational Impact

6. Conclusions and Future Work

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI