Attribution-Driven Teaching Interventions: Linking I-AHP Weighted Assessment to Explainable Student Clustering

Liu, Yanzheng; Yang, Xuan; Zhu, Ying; Wang, Jin; Zuo, Mi; Yang, Lei; Sun, Lingtong

doi:10.3390/a18110691

Open AccessArticle

Attribution-Driven Teaching Interventions: Linking I-AHP Weighted Assessment to Explainable Student Clustering

by

Yanzheng Liu

^1,2,*,

Xuan Yang

¹,

Ying Zhu

^1,3,*,

Jin Wang

²

,

Mi Zuo

²,

Lei Yang

² and

Lingtong Sun

²

¹

School of Environmental and Municipal Engineering, Xi’an University of Architecture and Technology, Xi’an 710055, China

²

School of Future Technology, Xi’an University of Architecture and Technology, Xi’an 710055, China

³

Faculty of Engineering and Applied Sciences, University of Regina, Regina, SK S4S 0A2, Canada

^*

Authors to whom correspondence should be addressed.

Algorithms 2025, 18(11), 691; https://doi.org/10.3390/a18110691

Submission received: 1 September 2025 / Revised: 29 October 2025 / Accepted: 30 October 2025 / Published: 1 November 2025

Download

Browse Figures

Versions Notes

Abstract

Student course performance evaluation serves as a critical pedagogical tool for diagnosing learning gaps and enhancing educational outcomes, yet conventional assessments often suffer from rigid single-metric scoring and ambiguous causality. This study proposes an integrated analytic framework addressing these limitations by synergizing pedagogical expertise with data-driven diagnostics through four key measure: (1) Interval Analytic Hierarchy Process (I-AHP) to derive criterion weights reflecting instructional priorities via expert judgment; (2) K-means clustering to objectively stratify students into performance cohorts based on multidimensional metrics; (3) Random Forest classification and SHAP value analysis to quantitatively identify key discriminators of cluster membership and interpret decision boundaries; and (4) attribution-guided interventions targeting cohort-specific deficiencies. Leveraging a dual-channel ecosystem across pre-class, in-class, and post-class phases, we established a hierarchical evaluation system where I-AHP weighted pedagogical sub-criteria to generate comprehensive student scores.

Keywords:

educational reform; interval analytic hierarchy process; K-means cluster; student course performance evaluation

1. Introduction

In the new era’s high talent demand, traditional curriculum assessments for environmental students, hindered by subjectivity and simplistic frameworks, fail to foster holistic growth. Environmental engineers now need practical skills, innovation, problem-solving, agility, and critical thinking for emerging industries [1,2]. Universities are advancing teaching and research reforms to meet these demands [3]. In 2018, China’s Ministry of Education launched “Emerging Engineering Education,” urging upgrades in environmental engineering to cultivate tech talents [4]. The course performance evaluation system, as the “baton” of reform, should be the priority. Course assessment’s purpose, design, implementation, and feedback are key in university evaluations, promoting student autonomy and standardizing teacher evaluation [5]. However, traditional exam-focused assessments fail to fully reflect student learning. Unclear criteria increase uncertainty and subjectivity, while their narrowness stresses rote learning over innovation and teamwork. Urgently, we need to refine the system scientifically to foster autonomous learning and practical innovation. In previous years, scholars have proposed and applied diverse methods to optimize student achievement evaluation, such as comprehensive, qualitative, data-driven, and structured approaches. Comprehensive evaluation assesses learning abilities via multiple forms and criteria, integrating project assessments, process-oriented evaluations, and multi-rater reviews. Gratchev replaced exams with project-based tasks, revealing that students’ average project scores exceeded exam averages [6]. Qualitative evaluation assesses students’ learning processes and abilities through subjective judgment and observation, including forms such as self-evaluation [7,8] and peer evaluation [9]. Data-driven methods use data analytics and technology tools to collect and analyze student performance to provide data-based feedback and decision making. These approaches rely on learning analytics and predictive tools to identify student needs and potential problems. Accurate predictive modeling can be achieved by several techniques such as regression, classification, and clustering. Among these were artificial neural network, decision tree, support vector machine, K-means clustering, K-nearest neighbor, Naive Bayes, and linear regression [10,11]. For example, Kukkar et al. [12] leveraged RNN + LSTM + RF techniques and gained approximately 97% accuracy. Shoaib et al. [13] developed a Convolutional Neural Network feature learning block to extract the hidden patterns in the student data, culminating in a commendable 93% accuracy for student grade prediction and student risk prediction. Shi et al. [14] conducted a cluster analysis of students based on three behavioral attributes (i.e., effort regulation, self-assessment, and learner participation), identifying distinct learning strategy patterns that offer new insights for teaching.

Among the various methods, the Analytic Hierarchy Process (AHP) can convert the decision-making process into a hierarchical structure of related criteria and is widely used to address complex decision-making problems across educational fields [15,16,17,18]. AHP is a structured decision-making approach that organizes complex problems into a hierarchical framework. This method begins by decomposing a decision problem into various levels, which include the overall goal, subordinate objectives, evaluation criteria, and specific alternatives. Once the hierarchy is established, a judgment matrix is constructed, allowing decision-makers to compare the elements at each level based on their relative importance [19]. However, it is important to note that decision-making problems often involve many uncertainties, such as subjective preferences of experts, incompleteness of information, challenges of quantification, etc. Therefore, AHP must be used together with other classical uncertainty methods, such as the Fuzzy set method [20], Delphi method [21], Interval number [19], and Grey system theory [22]. I-AHP enhances traditional AHP by incorporating interval numbers instead of single-point values when constructing pairwise judgment matrices. This approach mitigates the impact of subjective bias and more accurately captures the inherent uncertainty in the decision-making process. Qin et al. [19] used I-AHP to conduct a comprehensive evaluation of teaching quality in the mathematics classroom, and the results showed that the method has a greater ability to deal with uncertainty. Therefore, a more scientific and rational student course performance evaluation system can be established with the help of I-AHP. However, it is difficult for I-AHP to mine latent information from the final evaluation data, and the current evaluation system lacks scientific methods to utilize such data. These data are of great significance in guiding teachers and students in their future learning and work, and improving the quality of education.

Cluster analysis is an important and active research field in data mining [23]. Clustering is an unsupervised learning method that groups similar objects in a dataset, aiming to make the objects within the same group similar to each other while being distinctly different from objects in other groups. Currently, commonly used clustering methods include partition clustering, density clustering, hierarchy-based, model-based, and density-based [24]. The K-means clustering algorithm is simple and easy to understand, with high computational efficiency, and is suitable for processing large-scale student performance data. For example, Kim et al. [25] conducted a statistical assessment of student engagement in online learning using the K-means clustering algorithm, examining differences in attendance, assignment completion, discussion participation, and perceived learning outcomes. The results indicate that students are divided into two groups, with significant differences between the two groups in terms of instructor–student interaction, student–student interaction and perceived learning outcome. Tuyishimire et al. [26] categorized 703 students into three clusters: more encouraged students, encouraged students, and less encouraged students, and analyzed them in conjunction with proportional distribution. Due to its characteristics of simplified operation, timely feedback, and enhanced specificity, the K-means clustering method effectively compensates for the shortcomings of I-AHP. However, these works cannot explain how much a feature contributes to the dimensionality reduction result. To address this interpretability gap, Shapley Additive Explanations (SHAP) values derived from cooperative game theory are integrated into the clustering framework [27]. SHAP provides mathematically rigorous attribution values that quantify each feature’s marginal contribution to cluster assignments while maintaining consistency with the original model’s outputs.

Table 1 synthesizes the strengths and limitations of existing student performance evaluation methods. While these methods demonstrate promising results, they nevertheless struggle to address the specific challenges of course evaluation, particularly in simultaneously handling multiple evaluation components, managing uncertainty, and mitigating subjective grading. On one hand, comprehensive evaluation tends to be time-consuming and complex, requiring multiple raters and criteria—a significant drawback for large-scale implementation. Conversely, qualitative evaluation, despite its effectiveness in capturing behavioral nuances through self- and peer-assessments, suffers from high subjectivity and potential bias. In contrast to these approaches, data-driven methods such as regression, classification, and neural networks excel in predictive accuracy through objective data analysis [28]. However, their overreliance on quantifiable metrics causes them to not only overlook non-quantifiable aspects but also fail to account for the relative importance of different evaluation dimensions, ultimately compromising assessment precision. Therefore, this study adopts the I-AHP to evaluate student course performance by deriving criterion weights for pedagogical sub-criteria based on expert judgment, and utilizes K-means clustering to stratify students into performance cohorts based on multidimensional metrics. Random Forest classification with SHAP value analysis is employed to identify key discriminators of cluster membership and interpret decision boundaries, enabling attribution-guided interventions to address cohort-specific deficiencies. Implemented within a dual-channel ecosystem across pre-class, in-class, and post-class phases, this integrated framework aims to provide a comprehensive and accurate understanding of students’ performance patterns and learning differences. This approach provides teachers with a tool for analyzing student differences. Teachers can utilize this tool to systematically assess data on different student groups, including their knowledge mastery in final exams, participation in classroom performance, accuracy and timeliness of assignment completion, learning outcomes in chapter tests, self-directed learning abilities during chapter self-study, and technical application and collaboration performance in online assignments. Subsequently, teachers can develop personalized learning support plans based on students’ specific needs and characteristics. The framework’s process, illustrated in Figure 1.

2. Methodology

This study would use AHP combined with interval numbers to evaluate students’ course performance and utilize the SHAP-K-means clustering algorithm to classify student grades, identifying different performance groups. The specific methodology is described as follows.

2.1. Interval Number and Interval Judgment Matrix

Interval numbers are represented by a superscript with the symbol “±”, where “−“ indicates the lower bound and “+” represents the upper bound. An interval number can be expressed as a real interval

a = [a^{-}, a^{+}]

, where

a^{-} \leq a^{+}

and

a^{-}, a^{+} \in R

. For any two interval numbers

a = [a^{-}, a^{+}]

and

b = [b^{-}, b^{+}]

. For any two interval numbers

b = [b^{-}, b^{+}]

and

a = [a^{-}, a^{+}]

, then (1)

a + b = [a^{-} + b^{-}, a^{+} + b^{+}];

(2)

a b = [a^{-} b^{-}, a^{+} b^{+}];

(3)

k a = [k a^{-}, k a^{+}];

(4)

\frac{a}{b} = [\frac{a^{-}}{b^{+}}, \frac{a^{+}}{b^{-}}]

, especially,

\frac{1}{a} = [\frac{1}{a^{+}}, \frac{1}{a^{-}}]

.

Λ = {(c_{i j})}_{n \times n}

is called an interval judgment matrix, if for all

i, j \in \{1, 2, \dots, n\}

, we have

c_{i j} = [c_{i j}^{-}, c_{i j}^{+}]

, where

1 / 9 \leq c_{i j}^{-} \leq c_{i j}^{+} \leq 9

and

c_{i j} = \frac{1}{c_{i j}}

.

2.2. Analytic Hierarchy Process

AHP, proposed by American operations researcher T.L. Saaty in 1977 [29], serves as an analytical method that systematically and hierarchically combines qualitative and quantitative evaluations for decision-making challenges. Its solution steps are mainly as follows [19]: (1) Establishing a hierarchical structure model. AHP begins by decomposing a complex problem into a hierarchical structure, with each level containing controllable elements that are further subdivided into additional elements. (2) Constructing a pairwise comparison judgment at each level. This research adopts Saaty’s 1–9 scale (as shown in Table 2) of importance to rate the scale of importance of the given factor. (3) Calculating the matrix weight (eigenvector). Inconsistencies may arise when decision-makers make inadvertent mistakes or overly exaggerated judgments during pairwise comparisons. Therefore, it is essential for them to assess the consistency among different evaluation criteria. The matrices employed to compute the eigenvectors in this step correspond to the pairwise comparison judgment matrices shown in Equations (1)–(3) of Section 3.2, “Formulation and Solution of the I-AHP Model. The following steps further explain the issues related to the consistency test. (4) Calculating the maximum eigenvalue

λ_{\max}

. The matrices employed to calculate the maximum eigenvalue in this step correspond to the pairwise comparison judgment matrices presented in Equations (1)–(3) of Section 3.2 “Formulation and Solution of the I-AHP Model”. The computational results will be utilized to derive the maximum eigenvalue. (5) computing the consistency index with an eigenvalue. (6) Measuring the consistency ratio with the consistency index (CI) and the random index (RI). where RI denotes the random index. The RI values are computed for different matrix sizes, as shown in Table 3. If the value of CR is less than 0.1, the data of the judgment matrix is reliable, and the judgment matrix is considered to have a reliable consistency [30]. Otherwise, the judgment matrix needs to be adjusted until it passes the consistency test. The last step is (7) making decisions with the obtained results.

2.3. Clustering and Interpretability Analysis Methods

2.3.1. K-Means Clustering

K-means clustering, a partition-based method pioneered by Lloyd and MacQueen [31], is utilized to stratify students into performance cohorts based on multidimensional metrics, including comprehensive scores derived from the I-AHP using pedagogical sub-criteria. K-means is a partitioned clustering with each cluster connected by a centroid. Each is placed into a cluster with the closest centroid. Initially, the K-means algorithm randomly selects a specified number of centroids (k) from the dataset. It then evaluates the distance of each data point from all selected centroids, assigning each point to the nearest centroid, thereby designating it as a member of that centroid’s cluster. Upon the assignment of a new member to a cluster, the centroid is recalculated. This iterative process continues until cluster membership stabilizes. The basic steps in the K-means algorithm are as follows: (1) For a given set of data, randomly initialize k cluster centers. (2) Calculate the distance of each data point to the cluster centers and assign the data point to the nearest cluster. (3) Based on the obtained clusters, recalculate the cluster centers. (4) Repetition of (2) and (3) until the cluster centers no longer change. The specific steps are shown in Figure 2 [31], asterisks (*) are used to indicate the cluster centers (see legend for symbol definitions).

2.3.2. Random Forest Classification and SHAP Methods

To enhance interpretability and guide targeted interventions, Random Forest classification is employed to model cluster membership, with SHAP (Shapley Additive Explanations) values, proposed by Lundberg and Lee in 2017 [32]. SHAP values, grounded in cooperative game theory, provide mathematically rigorous attribution by calculating each feature’s marginal contribution to classification outcomes, offering both global and local interpretability. In educational settings, SHAP has been used to interpret predictors of student performance, identifying key factors such as engagement or assignment completion. In this study, a Random Forest classifier is trained on clustered student data to predict cohort membership, followed by SHAP analysis to quantify the importance of features (e.g., I-AHP sub-criteria) in distinguishing clusters and to interpret decision boundaries.

3. Case Study

This chapter presents a comprehensive case study to validate the proposed integrated assessment framework, combining the I-AHP, K-means clustering, Random Forest classification, and SHAP value analysis, within the context of an undergraduate Environmental Monitoring course. Section 3.1 begins by outlining the course structure, pedagogical objectives, and existing challenges in student performance evaluation. Building on this foundation, Section 3.2 details the formulation of the I-AHP model, including the hierarchical indicator system construction, expert-driven interval judgment matrices, and the derivation of interval weights. Section 3.3 then operationalizes these indicators through multi-source data collection, alongside rigorous preprocessing steps to ensure data integrity. Section 3.4 applies K-means clustering to categorize students based on normalized performance scores, followed by Random Forest classification and SHAP analysis to identify key discriminators of cluster membership and inform attribution-guided interventions.

3.1. Overview of Environmental Monitoring Course

The data presented in this paper are sourced from the Environmental Monitoring course at the School of Future Technology, Xi’an University of Architecture and Technology. The Environmental Monitoring course is a compulsory course designed for environmental majors, and the course content framework is mainly shown in Figure 3. The content of the Environmental Monitoring course is divided into five major chapters: Introduction, Monitoring of Water and Wastewater, Monitoring of Air and Exhaust Gas, Automatic Monitoring of Environmental Pollution, and Quality Assurance. This chapter structure is intended to help students systematically grasp both theoretical and practical skills in environmental monitoring. In addition, the course format is diverse, incorporating Ideological and political education, flipped classrooms, online and offline blended teaching to enhance student engagement and learning outcomes. The traditional method of student course performance evaluation in Environmental Monitoring courses is based on a final exam combined with online and offline evaluation of scores.

However, as student-centered teaching philosophy gradually took root, the traditional student course performance evaluation system increasingly proved unfavorable for students’ knowledge accumulation and skill development, making it difficult to meet the diverse needs of talent cultivation in universities in the new era. Firstly, there is a high uncertainty in the score. The content of the Environmental Monitoring course is quite extensive and highly integrative. This characteristic results in fewer examination questions with a single, definitive answer, and a higher proportion of analytical questions with open-ended answers. Consequently, the grading system often relies on the subjective judgment of instructors, which further increases the high degree of uncertainty in grading. Secondly, scores are predominantly based on summative examinations, which often fail to comprehensively analyze students’ learning attitudes, thought processes, and developmental capacities. This shift in focus may lead students to prioritize exam results over daily learning, decrease their initiative in learning, and ultimately result in insufficient mastery of knowledge. Thirdly, it is challenging for teachers to timely adjust or develop scientific and effective learning strategies for students based on assessment results. Since each student’s learning situation differs, especially in large classes, teachers face significant challenges. They struggle to conduct an in-depth analysis and diagnosis of each student’s learning process, making it difficult to accurately identify the specific problems and needs of individual students. As a result, teachers often resort to a one-size-fits-all approach rather than providing targeted guidance and support for different students. In addition, courses such as environmental monitoring not only teach professional knowledge but also cultivate students’ industry ethics and a strong sense of social responsibility. Traditional assessment methods make it difficult to reflect this part of the situation. Therefore, this study optimizes course assessment through I-AHP and proposes an assessment model that includes “online and offline” dual channels and “pre-class, in-class, and post-class” stages. Finally, an in-depth analysis of final grades was performed using clustering analysis to facilitate personalized and precise instruction.

3.2. Formulation and Solution of I-AHP Model

In order to evaluate students’ studying performance comprehensively and accurately, the evaluation system should be designed from diverse angles and aspects: studying attitude, studying preparation, studying process, exam results, etc. The comprehensive evaluation indicator system for environmental monitoring course performance is divided into the goal level, the criteria level, and the indicator level by AHP. The criteria level consists of two factors: process assessment (B1) and final exam (B2). The indicator level is further subdivided into classroom performance (C1), assignment completion (C2), online study (C3), chapter self-study (C4), etc. Consequently, a four-level hierarchical structure is shown in Table 4, as well as Figure 4 and Figure 5. Process assessment results (B1) can reflect students’ performance throughout the learning process, going beyond mere reliance on examination results. It effectively showcases students’ learning attitudes, participation, and effort levels. Educators can keep abreast of students’ learning conditions and provide guidance and assistance through process assessment. Classroom performance (C1), including signing in, discussion interactions, thoughts and insights, etc. This indicator not only reflects students’ engagement in the learning process but also showcases their professional cognition. Assignment completion (C2) represents students’ self-discipline and consistency in learning, aiding in the absorption and consolidation of newly acquired knowledge. Online study (C3), such as MOOCs, is not constrained by time and space, allowing students to organically integrate their interests and fragmented time. Through online learning, students can enhance their self-directed learning abilities and improve in weaker areas. The assessment criteria for online learning are divided into chapter tests (D1) and online assignments (D2). The environmental monitoring course encompasses numerous knowledge points, and the teaching progress is relatively rapid. Through chapter self-study (C4), students can promptly preview essential concepts, thereby facilitating a rapid transition into an active learning state during class. Finally, the final exam (B2) is a comprehensive assessment at the end of the semester, reflecting students’ mastery of the knowledge points and their ability to apply it. The original expert judgment data is presented in Table A1 of Appendix A, which shows the judgment results of two specified experts.

After the establishment of the hierarchy, it is necessary to compare the factors at each level, construct the judgment matrix, determine the relative importance, and use the appropriate scale value to solve the weight of each index in the hierarchy quantitatively. By solving the weights of level factors, it is possible to objectively reflect the importance of different level factors in the overall performance evaluation. The pairwise comparison between factors was obtained by issuing a questionnaire to the expert committee. This study invited nine experts in the field of environmental engineering to participate in a questionnaire survey, among whom three had previously taught environmental monitoring courses. The selection of experts needed to meet one of three criteria: firstly, having served as the chief editor for textbooks in the field of environmental engineering; secondly, having published papers related to educational themes in core journals; and thirdly, being scholars who have to preside over research projects related to curriculum reform at the provincial level or above. The expert group has rich experience in vocational education and significant research in vocational education teaching and textbooks, which can effectively ensure the authenticity and rationality of the evaluation index system construction. Finally, a questionnaire validity of 100% indicated a high level of engagement from the experts involved in the research.

Saaty’s 1–9 scale of importance was adopted by experts to assess the significance of factors across levels B to D. The evaluation results are presented in Table 5. The rating of B1 relative to B2 is [2, 1], indicating that the importance of B1 compared to B2 lies between 2 and 1, suggesting that B1 is slightly more important than B2. A detailed analysis of the weights for other hierarchical levels will be elaborated in Section 4.1. Before this, it is essential to validate the effectiveness of the weights through a consistency test to ensure the coherence of the judgment matrix. The interval reciprocal judgment matrices

Λ_{x}

(x = B, C, D) for levels B to D provided by experts are listed as follows:

Λ_{B} (\begin{matrix} 1 & [\frac{1}{2}, 1] \\ 1 \end{matrix})

(1)

Λ_{C} (\begin{matrix} 1 & [\frac{1}{4}, \frac{1}{3}] & [\frac{1}{4}, \frac{1}{3}] & [\frac{1}{4}, \frac{1}{3}] \\ 1 & 1 & [1, 2] \\ 1 & [1, 2] \\ 1 \end{matrix})

(2)

Λ_{D} (\begin{matrix} 1 & [3, 4] \\ 1 \end{matrix})

(3)

Taking the interval judgment matrix

Λ_{c}

as an example for calculation,

Λ_{c}

can be divided into the lower bound matrix

Λ_{c}^{-}

and the upper bound matrix

Λ_{c}^{+}

:

Λ_{C}^{-} = (\begin{matrix} 1 & \frac{1}{4} & \frac{1}{4} & \frac{1}{4} \\ 4 & 1 & 1 & 2 \\ 4 & 1 & 1 & 2 \\ 4 & \frac{1}{2} & \frac{1}{2} & 1 \end{matrix}) and Λ_{C}^{+} = (\begin{matrix} 1 & \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \\ 3 & 1 & 1 & 1 \\ 3 & 1 & 1 & 1 \\ 3 & 1 & 1 & 1 \end{matrix})

(4)

Since different evaluation criteria often have different dimensions, such a situation can affect the comparison results of the data analysis. To eliminate the dimensional influence between criteria, weight normalization is required. The specific formula is given in Equation (5).

{\tilde{c}}_{i j}^{-} = \frac{c_{i j}^{-}}{\sum_{i = 1}^{n} c_{i j}^{-}} and {\tilde{c}}_{i j}^{+} = \frac{c_{i j}^{+}}{\sum_{i = 1}^{n} c_{i j}^{+}}

(5)

Then {\tilde{c}}_{i j}^{-} = (\begin{matrix} 0.08 & 0.09 & 0.09 & 0.05 \\ 0.31 & 0.36 & 0.38 & 0.38 \\ 0.31 & 0.36 & 0.38 & 0.38 \\ 0.31 & 0.18 & 0.18 & 0.19 \end{matrix}) and {\tilde{c}}_{i j}^{+} = (\begin{array}{l} 0.10 & 0.10 & 0.10 & 0.10 \\ 0.30 & 0.30 & 0.30 & 0.30 \\ 0.30 & 0.30 & 0.30 & 0.30 \\ 0.30 & 0.30 & 0.30 & 0.30 \end{array})

(6)

Next, calculation the basic weights of matrix

Λ_{c}^{+}

and

Λ_{c}^{-}

, the feature vector of matrix

Λ_{c}^{+}

and

Λ_{c}^{-}

could be computed with a arithmetic mean method as shown in Equation (7). The feature vector

w_{c}^{\pm}

of

Λ_{c}^{\pm}

are listed as follows:

w_{i}^{-} = \frac{1}{n} \sum_{j = 1}^{n} {\tilde{c}}_{i j}^{-} and w_{i}^{+} = \frac{1}{n} \sum_{j = 1}^{n} {\tilde{c}}_{i j}^{+}

(7)

Then w_{c}^{-} = {(0.08, 0.35, 0.35, 0.22)}^{T} and w_{c}^{+} = {(0.1, 0.3, 0.3, 0.3)}^{T}

(8)

As stated in Section 2.2, it is necessary to calculate the maximum eigenvalue and the consistency index. The calculation of the maximum feature root

λ_{\max}

is a crucial step in the AHP. Its primary purpose is to assess the consistency of the pairwise comparison matrix. Taking the upper bound matrix

Λ^{+}

as an example, the judgment matrix

Λ^{+}

could be calculated with Equation (9). However, it is important to note that if the matrix is of order n ≤ 2, no consistency check is required because such small matrices are inherently consistent. while the consistency index is determined with Equation (10). The consistency ratio of the comparison matrix to eliminate inconsistencies (CR) is calculated as Equation (11).

λ_{\max} = \frac{1}{n} \sum_{i = 1}^{n} \frac{{(Λ^{+} w_{+})}_{i}}{w_{i}^{+}}

(9)

C I = \frac{λ_{\max} - n}{n - 1}

(10)

C R = \frac{C I}{R I}

(11)

Then, we obtained

λ_{\max}^{-} = \frac{1}{4} \sum_{i = 1}^{4} \frac{{(Λ^{-} w^{-})}_{i}}{w_{i}^{-}} = 4.06, λ_{\max}^{+} = \frac{1}{4} \sum_{i = 1}^{4} \frac{{(Λ^{+} w^{+})}_{i}}{w_{i}^{+}} = 4

(12)

C I_{c}^{-} = \frac{4.06 - 4}{4 - 1} = 0.02, C I_{c}^{+} = \frac{4 - 4}{4 - 1} = 0

(13)

C R_{c}^{-} = \frac{C I_{c}^{-}}{R I} = \frac{0.02}{0.9} = 0.02, C R_{c}^{+} = \frac{C I_{c}^{+}}{R I} = \frac{0}{0.9} = 0

(14)

Since

C R_{c}^{\pm}

value was less than limit value (CR < 0.1), it was decided that the comparisons of experts were consistent and dependable. After confirming the reliability of the weights, it is possible to analyze students’ performance in various indicators based on their actual performance and calculate their comprehensive scores using the weight system.

Figure 5 provides a detailed illustration of the weight change graph obtained after removing the expert evaluation component within the I-AHP (an integrated assessment system based on the Analytic Hierarchy Process). Through rigorous data analysis and calculation, it is found that the magnitude of weight changes presented in the graph does not exceed 0.02.

3.3. Case Study Data Collection and Preprocessing

The evaluation of student performance under the proposed I-AHP framework required multi-source data aligned with the hierarchical indicators defined in Section 3.2 (Table 3). Taking the performance evaluation of environmental monitoring for 98 undergraduates majoring in environmental science of the academic year 2020–2025 as an example, comparative analysis and research were carried out. B1 was operationalized through four hierarchical indicators: C1, C2, C3, C4. Data for C1 were derived from in-class activities, including verbal participation records, interactive Q&A sessions, and mini-quizzes comprising multiple-choice or true/false questions administered during lectures. C2 integrated objective scores from six routine assignments (e.g., problem-solving exercises) and instructor evaluations of four project reports, such as the formulation of a campus ambient air quality monitoring plan, a water quality monitoring scheme for Rivers, and an environmental impact analysis of urban noise. C3 was assessed through two components: D1, administered via the course’s dedicated online learning platform, and D2, calculated from platform-generated logs documenting students’ interactions with instructional audio-visual materials, including quantitative measures of viewing frequency and duration. C4 relied on self-assessment surveys evaluating students’ autonomous study habits and conceptual understanding. B2, serving as a summative assessment, encompassed multiple-choice questions, conceptual analyses, computational problems, and open-ended essays, all rigorously graded by the course instructor. Following data collection, the raw dataset underwent a series of preprocessing steps to address quality issues and ensure analytical robustness. Platform-generated behavioral logs underwent sanity checks to remove implausible entries, such as anomalously prolonged activity durations exceeding 24 h.

3.4. Educational Data Analysis

3.4.1. K-Means Clustering Analysis

Considering the practical needs of teachers and aiming to gain a deeper understanding of students’ performance patterns in the course, the K-means clustering analysis method was further employed. By clustering students’ performance across various factors in the I-AHP model, teachers can more effectively identify the needs of different student groups and implement personalized teaching interventions. Initial feature standardization employed Z-score normalization on six educational assessment dimensions (Classroom performance, Assignments completion, Online assignments, Chapter tests, Chapter self-study, and Final exam) to eliminate scale variance. Cluster cardinality optimization implemented multi-metric validation across K = 2–5, systematically evaluating silhouette coefficient, inertia elbow detection, and Calinski-Harabasz index. The multi-metric evaluation results are shown in Figure 6 and Table 6. With tripartite partitioning (K = 3) emerges as the optimal configuration through consensus of all metrics. This process groups students into cohorts with similar performance characteristics, enabling educators to identify distinct learning patterns and tailor instructional strategies. This process groups students into cohorts with similar performance characteristics, enabling educators to identify distinct learning patterns and tailor instructional strategies. Figure 7 demonstrates its stability after 100 restarts, showing that the tripartite partitioning (K = 3) also emerges as the optimal solution.

3.4.2. Random Forest Classification and SHAP Analysis

To enhance model interpretability and support the precise implementation of personalized intervention measures, this study employs the Random Forest classification algorithm to model cluster membership and conducts meticulous hyperparameter tuning: Specifically, the number of decision trees (n_estimators) is set to 100, the maximum tree depth (max_depth) is set to unlimited (i.e., None), the minimum number of samples required to split a node (min_samples_split) is set to 2, the minimum number of samples required at a leaf node (min_samples_leaf) is set to 1, and the “square root” criterion (max_features = ‘sqrt’) is adopted to achieve optimal feature selection during node splitting. Additionally, to ensure the reproducibility of the experiment, we deliberately set a fixed random seed (random_state = 42). During the model construction process, we utilize SHAP (Shapley Additive Explanations) values to quantify the contribution of each feature. Regarding the stability issue that may arise from differences in the number of SHAP samples, when using the shap. TreeExplainer algorithm, it is found that this algorithm can provide precise SHAP value calculations for tree-based models without relying on random sampling for approximate estimation. Therefore, all calculated SHAP values are deterministic, and their stability is not influenced by the sample size parameter, fundamentally guaranteeing the uniqueness and reproducibility of the attribution results. The results revealed that, for Cluster 1, ‘assignment completion rate’ was identified as the most important feature in 9 out of the 10 runs. For Cluster 2, ‘final exam score’ consistently emerged as the most significant negative driving factor. The ranking of key features based on their SHAP values remained highly consistent across all runs, with a standard deviation of less than 0.01, which demonstrates the robustness of our attribution conclusions. A Random Forest classifier, selected for its robust predictive accuracy, is trained on the clustered student data to predict cohort affiliation based on the six assessment dimensions. SHAP values, grounded in cooperative game theory [32], calculate each feature’s marginal contribution to classification outcomes, identifying key discriminators of cluster membership (e.g., low engagement or weak test performance) and interpreting decision boundaries. SHAP analysis uses 100 samples per explanation, with mean absolute SHAP values computed for each cluster to quantify feature importance, as described in Section 2.3.2. SHAP insights inform attribution-guided interventions, enabling educators to design targeted teaching strategies for cohort-specific deficiencies, such as enhancing online engagement or strengthening self-study skills.

4. Results Analysis

This study presents an analysis of the evaluation results derived from the integrated framework combining the I-AHP, K-means clustering, Random Forest classification, and SHAP value analysis, applied within a dual-channel ecosystem across pre-class, in-class, and post-class phases. Section 4.1 outlines the weighting distribution of pedagogical sub-criteria to highlight their significance in student course performance evaluation. Section 4.2 conducts a comparative analysis of student scores before and after implementing the evaluation framework to assess its impact. Section 4.3 applies K-means clustering to categorize students based on normalized performance scores, followed by Random Forest classification and SHAP analysis to identify key discriminators of performance levels and inform attribution-guided interventions.

4.1. Analysis of Interval Weights

After the calculation in Section 3.2, the weights of all level factors are presented in Table 7. Taking B1 as an example, the upper weight for this factor is 0.67 with a ranking of first place, and the lower limit weight is 0.5. For factors B1 and B2, some assessors stated that “Evaluation of learning outcomes should be diversified, with process assessment and final grades should be equally important”. The weights of C2 and C3 are considered most significant, with their lower and upper limits both specified between 0.30 and 0.35, and their rankings all established as prominent. This indicates that these two factors are deemed highly important within the assessment of Category C. Meanwhile, C1 is ranked fourth, with its weights allocated as 0.10 and 0.08, which are lower compared to the other factors. Some assessors noted that “Students’ performance in the classroom may be related to their personalities, compared to C2, C3 and C4, which are more reflective of students’ learning states”. Regarding C4, some assessors remarked that “The teaching process should emphasize pre-studying of chapters, which helps students develop good study habits”. However, others expressed that “This factor is assessed in a single way, which makes it difficult to accurately reflect students’ pre-studying, with a high degree of uncertainty”. The weight of D1 is significantly higher than that of D2. The upper weight of D1 is 0.80, and the lower weight is 0.75, while the upper weight of D2 is 0.25, and the lower weight is 0.20. Nearly all assessors stated that “Factor D1 is more challenging than factor D2, testing students’ understanding, application skills, and the solidity of foundational knowledge.” Based on the weight distribution results obtained from the I-AHP, it is possible to further calculate the comprehensive score of the students to assess their overall performance in the course. By incorporating the specific performance data of the students and applying the weighting system to each indicator, the final comprehensive evaluation result can be derived.

4.2. Analysis of Scores from the I-AHP Model

The implementation of the I-AHP model has provided a more comprehensive and nuanced assessment of students’ scores, while also facilitating the diversification of assessment functions in the teaching process. The comparison of student score distributions is shown in Figure 8. All students have met the passing standard (with scores of 60 points or above), among whom [50, 53] students fall within the middle score range (with scores between 70 and 79 points). Compared to the previous assessment scores based solely on a single score point, the average score assessed by the I-AHP model has seen a substantial increase in [1.88, 5.65] points. A detailed analysis of individual student scores reveals that the majority of students have achieved higher scores than those derived from the single-score-point assessment, suggesting that the I-AHP method may offer a more comprehensive evaluation of students’ abilities. The I-AHP method appears to effectively refine score estimates by providing a score range instead of a single score point, enabling a more detailed assessment of student performance. In past teaching practices, the assessment function has not been fully realized. Single examinations often devolve into summative evaluations, which, while expanding the evaluative function of exams, overshadow and neglect other crucial assessment functions. Course assessment should play a pivotal role in evaluation, differentiation, prediction, diagnosis, teaching feedback, and motivation. During the teaching process, educators can formulate more targeted instructional strategies based on existing assessment results, thereby enhancing the enthusiasm and initiative of both teachers and students.

Although the introduction of interval numbers in I-AHP enhances decision-making flexibility, it also presents certain limitations. Firstly, the construction of judgment matrices in I-AHP heavily relies on the subjective judgments of experts. Given that experts vary in their professional backgrounds and experiences, assessment results are prone to biases; secondly, the incorporation of interval numbers significantly increases computational complexity, and when dealing with a large number of indicators, the efficiency of iterative solutions markedly declines; thirdly, the determination of interval widths necessitates a careful balance between information retention and decision-making flexibility. Setting the interval range too wide may obscure key information, while setting it too narrow may restrict decision-making flexibility, ultimately undermining the effectiveness of decisions. To address these issues, this study introduces a method that integrates the Random Forest algorithm with K-means clustering analysis to further enhance the accuracy and robustness of decision analysis.

4.3. Results and Analysis of K-Means Clustering

An evaluation was carried out to delve into how retraining the Random Forest model affects the SHAP attribution results, during which the model was retrained using 10 distinct random seeds and the mean absolute SHAP values for each feature were then computed. The results revealed that for students in Cluster 1, ‘assignment completion’ stood out as the most crucial feature in 9 out of the 10 retraining scenarios, while for those in Cluster 2, ‘final exam’ consistently acted as the most significant negative driving factor across all retraining attempts. Moreover, the ranking of SHAP values for key features remained highly consistent throughout all runs, with a standard deviation below 0.01, providing solid statistical support to validate the robustness of the derived attribution conclusions. The integration of SHAP interpretability with K-means clustering delineates three distinct student cohorts characterized by quantifiable divergence in learning patterns, as evidenced by cluster centroids and Shapley value decomposition. The centers of clustering results are presented in Table 8, while Figure 9 visually summarizes the SHAP value-based feature importance across all clusters. The student distribution across clusters reveals Cluster 1 as the largest cohort with 39 students (39.58% of the total), followed by Cluster 2 with 37 students (37.50%), and Cluster 3 with the smallest cohort of 24 students (24.38%).

As shown in Table 9, Cluster 1 displays a balanced performance profile with moderate scores across most indicators, achieving a final exam score of 65.84 and a classroom performance of 80.11. SHAP analysis reveals that assignment completion is the primary driver of cluster membership, overshadowing the influence of final exam scores despite their moderate level. This indicates that Cluster 1 students heavily rely on completing assignments, likely compensating for weaker autonomous learning, as reflected in their chapter self-study score of 86.86. The dominance of assignment completion in Cluster 1’s SHAP profile suggests these students disproportionately rely on task compliance over genuine comprehension, as evidenced by their exam scores despite moderate performance in other domains. Cluster 2, with 37 students, demonstrates the lowest final exam performance despite relatively competent scores in other domains, such as chapter tests and online assignments. SHAP attribution reveals final exam scores as the primary differentiator for this cohort, with a high SHAP magnitude, underscoring their critical underperformance in knowledge integration during high-stakes assessments. The chapter self-study score (80.84) is the lowest among clusters, suggesting deficiencies in independent learning that may contribute to their exam struggles. This cohort’s SHAP profile suggests a reliance on incremental task completion without effective translation to exam outcomes, exposing a gap in knowledge retention. Cluster 3 performed well on all indicators, especially the self-study score, which showed that these students were very good at independent learning. They were able to effectively master the content of their studies and achieved better grades in the final exam (73.50). This high level of self-study may have given them an advantage in preparing for the exam and, therefore, better overall learning outcomes. Notably, while self-study scores are numerically dominant, SHAP attribution prioritizes final exams as the second-most influential feature, with exam scores exhibiting stronger differentiation power than chapter tests. This SHAP-based prioritization implies that exam performance better captures this cohort’s knowledge integration efficacy compared to incremental self-study gains, highlighting the model’s capacity to surface non-intuitive feature relationships.

In summary, cluster 1 students, with average grades and engagement levels, could adopt a simplified flipped classroom model: pre-class micro-lectures (≤10 min) paired with reflection worksheets, in-class structured peer debates using scenario-based prompts, and post-class journals consolidating insights from both phases. Students in Cluster 2 exhibit weaker self-learning abilities and poorer final grades. To address this, instructors may implement a phased approach beginning with structured review tasks (e.g., annotated concept maps or summary tables) to scaffold independent learning, followed by biweekly peer-led study groups guided by discussion templates provided by the instructor. Progress is reinforced through periodic self-assessment checklists and personalized feedback. For Cluster 3 students, who demonstrate strong academic performance with high assignment completion and online assignments but a relatively lower final exam score, brief synthesis-oriented questions can be embedded directly into post-lecture assignments, with subsequent in-class discussions analyzing these questions to emphasize connections between assignment content and broader course concepts.

Table 10 presents the Kruskal–Wallis test results for six academic performance features across all features. The H-statistic values, ranging from 82.65 (Online Assignments) to 95.25 (Classroom Performance), indicate varying degrees of discriminatory power among the features, with classroom performance demonstrating the strongest capacity to differentiate student groups, while online assignments exhibit relatively weaker but still signs three K-means clusters (k = 3), revealing statistically significant differences (p < 0.05) ificant clustering effects. These findings validate the effectiveness of the K-means clustering approach in identifying distinct student subgroups based on academic behaviors, suggesting that classroom performance serves as a critical indicator for educational differentiation, whereas online assignments may require additional contextual analysis to fully interpret their role in student stratification. The results collectively support the use of these features for targeted teaching interventions, with high-H-statistic features (e.g., classroom performance, assignment completion) warranting priority in designing personalized learning strategies.

4.4. Research Limitations

A significant research limitation of this study lies in the inadequate sample size. Specifically, the total number of samples included in this study is 98. A relatively small sample size may undermine the generalizability and robustness of the research findings. It could reduce the reliability of statistical results and limit the ability to accurately represent a broader population and draw far-reaching conclusions. When interpreting the research findings and considering their applicability, this limiting factor of sample size should be taken into account.

5. Conclusions

In this paper, I-AHP was used to evaluate student course performance, and differences in student performance factors were analyzed with K-means clustering and Random Forest classification with SHAP value analysis. On the one hand, I-AHP, based on pedagogical sub-criteria, reduced the influence of subjective factors in the evaluation process and clarified the weight of each evaluation criterion. On the other hand, K-means clustering, combined with Random Forest and SHAP analysis, reveals the internal structure of student performance and helps teachers identify groups of students with similar learning characteristics and achievement levels. Based on the analysis results, teachers can take appropriate measures to enhance students’ learning outcomes.

Based on the analysis, the following conclusions can be drawn: (1) Utilizing K-means clustering to categorize students into three cohorts, combined with Random Forest classification and SHAP analysis to categorize students into different groups, helps educators develop targeted teaching strategies. The clustering results allow for the identification of differences in students’ learning abilities, enabling the design of tailored instructional activities that cater to varying ability levels. (2) The clustering results indicate that students in Cluster 1 demonstrate average final grades and classroom performance. Meanwhile, Cluster 2 exhibits weaker self-learning abilities and has lower final grades. Lastly, students in Cluster 3 achieve strong final grades and excel in chapter tests and online assignments, but may benefit from improved synthesis of knowledge for comprehensive assessments.

Currently, there is still a lack of empirical research on the application of this framework in disciplines other than the “Environmental Monitoring” course. However, its potential for cross-disciplinary application is extremely remarkable. The Improved Analytic Hierarchy Process (I-AHP), with its unique mechanism of quantifying teaching criteria to reduce subjective evaluation biases, can flexibly cater to the teaching needs of different discipline courses, such as those in medicine and engineering. Through clustering and classification analysis methods, it can precisely identify the differences in ability characteristics among students from different disciplines in areas such as programming algorithm skills and humanistic critical thinking. The study has found that three types of student groups are commonly present: the balanced-development type, the type with weak autonomous learning abilities, and the type with high grades but insufficient knowledge integration. These findings provide a solid basis for the implementation of stratified teaching strategies.

In practical applications, it is necessary to adjust the data collection methods according to the specific characteristics of each course. For example, for language courses, text analysis can be added, and corresponding evaluation dimensions can be optimized. For engineering courses, the assessment of drawing and drafting abilities should be strengthened, while ensuring there is a sufficient amount of data to support in-depth analysis. In the future, multi-disciplinary comparative experiments will be conducted to further verify the framework’s adaptability across different disciplines, with the aim of developing a widely applicable teaching optimization plan.

Author Contributions

Y.L.: Conceptualization, Investigation, Formal analysis, Resources, Writing—review and editing; X.Y.: Formal analysis, Visualization, Writing—original draft; Y.Z.: Conceptualization, Data curation, Methodology, Funding acquisition, Project administration, Software, Supervision; J.W.: Validation, Visualization; M.Z.: Investigation, Validation; L.Y.: Methodology; L.S.: Resources, Formal analysis. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the General Program of the National Natural Science Foundation of China (No. 52470222 and No. 42577536), Undergraduate and Higher Continuing Education Teaching Reform Research Key Projects of Shaanxi Province (No. 23ZG016), Research Special on Educational Reform (Top-tier Innovative Talent Development)—Key Project on Curriculum Development and Reform (No. JGZ220113) and National Key Research and Development Program of china (No. 2022YFC3203105-02).

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A

Appendix A.1. The Original Expert Judgment Data

Table A1. The original expert judgment data.

Exert	B1 vs. B2	C1 vs. C2	C1 vs. C3	C1 vs. C4	C2 vs. C3	C2 vs. C4	C3 vs. C4	D1 vs. D2
Expert 1	[1, 2]	[ $\frac{1}{4}$ $, \frac{1}{3}$ ]	[ $\frac{1}{4}$ $, \frac{1}{3}$ ]	[ $\frac{1}{4}$ $, \frac{1}{3}$ ]	[1, 1]	[1, 2]	[1, 2]	[3, 4]
Expert 2	[1, 2]	[ $\frac{1}{4}$ $, \frac{1}{3}$ ]	[ $\frac{1}{4}$ $, \frac{1}{3}$ ]	[ $\frac{1}{4}$ $, \frac{1}{3}$ ]	[1, 1]	[1, 2]	[1, 2]	[3, 4]
Expert 3	[1, 2]	[ $\frac{1}{4}$ $, \frac{1}{3}$ ]	[ $\frac{1}{5}$ $, \frac{1}{4}$ ]	[ $\frac{1}{5}$ $, \frac{1}{4}$ ]	[1, 1]	[2, 3]	[2, 3]	[2, 4]
Expert 4	[2, 3]	[ $\frac{1}{3}$ $, \frac{1}{2}$ ]	$\frac{1}{4}$ $, \frac{1}{3}$ ]	[ $\frac{1}{4}$ $, \frac{1}{3}$ ]	[1, 2]	[1, 2]	[1, 2]	[3, 5]
Expert 5	[1, 2]	[ $\frac{1}{5}$ $, \frac{1}{4}$ ]	[ $\frac{1}{5}$ $, \frac{1}{4}$ ]	[ $\frac{1}{5}$ $, \frac{1}{4}$ ]	[1, 1]	[ $\frac{1}{3}$ $, \frac{1}{2}$ ]	[ $\frac{1}{3}$ $, \frac{1}{2}$ ]	[3, 4]
Expert 6	[1, 2]	[ $\frac{1}{4}$ $, \frac{1}{3}$ ]	[ $\frac{1}{4}$ $, \frac{1}{3}$ ]	[ $\frac{1}{4}$ $, \frac{1}{3}$ ]	[1, 1]	[1, 2]	[1, 2]	[4, 4]
Expert 7	[ $\frac{1}{2}$ , 1]	[ $\frac{1}{4}$ $, \frac{1}{3}$ ]	[ $\frac{1}{4}$ $, \frac{1}{3}$ ]	[ $\frac{1}{4}$ $, \frac{1}{3}$ ]	[1, 1]	[1, 2]	[1, 2]	[3, 4]
Expert 8	[ $\frac{1}{3}$ $, \frac{1}{2}$ ]	[ $\frac{1}{4}$ $, \frac{1}{3}$ ]	[ $\frac{1}{4}$ $, \frac{1}{3}$ ]	[ $\frac{1}{4}$ $, \frac{1}{3}$ ]	[1, 2]	[2, 3]	[2, 3]	[4, 5]
Expert 9	[1, 2]	[ $\frac{1}{3}$ $, \frac{1}{2}$ ]	[ $\frac{1}{3}$ $, \frac{1}{2}$ ]	[ $\frac{1}{3}$ $, \frac{1}{2}$ ]	[1, 1]	[1, 2]	[1, 2]	[2, 3]

Appendix A.2. I-AHP Calculation Process Data

(a) the lower bound matrix and the upper bound matrix

$Λ_{B}^{-} = (\begin{matrix} 1 & 1 \\ 1 & 1 \end{matrix}), Λ_{B}^{+} = (\begin{matrix} 1 & 2 \\ \frac{1}{2} & 1 \end{matrix}), Λ_{D}^{-} = (\begin{matrix} 1 & 3 \\ \frac{1}{3} & 1 \end{matrix}), Λ_{D}^{+} = (\begin{matrix} 1 & 4 \\ \frac{1}{4} & 1 \end{matrix})$
(b) the normalized matrix

${\tilde{b}}_{i j}^{-} = (\begin{matrix} 0.5 & 0.5 \\ 0.5 & 0.5 \end{matrix}), {\tilde{b}}_{i j}^{+} = (\begin{array}{l} 0.67 & 0.67 \\ 0.33 & 0.33 \end{array}), {\tilde{d}}_{i j}^{-} = (\begin{array}{l} 0.75 & 0.75 \\ 0.25 & 0.25 \end{array}), {\tilde{d}}_{i j}^{+} = (\begin{array}{l} 0.80 & 0.80 \\ 0.20 & 0.20 \end{array})$
(c) the basic weights of matrix

$w_{b}^{-} = {(0.50, 0.50)}^{T}, w_{b}^{+} = {(0.67, 0.33)}^{T}, w_{b}^{-} = {(0.75, 0.25)}^{T}, w_{b}^{+} = {(0.80, 0.20)}^{T}$

References

Bühler, M.M.; Jelinek, T.; Nübel, K. Training and Preparing Tomorrow’s Workforce for the Fourth Industrial Revolution. Educ. Sci. 2022, 12, 782. [Google Scholar] [CrossRef]
Álvarez-Huerta, P.; Muela, A.; Larrea, I. Disposition toward critical thinking and creative confidence beliefs in higher education students: The mediating role of openness to diversity and challenge. Think. Ski. Creat. 2022, 43, 101003. [Google Scholar] [CrossRef]
AlMalki, H.A.; Durugbo, C.M. Evaluating critical institutional factors of Industry 4.0 for education reform. Technol. Forecast. Soc. Change 2023, 188, 122327. [Google Scholar] [CrossRef]
China’s Ministry of Education. Notice of the General Office of the Ministry of Education on the Announcement of the First Batch of ‘Emerging Engineering Education’ Research and Practice Projects. Available online: http://www.moe.gov.cn/srcsite/A08/s7056/201803/t20180329_331767.html (accessed on 12 August 2025).
Boud, D.; Costley, C.; Cranfield, S.; Desai, J.; Nikolou-Walker, E.; Nottingham, P.; Wilson, D. The pivotal role of student assessment in work-integrated learning. High. Educ. Res. Dev. 2023, 42, 1323–1337. [Google Scholar] [CrossRef]
Gratchev, I. Replacing Exams with Project-Based Assessment: Analysis of Students’ Performance and Experience. Educ. Sci. 2023, 13, 408. [Google Scholar] [CrossRef]
Yan, Z.; Wang, X.; Boud, D.; Lao, H. The effect of self-assessment on academic performance and the role of explicitness: A meta-analysis. Assess. Eval. High. Educ. 2023, 48, 1–15. [Google Scholar] [CrossRef]
Hofman, R.H.; Dijkstra, N.J.; Adriaan Hofman, W.H. School self-evaluation and student achievement. Sch. Eff. Sch. Improv. 2009, 20, 47–68. [Google Scholar] [CrossRef]
Deneen, C.C.; Hoo, H.-T. Connecting teacher and student assessment literacy with self-evaluation and peer feedback. Assess. Eval. High. Educ. 2023, 48, 214–226. [Google Scholar] [CrossRef]
Alhothali, A.; Albsisi, M.; Assalahi, H.; Aldosemani, T. Predicting Student Outcomes in Online Courses Using Machine Learning Techniques: A Review. Sustainability 2022, 14, 6199. [Google Scholar] [CrossRef]
Alsariera, Y.A.; Baashar, Y.; Alkawsi, G.; Mustafa, A.; Alkahtani, A.A.; Ali, N.A. Assessment and Evaluation of Different Machine Learning Algorithms for Predicting Student Performance. Comput. Intell. Neurosci. 2022, 2022, 4151487. [Google Scholar] [CrossRef]
Kukkar, A.; Mohana, R.; Sharma, A.; Nayyar, A. A novel methodology using RNN + LSTM + ML for predicting student’s academic performance. Educ. Inf. Technol. 2024, 29, 14365–14401. [Google Scholar] [CrossRef]
Shoaib, M.; Sayed, N.; Singh, J.; Shafi, J.; Khan, S.; Ali, F. AI student success predictor: Enhancing personalized learning in campus management systems. Comput. Hum. Behav. 2024, 158, 108301. [Google Scholar] [CrossRef]
Shi, H.; Zhou, Y.; Dennen, V.P.; Hur, J. From unsuccessful to successful learning: Profiling behavior patterns and student clusters in Massive Open Online Courses. Educ. Inf. Technol. 2024, 29, 5509–5540. [Google Scholar] [CrossRef]
Albayrak, E.; Erensal, Y.C. Using analytic hierarchy process (AHP) to improve human performance: An application of multiple criteria decision making problem. J. Intell. Manuf. 2004, 15, 491–503. [Google Scholar] [CrossRef]
Han, B.; Ming, Z.; Zhao, Y.; Wen, T.; Xie, M. Comprehensive risk assessment of transmission lines affected by multi-meteorological disasters based on fuzzy analytic hierarchy process. Int. J. Electr. Power Energy Syst. 2021, 133, 107190. [Google Scholar] [CrossRef]
Kostić-Ljubisavljević, A.; Samčović, A. Selection of available GIS software for education of students of telecommunications engineering by AHP methodology. Educ. Inf. Technol. 2024, 29, 5001–5015. [Google Scholar] [CrossRef]
Feng, Z.; Hou, H.; Lan, H. Understanding university students’ perceptions of classroom environment: A synergistic approach integrating grounded theory (GT) and analytic hierarchy process (AHP). J. Build. Eng. 2024, 83, 108446. [Google Scholar] [CrossRef]
Qin, Y.; Hashim, S.R.M.; Sulaiman, J. An Interval AHP Technique for Classroom Teaching Quality Evaluation. Educ. Sci. 2022, 12, 736. [Google Scholar] [CrossRef]
Zhu, Y.; Tan, J.; Cao, Y.; Liu, Y.; Liu, Y.; Zhang, Q.; Liu, Q. Application of Fuzzy Analytic Hierarchy Process in Environmental Economics Education: Under the Online and Offline Blended Teaching Mode. Sustainability 2022, 14, 2414. [Google Scholar] [CrossRef]
Di Zio, S.; Gordon, T.J. Exploring a method to reduce inconsistency in the group Analytic Hierarchy Process using the Delphi method and Nudge theory. Futures 2024, 161, 103413. [Google Scholar] [CrossRef]
Sharma, R.; Pardeshi, S.; Joseph, J.; Khan, D.; Chelani, A.; Dhodapkar, R. Integrated analytical hierarchy process-grey relational analysis approach for mechanical recycling scenarios of plastics waste in India. Environ. Sci. Pollut. Res. 2024, 31, 23106–23119. [Google Scholar] [CrossRef]
Bergin, C.; Wind, S.A.; Grajeda, S.; Tsai, C.-L. Teacher evaluation: Are principals’ classroom observations accurate at the conclusion of training? Stud. Educ. Eval. 2017, 55, 19–26. [Google Scholar] [CrossRef]
Chang, W.; Ji, X.; Liu, Y.; Xiao, Y.; Chen, B.; Liu, H.; Zhou, S. Analysis of University Students’ Behavior Based on a Fusion K-Means Clustering Algorithm. Appl. Sci. 2020, 10, 6566. [Google Scholar] [CrossRef]
Kim, S.; Cho, S.; Kim, J.Y.; Kim, D.-J. Statistical Assessment on Student Engagement in Asynchronous Online Learning Using the k-Means Clustering Algorithm. Sustainability 2023, 15, 2049. [Google Scholar] [CrossRef]
Tuyishimire, E.; Mabuto, W.; Gatabazi, P.; Bayisingize, S. Detecting Learning Patterns in Tertiary Education Using K-Means Clustering. Information 2022, 13, 94. [Google Scholar] [CrossRef]
Marcílio-Jr, W.E.; Eler, D.M. Explaining dimensionality reduction results using Shapley values. Expert Syst. Appl. 2021, 178, 115020. [Google Scholar] [CrossRef]
Badal, Y.T.; Sungkur, R.K. Predictive modelling and analytics of students’ grades using machine learning algorithms. Educ. Inf. Technol. 2023, 28, 3027–3057. [Google Scholar] [CrossRef]
Saaty, T.L. How to make a decision: The analytic hierarchy process. Eur. J. Oper. Res. 1990, 48, 9–26. [Google Scholar] [CrossRef]
Saaty, T.L. The Modern Science of Multicriteria Decision Making and Its Practical Applications: The AHP/ANP Approach. Oper. Res. 2013, 61, 1101–1118. [Google Scholar] [CrossRef]
Ikotun, A.M.; Ezugwu, A.E.; Abualigah, L.; Abuhaija, B.; Heming, J. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Inf. Sci. 2023, 622, 178–210. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. In Proceedings of the Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]

Figure 1. A model evaluating Environmental Monitoring courses.

Figure 2. K-means clustering: (a) randomly distributed datasets, (b) initial cluster centers with four clusters and (c) closest cluster centers with four clusters.

Figure 3. Overview diagram of the Environmental Monitoring course.

Figure 4. Integrated performance evaluation indicator model on the Environmental Monitoring course.

Figure 5. I-AHP Sensitivity Analysis.

Figure 6. Clustering Stability Assessment.

Figure 7. Multi-metric evaluation for optimal cluster determination.

Figure 8. Score statistics of students: (a) the original course assessment methodology, (b) the low bounds and (c) the high bounds from the I-AHP mode.

Figure 9. SHAP value-based feature importance across student clusters.

Table 1. Comparative analysis of existing student performance evaluation methods and adapted solutions in this study.

Method Category	Advantages	Limitations	Source	Adapted Solutions in This Study
Comprehensive Evaluation	Multidimensional assessment	Time-consuming, reliance on multiple raters, complex criteria	[6]	Structured weight allocation by I-AHP to simplify multi-criteria integration
Qualitative Evaluation	Captures behavioral nuances	Lacks objective quantification	[7,8,9]	Integrates interval-valued judgments to balance subjectivity and objective quantification
Data-Driven Methods	High prediction accuracy	Relies on structured data, clusters lack pedagogical relevance	[12,13,14,29]	Retains K-means’ transparent clustering to provide interpretable student profiles
Structured Methods	Hierarchical weight allocation for multi-criteria decision-making	Uncertainty; static weights limit adaptability to dynamic teaching processes	[15]	Applies interval-valued I-AHP to address uncertainty and facilitates targeted teaching by clustering.

Table 2. AHP scale of importance.

Value of Importance	Comparative Judgment
1	F_i is as important as F_j
3	F_i is slightly more important than F_j
5	F_i is strongly more important than F_j
7	F_i is very strongly more important than F_j
9	F_i is extremely more important than F_j
2, 4, 6, 8	Represents the median value of the above adjacent judgment
Reciprocal	$If the ratio of the importance of f_{i} and f_{j} is f_{ij}, then the ratio of F_{i} and f_{j} is f_{i j} = \frac{1}{f_{i j}}$

Table 3. Satty’s derivation of the consistency index for a randomly generated matrix [31].

n	1	2	3	4	5	6	7	8	9	10	11	12
RI	0.00	0.00	0.58	0.90	1.12	1.24	1.32	1.41	1.45	1.49	1.51	1.48

Table 4. Hierarchical structure of student performance index.

Target Level	B-Level Factors	C-Level Factors	D-Level Factors
Student course performance evaluation	B1 Process assessment	C1 Classroom performance	-
		C2 Assignments completion	-
		C3 Online study	D1 Chapter tests
		C3 Online study	D2 Online assignments
		C4 Chapter self-study	-
	B2 Final exam	-	-

Table 5. Importance rating for factors.

Level	Pairwise Comparison Scale Values Between Factors
B-Level factors		B1 Process assessment		B2 Final exam
	B1	1		[1, 2]
	B2	[ $\frac{1}{2}$ , 1]		1
C-Level factors		C1 Classroom performance	C2 Assignments completion	C3 Online study	C4 Chapter self-study
	C1	1	[ $\frac{1}{4}$ $, \frac{1}{3}$ ]	[ $\frac{1}{4}$ $, \frac{1}{3}$ ]	[ $\frac{1}{4}$ $, \frac{1}{3}$ ]
	C2	[3, 4]	1	1	[1, 2]
	C3	[3, 4]	1	1	[1, 2]
	C4	[3, 4]	[ $\frac{1}{2}$ , 1]	[ $\frac{1}{2}$ , 1]	1
D-Level factors		D1 Chapter tests		D2 Online assignments
	D1	1		[3, 4]
	D2	[ $\frac{1}{4}$ $, \frac{1}{3}$ ]		1

Table 6. Multi-lndicator Cluster Evaluation Results for Different Cluster Numbers.

k	Silhouette Coefficient (Mean ± SD)	Lnertia (Mean ± SD)	Calinski-Harabasz (Mean ± SD)
2	0.5774 ± 0.0046	169.3881 ± 0.3706	237.2482 ± 0.7289
3	0.5952 ± 0.0000	67.2908 ± 0.0000	367.5639 ± 0.0000
4	0.5362 ± 0.0113	45.6976 ± 3.1079	373.6379 ± 26.4824
5	0.5085 ± 0.0120	31.3010 ± 3.2754	417.5347 ± 38.7879

Table 7. Define weight intervals for hierachical levels.

B-Level Factors	Weight	C-Level Factors	Weight	D-Level Factors	Weight	Synthetic Weight
B1	[0.50, 0.67]	C1	[0.10, 0.08]	-	-	[0.05, 0.05]
		C2	[0.30, 0.35]	-	-	[0.15, 0.23]
		C3	[0.30, 0.35]	D1	[0.75, 0.80]	[0.11, 0.19]
		C3	[0.30, 0.35]	D2	[0.25, 0.20]	[0.04, 0.05]
		C4	[0.30, 0.22]	-	-	[0.15, 0.15]
B2	[0.50, 0.33]	-	-	-	-	[0.50, 0.33]

Table 8. The center scores and the number of each cluster.

Level	Cluster 1	Cluster 2	Cluster 3
Final exam	65.84	58.00	73.00
Classroom performance	80.11	74.14	86.54
Assignments completion	85.92	82.22	90.12
Chapter tests	93.41	89.51	97.50
Chapter self-study	86.86	80.84	92.08
Online assignments	91.95	79.76	97.83
The number of students in the cluster	39	37	24

Table 9. T Mean and standard deviation of each characteristic within the cluster.

Level	Cluster 1	Cluster 2	Cluster 3
Final exam	65.84 ± 2.03	58.00 ± 2.55	73.5 ± 1.91
Classroom performance	80.11 ±1.97	74.14 ± 1.77	86.54 ± 1.64
Assignments completion	85.92 ± 1.30	82.22 ± 0.75	90.13 ± 0.85
Chapter tests	93.41 ± 1.14	89.51 ± 1.30	97.50 ± 1.14
Chapter self-study	86.86 ± 1.46	80.84 ± 1.79	92.08 ± 1.38
Online assignments	91.95 ± 2.62	79.76 ± 3.19	97.83 ± 1.88
The number of students in the cluster	39	37	24

Table 10. Kruskal–Wallis Test Results for Academic Performance Features by K-Means Clustering (k = 3).

Feature	H-Statisic	Effice Size
Assignment completion	86.18	0.886
Chapter self-study	85.54	0.878
Final exam	85.45	0.879
Chapter test	85.27	0.877
Classroom performance	85.25	0.876
Online assignments	82.65	0.849

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Y.; Yang, X.; Zhu, Y.; Wang, J.; Zuo, M.; Yang, L.; Sun, L. Attribution-Driven Teaching Interventions: Linking I-AHP Weighted Assessment to Explainable Student Clustering. Algorithms 2025, 18, 691. https://doi.org/10.3390/a18110691

AMA Style

Liu Y, Yang X, Zhu Y, Wang J, Zuo M, Yang L, Sun L. Attribution-Driven Teaching Interventions: Linking I-AHP Weighted Assessment to Explainable Student Clustering. Algorithms. 2025; 18(11):691. https://doi.org/10.3390/a18110691

Chicago/Turabian Style

Liu, Yanzheng, Xuan Yang, Ying Zhu, Jin Wang, Mi Zuo, Lei Yang, and Lingtong Sun. 2025. "Attribution-Driven Teaching Interventions: Linking I-AHP Weighted Assessment to Explainable Student Clustering" Algorithms 18, no. 11: 691. https://doi.org/10.3390/a18110691

APA Style

Liu, Y., Yang, X., Zhu, Y., Wang, J., Zuo, M., Yang, L., & Sun, L. (2025). Attribution-Driven Teaching Interventions: Linking I-AHP Weighted Assessment to Explainable Student Clustering. Algorithms, 18(11), 691. https://doi.org/10.3390/a18110691

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Attribution-Driven Teaching Interventions: Linking I-AHP Weighted Assessment to Explainable Student Clustering

Abstract

1. Introduction

2. Methodology

2.1. Interval Number and Interval Judgment Matrix

2.2. Analytic Hierarchy Process

2.3. Clustering and Interpretability Analysis Methods

2.3.1. K-Means Clustering

2.3.2. Random Forest Classification and SHAP Methods

3. Case Study

3.1. Overview of Environmental Monitoring Course

3.2. Formulation and Solution of I-AHP Model

3.3. Case Study Data Collection and Preprocessing

3.4. Educational Data Analysis

3.4.1. K-Means Clustering Analysis

3.4.2. Random Forest Classification and SHAP Analysis

4. Results Analysis

4.1. Analysis of Interval Weights

4.2. Analysis of Scores from the I-AHP Model

4.3. Results and Analysis of K-Means Clustering

4.4. Research Limitations

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. The Original Expert Judgment Data

Appendix A.2. I-AHP Calculation Process Data

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI