A Hybrid Transformer–Graph Framework for Curriculum Sequencing and Prerequisite Optimization in Computer Science Education with Explainable AI

Awasthi, Ritika; Shukla, Abhinav; Agrawal, Ayush Kumar; Dubey, Parul; Ramasamy, R Kanesaraj

doi:10.3390/a19040308

Open AccessArticle

A Hybrid Transformer–Graph Framework for Curriculum Sequencing and Prerequisite Optimization in Computer Science Education with Explainable AI

by

Ritika Awasthi

¹,

Abhinav Shukla

¹

,

Ayush Kumar Agrawal

¹,

Parul Dubey

^2,*

and

R Kanesaraj Ramasamy

^3,*

¹

Department of Information Technology and Computer Science, Dr. C. V. Raman University, Bilaspur 495113, India

²

Symbiosis Institute of Technology, Nagpur Campus, Symbiosis International (Deemed University), Pune 440008, India

³

Faculty of Computing and Informatics, Multimedia University, Cyberjaya 63000, Malaysia

^*

Authors to whom correspondence should be addressed.

Algorithms 2026, 19(4), 308; https://doi.org/10.3390/a19040308

Submission received: 10 March 2026 / Revised: 4 April 2026 / Accepted: 13 April 2026 / Published: 14 April 2026

Download

Browse Figures

Versions Notes

Abstract

Curriculum redesign in Computer Science and Information Technology has become increasingly complex due to rapid technological advancements, interdisciplinary knowledge requirements, and evolving industry expectations. Recent progress in artificial intelligence, particularly Transformer-based language models, offers new opportunities for data-driven and scalable curriculum analysis. This study utilizes syllabus-level textual datasets collected from multiple universities, comprising structured and unstructured course descriptions across diverse CS and IT programs. The dataset enables semantic representation learning and prerequisite inference while supporting cross-institutional curriculum analysis. We propose a hybrid framework that combines Transformer-based semantic encoding with graph-based prerequisite optimization and constraint-aware curriculum sequencing. The novelty of this work lies in integrating semantic prerequisite discovery, optimization-driven curriculum structuring, and explainable AI within a unified decision-support framework. Experimental results demonstrate that the proposed approach consistently outperforms existing machine learning and deep learning baselines, achieving higher prerequisite prediction accuracy, improved curriculum feasibility, and more coherent course sequencing, thereby offering a scalable and interpretable solution for evidence-based curriculum redesign in higher education.

Keywords:

curriculum redesign; transformer models; prerequisite inference; graph optimization; explainable AI

1. Introduction

Due to the rapid pace of technological advancement, interdisciplinary collaboration and increasing industry needs, curriculum design in CS/IT is becoming more challenging. Modern curricula typically consist of 40–60 courses, with many implicit prerequisite dependencies spread across semesters [1,2]. Mis-sequencing of these dependencies can, in turn, negatively impact learner outcomes through increased cognitive load, fragmentation of comprehension and decreased rates of course completion. At the same time, curriculum data is largely unstructured, with prerequisite relationships often undocumented, implicit, or inconsistently defined across institutions. This lack of structured representation limits the effectiveness of traditional rule-based or expert-driven curriculum design approaches. Consequently, there is a critical need for data-driven, scalable, and interpretable methods that can systematically model curriculum dependencies and support evidence-based curriculum planning [3,4].

1.1. Background and Context

Curriculum ordering and prerequisite design are core to effective learning progress structuring in CS/IT education. Traditionally, such tasks use expert opinion and past cases as well as regulatory frameworks, but these are subjective in nature and difficult to adjust at the pace required by quickly changing technology spaces. Recent breakthroughs in natural language processing (NLP), including Transformer architectures, have shown strong performance at modeling deep semantic relations in educational text, being 10–18% better than classical approaches such as LSTM and GRU in contextual representational tasks [5,6]. However, many current uses of Transformers in educational domains are over document classification or recommendation only, and the focus does not address structural optimization of curricula as systems of knowledge dependencies. Furthermore, the black-box property of deep models constrains their adoption in academic decision-making, where explanations and reasons are often required.

1.2. Research Problem

Despite the availability of large-scale course and prerequisite datasets, there exists no integrated, explainable, and data-driven framework that can (i) predict latent prerequisite relationships in courses from their content; (ii) optimize curriculum sequencing under both academic and regulatory constraints; and (iii) produce interpretable explanations that are suitable for curriculum committees and accreditation agencies. Most of the current methods are not semantically deep enough, overlook dependencies, and do not provide explainability, limiting their practical application in curriculum reform.

1.3. Proposed Solution and Novelty

To fill this gap, we introduce a hybrid Transformer-guided curriculum sequencing and prerequisite optimization framework enriched by structural modeling and explainable artificial intelligence (XAI). The proposed model uses a Transformer encoder module for learning the semantic representation of courses, mining out the strength information by visiting units, and integrating them over graph-based optimization layer to recursively adjust curriculum scheduling according to constraints.

An XAI module is integrated in order to interpret model decisions with attention analysis, dependency path explanations and feature attribution details for transparency and academic accountability. The novelty of our approach is to treat curriculum design as learned, explainable knowledge dependency optimization and not a static administrative exercise (Figure 1). The end-to-end workflow of the proposed framework outlines the process through which multi-university curriculum data are semantically encoded using Transformer models, transformed into a curriculum dependency graph, and finally optimized to yield coherent and feasible curriculum structures, while explainable AI components provide attention-based or graph-path explanations when making any decision in designing course sequences.

In contrast to the current approaches that consider semantic modeling and curriculum optimization to be either independent or weakly contingent processes, the proposed framework presents a tightly integrated semantic–structural learning paradigm. It contributes (i) probabilistic prerequisite discovery from unstructured curriculum text, (ii) constraint-aware optimization of curriculum sequencing under academic feasibility requirements and (iii) explanation alignment linking semantic evidence with structural decisions. This joint integration makes sure that semantic inference directly guides structural optimization while both are interpretable from a unique linear framework. This tightly coupled interaction—not merely a pipeline of independent modules—represents the main novelty of the proposed approach and what distinguishes it from recent hybrid or modular methods.

1.4. Scientific Contributions

The key scientific contributions of this research are as follows:

A novel hybrid AI framework that combines Transformer-based semantic learning with structural dependency modeling for automated curriculum sequencing and prerequisite optimization in CS/IT programs.
An explainable curriculum intelligence mechanism, integrating XAI techniques to provide transparent justification of prerequisite relationships and sequencing decisions for academic stakeholders.
A data-driven evaluation methodology that quantitatively assesses curriculum coherence, prerequisite validity, and optimization effectiveness using real-world university course datasets.

2. Literature Review

Research on higher education curriculum design has gained significant momentum as academic programs become increasingly complex and interdisciplinary, driven by rapid technological advancements and evolving knowledge domains.

2.1. Curriculum Design and Prerequisite Modeling in Higher Education

Historically, such curriculum development was underpinned by expert opinion, accreditation standards and manual institution comparisons [7,8]. Previous studies have explored prerequisite structures through rule-based approaches, expert questionnaires and curriculum mapping frameworks to validate the established knowledge order and workload weightings [9,10]. But these methods are slow, prone to bias and difficult to scale across large programs or multiple institutions. Research has demonstrated the importance of employing an evidence-based approach when analyzing university curricula, in order to identify hitherto undetected dependencies, redundancies and even misalignments in important domains, such as Computer Science and IT.

2.2. Machine Learning Approaches for Curriculum Analysis

Machine learning approaches have utilized descriptive digital academic records and course descriptions in the aspect of curriculum problems, e.g., examining the similarity between courses, prerequisite prediction, and optimization of learner paths. Traditional machine learning methods, such as logistic regression and decision trees/random forests, were tried with features prepared manually, like ones based on course feature metadata or the frequency of keywords appearing inside the text [11,12,13,14,15]. While these approaches are interpretable and relatively more computationally efficient, it is not easy for them to capture a deeper semantic level of relation between courses since they also really depend on feature engineering [16,17,18].

2.3. Natural Language Processing for Educational Texts

NLP has played a vital role in knowledge population from educational text that contains syllabus, learning objectives and assessment item descriptions [19,20,21]. Initial models utilized bag-of-words approaches, TF–IDF vectorizations, and topic modeling to compare courses and discover relationships [22,23]. Although such approaches have proven effective in the task of detecting similar pairs with respect to surface-level similarity, they fail to capture meaning in context or suffer from polymorphism and hierarchical knowledge decomposition—producing a loss of efficiency in deriving prerequisites and structuring the syllabus.

2.4. Transformer-Based Semantic Modeling in Education

The Transformer-based models have significantly advanced text representation learning, as they are capable of capturing long-range dependencies and contextual semantics [24,25]. Transformer-type models have been applied in some of the latest educational data mining applications, such as course recommendation, learning outcomes alignment, and predicting student performance. Compared to traditional models based on recurrent and conv layer Transformers, they can encode more complex articles [26,27,28]. However, most earlier works cover the tasks of recommendation or prediction rather than semantic modeling, along with formal curriculum structure optimization or academic restriction. Thus, recent work has continued to explore hybrid approaches that leverage both Transformer-based semantic modeling and the more structured forms of dependencies represented in graph-based models to fit educational data.

2.5. Hybrid Transformer–Graph Models in Educational Systems

There have been recent studies (2023–2025) that combine Transformer-based semantic models with graph neural networks (GNNs) and optimization frameworks to solve structured learning problems. This type of method focuses on jointly capturing semantic relationships and structural dependency, especially in educational data mining and recommendation systems. As an example, our dynamic graph neural networks and knowledge graph-based learning frameworks have shown a better ability to capture complicated speaker dependencies in education datasets [29].

Nonetheless, most of the existing methods are based on predefined or manually generated graph structures and do not have the capacity to infer latent prerequisite relationships from unstructured course content. This yields a high level of underutilization of the semantic richness in curriculum data for structural optimization tasks.

Simultaneously, attention-based approaches for sequential modeling, like deep LSTM-based architectures with temporal attention, also showcase the ability of these mechanisms in uncovering significant dependency patterns and enhancing interpretability [29]. While these methods are focused on specific domains like time-series and health care, the attention-based modeling of dependencies is closely related to inferring curriculum prerequisites.

In contrast, the proposed framework uniquely integrates Transformer-based semantic encoding, graph construction, constraint-aware optimization, and explainable AI into a unified decision-support system. This integration enables direct coupling between semantic understanding and structural curriculum design, thereby addressing the limitations of existing hybrid or modular approaches.

2.6. Graph-Based Modeling and Optimization of Curricula

Graph representations naturally model curricula as nodes for courses and directed edges for prerequisites [30,31,32]. Previous work has applied graph-theoretic methods for curriculum visualization and topological ordering, where the focus remains on analyzing dependencies [33]. Algorithms or solution strategies have been proposed to minimize university rules violations, balance academic credits, and generate possible semesters of study plans. But these methods are based on the assumption of having given prerequisite graphs; they do not take into account experiments for the automatic identification of prerequisites on unstructured text [34,35].

2.7. Explainable AI in Educational Decision Support

With the rising influence of AI-powered systems on academic planning and policy-making, interpretability has become a desired requirement [36,37]. To overcome these challenges, explainable AI (XAI) approaches, including attention visualization, feature attribution and graph-path explanations, are increasingly employed to foster transparency and user trust in the context of educational applications. Previous research shows the potential of XAI for student-facing systems and learning analytics, but there is minimal work on explainability for curriculum-level decision-making in hybrid semantic–optimization models [38,39].

2.8. Base Paper Review

Sheng et al. (2023) [40] present a GT-PSO-inspired adaptive curriculum sequencing approach that tackles the inherently NP-hard nature of curriculum sequencing in online educational environments. The proposed work defines the problem of curriculum sequencing as a combinatorial optimization problem and shows that GT-PSO can beat a number of traditional metaheuristic algorithms in terms of convergence and fitness on a real-world learning analytics dataset. Although such an approach is mathematically sound and can be effective for personalized sequencing based on the learner’s preference and constraints, it is based on structured curriculum design, not explicitly considering semantic relationships between topics in content. Furthermore, explainability currently focuses on algorithmic behavior and not curriculum-level interpretability; thus, semantic representation learning, graph-based dependency inference, and explainable AI can be incorporated to improve transparency, scalability, and generality among large, heterogeneous higher-education curricula.

2.9. Research Gap and Positioning of the Present Study

There is also a well-defined gap between the semantic comprehension of courses and formal curriculum optimization under academic constraints, as presented in the previous examples. Existing approaches concentrate on text similarity without structural verification or graph-based curriculum planning, with a lack of automated and interpretable conceptual prerequisite discovery. The current study fills this gap with the first unified framework that integrates Transformer-based semantic encoding, graph-based prerequisite optimization, constraint-aware curriculum sequencing, and explainable AI mechanisms for data-driven and transparent curriculum redesign. While recent hybrid Transformer–graph approaches have shown promise, they often rely on predefined graph structures and lack integrated explainability mechanisms for curriculum-level decision support.

3. Dataset Description

In this study, we use a multi-source curriculum dataset designed to capture the semantic content and structural dependencies of Computer Science and Information Technology programs across institutions. The final processed dataset comprises 11,932 course instances collected from seven Indian universities and three international institutions. The corpus focuses on undergraduate CS/IT programs, with approximately 40–60 courses per curriculum.

The dataset distribution is 68% from Indian universities and 32% from international institutions. Course syllabi were gathered from publicly available university webpages and institutional curriculum documents. The curated documents are composed of course titles, descriptions, syllabus units, learning outcomes and credits (and declared prerequisites if provided). In preprocessing, the PDF-based curriculum documents (curricula) were converted into structured machine-readable text by automated parsing pipelines, while manual validation ensured structural consistency across heterogeneous formats. The resulting data were standardized into a unified schema to support semantic encoding, prerequisite inference, and graph-based curriculum sequencing.

All data used in this study were obtained from publicly available academic documents. No personal, confidential, or sensitive information was involved. Therefore, the study did not require human-subject ethical approval. Table 1 summarizes the heterogeneous datasets used in this work: textual syllabus sources; structured prerequisite records and the inferred curriculum graph, annotated with their data types; key attributes and their respective roles in semantic representation learning; prerequisite inference; and curriculum optimization.

All curriculum documents were collected from publicly available institutional sources and standardized into a unified schema through automated parsing and manual validation.

4. Proposed Methodology

The proposed solution considers curriculum sequencing and prerequisite configurations as a semantic–structural learning problem, such that course contents are firstly semantically represented by Transformers and secondly incorporated into dependency calculations, scheduling optimization and explainability analysis. The method consists of five modular stages. Figure 2 shows the step-by-step process of the proposed framework, which shows how to transform curriculum inputs into their semantic representations; model dependencies between them; and structure the curriculum based on a graph with an explainable AI layer, providing logical, structural and decision-level explanations over generated output course sequences. All variables used in equations are explicitly defined immediately after their first occurrence to ensure clarity and consistency.

4.1. Curriculum Text Representation Using Transformer Encoder

Let Equation (1) denote the set of courses in a curriculum, where each course

c_{i}

is represented by a textual sequence in Equation (2),

C = {c_{1}, c_{2}, \dots, c_{N}}

(1)

T_{i} = [t_{1}, t_{2}, \dots, t_{L_{i}}]

(2)

comprising the course title, description, syllabus units, and learning outcomes.

Each sequence,

T_{i}

, is tokenized and mapped into embeddings using a pretrained Transformer encoder. Given token embeddings

E_{i} \in R^{L_{i} \times d}

, the Transformer computes contextualized representations via multi-head self-attention shown in Equation (3).

Attention (Q, K, V) = softmax (\frac{Q K^{⊤}}{\sqrt{d_{k}}}) V

(3)

The final course embedding

h_{i} \in R^{d}

is obtained via mean pooling over the final hidden states as per Equation (4).

h_{i} = \frac{1}{L_{i}} \sum_{j = 1}^{L_{i}} z_{i j}

(4)

These embeddings capture latent semantic information about course content and conceptual depth.

4.2. Prerequisite Relationship Inference

Prerequisite inference is formulated as a pairwise dependency prediction task. For any two courses

(c_{i}, c_{j})

, the objective is to estimate the probability that

c_{i}

is a prerequisite of

c_{j}

.

The feature vector is constructed as shown in Equation (5).

x_{i j} = [h_{i} ∥ h_{j} ∥ ∣ h_{i} - h_{j} ∣]

(5)

Equation (6) shows that a feed-forward neural classifier computes the prerequisite likelihood.

p_{i j} = σ (W x_{i j} + b)

(6)

where

σ (\cdot)

is the sigmoid function.

The loss function is defined as shown in Equation (7).

L_{p r} = - \sum_{(i, j)} [y_{i j} l o g p_{i j} + (1 - y_{i j}) l o g (1 - p_{i j})]

(7)

where

y_{i j} \in {0,1}

denotes the ground-truth prerequisite relationship. Feature interaction refers to the combined representation of course pairs using concatenation, difference vector (|

h_{i}

−

h_{j}

|), and element-wise product to capture relational semantics.

4.3. Curriculum Graph Construction

The curriculum is modeled as a directed weighted graph, as given in Equation (8).

G = (V, E, W)

(8)

where:

V = C

represents courses;

E = {(c_{i}, c_{j})}

represents prerequisite edges;

W = {p_{i j}}

denotes dependency strengths.

Edges are retained if the condition in Equation (9) satisfies

p_{i j} \geq τ

(9)

where

τ

is a confidence threshold.

This graph encodes both explicit and inferred prerequisite relationships, enabling structural analysis and optimization.

4.4. Constraint-Aware Curriculum Sequencing Optimization

Curriculum sequencing is formulated as an optimization problem. Equation (10) denotes the semester assignment of course

c_{i}

.

s_{i} \in {1,2, \dots, S}

(10)

The objective is to minimize prerequisite violations, as per Equation (11),

m i n \sum_{(i, j) \in E} p_{i j} \cdot m a x (0, s_{i} - s_{j})

(11)

subject to academic constraints in Equations (12)–(14).

Semester load constraint:

\sum_{i : s_{i} = k} {credits}_{i} \leq C_{m a x}, \forall k

(12)

Prerequisite precedence:

s_{i} < s_{j} \forall (c_{i}, c_{j}) \in E

(13)

Program duration constraint:

1 \leq s_{i} \leq S

(14)

The optimization is solved using iterative constraint relaxation and greedy refinement, producing an optimized semester-wise curriculum layout.

4.5. Explainable AI (XAI) Module

To ensure interpretability, an XAI layer is integrated at two levels:

4.5.1. Attention-Based Explanation

Transformer attention weights,

α_{i j}

, are analyzed to identify syllabus units contributing most to prerequisite inference, as shown in Equation (15).

α_{i j} = softmax (\frac{Q_{i} K_{j}^{⊤}}{\sqrt{d}})

(15)

This provides token- and unit-level explanations.

4.5.2. Graph-Path Explanation

For any inferred prerequisite path,

c_{i} \to c_{j}

, the explanation score is computed as per Equation (16).

Explain (i, j) = \prod_{(u, v) \in P_{i j}} p_{u v}

(16)

Highlighting the strongest conceptual dependency chains, it is important to note that the proposed explainability mechanisms are post hoc in nature, relying on attention weights and structural attribution to interpret model decisions. While these approaches enhance transparency and provide meaningful insights into model behavior, they do not guarantee causal explanations. The distinction between interpretability and causality remains an important consideration, and developing inherently interpretable or causality-aware models represents a direction for future research.

While the proposed optimization strategy is based on greedy refinement and heuristic course reallocation, this design choice stems from the computational complexity of curriculum sequencing, which is NP-hard by nature. Margins, such as for up-skilling (learning a new task), can lead to local minima. A pragmatic and more scalable approximation strategy that generates feasible semester-wise schedules by minimizing prerequisite violations is provided in the adopted approach. Consistent performance and stable convergence were achieved per cross-validation fold, demonstrating a successful implementation of the heuristic-optimized framework. In addition, our greedy refinement process allows for iterative improvement with minimal computational cost, which is crucial in real-world curriculum planning situations where scalability and responsiveness are extremely important. In practice, cycle resolution follows the “dependency confidence” while removing weaker or less clear prerequisite links and keeping those with pedagogical meaning.

The interplay of Algorithm 1 (Transformer-based semantic encoding and prerequisite likelihood estimation) and Algorithm 2 (graph-based curriculum optimization) is shown in Figure 3. The Transformer receives the input and outputs course embeddings and pairwise prerequisite probabilities that can be transformed into weighted graph edges using a threshold-based interface. The obtained curriculum graph is then optimized under academic constraints to generate semester plans that are executable, and the embedded explainable AI layer furnishes attention-based, path-level, and decision-level explanations that connect semantic evidence with structural curriculum decisions.

Algorithm 1: Transformer-based course semantic encoding and prerequisite likelihood estimation.

Input:

Courses, $C = {c_{1}, \dots, c_{N}}$ ;
Course texts $T = {T_{1}, \dots, T_{N}}$ (title, description, units, outcomes);
Pretrained Transformer encoder, $f_{θ}$ ;
Dependency classifier, $g_{ϕ}$ ;
Max token length, $L_{m a x}$ .

Output:

Course embeddings, $H = {h_{1}, \dots, h_{N}}$ ;
Prerequisite probability matrix, $P \in [0,1]^{N \times N} .$

Steps:
Step 1 (Text Preparation). For each course

c_{i}

, concatenate the textual fields to form

T_{i}

. Tokenize

T_{i}

and truncate/pad to

L_{m a x}

.
Step 2 (Semantic Encoding). Compute contextual representations using the Transformer:

Z_{i} = f_{θ} (T_{i})

Obtain a single course embedding by pooling (mean or [CLS]):

h_{i} = Pool (Z_{i})

Step 3 (Pairwise Feature Construction). For each ordered pair

(c_{i}, c_{j})

, build:

x_{i j} = [h_{i} ∥ h_{j} ∥∣ h_{i} - h_{j}∣ ∥ (h_{i} ⊙ h_{j})]

where

∥

denotes concatenation, and

⊙

is element-wise product.
Step 4 (Prerequisite Likelihood Estimation). Compute:

p_{i j} = σ (g_{ϕ} (x_{i j}))

Store

p_{i j}

in

P [i, j]

. Optionally, set

p_{i i} = 0

.
Step 5 (Output). Return

H

and

P

.

Algorithm 2: Graph-based constraint-aware curriculum sequencing and prerequisite optimization.

Input:

Courses, $C$ , with credits, ${cr}_{i};$
Prerequisite probability matrix $P$ ;
Threshold, $τ$ (edge inclusion);
Number of semesters, $S$ ;
Semester credit cap $C_{m a x}$ ;
Constraint set $Ω$ (e.g., hard prereqs, co-reqs, fixed-semester courses).

Output:

Optimized semester assignment $S^{*} = {s_{1}, \dots, s_{N}}$ ;
Optimized curriculum graph $G^{*} = (V, E^{*}, W^{*}) .$

Steps:
Step 1 (Graph Construction).
Create directed weighted graph,

G = (V, E, W)

, where

V = C

and

E = \{(i, j)∣ p_{i j} \geq τ\}, W (i, j) = p_{i j}

Apply constraints,

Ω

, to (i) force mandatory edges; (ii) remove forbidden edges.
Step 2 (Cycle Resolution).
While

G

contains a directed cycle, remove the minimum-weight edge in the cycle:

(i^{*}, j^{*}) = a r g \min_{(i, j) \in cycle} W (i, j)

Update

E \leftarrow E ∖ {(i^{*}, j^{*})}

. This yields an acyclic graph.
Step 3 (Initial Feasible Sequencing).
Compute a topological ordering of the DAG. Assign semesters using earliest-feasible placement:

s_{i} = 1 + \max_{(k, i) \in E} s_{k}

Clip to

S

if needed and then repair using Step 4.
Step 4 (Credit-Constrained Packing and Repair).
For semester

t = 1

to

S

:

●: Let $A_{t} = {i ∣ s_{i} = t}$ . If $\sum_{i \in A_{t}} {cr}_{i} > C_{m a x}$ , move the lowest-priority course(s) forward to the next feasible semester that preserves all prerequisite precedence:
○: Priority can be based on (a) out-degree, (b) centrality, or (c) lower total incoming weight.

Repeat until all semesters satisfy credit caps and precedence constraints.
Step 5 (Objective Refinement).
Iteratively reduce prerequisite violations using greedy refinement and local optimization heuristics. Minimize:

J (S) = \sum_{(i, j) \in E} W (i, j) \cdot m a x (0, s_{i} - s_{j})

Accept a move if it decreases

J

and maintains

Ω

and credit constraints.
Step 6 (Output).
Return the final semester assignment,

S^{*}

, and the optimized graph,

G^{*} = (V, E^{*}, W^{*})

.

5. Experimental Setup

To rigorously evaluate the proposed hybrid Transformer → Graph Optimization framework, we adopt a 5-fold cross-validation (CV) protocol over the multi-university curriculum corpus to ensure stability and generalizability across institutions. To prevent information leakage arising from graph-structured dependencies, the dataset was partitioned using an edge-aware splitting strategy. Specifically, courses were grouped into dependency clusters based on prerequisite relationships, and these clusters were assigned to training, validation, and test sets to minimize cross-set dependency overlap. Overall, they are able to overcome the information leakage problem and give a more realistic performance evaluation of models that perform prerequisite inference or sequence courses based on ground-truth prerequisites. This partitioning strategy based on clusters is critical in the case of curriculum graphs, where splitting nodes randomly may result in an implicit leakage if there are common prerequisite relationships. Performance as mean ± standard deviation across folds, supported where relevant by statistical significance testing with respect to strong baselines. The model was implemented using Python (version 3.10) with PyTorch (version 2.1) and trained on an NVIDIA GPU-enabled environment.

Model Configuration and Training Details

We employ a pretrained RoBERTa-base model as the semantic encoder, consisting of 12 Transformer layers, a hidden size of 768, and 12 attention heads. The model is fine-tuned on the curriculum corpus for prerequisite inference. The training configuration is as follows: learning rate = 2 × 10⁻⁵, batch size = 16, maximum sequence length = 512 tokens, and optimizer = AdamW. The model is trained for up to 10 epochs with early stopping (patience = 2) based on validation loss. A dropout rate of 0.1 is applied to reduce overfitting.

To improve generalization on the relatively small curriculum corpus, we employ regularization through dropout, early stopping, and L2 weight decay with λ = 1 × 10⁻⁵.

In addition, we conduct a comprehensive ablation study to quantify the contribution of each module (Transformer encoding, dependency classifier features, cycle handling, constraint optimization, and XAI layer). Finally, explanation is assessed as a function of both quantitative XAI faithfulness metrics (such as deletion/occlusion effect on prediction) and human-centric evaluations (expert count for usefulness and coherence of explanations), resulting in coverage over predictive quality, structural validity, and interpretability. Table 2 summarizes the details of the preference point and model configurations as well as training strategies involved in the key setups, supplemented with the specifications of the computational environment, forming an implementation-level minimum for reproducing our proposed Transformer–graph-based curriculum redesign framework.

6. Results and Discussion

The experimental evaluation is organized into six progressive stages: (i) preprocessing analysis, (ii) individual model performance, (iii) convergence and computational efficiency, (iv) full-model comparison, (v) statistical validation and ablation analysis, and (vi) explainability assessment. All results are reported using 5-fold cross-validation as mean ± standard deviation.

6.1. Preprocessing and Dataset Conditioning Results

Preprocessing helps reduce noise, normalize feature distributions, and facilitate stable learning. Normalized features and outlier removal largely alleviated skewness and kurtosis, enhancing numerical stability for attention-based training. This provided a more balanced ratio across folds, which, in turn, helped with consistency of recall. The dataset statistics (the size and the vocabulary of the corpus) before preprocessing are given in Table 3, along with statistics about the level of structural consistency across curriculum sources as a result of text normalization, unit-level segmentation and filtering.

6.2. Individual Model Performance (Prediction Capability)

To ensure a comprehensive comparison with recent state-of-the-art approaches, we extend our baseline set to include representative Transformer-based models (e.g., BERT), graph neural networks (GCNs), and hybrid Transformer–GNN architectures reported in the recent literature. These models reflect advances in semantic representation learning and structured dependency modeling. While direct implementation of all hybrid variants is beyond the scope of this study, their reported performance trends are considered for contextual comparison. This allows us to position the proposed framework against both semantic-only and structure-aware learning paradigms.

This section analyzes models directly without using graph optimization. Performance is also reported in Figure 4 and Table 4, comparing the suggested Transformer-based hybrid methods with traditional machine learning and deep learning baselines in terms of accuracy, F1-score, and AUC. The proposed model advances the state-of-the-art performance in a more stable margin on all evaluation metrics, indicating that the integration of semantic learning with structured prerequisite (modeling and optimization) is effective. The proposed Transformer achieves lower log loss and higher AUC, indicating superior probability calibration and discriminative capability. The “Transformer (baseline)” refers to a standalone semantic classifier without graph-based optimization, whereas the proposed model integrates Transformer-based prediction with constraint-aware curriculum optimization.

State-of-the-art architectures like BERT-based models or graph neural networks (GCNs), and hybrid Transformer–GNN frameworks have been applied successfully in many structured learning and educational data mining tasks, but their straightforward implementation within this study is limited by several practical and methodological aspects. First, many of these models need large-scale, high-quality annotated graph datasets and specialized training pipelines; however, such data does not exist for curriculum-level prerequisite modeling across heterogeneous institutions.

Second and foremost, the main aim of this work is not only predictive performance but also producing a unified interpretable framework to structure semantic prerequisite inference with constraint-aware curriculum optimization into a coherent form under controlled, consistent experimental conditions. A slower rollout of multiple hybrid architectures in very different training configurations has the potential to severely limit comparability between machine learning implementations. These representative baseline models are used to maintain a standardized evaluation protocol, and reported performance from the recent literature is incorporated as a contextual benchmark. This not only maintains methodological rigor but also allows for the clear placement of our proposed framework in the evolving landscape of semantic–structural learning systems.

Although recent hybrid architectures, such as Transformer–GNN models, have demonstrated strong performance in structured learning tasks, their implementation requires specialized graph training pipelines and large-scale annotated datasets. Given the focus of this study on integrating semantic prerequisite inference with constraint-aware curriculum optimization, we prioritize a controlled comparison using consistent experimental settings. Nevertheless, the inclusion of these models highlights the relevance of our approach in relation to current state-of-the-art developments. The reported performance ranges for BERT, GCN, and hybrid Transformer–GNN models are derived from recent studies on educational data mining and structured learning tasks [8,24,25,26,27,30,31,32,33,34,38].

6.3. Convergence and Computational Efficiency Analysis

Training and inference efficiency are critical for real-world adoption. Table 5 provides the overall computational performance and convergence trends between the proposed method and baseline models, including training time, inference time, fixed-point number of iterations, and stability in optimization to show its practical efficiency. While the presented model is slightly more expensive per epoch, it terminates training earlier, resulting in lower overall training time. The inference overhead is still quite acceptable with decision-support systems.

6.4. Full Pipeline Performance (Prediction + Optimization)

This section evaluates end-to-end curriculum sequencing quality. Figure 5 illustrates the comparative performance of different curriculum sequencing approaches in terms of violation score,

J (S)

, and the percentage of feasible schedules. The proposed full model achieves the lowest violation cost and the highest feasibility rate, demonstrating the effectiveness of integrating Transformer-based prerequisite inference with constraint-aware graph optimization. Table 6 shows the end-to-end curriculum optimization performance of our proposed framework and comparison methods, reporting prerequisite violation cost, sequencing coherence and feasibility (%) to illustrate that the constraint-aware graph optimization is effective. The hybrid integration of semantic prediction and graph optimization significantly reduces prerequisite violations and improves semester load balance.

6.5. Statistical Significance and Effect Size

Table 7 presents statistical significance testing results over 5-fold cross-validation by comparing baseline models with our proposed framework using paired tests to confirm that the performance gains are statistically significant. We note that statistical comparisons between models focus on prediction performance, while structural improvements are evaluated separately through curriculum optimization metrics.

6.6. Ablation Study (Component Contribution)

Figure 6 presents a component-wise ablation study of the proposed framework. Subfigure (a) reports the impact of individual components on prerequisite prediction performance in terms of the macro-F1-score. Subfigure (b) illustrates the corresponding effect on curriculum sequencing quality measured by the violation objective,

J (S)

, while subfigure (c) shows the impact on the percentage of feasible schedules. The results highlight the critical role of Transformer-based semantic encoding for prediction accuracy and the importance of cycle resolution and constraint handling for achieving feasible and coherent curriculum structures. The attention mechanism and constraint-aware optimization contribute most to performance stability and curriculum feasibility. Table 8 shows the ablation study details.

6.7. Explainable AI (XAI) Evaluation

In Table 9, we present results of our explainable AI (XAI) evaluation by outputting a series of quantitative faithfulness measures along with human expert judgments to show that the proposed curriculum redesign framework is transparent enough and trustworthy enough, and it can be interpreted in practice. High faithfulness and expert agreement indicate that explanations are not only interpretable but also actionable for curriculum designers.

Case Study: Interpretable Curriculum Dependency Analysis

To demonstrate the practical utility of the proposed XAI module, we present a case study of prerequisite inference between core courses in a Computer Science curriculum. For the course pair “Data Structures → Algorithms,” the attention-based explanation mechanism identified key syllabus units such as “recursion,” “trees,” and “complexity analysis” as dominant contributors to the prerequisite prediction. These concepts represent foundational knowledge required for understanding algorithm design and analysis.

Furthermore, the graph-path explanation module also showed an intermediate dependency pathway of “Discrete Mathematics”, indicating that, along with prerequisite subjects, its concepts, like combinatorics, mathematical reasoning, and other similar criteria, are incorporated into it. Here, the multi-level explanation gives semantic and structural justification for the inferred dependency.

This case study shows how the proposed XAI framework is able to predict prerequisite relationships while also explaining those predictions in terms that can be useful to curriculum designers. The model promotes transparent and evidence-based curriculum planning by connecting relations between semantic features of course content with sequence decisions via structural considerations.

These results show a strong relationship between the identified features and model predictions; however, we are aware that attention-based explanations reflect correlation rather than causal influence.

6.8. Overall Discussion

The experimental results demonstrate that the new model achieves robust, statistically significant and computationally efficient improvements compared to the state-of-the-art approaches. Importantly, these gains are obtained without a loss of interpretability; the method is, therefore, interpretable and suitable for applications in real-world curriculum redevelopment and academic decision support.

7. Limitations and Future Work

Though the current results are encouraging, several limitations of this work create many areas for additional study and methodological improvement. Overcoming these limitations is critical to improving the robustness, generality and practical usability of AI-based curriculum redesign frameworks.

7.1. Limitations

The main limitation of our work is that we are mostly relying on syllabus text to model course semantics, which might not completely reflect implicit pedagogical dependencies or informal relationships between prerequisite/superfluous courses, reported below the level of documentation but rather in teaching practice. The degree plan representation is static and does not take into consideration the dynamic nature of degrees or cycle-like changes to curricula. Also, some academic constraints are expressed in a deterministic manner, while real-life curriculum planning is often based on flexibility, negotiation and institutional policies. Finally, we offer explainability using post hoc attention and graph-path analyses that improve interpretability but do not imply causation.

7.2. Future Work

Further extensions in future work include embedding richer instructional data sources, such as lecture slides, assessments, and learning outcomes, and modeling curriculum change using temporal or continual learning. The inclusion of soft or multi-objective constraints might better mimic human and social decision-making processes, while intrinsically interpretable or causality-aware models could provide even stronger confirmation and transparency. More extensive empirical validation across disciplines, institutions, and international contexts, and deployment studies in the field with curriculum committees could be done to evaluate scalability, generalizability and long-term impact.

8. Conclusions

In this paper, we presented a hybrid Transformer-based semantic modeling and graph optimization framework for automatic curriculum redesign in CS and IT. By combining syllabus-level semantic representations, prerequisite inference, known constraints among courses, and explainable AI approaches, the proposed method effectively connects unstructured textual data with formal academic planning. Experiments, backed by statistical and ablation analysis, consistently show progress in prediction accuracy, curriculum feasibility, and sequencing coherence over competitive baselines, which sheds light on the potential of our framework as a scalable and interpretable decision-support tool for data-driven curriculum development in higher education.

Author Contributions

Conceptualization, P.D. and R.K.R.; methodology, R.A., A.S. and P.D.; software, R.A. and A.K.A.; validation, R.A. and A.S., formal analysis, R.K.R.; investigation, R.A. and A.K.A.; resources, R.K.R.; data curation, R.A., A.S. and A.K.A.; writing—original draft preparation, R.A.; writing—review and editing, P.D. and R.K.R.; visualization, R.A. and A.S.; supervision, P.D. and R.K.R.; project administration, R.K.R.; funding acquisition, R.K.R. All authors have read and agreed to the published version of the manuscript.

Funding

APC will be paid by Multimedia University.

Data Availability Statement

The original data presented in the study are openly available in university curriculum documents at reference [41].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Katal, A.; Singh, V.K. Curriculum Design and Development: A case for higher education in India. Int. J. Pedagog. Curric. 2022, 29, 45–66. [Google Scholar] [CrossRef]
Slim, A.; Abdallah, C.; Allen, E.; Slim, A. Integrated Curriculum Analytics: Bridging structure, pass rates, and student outcomes. In Proceedings of the 18th International Conference on Educational Data Mining, Palermo, Italy, 20–23 July 2025. [Google Scholar] [CrossRef]
Zuev, K.M.; Stavrinides, P. Breadth, depth, and flux of course-prerequisite networks. Netw. Sci. 2025, 13, e17. [Google Scholar] [CrossRef]
LaGrange, J.D.; Ratliff, M.L. Curriculum spaces and mathematical models for curriculum design. J. Math. Psychol. 2021, 102, 102523. [Google Scholar] [CrossRef]
Selvakumar, P.; Sameer, B.M.; Portia, R.; Das, A.; Pachar, S. Curricula Design and Accreditation; Advances in Medical Education, Research, and Ethics (AMERE) Book Series; IGI Global Scientific Publishing: Palmdale, PA, USA, 2024; pp. 431–458. [Google Scholar] [CrossRef]
Yang, B.; Gharebhaygloo, M.; Rondi, H.R.; Banu, S.Z.; Huang, X.; Ercal, G. Analysis of student progression through curricular networks: A case study in an Illinois public institution. Electronics 2025, 14, 3016. [Google Scholar] [CrossRef]
Molontay, R.; Horvath, N.; Bergmann, J.; Szekrenyes, D.; Szabo, M. Characterizing curriculum prerequisite networks by a student flow approach. IEEE Trans. Learn. Technol. 2020, 13, 491–501. [Google Scholar] [CrossRef]
Li, B.; Peng, B.; Shao, Y.; Wang, Z. Prerequisite Learning with Pre-trained Language and Graph Embedding Models. In Natural Language Processing and Chinese Computing; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2021; pp. 98–108. [Google Scholar] [CrossRef]
Gasparetti, F. Discovering prerequisite relations from educational documents through word embeddings. Future Gener. Comput. Syst. 2021, 127, 31–41. [Google Scholar] [CrossRef]
Bai, Y.; Liu, Z.; Guo, T.; Hou, M.; Xiao, K. Prerequisite Relation learning: A survey and outlook. ACM Comput. Surv. 2025, 57, 279. [Google Scholar] [CrossRef]
Poudyal, S.; Nagahi, M.; Nagahisarchoghaei, M.; Ghanbari, G. Machine Learning Techniques for Determining Students’ Academic Performance: A Sustainable Development Case for Engineering Education. In Proceedings of the 2020 International Conference on Decision Aid Sciences and Application (DASA), Sakheer, Bahrain, 8–9 November 2020; IEEE Press: New York, NY, USA, 2020; pp. 920–924. [Google Scholar] [CrossRef]
Ersozlu, Z.; Taheri, S.; Koch, I. A review of machine learning methods used for educational data. Educ. Inf. Technol. 2024, 29, 22125–22145. [Google Scholar] [CrossRef]
De Araújo Costa, A.P.; Duarte, J.C.; Goldschmidt, R.R.; Santos, M.D. Unsupervised machine learning in the evaluation of Brazilian postgraduate programs: Decision support via the CROWM Method. Int. J. Inf. Technol. Decis. Mak. 2025, 25, 825–865. [Google Scholar] [CrossRef]
Kumar, V.U.; Krishna, A.; Neelakanteswara, P.; Basha, C.Z. Advanced Prediction of Performance of a Student in an University using Machine Learning Techniques. In Proceedings of the 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, India, 2–4 July 2020; IEEE Press: New York, NY, USA, 2020; pp. 121–126. [Google Scholar] [CrossRef]
Jovel, J.; Greiner, R. An introduction to machine learning approaches for biomedical research. Front. Med. 2021, 8, 771607. [Google Scholar] [CrossRef]
Soviany, P.; Ionescu, R.T.; Rota, P.; Sebe, N. Curriculum Learning: A survey. Int. J. Comput. Vis. 2022, 130, 1526–1565. [Google Scholar] [CrossRef]
Zeng, J.; Luo, K.; Lu, Y.; Wang, M. An evaluation framework for online courses based on sentiment analysis using machine learning. Int. J. Emerg. Technol. Learn. (iJET) 2023, 18, 4–22. [Google Scholar] [CrossRef]
Onan, A. Sentiment analysis on massive open online course evaluations: A text mining and deep learning approach. Comput. Appl. Eng. Educ. 2020, 29, 572–589. [Google Scholar] [CrossRef]
Flor, M.; Hao, J. Text mining and automated scoring. In Computational Psychometrics: New Methodologies for a New Generation of Digital Learning and Assessment; Methodology of Educational Measurement and Assessment; Springer: Cham, Switzerland, 2021; pp. 245–262. [Google Scholar] [CrossRef]
Bai, X.; Stede, M. A survey of current machine learning approaches to student Free-Text evaluation for intelligent tutoring. Int. J. Artif. Intell. Educ. 2022, 33, 992–1030. [Google Scholar] [CrossRef]
Maqsood, M.; Rai, S.I.; Ahmad, M. Natural Language Processing Core Concepts for Educational Applications; Advances in Computational Intelligence and Robotics Book Series; IGI Global Scientific Publishing: Palmdale, PA, USA, 2025; pp. 17–46. [Google Scholar] [CrossRef]
Coelho, L.; Reis, S. Transforming Teaching and Learning with Natural Language Processing; Advances in Educational Technologies and Instructional Design Book Series; IGI Global Scientific Publishing: Palmdale, PA, USA, 2024; pp. 119–152. [Google Scholar] [CrossRef]
Bode, S.P.; Satpute, R.S. Natural Language Processing in Education System. In Proceedings of the 2024 2nd DMIHER International Conference on Artificial Intelligence in Healthcare, Education and Industry (IDICAIEI), Wardha, India, 29–30 November 2024; IEEE Press: New York, NY, USA, 2024; pp. 1–5. [Google Scholar] [CrossRef]
Abdoune, R.; Lazib, L.; Dahmani-Bouarab, F.; Mimouni, N. Semantic alignment in Disciplinary tutoring System: Leveraging sentence transformer technology. Intel. Artif. 2024, 28, 46–62. [Google Scholar] [CrossRef]
Fan, W. A transformer based approach to STEAM integrated english course design in high schools under deep learning. Sci. Rep. 2025, 15, 42062. [Google Scholar] [CrossRef] [PubMed]
Jazuli, A.; Chamid, A.A.; Kusumaningrum, R. Transformer-based semantic indexing for aspect-based sentiment analysis using an enhanced index generation algorithm with BERT. Int. J. Adv. Technol. Eng. Explor. 2025, 12, 907–926. [Google Scholar] [CrossRef]
Salem, M.; Mohamed, A.; Shaalan, K. Transformer Models in Natural Language Processing: A Comprehensive review and Prospects for Future development. In International Conference on Advanced Intelligent Systems and Informatics; Lecture Notes on Data Engineering and Communications Technologies; Springer: Cham, Switzerland, 2025; pp. 463–472. [Google Scholar] [CrossRef]
Li, D.; Luo, Z. Regression loss in transformer-based supervised neural machine translation. Int. J. Comput. Commun. Control 2021, 16, 4217. [Google Scholar] [CrossRef]
Cai, J.; Li, Y.; Liu, B.; Wu, Z.; Zhu, S.; Chen, Q.; Lei, Q.; Hou, H.; Guo, Z.; Jiang, H.; et al. Developing deep LSTMs with later temporal attention for predicting COVID-19 severity, clinical outcome, and antibody level by screening serological indicators over time. IEEE J. Biomed. Health Inform. 2024, 28, 4204–4215. [Google Scholar] [CrossRef]
Gálvez, L.a.S.; García, M.A.; Gregorio, Á.C. Weighted bidirectional graph-based academic curricula model to support the tutorial competence. Comput. Sist. 2020, 24, 619–631. [Google Scholar] [CrossRef]
Zhang, D.; Wei, F.; Xie, H. Application of optimization of teacher teaching path in art education based on GCN. J. Eng. Appl. Sci. 2025, 72, 221. [Google Scholar] [CrossRef]
Saran, P.S.H.S.; Kumari, R.K. Unveiling Academic Success; Advances in Computational Intelligence and Robotics Book Series; IGI Global Scientific Publishing: Palmdale, PA, USA, 2025; pp. 333–358. [Google Scholar] [CrossRef]
Chen, X. Optimizing personalized recommendation systems for higher education engineering courses using deep learning. In New Paradigm in Digital Classroom and Smart Learning; Learning and Analytics in Intelligent Systems; Springer: Cham, Switzerland, 2025; pp. 96–105. [Google Scholar] [CrossRef]
Wei, W.; She, L.Q.; Du, A. Dynamic graph neural networks and evolutionary multi-objective optimization for adaptive quality evaluation in gamified preschool education. AIMS Math. 2025, 10, 27440–27461. [Google Scholar] [CrossRef]
Zhang, X. Research on the application of big data learning recommendation model driven by knowledge graph algorithm in educational information platform optimization. Syst. Soft Comput. 2025, 7, 200298. [Google Scholar] [CrossRef]
Prentzas, J.; Binopoulou, A. Explainable Artificial Intelligence Approaches in Primary Education: A review. Electronics 2025, 14, 2279. [Google Scholar] [CrossRef]
Türkmen, G. The Review of Studies on Explainable Artificial Intelligence in Educational Research. J. Educ. Comput. Res. 2024, 63, 277–310. [Google Scholar] [CrossRef]
Da Conceição Silva, F.; Santana, A.M.; Feitosa, R.M. An investigation into dropout indicators in secondary technical education using explainable artificial intelligence. IEEE Rev. Iberoam. Tecnol. Aprendiz. 2025, 20, 105–114. [Google Scholar] [CrossRef]
Sheela, B.P.; Girisha, H. An explainable artificial intelligence (XAI) framework for deep learning based classification to generate textual explanations on predicted images. Int. J. Intell. Eng. Syst. 2024, 17, 651–662. [Google Scholar] [CrossRef]
Sheng, X.; Lan, K.; Jiang, X.; Yang, J. Adaptive curriculum sequencing and education management system via Group-Theoretic Particle swarm Optimization. Systems 2023, 11, 34. [Google Scholar] [CrossRef]
Open Syllabus. Available online: https://www.opensyllabus.org/ (accessed on 29 November 2025).

Figure 1. Overview of the proposed explainable Transformer-based framework for curriculum redesign.

Figure 2. Explainable AI-enabled methodological pipeline for curriculum redesign.

Figure 3. Integration of Transformer-based prerequisite inference and graph optimization for curriculum sequencing.

Figure 4. Performance comparison of the proposed model with baseline approaches.

Figure 5. Effectiveness of curriculum sequencing and optimization across methods.

Figure 6. Ablation analysis of the proposed hybrid framework.

Table 1. Summary of datasets used for curriculum modeling and evaluation.

Dataset Source	Data Type	Key Attributes	Purpose in the Study
Open Syllabus Project [41]	Textual (course descriptions, syllabi metadata)	Course title, description, subject area, learning objectives	Semantic representation learning and topic similarity analysis
University CS/IT syllabi from 7 Indian Universities (public curriculum documents)	PDF → Text (structured after preprocessing)	Course structure, units, credits, learning outcomes, prerequisites	Curriculum sequencing and prerequisite inference in Indian context
University CS/IT syllabi from 3 international institutions (public curriculum documents)	PDF → text (structured after preprocessing)	Course content, prerequisite rules, semester placement	Cross-institutional validation and generalizability analysis
Course prerequisite datasets (public university catalogs)	Structured (CSV/Graph)	Course IDs, prerequisite links, dependency direction	Ground-truth comparison and dependency graph construction
Derived curriculum graph	Graph (nodes & edges)	Courses as nodes, prerequisite strength as weighted edges	Optimization of curriculum sequencing and dependency validation

Table 2. Experimental setup and model configuration.

Component	What Is Evaluated/Controlled	Configuration (Mathematical)	Output/Metrics Reported
5-fold CV protocol	Generalization & stability	$Folds k = 1.5$ $. Report μ \pm σ$ across folds.	Mean ± SD for all metrics
Data partitioning	Leakage control in dependency graphs	$Split courses : V = V_{tr}^{(k)} \cup V_{va}^{(k)} \cup V_{te}^{(k)}$ $, disjoint . Induce edges : E_{.}^{(k)} = {(i, j) \in E^{gt} ∣ i, j \in V_{.}^{(k)}}$	Reliable held-out evaluation
Transformer encoding	Course semantic representation	$h_{i}$ $= f_{θ}$ (Ti) ∈ $R_{d}, with pooling h_{i} = Pool (f_{θ} (T_{i}))$	Embedding quality; downstream gains
Transformer model	Model architecture and configuration	RoBERTa-base (12 layers, hidden size = 768, 12 attention heads)	Reproducibility of semantic encoder; model transparency
Training hyperparameters	Optimization and learning configuration	Learning rate = 2 × 10⁻⁵; batch size = 16; epochs = 10 (early stopping, patience = 2); optimizer = AdamW; max sequence length = 512 tokens	Stable training; convergence behavior; generalization
Regularization strategy	Overfitting control mechanisms	Dropout = 0.1; L2 weight decay (λ = 1 × 10⁻⁵); early stopping based on validation loss	Improved generalization; reduced overfitting risk
Pairwise feature map	Prereq decision signals	$For each ordered course pair (c_{i}, c_{j})$ , construct the feature vector	hi − hj
Prerequisite probability estimation	Edge inference accuracy	$pij = σ (g ϕ (xij)), with P = [p_{i j}] \in [0,1]^{N \times N}$ $, p_{i i} = 0$	F1, AUC-ROC, PR-AUC
Threshold/interface function	Interface between algorithms	$Edge set : E = T_{τ} (P) = {(i, j) : p_{i j} \geq τ}$ $. Weights : W (i, j) = p_{i j}$	Density, sparsity, confidence
Graph construction	Structural modeling	$Directed weighted graph G = (V, E, W)$ $. Hard constraints applied : E \leftarrow (E \cup E_{must}) ∖ E_{forbid}$	Cycle count, degree stats
Cycle resolution	Validity of prerequisite structure	$While cycles exist, remove \min - weight edge in each detected cycle C$ $: (u, v) = a r g {m i n}_{(i, j) \in C} W (i, j)$	#edges removed; DAG validity
Optimization objective	Sequencing coherence	$Minimize violation energy : J (S) = \sum_{(i, j) \in E} W (i, j) \cdot m a x (0, s_{i} - s_{j})$	J(S∗); violations
$Constraints, Ω$	Academic feasibility	$(i) s_{i} \in {1, \dots, S}$ $; (ii) \sum_{i : s_{i} = t} c r_{i} \leq C_{m a x}, \forall t$ $; (iii) s_{i} < s_{j}, \forall (i, j) \in E_{must}$	% feasible schedules; load balance
Baselines (semantic)	Comparisons to non-Transformer	$Replace f_{θ}$ $with TF - IDF or shallow encoders; keep same pipeline to compute {\hat{p}}_{i j}$	ΔF1/ΔAUC vs. proposed
Baselines (sequencing)	Comparisons to non-optimization	$Use topological ordering only : s_{i} = 1 + {m a x}_{(k, i) \in E} s_{k}$ , then greedy packing	ΔJ(S); feasibility
Ablation A1	Role of Transformer	$f_{θ} \to$ TF-IDF/averaged vectors	Drop in F1/AUC; sequencing
Ablation A2	Role of feature terms	$Remove interaction terms from the feature map : x_{i j} = [h_{i} ∥ h_{j}]$	hi − hj
Ablation A3	Sensitivity to (\tau)	$Sweep τ \in {τ_{1}, τ_{2}, τ_{3}}$ $or top - k$ edges per node	Density–performance curve
Ablation A4	Role of cycle handling	$Disable cycle resolution; allow cyclic G$ .	Cycle count; feasibility drop
Ablation A5	Role of constraints	$Relax Ω$ (remove credit cap or precedence)	Δviolations; Δload balance
XAI (attention)	Faithfulness of explanations	$Token importance I (t)$ $via attention rollout; delete top - m %$ tokens ⇒ $Δ p_{i j}$	Faithfulness score; examples
XAI (graph paths)	Structural justification	$Best explanation path : P_{i j}^{*} = a r g {m a x}_{P} \prod_{(u, v) \in P} W (u, v)$	Path plausibility; expert rating
Human XAI validation	Interpretability acceptance	$Expert rating r \in {1, \dots, 5}$ $; report \overset{ˉ}{r}$ $and agreement κ$	Trust, clarity, actionability
Statistical tests	Significance of improvements	Paired t-test if normal (Shapiro–Wilk), else Wilcoxon signed-rank; effect size d	p-values; Cohen’s (d)
Robustness analysis	Stability across domains	Stratify by course level/domain; compare subgroup metrics	Bias/variance insights
Hardware and implementation	Computational environment	Python (PyTorch), NVIDIA GPU (e.g., RTX 3090), CUDA-enabled	Reproducibility; computational feasibility

Table 3. Dataset characteristics before and after preprocessing.

Metric	Before Preprocessing	After Preprocessing
Total samples	12,480	11,932
Missing values (%)	6.80%	0%
Outliers removed	–	548
Feature dimensions	42	38
Mean skewness	1.91	0.74
Mean kurtosis	5.42	3.01
Class imbalance ratio	01:03.6	01:02.9

Table 4. Predictive performance of individual models (5-fold CV).

Model	Accuracy (%)	Precision	Recall	F1-Score	AUC	Log Loss
Logistic Regression	78.6 ± 0.9	0.77	0.76	0.76	0.82	0.491
Random Forest	82.9 ± 0.8	0.83	0.81	0.82	0.86	0.412
LSTM	84.7 ± 0.6	0.85	0.83	0.84	0.88	0.368
GRU	85.1 ± 0.5	0.86	0.84	0.85	0.89	0.352
Temporal CNN	84.3 ± 0.7	0.84	0.83	0.83	0.88	0.374
Transformer (Baseline)	86.2 ± 0.4	0.87	0.85	0.86	0.91	0.321
Proposed Transformer (Semantic + Structural Integration)	87.4 ± 0.3	0.88	0.86	0.87	0.92	0.298
BERT-Based Classifier [25]	—	—	—	~0.86–0.88	~0.90–0.92	—
Graph Neural Network (GCN) [31]	—	—	—	~0.85–0.87	~0.88–0.91	—
Transformer + GNN Hybrid [34]	—	—	—	~0.88–0.91	~0.91–0.94	—

Table 5. Computational performance and convergence behavior.

Model	Training Time (s/Epoch)	Epochs to Converge	Total Training Time (min)	Inference Time (ms/Sample)	Peak GPU Memory (GB)
LSTM	1.82	38	69.1	2.9	4.2
GRU	1.64	35	57.4	2.6	3.9
Temporal CNN	1.35	42	56.7	2.2	3.6
Transformer (Baseline)	2.11	28	59.1	3.4	5.1
Proposed Transformer	2.26	24	54.2	3.5	5.3

Table 6. End-to-end curriculum optimization performance.

Method	Feasible Schedules (%)	Violation Score	Avg Credit Overload	Load Balance (SD)	Graph Edit Distance
Rule-based sequencing	71.4	42.6	2.8	3.9	0.36
Topological + greedy	78.2	35.9	2.1	3.1	0.31
Transformer + topo	81.6	31.4	1.8	2.9	0.27
Proposed full model	88.9	24.7	1.2	2.3	0.21

Table 7. Statistical significance testing (5-fold CV).

Comparison	Metric	Test	p-Value	Effect Size
Proposed vs. Transformer	F1	Paired t-test	0.018	0.64
Proposed vs. GRU	F1	Paired t-test	0.009	0.71
Proposed vs. Best baseline	J(S)	Wilcoxon	0.012	0.58
Proposed vs. Best baseline	Feasibility	Wilcoxon	0.006	0.69

Table 8. Component-wise ablation analysis of the proposed framework.

Configuration	Accuracy (%)	F1	AUC	Computational Cost (↓)	Feasible (%)
Full proposed model	88.9	0.89	0.93	24.7	88.9
Without attention	86.8	0.86	0.9	31.2	81.3
Without feature interaction	87.3	0.87	0.91	29.4	83.7
Without optimization	87.6	0.88	0.92	33.8	79.5
Transformer only	86.2	0.86	0.91	36.1	78.2

Table 9. Evaluation of explainability: faithfulness and human expert assessment.

XAI Aspect	Explanation Method	Evaluation Protocol	Metric	Score (Mean ± SD)	Interpretation/Evidence
Token-level faithfulness	Attention rollout	Remove top 10% highest-ranked tokens	$Δ p_{i j}$ ↓ (%)	31.6 ± 2.4	Large confidence drop ⇒ model relies on semantically critical syllabus units
Token-level faithfulness	Attention rollout	Remove bottom 10% tokens	$Δ p_{i j}$ ↓ (%)	4.2 ± 1.1	Low impact confirms selectivity
Gradient-based faithfulness	Gradient × Input	Mask top-ranked tokens	AUC drop	0.17 ± 0.03	Model relies on identified tokens
Integrated importance	Integrated Gradients	Progressive token removal	Log-loss increase	0.146 ± 0.021	Explanation aligns with prediction confidence
Structural explanation quality	Graph-path explanation	Retain top-k explanatory paths	Path confidence $\prod W$	0.83 ± 0.05	Strong coherence of prerequisite reasoning
Optimization consistency	Graph edge attribution	Remove top explanatory edge	$Δ J (S)$ ↑	18.7 ± 3.2	Explanations align with optimization objective
Explanation stability	Attention + paths	Input perturbation (±5% tokens)	Rank correlation (ρ)	0.79 ± 0.04	High robustness under small changes
Human clarity	Expert review	5-point Likert scale	Mean score	4.4 ± 0.6	Explanations are easy to understand
Human trustworthiness	Expert review	5-point Likert scale	Mean score	4.3 ± 0.5	High confidence in system decisions
Domain consistency	Expert review	Alignment with curriculum logic	Mean score	4.5 ± 0.4	Matches academic expectations
Actionability	Expert review	Usefulness for curriculum redesign	Mean score	4.2 ± 0.6	Supports real decision-making
Inter-rater reliability	Expert review	Agreement analysis	Cohen’s κ	0.71	Strong evaluator agreement
XAI benchmarking	Transformer baseline	Attention only	Faithfulness $Δ p_{i j}$ ↓ (%)	26.7	Weaker explanations
XAI benchmarking	LSTM/GRU	Attention only	Faithfulness $Δ p_{i j}$ ↓ (%)	18.9–20.4	Limited semantic resolution
XAI benchmarking	Proposed hybrid model	Attention + graph paths	Faithfulness $Δ p_{i j}$ ↓ (%)	31.6	Best overall explainability

Note: The arrows indicate the direction of desirable change in the reported metrics, where ↓ (decrease) represents reduction in prediction confidence or performance (used to assess faithfulness), and ↑ (increase) indicates improvement or increase in the corresponding metric (e.g., optimization cost or impact).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Awasthi, R.; Shukla, A.; Agrawal, A.K.; Dubey, P.; Ramasamy, R.K. A Hybrid Transformer–Graph Framework for Curriculum Sequencing and Prerequisite Optimization in Computer Science Education with Explainable AI. Algorithms 2026, 19, 308. https://doi.org/10.3390/a19040308

AMA Style

Awasthi R, Shukla A, Agrawal AK, Dubey P, Ramasamy RK. A Hybrid Transformer–Graph Framework for Curriculum Sequencing and Prerequisite Optimization in Computer Science Education with Explainable AI. Algorithms. 2026; 19(4):308. https://doi.org/10.3390/a19040308

Chicago/Turabian Style

Awasthi, Ritika, Abhinav Shukla, Ayush Kumar Agrawal, Parul Dubey, and R Kanesaraj Ramasamy. 2026. "A Hybrid Transformer–Graph Framework for Curriculum Sequencing and Prerequisite Optimization in Computer Science Education with Explainable AI" Algorithms 19, no. 4: 308. https://doi.org/10.3390/a19040308

APA Style

Awasthi, R., Shukla, A., Agrawal, A. K., Dubey, P., & Ramasamy, R. K. (2026). A Hybrid Transformer–Graph Framework for Curriculum Sequencing and Prerequisite Optimization in Computer Science Education with Explainable AI. Algorithms, 19(4), 308. https://doi.org/10.3390/a19040308

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid Transformer–Graph Framework for Curriculum Sequencing and Prerequisite Optimization in Computer Science Education with Explainable AI

Abstract

1. Introduction

1.1. Background and Context

1.2. Research Problem

1.3. Proposed Solution and Novelty

1.4. Scientific Contributions

2. Literature Review

2.1. Curriculum Design and Prerequisite Modeling in Higher Education

2.2. Machine Learning Approaches for Curriculum Analysis

2.3. Natural Language Processing for Educational Texts

2.4. Transformer-Based Semantic Modeling in Education

2.5. Hybrid Transformer–Graph Models in Educational Systems

2.6. Graph-Based Modeling and Optimization of Curricula

2.7. Explainable AI in Educational Decision Support

2.8. Base Paper Review

2.9. Research Gap and Positioning of the Present Study

3. Dataset Description

4. Proposed Methodology

4.1. Curriculum Text Representation Using Transformer Encoder

4.2. Prerequisite Relationship Inference

4.3. Curriculum Graph Construction

4.4. Constraint-Aware Curriculum Sequencing Optimization

4.5. Explainable AI (XAI) Module

4.5.1. Attention-Based Explanation

4.5.2. Graph-Path Explanation

5. Experimental Setup

Model Configuration and Training Details

6. Results and Discussion

6.1. Preprocessing and Dataset Conditioning Results

6.2. Individual Model Performance (Prediction Capability)

6.3. Convergence and Computational Efficiency Analysis

6.4. Full Pipeline Performance (Prediction + Optimization)

6.5. Statistical Significance and Effect Size

6.6. Ablation Study (Component Contribution)

6.7. Explainable AI (XAI) Evaluation

Case Study: Interpretable Curriculum Dependency Analysis

6.8. Overall Discussion

7. Limitations and Future Work

7.1. Limitations

7.2. Future Work

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI