1. Introduction
In higher education, understanding student academic progression is an important institutional challenge. Universities seek to analyse academic pathways and identify factors associated with different academic outcomes in order to improve advising strategies and student support. Predicting student success has therefore become an important research topic in educational data mining and learning analytics, as it enables institutions to analyse progression patterns and better understand factors related to academic performance and retention. Recent OECD reports indicate that student attrition and delayed completion remain significant challenges in tertiary education systems across OECD countries [
1].
Academic programs typically consist of sequences of courses taken across multiple semesters. However, most predictive approaches treat student records as independent observations summarised as aggregated indicators such as grades, number of completed courses, or cumulative performance measures. While informative, these indicators ignore the structural and temporal nature of academic progression. Students do not simply accumulate credits: they follow pathways, repeat courses, change workload across semesters, and combine subjects in ways that may influence later outcomes. These trajectory patterns may contain predictive information that is not captured by flat feature vectors. Since student achievement in prior semesters may impact future enrollment decisions, prerequisite completion, course repetition, and workload distribution, academic advancement is intrinsically sequential. However, the ability of standard tabular models to clearly describe course sequencing and semester-to-semester transitions is limited since they summarise academic history into fixed-length feature vectors. As a result, significant developmental trends that could influence future academic results may go unnoticed.
Graph-based modelling offers a natural representation for such structured educational data. Graph Neural Networks (GNNs) are capable of learning from relational dependencies and sequential interactions and have been successfully applied to domains involving complex structured data [
2,
3]. Graph-based models can simultaneously capture temporal transitions and relational dependencies within academic trajectories, unlike sequential approaches such as RNNs or LSTMs, which primarily model linear ordered sequences. GNNs provide a flexible representation framework for modelling academic progression because student development frequently involves irregular pathways, repeated courses, co-enrollment relationships, and varied enrollment patterns. In the educational context, students and courses form an interconnected system in which relationships between successive enrollments can influence later performance. Modelling these relationships may therefore provide additional predictive signal beyond conventional tabular representations.
In this work, we introduce the Academic Trajectory Graph (ATG), a graph-based representation designed to capture semester-to-semester course transitions and temporal relationships between enrollments within academic records. Rather than focusing solely on final academic indicators, the ATG models how students progress through the curriculum. In contrast to traditional graph-based educational representations that generally depend on static student-course relationships, the proposed ATG integrates temporal progression directly into the graph construction process. Unlike prior educational graph representations based on static bipartite student-course interactions, the ATG explicitly models semester-to-semester transitions as directed relational structures, enabling the representation of progression dynamics within a unified graph framework. This trajectory-aware representation allows learning models to capture dynamic academic pathways across semesters and evaluate whether relational progression patterns provide predictive information beyond conventional tabular features. We evaluate whether this structure provides useful predictive information for student success prediction and analyse the types of academic pathways associated with different outcomes.
This paper is structured as follows. We first review prior work on student success prediction and educational data mining. Next, we describe the dataset, the proposed ATG, and the graph-based predictive modelling framework. We then present the experimental setup and evaluation methodology, followed by an analysis of the experimental results and their implications for academic analytics and student support.
2. Related Work
Recommendation systems have garnered significant interest in educational environments due to their potential to support academic decision-making, improve student satisfaction, and optimise enrollment strategies. Data mining and artificial intelligence approaches have shown promise in assisting students with course selection and academic planning [
4]. These systems are closely related to student success prediction, as both research areas aim to model academic progression, enrollment behaviour, and trajectory patterns across semesters.
In the broader field of Educational Data Mining, predictive analytics is commonly used to support early academic guidance and intervention policies. In many institutional settings, student performance prediction has traditionally been formulated as a supervised learning problem over tabular data, where each student is represented through aggregated academic and demographic attributes such as GPA history, credit load, and enrollment information. Classical machine learning models, including logistic regression, decision trees, random forests, and multilayer perceptrons, have demonstrated strong performance in this context due to the structured nature of educational records [
5]. However, these approaches typically treat observations as independent samples and compress a student’s academic history into fixed-length feature vectors, which may obscure temporal dependencies, progression dynamics across semesters, and relationships between courses. As a result, trajectory-level patterns associated with academic outcomes may remain difficult to capture using conventional tabular representations.
The following section reviews representative approaches to course recommendation and student success prediction in educational data mining and learning analytics.
In [
6], the C2C (Course to Choose) group recommendation model is introduced, utilising students’ collective interests and group characteristics to generate individual recommendations. In addition, the work of [
7] designs a personalised online education platform that enhances course recommendation accuracy and efficiency by improving collaborative filtering through an enhanced algorithm, offering pre-recorded and interactive live teaching options. [
8] implemented sequential pattern mining to identify course sequences followed by high-achieving students, considering factors such as enrollment percentages, time to graduation, and course requirements. A similar approach was adopted in [
9], recommending courses based on predicted learning outcomes for each student. These recommendation-oriented approaches highlight the importance of modelling academic progression and enrollment behaviour. However, many of them focus primarily on recommendation accuracy rather than explicitly representing the structural evolution of academic trajectories across semesters.
Deep learning has been widely used in recommendation systems. For example, in [
10], the authors introduce a novel deep neural network method for e-learning platforms, effectively combining synchronous sequences and heterogeneous features to improve the accuracy of course recommendations amidst a surge in online educational resources.
The UniNet method [
11], for instance, employs deep learning to advise on course order, combination, and quantity. Ref. [
12] focused on aligning course recommendations with students’ backgrounds and preferences, particularly considering their best-performing courses. Clustering algorithms have also been utilised to group students for similarity-based recommendations, as in [
13]. Graph representation has been previously used in recommendation systems in educational contexts. For example, ref. [
14] proposes a method for developing a recommendation system for e-learning, utilising a knowledge model that combines ontology and a two-layer knowledge graph (Rela-KG) to enhance student access to online learning resources in IT courses.
Graph-based approaches attempt to address the limitations of tabular models by representing educational data as relational structures. In such representations, students, courses, and semesters are interconnected entities rather than independent records. Learning outcomes may therefore depend not only on individual attributes but also on course ordering and progression pathways across semesters. These relational and temporal dependencies are difficult to capture through aggregated tabular features because they require modelling interactions and transitions over time. Recent work on temporal graph learning and dynamic relational modelling further suggests that graph-based representations may provide useful mechanisms for capturing evolving dependencies in sequential systems [
15,
16].
In addition, Graph Neural Networks have been used in educational recommendation scenarios; for example, in [
17], the authors prepared a Top-N personalised Recommendation with Graph Neural Networks (TP-GNN) in the Massive Open Online Course (MOOCs) as a solution to tackle this problem. It explored two different aggregate functions to handle the user’s sequence neighbours, then used an attention mechanism to generate the final item representations. Furthermore, ref. [
18] proposed the Meta-Relationship Course Recommendation (MRCRec) to enrich the expression of relational information, which focused on complex semantic information of multi-entity relationships and entity association. It constructed a two-graph structure called the multi-entity relational self-symmetric meta-path (MSMP) and associative relational self-symmetric meta-graph (ASMG), which are referred to as meta-relationship (MR).
Additionally, in this domain, GNNs have been extensively used for predicting student performance. In [
19], the authors propose a novel pipeline for student performance prediction based on multi-topology graph neural networks (MTGNNs). This work designed an MTGNN module for semi-supervised node classification, where each node represents a student and its label corresponds to the student’s performance. This work demonstrated the effectiveness of GNNs in handling complex educational data and providing accurate predictions.
Finally, the work of [
20] arranged a grade prediction model based on a graph neural network. This model was applied to the Ningbo Xiaoshi High School dataset and achieved high accuracy in predicting the grades of senior high school students. This study illustrated how GNNs managed educational data and generated accurate predictions.
Previous studies have already demonstrated the potential of graph-based learning for modelling relational educational data and student performance prediction. Therefore, the objective of this work is to investigate how a temporally structured student-level representation performs across heterogeneous real-world curricula, compared with strong tabular baselines.
In this work, ATG models academic progression as a directed temporal graph that captures semester-level course transitions within a unified relational structure. Rather than introducing a fundamentally new graph-learning paradigm, the proposed representation aims to preserve the sequential structure of academic progression and evaluate whether trajectory-aware relational information provides complementary insights beyond conventional tabular representations across heterogeneous curricular settings.
3. Materials and Methods
The proposed approach represents each student’s trajectory as a directed graph using the ATG representation, facilitating the analysis of both individual and group academic behaviour. Building on this foundation, this work introduces three graph-based approaches for analysing academic progression patterns: DGCNN, which focuses on hierarchical structural extraction; GCN, which incorporates attention mechanisms to weight relational neighbourhood information; and a random-walk-based approach using node2vec embeddings to capture global structural similarities among trajectories. These models generate embedded representations of each student’s ATG, which are subsequently processed through a fully connected neural network to predict the likelihood of success for a given academic trajectory.
This Section begins by detailing the construction of the ATG student and outlining the data and relationships between the subjects it encapsulates. Subsequently, the internal structures and characteristics of the proposed models are explored, highlighting their features and underlying mechanisms.
3.1. Graph Construction
ATGs are constructed from semester-wise course data, enabling the encapsulation of both the temporal sequence and the interconnections among the courses taken by a student. A separate ATG is constructed for each individual student. The data for each student are sorted chronologically by semester in order to arrange the courses in the order they were taken. The sorted data is divided by unique semester dates to segregate the courses into individual semesters. An iteration is performed over each student’s semesters. For each pair of consecutive semesters, we identify the courses taken in each semester.
Figure 1 illustrates an example ATG where nodes correspond to course instances and edges capture transitions between consecutive semesters.
A directed edge is created between every pair of courses where the source course is from the current semester, and the target course is from the next semester. This process effectively models the transition of a student from one course to another between consecutive semesters. The resulting graph for each student is a temporal network in which nodes correspond to courses and directed edges represent the sequential order of courses across semesters.
Each course instance is represented as a distinct node for a specific semester, so the same course taken in different semesters appears as distinct nodes in the graph.
Each edge from course A to course B signifies that the course A was taken in a semester before course B. By constructing the graph in this manner, we maintain the chronological progression of a student’s academic journey.
Although courses belonging to the same semester become implicitly related through shared transitions toward subsequent semesters, the current ATG implementation does not explicitly incorporate intra-semester co-enrollment edges.
The current ATG formulation intentionally focuses on modelling semester-level temporal transitions through a relatively simple directed graph structure. More complex edge semantics, such as weighted prerequisite dependencies or grade-aware relationships, are left for future extensions of the framework.
Each node i is associated with a feature vector . Course node features are numeric descriptors derived from institutional metadata, including credit hours and timetable-related weights (day-of-week and time-of-day indicators when available). In addition, a single student node is included per graph and is represented using structural summary features such as the number of enrollments, the number of unique courses, and the number of observed semesters. To enable processing with homogeneous GNN libraries, student and course nodes are embedded in a common feature space by padding missing attributes with zeros, yielding a feature matrix . The selected node attributes were intentionally kept relatively simple in order to evaluate the contribution of the trajectory-aware graph structure itself, rather than relying on extensive feature engineering.
Only information available up to semester t is used as input. No grades, cumulative GPA, or performance indicators from any future semester are included in the node features. This prevents information leakage during training and evaluation.
A collection of ATGs is produced during the graph-building process, one for each student. These graphs serve as the foundational data structures for our deep learning models that predict student performance based on their academic trajectory. The prediction task consists of estimating the success label (
) from the observed academic trajectory. The graph is constructed using only records available up to t, while the label is defined from the final GPA at program completion. The process of constructing the ATG is shown in Algorithm 1.
| Algorithm 1 Student ATG Creation |
- 1:
procedure CreateGraph() - 2:
- 3:
- 4:
by - 5:
- 6:
- 7:
for each in starting from second element do - 8:
- 9:
- 10:
for each in do - 11:
for each in do - 12:
- 13:
- 14:
end for - 15:
end for - 16:
- 17:
end for - 18:
return - 19:
end procedure
|
3.2. Tabular Feature Representation
In addition to the graph-based representation, we construct a conventional tabular representation of each student’s academic history in order to compare relational and non-relational modelling approaches. For each student, records from all semesters up to time t are aggregated into a fixed-length feature vector.
The tabular representation summarises academic progression using cumulative and temporal indicators, including the number of enrolled courses, accumulated credits, semester index, and performance-related measures. This representation corresponds to the standard formulation in educational data mining, where each student is treated as an independent instance described by aggregated academic attributes.
Unlike the Academic Trajectory Graph, this representation does not explicitly preserve the relationships between courses or the ordering of transitions between semesters. It therefore serves as a baseline to evaluate whether modelling academic trajectories as relational structures provides additional predictive information beyond aggregated features.
3.3. DGCNN Model
In this work, a Deep Graph Convolutional Neural Network (DGCNN) [
21] is employed to learn structural representations from graph-structured academic trajectories. DGCNN is particularly suitable for modelling relational educational data because it can capture structural dependencies and progression patterns within the proposed Academic Trajectory Graph (ATG).
Figure 2 illustrates the overall architecture of the proposed framework.
Let
A denote the adjacency matrix of the graph and
X the node-feature matrix. A graph convolutional layer can be expressed as
where
D is the degree matrix,
W represents the learnable weights, and
F is the activation function. The model stacks four graph convolutional layers, allowing node representations to iteratively aggregate structural information from neighbouring nodes and progressively learn higher-level trajectory patterns.
After the graph convolution operations, a SortPooling layer [
22] is applied to transform graphs with variable numbers of nodes into fixed-size representations suitable for subsequent convolutional processing. The SortPooling operation retains the most informative node representations and produces a fixed-size matrix for each graph, enabling the extraction of both local and global structural patterns.
Following the graph convolution and SortPooling operations, the resulting graph representations are processed using stacked 1D convolutional and pooling layers to extract local structural features and reduce dimensionality. These operations also help preserve permutation invariance with respect to node ordering in the graph representation.
The extracted representations are subsequently flattened and passed through a Dense layer with 128 units and a rectified linear unit (ReLU) activation function. A dropout layer is employed as a regularisation mechanism to reduce overfitting. The final classification layer consists of a Dense layer with a sigmoid activation function [
23], producing the probability that a student belongs to the “good” academic performance category, defined as achieving a GPA greater than 4.
The fully connected classification layers are shared across all three proposed graph-based models, while the primary distinction between them lies in the mechanism used to generate the ATG embeddings.
3.4. Graph Convolutional Network (GCN) Model
In the second model, Graph Convolutional Networks (GCNs) [
24] are employed.
Consider an ATG characterised by an adjacency matrix
A and a node feature matrix
X, where each node signifies a course instance (a course undertaken in a particular semester) and each directed edge denotes a temporal transition between successive semesters. A standard GCN layer enhances node representations by aggregating information from adjacent nodes. According to the commonly utilised normalised formulation, the node embedding propagation at layer
l can be articulated as
In the ATG, neighbouring nodes correspond to courses taken in adjacent semesters. Therefore, the convolution operation propagates information along the temporal dimension of the academic trajectory. After multiple layers, each node representation encodes not only the course itself but also its context within the student’s prior academic progression. After the convolutional layers, node-level embeddings are obtained. These embeddings are aggregated using a graph-level pooling operation to obtain a trajectory-level representation of the student.
The output of GCNs is a graph-level embedding matrix that encodes information about a student’s knowledge and skills acquired in prior courses. The knowledge acquired from different prior courses has different importance for the target course.
An attention mechanism is incorporated to account for the heterogeneous influence of prior courses within academic trajectories. Since not all previously completed courses contribute equally to future academic performance, the attention layer allows the model to assign different importance weights to neighbouring node representations during trajectory encoding. The attention scores are computed using a learnable function obtained through an MLP, as defined in Equations (
3) and
4, where
represents the attention weight associated with node representation
.
Finally, the graph embeddings are weighted by attention scores to form a weighted graph embedding matrix Z (Equation (
5)), and the pooling layer adapts the weighted graph embedding matrix into a latent vector
v.
The embedding vector
v is processed by a fully connected neural network, similar to the previous model, which classifies based on features extracted by the preceding layers. The proposed model architecture is shown in
Figure 3.
3.5. Random Walks Model
The third proposed approach is based on random walks over the ATG structure to capture global relational patterns within academic trajectories (
Figure 4). The embedding generation process is inspired by Graph2Vec [
25] and employs Node2Vec [
26] to learn low-dimensional node representations from stochastic graph exploration.
In this framework, random walks are performed over the directed ATG structure to generate sequences of connected nodes representing local trajectory contexts. Node2Vec then learns embeddings by modelling co-occurrence relationships between nodes that appear in similar walk contexts, enabling the representation of structural similarities between academic pathways.
The resulting node embeddings are aggregated to obtain a graph-level representation for each student trajectory.
The generated graph embeddings are subsequently processed using the same fully connected classification network employed in the previous models to predict student academic performance.
4. Experimental Setup
4.1. Datasets
This study used datasets including student, course, and teacher features to investigate the factors. This data is applied to the three previously described models. The data was sourced from a subset of students at King Abdul Aziz University in Saudi Arabia as part of a research study examining the correlation between student attributes and academic achievement [
27].
The datasets, collected from the university’s system, encompass different faculties, each with varying numbers of students. The data collection was designed to capture information on demographics, academic behaviours, and family backgrounds of both students and teachers. This data can be leveraged to explore the relationship between student characteristics and academic performance.
Since there are 13 departments in this university, 13 datasets were used for the experiments: Faculty of Arts and Humanities, Faculty of Business Rabeg, Faculty of Communication and Media, Faculty of Computing and Information Technology Rabeg, Faculty of Designs and Arts, Faculty of Economics and Administration, Faculty of Engineering, Faculty of Engineering Rabeg, Faculty of Home Economics, Faculty of Information Technology, Faculty of Law, Faculty of Sciences, and Faculty of Sciences and Arts Rabeg.
Each dataset contained a variety of fields about the departments, courses, and students. The number of instances varied across faculties, with the highest being 192,595 for the Faculty of Arts and Humanities and the lowest being 6426 for the Faculty of Engineering Rabeg. Each instance in the dataset represents a specific course taken by a student during a particular semester. Initially collected in numerical, character, and string formats, the data is subsequently converted to integers for further analysis.
Table 1 summarises the main characteristics of the datasets, including the class distribution across faculties. The reported success rates reveal heterogeneous distributions of academic outcomes across curricular settings, which were intentionally preserved in order to evaluate the proposed models under realistic institutional conditions. To ensure methodological consistency across all datasets, missing or incomplete records were removed during preprocessing, and the same preprocessing, feature engineering, and evaluation pipeline was applied across all faculties.
The institutional grading system follows a bounded GPA scale ranging from 0 to 5, where a GPA of 4.0 corresponds to a high academic standing (approximately equivalent to a B+/A− grade level). Therefore, the success label does not represent graduation or completion, but rather sustained high academic performance. The selected threshold was used as a consistent and interpretable criterion across faculties in order to distinguish between different levels of academic achievement within a unified prediction setting.
The proportion of successful students varies substantially across faculties, ranging from approximately 25% to nearly 90%. This variability reflects differences in grading practices, cohort characteristics, and curriculum difficulty, and motivates the use of class-imbalance-robust evaluation metrics such as balanced accuracy, MCC, and PR-AUC To ensure methodological consistency across all datasets, missing or incomplete academic records were removed during preprocessing. In addition, the same preprocessing, feature engineering, and evaluation pipeline was applied across all faculties to reduce institutional variability and maintain comparability between curricular settings.
We formulate the task as a binary classification problem rather than GPA regression in order to distinguish different levels of academic performance while maintaining a simplified and interpretable prediction setting for academic progression analysis.
The tests have been carried out using 10-fold cross-validation across all datasets. To prevent information leakage, cross-validation is performed at the student level. All records for a given student were assigned to a single fold, ensuring that no student appeared in both the training and testing sets within a given iteration. For each faculty dataset, a 10-fold cross-validation procedure is applied. Furthermore, a new model instance is initialised for every fold and trained from scratch. No parameters were carried over between folds, and all preprocessing steps were fitted using only the training partition and subsequently applied to the corresponding test partition.
4.2. Performance Metrics
In this work, several classification metrics were used to evaluate model performance under both balanced and imbalanced class distributions. The primary metrics included accuracy, precision, recall, F1-score, balanced accuracy, Matthews Correlation Coefficient (MCC), ROC-AUC, and PR-AUC. The inclusion of MCC, balanced accuracy, ROC-AUC, and PR-AUC was intended to provide a more robust evaluation in faculties exhibiting heterogeneous class distributions, where accuracy and F1-score alone may produce overly optimistic interpretations.
Accuracy measures the proportion of correctly classified instances:
Precision and recall are defined as follows:
The F1-score corresponds to the harmonic mean of precision and recall:
Balanced accuracy compensates for class imbalance by averaging recall across both classes:
The Matthews Correlation Coefficient (MCC) provides a robust single-score evaluation by considering all entries of the confusion matrix:
In addition, ROC-AUC and PR-AUC were used to evaluate discriminative performance independently of a fixed decision threshold:
Collectively, these metrics provide a comprehensive evaluation framework for assessing predictive performance under heterogeneous class distributions and varying decision thresholds. In particular, balanced accuracy, MCC, and PR-AUC help reduce the risk of misleading performance estimates in faculties exhibiting substantial class imbalance.
4.3. Parameter Setup
The hyperparameter tuning process involved adjusting the number of neurons, activation functions, dropout rates, and optimisation settings for each model. The final hyperparameter configuration was selected through preliminary exploratory experiments aimed at obtaining stable training behaviour across faculties while maintaining manageable computational cost. To ensure comparability between curricular settings, the same hyperparameter configuration was applied across all faculties.
All neural network-based models were optimised using the Adam optimiser [
28] with a learning rate of 0.001 and binary cross-entropy loss. Model selection was performed using early stopping based on validation loss, restoring the best weights obtained during training. Training was carried out using mini-batches of size 50 for graph-based models and 32 for tabular neural networks, with a validation split of 10% within each cross-validation fold.
The fully connected classification network shared across the proposed graph-based approaches consisted of dense layers with ReLU activations [
29], dropout regularisation, and a final sigmoid output layer [
23] producing the probability associated with the binary academic success label.
For the Deep Graph Convolutional Neural Network (DGCNN), the graph convolutional block used four convolutional layers with sizes [32, 32, 32, 1], followed by a SortPooling layer with nodes and two 1D convolutional layers with 16 and 32 filters, respectively. The resulting representations were processed through dense layers with dropout regularisation. The model was trained for up to 150 epochs using early stopping with patience 10.
The attention-based graph convolution model employed two graph attention layers with 32 hidden units and four attention heads per layer, ELU activation functions, and dropout regularisation. Node embeddings were aggregated through mean pooling to obtain graph-level trajectory representations, which were subsequently classified using the shared fully connected network. The model was trained for up to 150 epochs with early stopping (patience 20).
For the random walk baseline, node embeddings were generated using a node2vec strategy with random walks of length 8 and five walks per node. A skip-gram Word2Vec model was trained with an embedding dimension of 128 and a context window size of 10. Graph-level embeddings were obtained by averaging node embeddings within each trajectory graph and subsequently processed using a multilayer perceptron with dropout regularisation and sigmoid output classification. The model was trained using Adam optimisation with early stopping.
All preprocessing, graph construction, and embedding generation procedures were recomputed independently inside each cross-validation fold using only training students and subsequently applied to the corresponding test students, ensuring a fully leakage-free evaluation protocol.
All source code used to construct the Academic Trajectory Graph, train the models, and reproduce the experiments is publicly available in an open repository [
30].
4.4. Tabular Machine Learning Baselines
To provide a fair comparison with non-graph approaches, we constructed a student-level tabular representation derived from the same academic records used to build the ATGs. For each student, historical features were aggregated across all semesters prior to the last observed semester in order to avoid information leakage. The extracted attributes included the number of completed courses, the number of attended semesters, ratios of passed and failed courses, historical average grade, attempted and passed credits, and available demographic and admission information (e.g., age, high-school GPA, admission year, and program). The prediction label is defined from the final cumulative GPA, while features were computed only from preceding academic history.
The tabular baselines and the graph-based models, therefore, use different representations of the same underlying academic history. While the tabular models operate on aggregated descriptors summarising cumulative student performance, the graph-based approaches use the ATG. Consequently, the comparison evaluates different representational paradigms rather than identical feature spaces.
Prior to model training, numerical attributes were standardised using z-score normalisation, and categorical attributes were encoded using one-hot encoding. All preprocessing operations were fit exclusively on the training split within each fold and then applied to the corresponding test split.
We evaluated several conventional baselines, including Logistic Regression (max iterations = 2000), a linear Support Vector Machine with probabilistic outputs, and a Random Forest classifier with 500 trees. In addition, a multilayer perceptron (MLP) is implemented as a neural network baseline. The MLP consisted of three fully connected layers of 32 neurons with ReLU activation followed by a 1024-unit dense layer, each regularised with dropout (rate 0.2), and a sigmoid output layer. The network is trained using the Adam optimiser (learning rate 0.0005), binary cross-entropy loss, batch size 32, and 150 training epochs.
All tabular baselines were evaluated using the same 10-fold student-level cross-validation protocol as the graph-based models. Hyperparameter tuning for the traditional machine learning baselines was conducted under the same validation strategy and experimental conditions adopted for the proposed graph-based approaches in order to ensure fair comparison across all evaluated models.
5. Results and Discussion
All models were evaluated using student-level 10-fold cross-validation across the 13 faculties. Results are reported as the mean ± standard deviation and include accuracy, AUC, F1-score, balanced accuracy, MCC, and PR-AUC. These metrics were selected to provide a robust evaluation under potential class imbalance.
Table 2,
Table 3 and
Table 4 present the detailed performance of the graph-based models for each faculty. Overall, DGCNN achieved the strongest and most consistent performance among the graph-based approaches across most faculties, whereas GCN showed higher variability and noticeably weaker results in several cases. Random Walks/node2vec exhibited moderate performance but with substantial variability across faculties. To complement the descriptive comparison of predictive metrics, a statistical comparison based on Friedman and Nemenyi tests is presented in
Section 5.1 to assess whether the observed differences between models are statistically significant across faculties.
Performance variability across faculties is observed for all models, suggesting that the predictive difficulty depends on faculty-specific characteristics (e.g., cohort size, class balance, and curriculum structure). In particular, Engineering yields lower performance for multiple approaches, indicating that this faculty may present more challenging or less separable patterns under the current feature representation. Some faculties also exhibit relatively large standard deviations across cross-validation folds, particularly for MCC and balanced metrics. This behaviour is more pronounced in faculties with smaller cohort sizes and stronger class imbalance, where small changes in the train/test partition can substantially affect the learned decision boundaries. Consequently, faculty-level results should be interpreted cautiously, and the main conclusions of the study are based on the overall behaviour observed across faculties and the aggregated statistical analysis. These results reinforce the importance of using imbalance-aware metrics when evaluating predictive performance across heterogeneous curricular settings.
A plausible explanation is that the usefulness of graph-based learning depends on the structural diversity of academic pathways within each faculty. Graph neural networks rely on neighborhood message passing to learn discriminative representations; when many students follow highly standardized curricula with nearly identical semester-by-semester course sequences, their local trajectory subgraphs become structurally similar and provide limited additional signal beyond aggregated performance indicators. Conversely, in faculties with more elective flexibility and heterogeneous progression patterns, the ATG captures richer relational variation (e.g., alternative course orders, repeated courses, and different temporal course transitions patterns), which can provide additional discriminative information.
In addition, several faculties also exhibit substantial class imbalance, which affects the interpretation of some performance metrics. In highly imbalanced cases such as Home Economics, some models achieve very high recall while simultaneously obtaining low MCC and balanced accuracy values, indicating a tendency toward majority-class prediction behaviour.
The baseline results (
Table 5) show that traditional models achieve strong predictive performance across most metrics, indicating that aggregated academic indicators, particularly those related to cumulative academic progression, capture a substantial portion of the predictive signal.
Academic success prediction is strongly influenced by prior academic performance. Consequently, models based on aggregated tabular features achieve high predictive accuracy. Indicators such as completed credits, course load, and historical performance summarise the student’s accumulated academic state and are naturally correlated with final outcomes. For this reason, the strong performance of Logistic Regression, Support Vector Machines, Random Forests, and multilayer perceptrons is expected and does not contradict the usefulness of the proposed representation.
However, aggregated representations remove structural information contained in academic records. When student histories are converted into fixed-length feature vectors, the ordering of courses, semester-to-semester transitions, course repetitions, and, in general, temporal relationships are not preserved. Students with similar cumulative statistics may therefore follow substantially different academic pathways. These structural properties cannot be analysed using flat tabular representations even when predictive performance is high. While this does not necessarily translate into large improvements in predictive accuracy, it enables the analysis of trajectory-level patterns that are not accessible through aggregated representations.
The ATG is specifically designed to capture this missing structural and temporal information, enabling the modelling of academic progression beyond static summaries. The ATG preserves the progression of the curriculum and allows learning algorithms to capture relational dependencies between successive enrollments by representing courses as temporally connected nodes. Rather than describing only the student’s academic state, the representation describes the student’s academic progression. The graph-based models, therefore, exploit a complementary source of information to that used by conventional tabular predictors.
This distinction clarifies the interpretation of the experimental results. The ATG complements tabular predictors by modelling aspects of academic structure that cannot be represented through aggregated features. While cumulative performance is highly predictive of final success, understanding how students progress through a curriculum requires modelling course sequences and transitions. The ATG enables the identification of pathway patterns, atypical progressions, and sequential variations in academic pathways, which are not accessible from flat feature vectors.
The benefit of the representation depends on the diversity of academic trajectories. In curricula where most students follow nearly identical course sequences, the relational structure adds limited predictive information beyond accumulated performance, and tabular models may be sufficient. Conversely, in programs with flexible enrollment choices, varied course combinations, or frequent course repetition, trajectory structure becomes more informative and graph-based modelling becomes more valuable.
The main contribution of this work lies in providing a structured representation that enables the analysis of sequential and relational patterns in real-world data. The ATG provides a reusable, structured framework for analysing academic pathways while maintaining competitive predictive performance. Beyond final outcome prediction, the representation provides a structured framework for analysing academic progression patterns and trajectory variability, offering potential value for educational analytics and curriculum-level analysis.
Despite these advantages, the proposed approach also presents limitations. When curricula follow highly standardised and nearly identical course sequences, most students share very similar structural patterns. In such settings, the relational structure provides limited additional information beyond aggregated performance indicators, and simpler tabular models may already capture the dominant predictive signals. Therefore, the benefit of the ATG representation depends on the diversity of academic trajectories present in the dataset.
In addition, graph construction and graph-based learning introduce higher computational complexity than conventional tabular approaches, particularly when processing large-scale academic records and multiple trajectory graphs. Although the proposed framework remained computationally feasible for the evaluated faculties, scalability may become more challenging in larger institutional settings or when incorporating richer relational information. Furthermore, the experiments were conducted using datasets from a single institutional environment. Additional external validation across universities with different curricular structures and academic regulations would be necessary to further assess the generalizability of the proposed representation.
5.1. Statistical Analysis
The statistical significance of differences among the evaluated methods is assessed using the Friedman test [
31]. The analysis is performed across the 13 faculties using the student-level cross-validation protocol, with AUC as the primary performance indicator and including both the proposed graph-based models and tabular baselines.
The Friedman statistic is with a p-value of , which is well below the significance level of . Consequently, the null hypothesis () that all models have equivalent performance is rejected, indicating that at least one method differs significantly from the others.
The average ranks obtained by the compared methods were shown in
Table 6. Lower rank values indicate better performance. These results show that classical tabular models achieve the best overall predictive performance, while graph-based methods exhibit architecture-dependent performance while still providing structural information not captured by aggregated baselines.
The Nemenyi post hoc test [
31] is subsequently applied to determine which pairs of algorithms differ significantly. The results, shown in
Figure 5 and
Table 7, indicate that SVM and Logistic Regression significantly outperform node2vec (
p = 0.0026 and
p = 0.0187, respectively) and GCN (
p = 0.001 for both). The MLP also significantly outperforms GCN (
p = 0.010). In contrast, the differences among the tabular methods themselves are not statistically significant.
Importantly, DGCNN does not significantly differ from the strongest tabular baselines (p > 0.05), although it ranks worse on average. This indicates that while graph-based learning does not consistently improve pointwise classification accuracy, it remains competitive with several traditional machine learning approaches.
Overall, the statistical analysis supports the interpretation that the primary contribution of the proposed approach is representational: the ATG-based models capture relational and temporal information not available in aggregated feature vectors, enabling trajectory-level analysis while maintaining competitive predictive performance.
5.2. Trajectory Diversity Analysis
To evaluate the analytical value of the ATG, we quantified the structural diversity of academic trajectories across faculties.
Table 8 summarises several trajectory-level statistics derived directly from the ATG representation.
The results show that most faculties exhibit a very high number of unique trajectory signatures, with normalised trajectory diversity values ranging from 0.88 to 0.99. In addition, the dominant trajectory pattern represents only a very small fraction of students in most faculties. These findings indicate that academic progression is highly heterogeneous rather than concentrated around a small number of fixed curricular pathways.
The average pairwise semester-level Jaccard distance between trajectories is also relatively high across most faculties, reaching values above 0.70 in several cases. This suggests that students frequently differ not only in the courses they complete, but also in the temporal ordering and semester-level organisation of their enrollments.
These observations provide quantitative evidence that the ATG captures substantial structural variability in academic progression. While conventional tabular models achieve stronger predictive performance for final outcome classification, aggregated feature vectors discard this trajectory-level information because they summarise academic history into fixed-length indicators. In contrast, the ATG preserves sequential and relational properties of student progression, enabling structural analyses that are not directly accessible through tabular representations.
The quantitative diversity analysis demonstrates that academic trajectories are highly heterogeneous across faculties. In addition to this global characterisation, the ATG representation also enables a qualitative analysis of individual academic pathways, which is not possible using aggregated tabular features.
Figure 6,
Figure 7 and
Figure 8 show examples of student trajectories represented as graphs.
Figure 6 corresponds to a student with high academic performance. The trajectory shows a relatively coherent sequence of courses across semesters, with consistent progression and stable course load. In contrast,
Figure 7 and
Figure 8 correspond to students with lower final GPA. Although some courses overlap, their trajectories differ in the early semesters, where course selection and progression patterns diverge.
These examples illustrate that students with similar final outcomes may follow structurally different academic pathways, and conversely, students with superficially similar course sets may differ in the ordering and transitions between semesters. Such structural differences cannot be captured by flat feature vectors, since tabular representations discard ordering and relational dependencies between courses.
Therefore, beyond predictive performance, the ATG provides an interpretable representation of academic progression. This facilitates the identification of trajectory patterns, atypical course sequences, and structural variations in academic pathways, supporting educational analysis and academic advising.
6. Conclusions and Future Work
This study presented a structured representation of Graph Neural Networks (GNNs) for predicting student success across various faculties by introducing the Academic Trajectory Graph (ATG), an approach that explicitly models semester-to-semester temporal course transitions. Rather than focusing solely on predictive accuracy, the proposed framework emphasises the importance of preserving structural and temporal information in academic records.
The tested models leverage the unique capability of GNNs to learn from complex relational structures in educational data, demonstrating strong performance in terms of accuracy, AUC, precision, and recall across most faculties. However, comparison with conventional tabular baselines shows that aggregated academic indicators already provide a strong predictive signal, indicating that the primary contribution of the ATG lies not only in prediction but in representing academic progression as a structured process. The statistical analysis indicates that DGCNN achieved the strongest overall performance among the graph-based approaches, while GCN obtained lower average rankings across faculties.
However, the performance of the models is not uniform across all faculties, emphasising that predicting academic outcomes is not a one-size-fits-all task. Each faculty exhibits distinctive characteristics, challenges, and data distributions that must be considered when designing predictive models. In particular, faculties with more constrained curricula and highly standardised academic pathways show smaller benefits from relational modelling, whereas heterogeneous trajectories provide richer structural information that can be exploited by graph representations. Additionally, some faculties exhibit substantial class imbalance, which may affect model stability and contribute to majority-class prediction behaviour in certain cases. While the proposed approach establishes a structured foundation for analysing academic progression, further refinement and customisation are necessary to address faculty-specific characteristics and improve robustness across diverse academic contexts. The present study does not incorporate imbalance-aware optimisation strategies such as class weighting, focal loss, or resampling techniques. Future work should investigate whether these approaches improve robustness and stability in faculties exhibiting strong class imbalance.
Scalability and faculty-specific biases also present important challenges for future research. As the models are expanded to larger and more diverse datasets, differences in data quality, feature distributions, grading standards, and institutional policies may introduce systematic biases that impact fairness and generalizability. In addition, the present study is based on data from a single institution, which limits the external validity of the reported results despite the heterogeneity observed across the 13 evaluated faculties. Cross-institutional validation will therefore be necessary to assess the robustness and transferability of the proposed representation under different academic structures and educational contexts. Mitigating these issues will require robust normalisation and bias-correction strategies to ensure equitable model performance across academic domains.
Graph-based approaches also introduce higher computational complexity than conventional tabular models due to graph construction, message passing operations, and increased memory requirements during training. This represents an additional trade-off of the proposed representation, where richer structural modelling is obtained at the expense of higher computational cost. Future work should therefore investigate more scalable architectures and optimisation strategies for large-scale educational datasets.
Beyond prediction, the ATG provides a reusable data structure for analysing academic pathways. Because trajectories are explicitly represented, the framework facilitates the study of progression patterns, atypical course sequences, and structural variations in academic pathways, supporting educational analytics, academic advising, and curriculum-level analysis.
Accordingly, course recommendation should be interpreted as a potential downstream application enabled by the predictive framework, and not as a system evaluated within the scope of this study.
Moreover, incorporating dynamic graph structures could enable models to capture temporal evolutions in student interactions and academic behaviours, providing a more realistic representation of the learning environment. Future work will therefore focus not only on improving predictive models but also on leveraging the ATG representation for longitudinal analysis of student progression and curriculum structure. Finally, cross-institutional validation would offer a valuable benchmark for evaluating model transferability and robustness, paving the way for scalable and generalizable student success prediction systems. Future research may also explore temporal graph neural networks and explainable AI techniques to improve the interpretability of trajectory-level predictions and better understand the relational factors associated with academic progression [
32,
33].