1. Introduction
With the rapid advancement of educational digitalization and online learning platforms, personalized learning has emerged as a critical direction for improving educational quality [
1]. Within this context, student performance prediction is regarded as a key component, as it not only enables teachers to adapt instructional strategies with greater precision but also provides students with more effective learning guidance, thereby enhancing overall learning outcomes [
2]. However, existing adaptive learning technologies predominantly focus on modeling surface-level behavioral data, which limits their ability to capture the complex structures and latent semantic relations underlying the learning process—particularly the multi-layered interactions among students, knowledge concepts, and learning items. This limitation becomes especially pronounced when dealing with heterogeneous educational data and high-dimensional semantic information, highlighting the urgent need for more refined and intelligent predictive modeling approaches.
Recent research on predicting student performance generally employs either sequential models or graph neural networks (GNNs). Sequential approaches are designed to capture temporal dependencies in learners’ behavioral records, while GNN-based methods focus on modeling the relational structures linking students with educational content, such as learning activities and assessment items [
3,
4]. To address the challenge of modeling fine-grained student–answer interactions, Ni et al. [
5] introduced a signed bipartite graph framework that incorporates positive and negative edges to enhance the density of structural associations.
Despite their strengths in structural modeling, GNN-based methods struggle to effectively extract the deep semantic features embedded within nodes. This limitation has led to performance bottlenecks when handling complex, multimodal educational data. To overcome this barrier, recent studies have begun integrating natural language processing (NLP) techniques into educational contexts, aiming to capture the rich semantic information within both learning materials and student behaviors. Large language models, in particular, demonstrate strong capabilities in text understanding and knowledge transfer, providing valuable semantic priors for multiple-choice questions and related educational content. However, LLMs alone cannot directly generate features tailored for graph-structured learning, while GNNs remain constrained in their ability to represent node semantics at appropriate scales and levels of granularity. Even when LLM-derived embeddings are incorporated, the overall predictive improvement remains limited.
These challenges underscore the urgent need for a customized framework that can jointly integrate high-order graph structures with fine-grained semantic representations. In this direction, Wang et al. [
6] proposed the LLM-SBCL model, which attempts to combine LLMs with GNNs for joint structural and semantic modeling. Although this approach demonstrates certain performance gains, it still fails to adequately capture higher-order student–item interaction patterns and lacks a multi-scale representation mechanism capable of reflecting both group-level learning preferences and individual-level variations. Addressing these unresolved issues points to the critical next step in advancing student performance prediction models.
To address the limitations of existing student performance prediction models in capturing higher-order interactions and semantic information, we propose EduSheaf, a unified framework that integrates signed graph representations with large language models to enhance both prediction granularity and generalization. In this framework, students and multiple-choice questions are modeled as graph nodes, while responses are encoded as signed edges, with positive and negative signs corresponding to correct and incorrect answers, respectively. This formulation enables the framework to jointly represent structural characteristics and interaction patterns within the learning process.
The sheaf Laplacian operator provides a powerful mathematical tool for graph representation by assigning vector spaces to nodes and edges and defining linear maps between them, thereby embedding local cellular structures into the representation [
7,
8,
9]. EduSheaf leverages this operator for signed graph learning, significantly enhancing its modeling capacity. This allows the network to capture complex node relations and higher-order dependencies while maintaining local consistency. Building on this foundation, we design a sheaflet-based signed graph neural network, which employs multi-resolution signal processing to derive fine-grained node representations. Specifically, the low-pass filters highlight group-level behavioral patterns among students, while the high-pass filters emphasize individualized deviations, enabling multi-scale and fine-grained interaction modeling. Furthermore, the wavelet-based decomposition and reconstruction enhance the compactness and robustness of representations, mitigating the impact of noise.
In this study, we incorporate semantic embeddings derived from large language models (LLMs) into a signed graph setting, thereby enriching the semantic representation of student–content interactions. Comprehensive experiments conducted on diverse educational datasets show that EduSheaf consistently achieves superior performance over existing state-of-the-art methods in terms of both accuracy and robustness. These results highlight the efficacy of its multi-scale representation strategy. More broadly, EduSheaf offers a new perspective for modeling complex learning dynamics and provides a technical foundation for advancing adaptive learning and supporting fine-grained instructional decision-making.
Key Contributions: The main contributions of this work are articulated as follows:
Novel Task Reframing: We reconceptualize student performance prediction as a signed graph learning problem. Correct and incorrect responses are encoded as positive and negative edges, respectively, yielding a representation that integrates structural dependencies with semantic signals in a principled manner.
Sheaflet-based SNN Architecture: We design a signed graph neural network grounded in cellular sheaf theory and equipped with wavelet-like transforms for multi-resolution analysis. This design employs low-pass filters to capture collective learning tendencies and high-pass filters to highlight individual variations, thus enabling the model to disentangle global and local learning patterns.
Integrated Structure–Semantics Framework: We propose EduSheaf as a unified framework that fuses structural modeling with semantic enrichment from LLMs. By aligning graph-based structural information with semantically rich embeddings, EduSheaf provides a more holistic understanding of learner–content interactions. Extensive evaluation on multiple benchmark datasets confirms its predictive advantages and underscores the benefits of structure–semantic co-learning.
The remainder of this paper is organized as follows:
Section 2 reviews relevant studies on student performance prediction, signed graph neural networks, and the application of large language models in education.
Section 3 introduces the necessary preliminaries to establish the foundation for our framework.
Section 4 presents a detailed description of the proposed approach, followed by
Section 5, which reports the experimental design and evaluation results. Finally,
Section 6 concludes the paper with a summary of the key findings and a discussion of potential directions for future research.
2. Related Work
We investigate four domains: student performance prediction, signed graph neural networks, sheaf theory for graphs, and large language models in education. Prior studies in these areas highlight significant advances, yet limitations remain in handling data sparsity, capturing signed structural dependencies, and leveraging semantic knowledge effectively. Our work builds on these insights by integrating structural and semantic modeling within a unified framework for student performance prediction.
2.1. Student Performance Prediction
The central objective of student performance prediction is to anticipate learners’ outcomes in tasks such as assignments, examinations, or course completion based on their historical learning trajectories [
10]. Such predictions allow educators to deliver targeted interventions and personalized support. Early approaches primarily relied on traditional machine learning techniques, including logistic regression, decision trees, and support vector machines. While effective in limited contexts, these models were heavily dependent on manual feature design and failed to capture the intricate dynamics of student learning. With the rise in deep learning, architectures such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) gained prominence for their ability to automatically extract high-dimensional features and model complex behavioral patterns, thereby achieving notable improvements in predictive accuracy [
10].
To advance personalized education, subsequent studies have developed two major paradigms: static models and sequential models [
11,
12]. Static models (e.g., [
13,
14]) utilize fixed historical data, offering reasonable accuracy but lacking adaptability to the evolving nature of student knowledge. In contrast, sequential approaches such as knowledge tracing and its variants [
15,
16] dynamically model students’ learning progression, enabling real-time updates of knowledge states and improved predictive precision in interactive learning environments.
Despite these advancements, two critical limitations remain. First, most existing methods struggle to capture the highly nonlinear and interdependent relationships between learners and educational content. Second, the deeper semantic context embedded in instructional materials is often neglected, weakening the interpretability and contextual richness of predictions. To address these gaps, we propose the EduSheaf framework, which employs signed graph structures to represent fine-grained student–question interactions while incorporating semantic embeddings derived from large language models. This integration of structural dependencies and semantic information enables more accurate, context-aware predictions and offers deeper insights into learner behaviors.
2.2. Signed Graph Neural Networks
In modeling student performance, signed bipartite graphs provide a natural framework, where positive edges represent correct answers and negative edges denote incorrect ones [
17,
18]. Early studies on signed graph learning primarily adopted embedding-based techniques, including SIDE [
19] and SGDN [
20], as well as signed Laplacian and matrix factorization approaches [
18]. These methods attempted to encode signed information into compact representations but often struggled to capture the intricate structural dependencies of signed networks. Later, neural architectures were introduced, such as SGCN [
17], which extends GCNs [
21] by integrating balance theory, and SiNE [
22], which leverages triangular motifs to strengthen balance-aware modeling. While these models improved the understanding of local structures, they still lacked the ability to effectively reflect global consistency and antagonistic relations across the graph.
More recently, contrastive learning has become a powerful paradigm for representation learning on signed graphs. Approaches such as SGCL [
23], SBGNN [
24], SBCL [
6], and MOSGCN [
25] apply contrastive objectives to refine signed graph embeddings, demonstrating notable improvements in capturing local interactions. However, these frameworks predominantly focus on pairwise relations, limiting their capacity to uncover higher-order dependencies and collaborative patterns among multiple nodes—factors particularly important in complex educational settings.
In parallel, graph wavelet-based methods [
26,
27,
28] have gained traction for their ability to conduct multi-scale spectral analysis, offering new insights into structural characterization. Yet, their application to signed graphs remains underexplored, especially regarding the complementary roles of low-pass and high-pass components and their integration with advanced algebraic tools such as sheaf theory. Although Chen et al. [
29] made an initial step toward combining wavelet analysis with sheaf structures, their focus was restricted to general graphs without addressing the specific challenges posed by signed networks.
These gaps underscore the demand for a more comprehensive framework capable of simultaneously capturing local discriminative cues, maintaining global structural coherence, and accommodating the semantic complexity of signed graphs. To bridge this gap, we propose EduSheaf, a sheaflet-inspired model that unifies cellular sheaves with framelet transforms. By jointly exploiting high-pass details and low-pass trends across multi-frequency domains, EduSheaf enables expressive, robust, and semantically enriched representation learning for signed graphs.
2.3. Sheaf Theory for Graphs
Sheaf theory has recently gained attention as a unifying paradigm for extending graph learning by promoting local consistency while enabling richer global reasoning. In contrast to conventional graph-based approaches that employ the standard Laplacian operator, SheafNN [
8] adopts the sheaf Laplacian [
9] to construct more expressive diffusion dynamics, thereby broadening the representational capacity of graph neural architectures. Building on this foundation, subsequent research has proposed several improvements: Attention mechanisms have been introduced to adaptively capture neighborhood dependencies [
30]; positional encodings have been designed to strengthen the structural expressiveness of sheaf models [
31]; and domain-specific adaptations, such as personalized federated learning [
32] and recommendation tasks have demonstrated the versatility and transferability of sheaf-based techniques.
Parallel efforts in algebraic topology have also contributed to expanding the methodological scope of sheaf theory. For example, Ayzenberg et al. [
33] proposed a poset-oriented algorithm for sheaf cohomology, offering novel ways to incorporate topological structures into machine learning.
Nevertheless, the majority of existing sheaf-based approaches operate within the constraints of single-frequency diffusion or shallow propagation schemes. The absence of systematic exploration into multi-frequency filtering and spectral decomposition limits the ability to integrate sheaf theory with advanced signal processing tools such as wavelets and multi-scale spectral analysis. Addressing this gap is essential for enhancing the expressiveness and interpretability of sheaf-based learning frameworks when applied to complex relational data.
2.4. LLM for Education
The rapid advancement of large language models is profoundly reshaping the landscape of educational research and practice, particularly in the areas of personalized learning, automated assessment, and intelligent educational support [
34,
35]. Early efforts, such as GPT-3 [
36], have already demonstrated remarkable few-shot learning capabilities, showing that these models can interpret students’ learning behaviors and linguistic expressions to generate personalized learning resources and pathways, thereby significantly improving learning outcomes. However, deploying such models in authentic educational settings remains challenging. A primary obstacle lies in the reliance on large-scale annotated datasets during training, while educational data are often scarce, noisy, and imbalanced, which hampers generalization and limits stable deployment.
With the continued scaling and enhancement of model capabilities, the GPT-4 technical report [
37] further revealed that LLMs can perform at a level comparable to students in standardized tests across mathematics, physics, and computer science, while effectively handling both multiple-choice and open-ended questions. This expanded capacity has unlocked new opportunities for intelligent educational applications. Recent studies have further validated LLMs’ strengths in supporting reading, writing, and knowledge integration. For instance, Susnjak & McIntosh [
38] showed that ChatGPT-4 can generate logically coherent and comprehensive answers to multidisciplinary problems; Malinka et al. [
39] reported quantitative evidence that students who refined their answers with LLM assistance achieved significantly higher scores in a computer security course than their peers in the control group. In addition, several survey studies [
40] have synthesized the state-of-the-art applications of LLMs in education, highlighting their potential in facilitating teacher–student interaction, optimizing personalized learning strategies, and enhancing the automation of assessment and feedback. Meanwhile, Ravi et al. [
41] collaborated with K-12 teachers to co-design LLM-powered tools, focusing on in-situ design processes, teacher needs, risk assessment, and classroom integration strategies, thereby offering critical insights into teacher-centered design and school-based applications.
As powerful semantic modeling tools, LLMs have already shown the ability to extract deep semantic information from multimodal educational data. This capability opens new avenues for tackling complex prediction tasks in educational contexts. Motivated by this, and from the perspective of unifying semantic modeling with structured learning, we propose the EduSheaf framework, which for the first time integrates LLMs with signed graph learning for student performance prediction—addressing a critical research gap in this domain.
3. Notation and Problem Formulation
This section aims to formalize the modeling of interactions between students and multiple-choice questions. The interaction data not only contain students’ response outcomes but also incorporate textual information such as question stems, options, answers, and explanations, which provides opportunities for constructing richer semantic representations. To systematically capture these interactions, we adopt a signed graph modeling approach, where students and questions are represented as distinct types of nodes, and students’ responses to questions are encoded as edges with positive or negative signs.
Signed Graph. We define the graph as , where represents the set of students, and represents the set of questions, with ). The edge set represents pairwise relationships between students and questions, partitioned into positive edges () and negative edges (), such that , and .
Task Definition. Within this signed graph framework, student performance prediction is formalized as a signed edge prediction problem. Specifically, given a signed graph G, the goal is to learn low-dimensional embeddings for students and questions: each student is associated with an embedding , and each question with an embedding , where and d denotes the embedding dimension. A mapping function is then applied to predict the sign of edge .
LLM-based Semantic Extraction. In the EduSheaf framework, we leverage large language models to enrich the semantic representation of multiple-choice questions. Specifically, question stems, options, and explanations are reformulated into structured sentences, from which the LLM extracts key concepts and their associated weights. These are then combined with GloVe embeddings to generate weighted semantic representations. By integrating this semantic information with the structural features of signed graphs, we obtain more comprehensive node embeddings that enhance prediction performance. Compared with traditional approaches, this strategy effectively addresses the underutilization of semantic cues in MCQ representation and introduces a novel perspective for fusing textual and graph-structured data in educational contexts.
To illustrate the semantic extraction process, we provide a representative example. Suppose an MCQ is authored by a first-year law student and includes a stem, several answer options, the correct answer, and an explanation. The objective is to identify key knowledge points that capture the essential legal concepts reflected in the question. To guarantee the structural validity of the LLM-generated outputs, we employ a constrained prompt that explicitly enforces a strict JSON schema, including predefined fields for semantic keywords and their corresponding weights. The model is instructed to return a single, well-formed JSON object without any additional natural language text. At runtime, each response is automatically parsed and validated using a schema-based checker. We verify that (i) all required fields are present, (ii) keyword weights are numerical and bounded in [0, 1], and (iii) the sum of weights is normalized to 1. If any of these conditions fail (e.g., malformed JSON, missing fields, or invalid normalization), the query is automatically re-issued using a fallback prompt with stricter formatting constraints. In the rare case that repeated attempts fail, the corresponding question is assigned a default semantic representation derived from the average embedding statistics of the training corpus, ensuring robustness of the preprocessing pipeline and preventing interruptions to downstream model training. For instance, given a contract law question, the LLM generates a response such as:
{‘‘Keywords’’: [
{‘‘keyword’’: ‘‘Anticipatory breach’’,
‘‘percentage’’: 0.5},
{‘‘keyword’’: ‘‘Repudiation’’,
‘‘percentage’’: 0.2},
{‘‘keyword’’: ‘‘Contract law’’,
‘‘percentage’’: 0.1},
{‘‘keyword’’: ‘‘Unequivocal renunciation’’,
‘‘percentage’’: 0.1},
{‘‘keyword’’: ‘‘Common law’’,
‘‘percentage’’: 0.1}
]
}
This structured output highlights the primary legal notions embedded in the MCQ while preserving proportional relevance, thereby enabling the integration of semantic knowledge into downstream graph-based representations. A more detailed description of the large language model processing pipeline is elaborated in
Appendix A.
In our implementation, semantic embeddings are generated using the OpenAI API with the model identifier gpt-4-0613 (June 2023 snapshot) as an offline preprocessing step. Each multiple-choice question is processed independently through a fixed, structured prompt, and the extracted keywords and weights are cached and reused throughout all training and inference stages. To ensure reproducibility, we adopt a deterministic decoding strategy by fixing the inference parameters, with the temperature set to 0.0 and nucleus sampling disabled . The maximum output length is capped at 256 tokens, which is sufficient to accommodate the structured JSON-style output containing extracted semantic keywords and their associated weights. Under this configuration, repeated queries with identical prompts yield consistent outputs. The average processing time is approximately 0.8–1.2 s per question. Under the OpenAI pricing scheme at the time of experimentation, the total preprocessing cost for the largest dataset is on the order of tens of USD, which constitutes a minor overhead relative to model training and evaluation.
The key notations used in this paper are summarized in
Table 1 for the reader’s convenience.
Remark 1. In this work, student performance prediction is formulated under an edge-split transductive signed edge prediction setting. The original dataset consists of labeled student–question interactions, where positive edges correspond to correct responses and negative edges correspond to incorrect responses. The training, validation, and test sets are constructed by randomly partitioning this set of labeled signed edges. Consequently, the test set is composed of held-out observed interactions with known ground-truth labels, rather than artificially generated negatives sampled from unobserved student–question pairs. In this context, we define “unobserved” interactions as student–question pairs for which no response is recorded in the dataset, i.e., the student has not attempted the question and no label is available. Importantly, we do not assume a missing-at-random mechanism and we do not perform additional negative sampling from these unobserved pairs during either training or evaluation. This protocol ensures that the reported performance reflects the model’s ability to generalize to unseen labeled interactions, rather than its sensitivity to a particular negative sampling strategy.
4. Proposed Method
4.1. Framework Overview
As depicted in
Figure 1, the EduSheaf framework is composed of three tightly coupled modules, each addressing a different stage of the modeling process. First, the LLM-based semantic extraction module processes multiple-choice questions by parsing their key elements—such as stems, candidate options, ground-truth answers, and explanatory texts—and encoding them into semantic embeddings enriched with contextual and pedagogical knowledge. This design allows the framework to capture fine-grained semantic variations across questions, which is often overlooked in conventional representations.
Second, the signed graph construction module formalizes the interactions between students and MCQs into a signed bipartite graph structure. In this representation, students and questions are modeled as heterogeneous nodes, while correct and incorrect responses are encoded as positive and negative edges, respectively. Such a formulation enables the graph to preserve both supportive and conflicting learning signals, thereby offering a more expressive characterization of student performance patterns.
Finally, the sheaflet-based signed graph neural network constitutes the central innovation of EduSheaf. By integrating low-pass filters that capture global learning trends with high-pass filters that emphasize local discrepancies, this module facilitates multi-frequency feature learning on signed graphs. The incorporation of sheaflet operators further strengthens the capacity to model complex dependencies, making the framework more robust to sparsity and heterogeneity in educational data. This novel neural architecture will be introduced in greater detail in the following section.
4.2. Sheaflets on Signed Graphs
In this section, we introduce EduSheaflet, a sheaflet-based signed graph neural network framework that unifies cellular sheaf theory with framelet transforms to jointly capture low-pass and high-pass information on signed graphs. To provide a rigorous basis for this design, we begin by formalizing the concept of cellular sheaves defined over signed graphs and outlining the corresponding linear sheaf Laplacian, which serves as the mathematical foundation for subsequent model construction.
Basics of Sheaves on graphs. A cellular sheaf on a signed graph is specified by the following components:
A vector space for each node ;
A vector space for each positive edge ;
A vector space for each negative edge ;
A linear map for each incident node–positive-edge pair ;
A linear map for each incident node–negative-edge pair .
The vector spaces associated with nodes and edges are referred to as stalks, while the corresponding linear maps are called restriction maps. Grouping these spaces leads to two fundamental cochain spaces: the collection of node stalks forms the space of 0-cochains, and the collection of edge stalks forms the space of 1-cochains.
More generally, a cellular sheaf
on a signed graph can be described as a triple
where (i)
are vertex stalks, i.e., vector spaces attached to each node
v; (ii)
are signed edge stalks, i.e., vector spaces attached to each edge
e; (iii)
are restriction maps defined for each incident node–edge pair.
On this basis, the linear sheaf Laplacian is defined as
where
d is the dimension of the stalks, and each restriction map
governs the information flow from node
v to edge
e.
The normalized sheaf Laplacian is expressed as
where
D is the block-diagonal degree matrix of
. If the underlying graph has
n nodes, the resulting sheaf Laplacian has dimensions
.
Construction of Sheaflets on Signed Graphs. Consider a signed graph
with
N nodes and the associated signed graph Laplacian
L. Following the principles of framelet construction on graphs, we extend this methodology to define
sheaflets on graphs. Let
denote the eigenpairs of the linear sheaf graph Laplacian
. For
and
, the undecimated sheaflets
and
at scale
j are defined by
The scaling functions
are associated with a filter bank
, satisfying for all
:
where
denotes the Fourier transform of
h, defined as
Here, serves as the low-pass scaling function, while corresponds to the high-pass functions, with n denoting the number of high-pass channels. This construction naturally captures both coarse-grained and fine-grained information of graph signals, thereby enabling multi-scale representations that integrate global and local structures.
The framelet coefficients
are defined as the inner product between the sheaflet basis and a graph signal
, where
d is the feature dimension. The coefficient matrices share the same dimensions as the node feature matrix
X:
Let
denote the decomposition operators such that
and
. From Equation (
8), the framelet transform operators can be expressed as
Signed Graph Neural Networks with Sheaflets ( GNN). Due to the intrinsic properties of sheaflet decomposition and reconstruction, the following constraint holds:
which guarantees perfect reconstruction of the graph signal
F. Within our framework, we employ Signed Graph Neural Networks with Sheaflets (
GNN) to model node feature propagation on signed graphs. The update rule for node representations is defined as:
where
and
are learnable filter coefficients corresponding to low- and high-frequency components, respectively.
denotes a shared trainable weight matrix, and
is a nonlinear activation function such as ReLU.
This formulation enables the GNN to jointly capture multi-scale patterns in signed graphs: the low-frequency component encodes shared global structures, while the high-frequency component highlights individual node variations. By integrating sheaflets into the graph convolution, the model preserves local consistency at the edge level and effectively propagates information across heterogeneous and signed interactions, providing a unified mechanism for learning rich, multi-resolution node representations.
4.3. Training Objective
The EduSheaf framework formulates student performance prediction as a link sign prediction task on a bipartite graph of students and questions. To this end, semantic information is incorporated into question nodes, while latent representations are jointly learned for both students and questions. The prediction objective is to infer the polarity of unobserved interaction edges, thereby capturing not only a student’s hidden proficiency with respect to a specific question but also the underlying structural dependencies between learners and assessment items.
Formally, for each student
i and question
j, we assign embedding vectors
and
, where
d denotes the latent dimension. A classifier
is then employed to predict the sign of the potential edge
. In practice, the two embeddings are concatenated to form a joint representation that preserves individual semantics while encoding their interactions, which is subsequently passed through a multi-layer perceptron (MLP):
where the operator “∥” indicates concatenation. This design allows the MLP to model nonlinear dependencies essential for accurate edge polarity prediction.
The output
denotes the probability that the edge between
and
is positive. The larger values correspond to a higher likelihood of correct responses, whereas the smaller values indicate negative outcomes. To optimize the prediction task, we employ the binary cross-entropy loss function [
24]:
where the ground-truth label
y maps
to
. Minimizing this loss enables EduSheaf to refine its embeddings in a manner that strengthens predictive accuracy while preserving the nuanced relational structure between students and questions.
4.4. Algorithm Implementation
To illustrate the workflow of our proposed framework, we present the pseudocode of EduSheaf in Algorithm 1, which integrates sheaf Laplacian computation with framelet-based multi-scale representation learning.
| Algorithm 1 Training Procedure of EduSheaf |
Input: Signed graph , normalized sheaf Laplacian , feature matrix , learning rate , hyperparameters , epochs T Output: Final node representation , prediction - 1:
Construct using Equations (1)–(3). - 2:
Initialize GNN layers, Combine Layer , and classifier. - 3:
for do - 4:
Forward: - 5:
Compute loss , update - 6:
Extract multi-scale embeddings - 7:
Fuse scales: - 8:
Predict signed edges: - 9:
Evaluate and save best model if improved - 10:
end for - 11:
return
|
EduSheaf first constructs from the signed bipartite graph, encoding node–edge dependencies via restriction maps. GNN applies hierarchical sheaflet transforms to capture coarse- and fine-grained signals, producing multi-scale embeddings. These are fused by Combine Layer with parameters to integrate global and local information. Finally, predicts edge signs , completing the signed edge classification task for student performance prediction.
Next, we will provide a detailed introduction to the Combine Layer .
After obtaining the low-pass and high-pass representations from the GNN, we employ a Combine Layer to integrate multi-scale semantic and structural information.
and define the aggregated multi-scale embedding as
To balance stochastic spectral perturbations and sheaf-based structural propagation, we compute
where
denotes the random filtering operator constructed from the framelet decomposition operators
, and
is the normalized sheaf Laplacian.
We then introduce a residual fusion with the initial node representation
:
A depth-adaptive scaling factor
is used to modulate the magnitude of feature updates across layers. The final fused representation is given by
with an optional residual connection
Here, controls the interaction between spectral and structural pathways, anchors intermediate representations to the original semantic embedding, and governs the decay rate of layer-wise feature updates, enabling stable multi-scale hierarchical learning.
5. Experiments
In this section, we conduct a series of experiments to comprehensively evaluate the effectiveness of the proposed EduSheaf framework. The evaluation is carried out from three complementary perspectives. First, we compare EduSheaf with state-of-the-art signed graph representation learning methods as well as classical graph neural networks to assess its overall performance advantage. Second, we investigate the contributions of different information components, including high-pass features, low-pass features, and semantic embeddings derived from large language models, by performing systematic ablation studies. Finally, we analyze the sensitivity of EduSheaf to key hyperparameters, thereby examining the stability of the model under different configurations.
Following prior work on signed link and student performance prediction, we report Binary-F1 as the primary metric to ensure direct comparability with baseline methods. Binary-F1 provides a balanced measure by simultaneously considering precision and recall across both classes, making it particularly suitable for imbalanced prediction tasks. Higher Binary-F1 values indicate stronger model capability in accurately identifying both positive and negative interactions, thereby offering a reliable and interpretable assessment of predictive performance. The classification threshold is selected by maximizing the F1 score on the validation set and then fixed for evaluation on the test set. To provide a threshold-independent perspective, we additionally report AUC in the hyperparameter sensitivity analysis, which reflects the model’s discriminative ability under varying operating points.
5.1. Datasets
Multiple-choice questions are a core component of assessment in educational settings, particularly on online learning platforms, where they enable standardized, scalable, and automated evaluation of learners’ knowledge. Beyond simple correctness evaluation, the interactions between students and MCQs encode rich information regarding learning behaviors, knowledge acquisition, and error patterns, making them ideal for predictive modeling of student performance.
For our study, we employed five real-world datasets collected from three universities [
6]: the Biology and Law courses at the University of Auckland, the Cardiff20102 course from Cardiff University School of Medicine, and two biochemistry courses (Sydney19351 and Sydney23146) at the University of Sydney. These courses vary in discipline, scale, difficulty, and student composition, which ensures that our experimental evaluation encompasses a broad spectrum of educational contexts. Such diversity provides a robust foundation to assess model adaptability and generalization in heterogeneous learning scenarios.
Signed Graph Construction
To enable graph-based learning, we convert the raw student–question interactions into a signed bipartite graph. Let represent a student and a question. We construct edges according to the correctness of the student’s response as follows:
If a student answered a question correctly, an edge with a positive sign (“+1”) is established between the two nodes.
If the student’s answer was incorrect, an edge with a negative sign (“−1”) is created between the student node and the question node.
This signed bipartite graph captures dual relationships between students and questions: positive edges encode mastery, while negative edges highlight misconceptions or areas needing improvement. Compared to unsigned graphs, this structure enables the model to differentiate between correct and incorrect learning patterns, facilitating more fine-grained representation learning.
The total number of edges
provides an indication of the interaction density within each dataset, reflecting both the scale of student engagement and the structural complexity of learning behaviors.
Table 2 presents a detailed comparison of these statistics across the five datasets. Notably, the Law course at the University of Auckland exhibits a high overall accuracy (93%), which may indicate higher average mastery or limited discriminative difficulty of the MCQs, whereas the other courses show accuracy rates between 60% and 70%, revealing more realistic learning challenges and error distributions. These variations offer diverse structural patterns for evaluating model performance and robustness.
By constructing the signed bipartite graph in this manner, we create a structured and semantically meaningful representation of the learning environment. Each student–question interaction becomes a node–edge signal that encodes correctness information, which serves as a foundation for downstream signed graph neural network modeling. This structured representation supports multi-level analysis, allowing the model to capture both individual learning deviations and broader group-level patterns, ultimately enhancing predictive accuracy and interpretability (see
Figure 1).
5.2. Baselines
To systematically evaluate the performance of EduSheaf, we select a diverse set of representative baselines. These baselines span from simple methods that ignore graph structure, to general graph neural networks operating in both spectral and spatial domains, and finally to specialized models designed explicitly for signed graphs. This layered comparison establishes a comprehensive evaluation framework.
We first adopt
Random Embedding as a performance lower bound. This method assigns random low-dimensional vectors to student and question nodes, concatenates them, and feeds the result into a logistic regression classifier for edge sign prediction. As it does not incorporate any structural modeling, it provides an objective reference point to measure the real performance gains of more advanced models [
24].
Among general-purpose graph learning methods,
GCN [
21] and
GAT [
42] serve as canonical representatives. GCN, built on the spectral convolution framework, recursively aggregates neighbor features to capture local topological patterns, with its multi-layer stacking enabling deeper representation learning. GAT extends this paradigm by introducing an attention mechanism, which adaptively assigns weights to neighbors, thereby enhancing flexibility and expressiveness, particularly in heterogeneous graph structures.
In contrast,
SGCN [
17],
SBGNN [
24], and
SBCL [
6] are tailored to the unique challenges of signed graphs. SGCN explicitly distinguishes positive and negative edges during convolutional propagation, allowing node embeddings to reflect both supportive and antagonistic relations. SBGNN further leverages a bipartite structure to separately model student and question nodes, aligning more naturally with educational interaction patterns. SBCL adopts a contrastive learning paradigm, constructing positive and negative sample pairs across different views to significantly improve the discriminability and robustness of embeddings. Building on this foundation,
LLM-SBCL [
6] extends SBCL by incorporating semantic embeddings generated by large language models to enrich the semantic representations of question nodes. This design not only preserves the advantages of contrastive learning but also achieves a tighter integration of textual information with graph structure.
Through this progression of baselines—from random embeddings, to general graph models, and finally to specialized signed-graph methods—we are able to validate EduSheaf’s effectiveness from multiple perspectives, demonstrating its clear advantages in student performance prediction tasks.
5.3. Experimental Setup
To guarantee the reliability and validity of our evaluation, we carefully designed dataset partitioning, training procedures, and reporting strategies. We adopt an edge-level random split protocol, where observed student–question interactions are partitioned into 85% training, 5% validation, and 10% testing, while the full set of student and question nodes is retained across all splits. This corresponds to a transductive evaluation setting, in which node representations are learned from partially observed edges without access to validation or test labels. All hyperparameters are selected exclusively based on validation performance. This partitioning design ensures both sufficient learning signals and a rigorous basis for fair performance evaluation. Cold-start scenarios with unseen students or questions are left as an important direction for future work.
Each model was trained for 300 epochs to achieve convergence across datasets of different scales, while regularization strategies mitigated risks of overfitting. To minimize the randomness introduced by data partitioning and initialization, we repeated every experiment ten times. Performance metrics are reported as mean and standard deviation, allowing us to assess robustness and statistical reliability.
All implementations were carried out in PyTorch 2.1.0 and executed on a single NVIDIA RTX A6000 GPU. For reproducibility and fair benchmarking, the complete hyperparameter configurations used to obtain the reported results are summarized in
Table 3. These settings enable precise replication of our work and establish a consistent foundation for long-term comparison across future studies.
5.4. Results and Discussion
To comprehensively assess the effectiveness of EduSheaf, we performed a series of comparative experiments against a broad spectrum of baseline models on five representative educational datasets. The results, reported in
Table 4, reveal several key findings:
Superior performance of EduSheaf. EduSheaf consistently attains the best binary F1 scores across all datasets, outperforming every baseline. This superiority originates from the ability of the Sheaf Laplacian to incorporate directional constraints and local consistency into the signed graph structure, enabling a more faithful representation of the nuanced student–question interactions. In addition, the integration of framelet-based wavelet convolution facilitates the joint utilization of low-pass and high-pass signals, thereby strengthening the model’s capacity to capture both overarching trends and localized deviations. Together, these mechanisms significantly boost predictive accuracy.
Limitations of traditional GNN models. As evident in
Table 4, methods designed for unsigned graphs such as GCN and GAT perform poorly when applied to signed contexts. Their lack of sensitivity to polarity and failure to model antagonistic relationships prevent them from capturing the complexity of signed educational networks. In contrast, EduSheaf benefits from the geometric principles of sheaf theory and the multi-resolution analysis provided by framelets, which collectively enable simultaneous handling of polarity, structural consistency, and semantic hierarchy within a single framework.
Comparison with advanced signed GNNs. Beyond outperforming unsigned methods, EduSheaf also achieves clear gains over advanced signed graph baselines, including SGCN, SBGNN, SBCL, and their LLM-augmented variants. This advantage arises from the unique synergy between the Sheaf Laplacian and framelet-based convolution in the proposed Sheaflet-SNN, which imposes consistency-aware constraints while extracting multi-scale features from both local and global perspectives. Such a design endows EduSheaf with a richer representational capacity for signed information, explaining its consistent superiority over competing approaches.
Robustness and stability. Another noteworthy observation is EduSheaf’s low variance across different datasets. This indicates not only reliable stability but also strong generalization, suggesting that the model is well suited to diverse educational environments with varying structural and semantic properties.
Addressing the cold-start challenge is a fundamental issue in representation learning, particularly prominent in machine learning scenarios within the e-learning domain. To this end, we have designed dedicated experiments in
Appendix B, and the results demonstrate that the proposed model can effectively tackle such cold-start problems.
5.5. Ablation Study
To further assess the importance of the core components within EduSheaf, we conducted a systematic ablation study. By selectively removing different frequency components and the semantic module, we designed four model variants: the complete model (retaining both low-pass and high-pass components as well as LLM embeddings), a variant without the high-pass component, a variant without the low-pass component, and a variant without the LLM module.
As shown in
Table 5, removing the high-pass component led to a substantial decline in performance across all datasets, confirming the critical role of high-frequency information in capturing local variations and preserving node-level discriminability. In contrast, removing the low-pass component caused a comparatively milder yet still significant performance drop in most datasets, indicating that low-frequency signals are indispensable for graph smoothness and global consistency modeling. Together, these results highlight the complementary nature of low-pass and high-pass signals, demonstrating how the wavelet-based convolution mechanism of the Sheaflet framework effectively bridges local and global scales to enhance the representational capacity of signed graphs.
Moreover, eliminating the LLM module also resulted in a marked reduction in binary F1 scores, underscoring the irreplaceable role of semantic embeddings in strengthening the representation of knowledge points in MCQs and complementing structural features. The deep integration of structural and semantic information enables EduSheaf to exhibit stronger robustness and adaptability when addressing the sparsity and heterogeneity inherent in educational data. We further analyze the impact of the LLM-based semantic module across different dataset characteristics. For datasets with longer question texts and higher domain diversity, such as Law and Cardiff20102, semantic enhancement tends to yield more significant performance improvements, as conceptual-level information provides additional discriminative power beyond structural connectivity. Similarly, in scenarios with sparse interactions, such as the Sydney19351 dataset, neighborhood-based propagation carries limited informative value, and the semantic signals derived from the LLM serve as an effective complementary component, thereby enhancing model robustness. In contrast, for datasets with short, highly standardized question texts and dense interaction patterns, the marginal performance gain from semantic embeddings is more limited, indicating that structural cues already dominate the prediction process in such cases.
In summary, the ablation study demonstrates the complementary value of high-pass and low-pass components in balancing local discriminability and global consistency, while also revealing the significant contribution of LLM-based semantic embeddings to structural learning. The synergy of these three elements collectively underpins the performance advantages and theoretical significance of EduSheaf in student performance prediction tasks.
5.6. Parameter Sensitivity Analysis
To rigorously evaluate the effect of hyperparameter choices on EduSheaf, we performed a comprehensive sensitivity analysis with a focus on the scale level. This parameter specifies the number of hierarchical resolutions employed in the multi-scale spectral decomposition, ranging from coarse resolutions that primarily capture global structural dependencies to fine resolutions that emphasize localized patterns and subtle high-frequency variations. As such, the scale level functions as a critical mechanism for balancing global contextual awareness with localized discriminative capacity in representation learning.
In the experimental setup, the scale level was varied from 1 to 6, and its influence was assessed across three representative datasets—Sydney19351, Sydney23146, and Biology. The results, summarized in
Figure 2, reveal that EduSheaf maintains stable and reliable performance across the full spectrum of scale settings, reflecting its resilience to hyperparameter perturbations. Notably, the model achieves marginally superior predictive accuracy when the scale level is configured at smaller values (e.g., 1 or 2). This observation suggests that a limited number of scales already suffice to encode the essential multi-frequency information, thereby achieving a favorable balance between expressive power and computational efficiency. By contrast, employing larger-scale levels tends to introduce redundant representations, elevate computational burden, and provide only negligible benefits in predictive accuracy.
Figure 3 reports the sensitivity of the proposed framework to the scale level under the AUC metric across three representative datasets, namely Sydney19351, Sydney23146, and Biology. Overall, the AUC values exhibit a stable trend as the scale level increases, indicating that the model’s discriminative capability is not overly sensitive to moderate variations in the multi-scale configuration.
Taken together, these findings underscore two key insights: first, EduSheaf demonstrates robustness and adaptability in the face of hyperparameter variations, and second, compact scale configurations are not only computationally economical but also empirically effective for performance optimization. These results further provide actionable guidance for practitioners, indicating that lower-scale settings constitute a pragmatic choice for real-world deployment where efficiency and scalability are paramount.
6. Conclusions
This paper introduces and systematically validates EduSheaf, a signed graph learning framework that combines sheaf theory, framelet-based multi-scale filtering, and semantic embeddings from LLMs for student performance prediction. Unlike traditional GNNs or purely semantic-enhanced methods, EduSheaf achieves a synergy of structural, spectral, and semantic modeling through three core designs. First, the sheaf Laplacian imposes edge-level local consistency constraints, ensuring that node representations remain coherent while also reflecting cross-local inconsistencies. Second, framelet-based low-pass and high-pass components enable multi-scale signal decomposition, where low-pass filters capture group-level learning patterns, and high-pass filters highlight individual deviations. Finally, fine-grained semantic embeddings derived from LLMs are seamlessly integrated into signed node representations, bridging the gap between structural learning and textual semantics.
Experiments across diverse real-world educational datasets demonstrate that EduSheaf achieves superior performance compared to a range of baselines, including random embeddings, GCN, GAT, as well as signed-graph and contrastive learning methods such as SGCN, SBGNN, SBCL, and LLM-SBCL. Ablation studies further show that both high-pass and low-pass components play critical roles in balancing local discriminability and global consistency, while LLM-derived semantics substantially enhance the discriminative power of questions and knowledge concepts. The synergy between these components forms the primary source of EduSheaf’s advantage. Overall, EduSheaf provides an interpretable, scalable, and robust solution for high-precision student performance prediction in heterogeneous, sparse, and semantically rich educational contexts.
Despite these advances, several challenges remain. The current framework faces computational overhead and real-time deployment issues when scaling to large datasets and incremental online updates. Moreover, its reliance on LLM embedding quality raises concerns about the robustness and fairness of semantic representations. Future directions include (i) extending EduSheaf to multimodal educational data, such as process logs, discussion texts, and learning resources; (ii) explore the integration of transformer-based and instruction-tuned semantic encoders as alternatives to GloVe, in order to further investigate how richer contextual embeddings interact with sheaf-based multi-scale signed graph learning in educational settings; and (iii) developing more advanced signed graph neural networks or extending the framework to graph settings to capture higher-order information, ultimately enabling reliable and equitable deployment in real-world educational scenarios