Student Learning Outcome Prediction via Sheaflet-Based Graph Learning and LLM

Zhang, Dongmei; Zhu, Zhanle; Cheng, Yukang; Gu, Yongchun

doi:10.3390/app16031658

Open AccessArticle

Student Learning Outcome Prediction via Sheaflet-Based Graph Learning and LLM

by

Dongmei Zhang

^1,†,

Zhanle Zhu

^2,†,

Yukang Cheng

²

and

Yongchun Gu

^2,*

¹

School of Educational Science, Yili Normal University, Yining 835000, China

²

Zhejiang Key Laboratory of Intelligent Education Technology and Application, Zhejiang Normal University, Jinhua 321004, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2026, 16(3), 1658; https://doi.org/10.3390/app16031658

Submission received: 6 January 2026 / Revised: 30 January 2026 / Accepted: 4 February 2026 / Published: 6 February 2026

(This article belongs to the Special Issue Generative AI for Intelligent Knowledge Systems and Adaptive Learning)

Download

Browse Figures

Versions Notes

Abstract

Accurately modeling the interactions between students and learning content is a central challenge in achieving personalized and adaptive learning in online education. However, existing methods often struggle to simultaneously capture the multi-scale structural dependencies and the rich semantic information embedded in educational materials. To bridge this gap, we propose EduSheaf—a unified framework that integrates large language models (LLMs) with a sheaflet-based signed graph neural network. Specifically, LLMs are employed to extract fine-grained semantic embeddings from multiple-choice questions (MCQs), thereby enriching graph representations with contextual knowledge. A signed graph is then constructed to encode student–MCQ interactions, where correct and incorrect responses are represented as positive and negative edges. On top of this, a novel sheaflet-based signed graph neural network performs multi-frequency learning through low-pass and high-pass filters, enabling the joint modeling of global consensus and local variations, while sheaf structures enforce edge-level consistency. Extensive experiments on multiple real-world educational datasets demonstrate that EduSheaf consistently outperforms state-of-the-art baselines, including both semantic-enhanced and signed graph models, in terms of prediction accuracy and robustness. Ablation studies further reveal the complementary roles of semantic embeddings and multi-frequency graph filters.

Keywords:

signed graph neural networks; large language model; student performance prediction; sheaf theory

1. Introduction

With the rapid advancement of educational digitalization and online learning platforms, personalized learning has emerged as a critical direction for improving educational quality [1]. Within this context, student performance prediction is regarded as a key component, as it not only enables teachers to adapt instructional strategies with greater precision but also provides students with more effective learning guidance, thereby enhancing overall learning outcomes [2]. However, existing adaptive learning technologies predominantly focus on modeling surface-level behavioral data, which limits their ability to capture the complex structures and latent semantic relations underlying the learning process—particularly the multi-layered interactions among students, knowledge concepts, and learning items. This limitation becomes especially pronounced when dealing with heterogeneous educational data and high-dimensional semantic information, highlighting the urgent need for more refined and intelligent predictive modeling approaches.

Recent research on predicting student performance generally employs either sequential models or graph neural networks (GNNs). Sequential approaches are designed to capture temporal dependencies in learners’ behavioral records, while GNN-based methods focus on modeling the relational structures linking students with educational content, such as learning activities and assessment items [3,4]. To address the challenge of modeling fine-grained student–answer interactions, Ni et al. [5] introduced a signed bipartite graph framework that incorporates positive and negative edges to enhance the density of structural associations.

Despite their strengths in structural modeling, GNN-based methods struggle to effectively extract the deep semantic features embedded within nodes. This limitation has led to performance bottlenecks when handling complex, multimodal educational data. To overcome this barrier, recent studies have begun integrating natural language processing (NLP) techniques into educational contexts, aiming to capture the rich semantic information within both learning materials and student behaviors. Large language models, in particular, demonstrate strong capabilities in text understanding and knowledge transfer, providing valuable semantic priors for multiple-choice questions and related educational content. However, LLMs alone cannot directly generate features tailored for graph-structured learning, while GNNs remain constrained in their ability to represent node semantics at appropriate scales and levels of granularity. Even when LLM-derived embeddings are incorporated, the overall predictive improvement remains limited.

These challenges underscore the urgent need for a customized framework that can jointly integrate high-order graph structures with fine-grained semantic representations. In this direction, Wang et al. [6] proposed the LLM-SBCL model, which attempts to combine LLMs with GNNs for joint structural and semantic modeling. Although this approach demonstrates certain performance gains, it still fails to adequately capture higher-order student–item interaction patterns and lacks a multi-scale representation mechanism capable of reflecting both group-level learning preferences and individual-level variations. Addressing these unresolved issues points to the critical next step in advancing student performance prediction models.

To address the limitations of existing student performance prediction models in capturing higher-order interactions and semantic information, we propose EduSheaf, a unified framework that integrates signed graph representations with large language models to enhance both prediction granularity and generalization. In this framework, students and multiple-choice questions are modeled as graph nodes, while responses are encoded as signed edges, with positive and negative signs corresponding to correct and incorrect answers, respectively. This formulation enables the framework to jointly represent structural characteristics and interaction patterns within the learning process.

The sheaf Laplacian operator provides a powerful mathematical tool for graph representation by assigning vector spaces to nodes and edges and defining linear maps between them, thereby embedding local cellular structures into the representation [7,8,9]. EduSheaf leverages this operator for signed graph learning, significantly enhancing its modeling capacity. This allows the network to capture complex node relations and higher-order dependencies while maintaining local consistency. Building on this foundation, we design a sheaflet-based signed graph neural network, which employs multi-resolution signal processing to derive fine-grained node representations. Specifically, the low-pass filters highlight group-level behavioral patterns among students, while the high-pass filters emphasize individualized deviations, enabling multi-scale and fine-grained interaction modeling. Furthermore, the wavelet-based decomposition and reconstruction enhance the compactness and robustness of representations, mitigating the impact of noise.

In this study, we incorporate semantic embeddings derived from large language models (LLMs) into a signed graph setting, thereby enriching the semantic representation of student–content interactions. Comprehensive experiments conducted on diverse educational datasets show that EduSheaf consistently achieves superior performance over existing state-of-the-art methods in terms of both accuracy and robustness. These results highlight the efficacy of its multi-scale representation strategy. More broadly, EduSheaf offers a new perspective for modeling complex learning dynamics and provides a technical foundation for advancing adaptive learning and supporting fine-grained instructional decision-making.

Key Contributions: The main contributions of this work are articulated as follows:

Novel Task Reframing: We reconceptualize student performance prediction as a signed graph learning problem. Correct and incorrect responses are encoded as positive and negative edges, respectively, yielding a representation that integrates structural dependencies with semantic signals in a principled manner.
Sheaflet-based SNN Architecture: We design a signed graph neural network grounded in cellular sheaf theory and equipped with wavelet-like transforms for multi-resolution analysis. This design employs low-pass filters to capture collective learning tendencies and high-pass filters to highlight individual variations, thus enabling the model to disentangle global and local learning patterns.
Integrated Structure–Semantics Framework: We propose EduSheaf as a unified framework that fuses structural modeling with semantic enrichment from LLMs. By aligning graph-based structural information with semantically rich embeddings, EduSheaf provides a more holistic understanding of learner–content interactions. Extensive evaluation on multiple benchmark datasets confirms its predictive advantages and underscores the benefits of structure–semantic co-learning.

The remainder of this paper is organized as follows: Section 2 reviews relevant studies on student performance prediction, signed graph neural networks, and the application of large language models in education. Section 3 introduces the necessary preliminaries to establish the foundation for our framework. Section 4 presents a detailed description of the proposed approach, followed by Section 5, which reports the experimental design and evaluation results. Finally, Section 6 concludes the paper with a summary of the key findings and a discussion of potential directions for future research.

2. Related Work

We investigate four domains: student performance prediction, signed graph neural networks, sheaf theory for graphs, and large language models in education. Prior studies in these areas highlight significant advances, yet limitations remain in handling data sparsity, capturing signed structural dependencies, and leveraging semantic knowledge effectively. Our work builds on these insights by integrating structural and semantic modeling within a unified framework for student performance prediction.

2.1. Student Performance Prediction

The central objective of student performance prediction is to anticipate learners’ outcomes in tasks such as assignments, examinations, or course completion based on their historical learning trajectories [10]. Such predictions allow educators to deliver targeted interventions and personalized support. Early approaches primarily relied on traditional machine learning techniques, including logistic regression, decision trees, and support vector machines. While effective in limited contexts, these models were heavily dependent on manual feature design and failed to capture the intricate dynamics of student learning. With the rise in deep learning, architectures such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) gained prominence for their ability to automatically extract high-dimensional features and model complex behavioral patterns, thereby achieving notable improvements in predictive accuracy [10].

To advance personalized education, subsequent studies have developed two major paradigms: static models and sequential models [11,12]. Static models (e.g., [13,14]) utilize fixed historical data, offering reasonable accuracy but lacking adaptability to the evolving nature of student knowledge. In contrast, sequential approaches such as knowledge tracing and its variants [15,16] dynamically model students’ learning progression, enabling real-time updates of knowledge states and improved predictive precision in interactive learning environments.

Despite these advancements, two critical limitations remain. First, most existing methods struggle to capture the highly nonlinear and interdependent relationships between learners and educational content. Second, the deeper semantic context embedded in instructional materials is often neglected, weakening the interpretability and contextual richness of predictions. To address these gaps, we propose the EduSheaf framework, which employs signed graph structures to represent fine-grained student–question interactions while incorporating semantic embeddings derived from large language models. This integration of structural dependencies and semantic information enables more accurate, context-aware predictions and offers deeper insights into learner behaviors.

2.2. Signed Graph Neural Networks

In modeling student performance, signed bipartite graphs provide a natural framework, where positive edges represent correct answers and negative edges denote incorrect ones [17,18]. Early studies on signed graph learning primarily adopted embedding-based techniques, including SIDE [19] and SGDN [20], as well as signed Laplacian and matrix factorization approaches [18]. These methods attempted to encode signed information into compact representations but often struggled to capture the intricate structural dependencies of signed networks. Later, neural architectures were introduced, such as SGCN [17], which extends GCNs [21] by integrating balance theory, and SiNE [22], which leverages triangular motifs to strengthen balance-aware modeling. While these models improved the understanding of local structures, they still lacked the ability to effectively reflect global consistency and antagonistic relations across the graph.

More recently, contrastive learning has become a powerful paradigm for representation learning on signed graphs. Approaches such as SGCL [23], SBGNN [24], SBCL [6], and MOSGCN [25] apply contrastive objectives to refine signed graph embeddings, demonstrating notable improvements in capturing local interactions. However, these frameworks predominantly focus on pairwise relations, limiting their capacity to uncover higher-order dependencies and collaborative patterns among multiple nodes—factors particularly important in complex educational settings.

In parallel, graph wavelet-based methods [26,27,28] have gained traction for their ability to conduct multi-scale spectral analysis, offering new insights into structural characterization. Yet, their application to signed graphs remains underexplored, especially regarding the complementary roles of low-pass and high-pass components and their integration with advanced algebraic tools such as sheaf theory. Although Chen et al. [29] made an initial step toward combining wavelet analysis with sheaf structures, their focus was restricted to general graphs without addressing the specific challenges posed by signed networks.

These gaps underscore the demand for a more comprehensive framework capable of simultaneously capturing local discriminative cues, maintaining global structural coherence, and accommodating the semantic complexity of signed graphs. To bridge this gap, we propose EduSheaf, a sheaflet-inspired model that unifies cellular sheaves with framelet transforms. By jointly exploiting high-pass details and low-pass trends across multi-frequency domains, EduSheaf enables expressive, robust, and semantically enriched representation learning for signed graphs.

2.3. Sheaf Theory for Graphs

Sheaf theory has recently gained attention as a unifying paradigm for extending graph learning by promoting local consistency while enabling richer global reasoning. In contrast to conventional graph-based approaches that employ the standard Laplacian operator, SheafNN [8] adopts the sheaf Laplacian [9] to construct more expressive diffusion dynamics, thereby broadening the representational capacity of graph neural architectures. Building on this foundation, subsequent research has proposed several improvements: Attention mechanisms have been introduced to adaptively capture neighborhood dependencies [30]; positional encodings have been designed to strengthen the structural expressiveness of sheaf models [31]; and domain-specific adaptations, such as personalized federated learning [32] and recommendation tasks have demonstrated the versatility and transferability of sheaf-based techniques.

Parallel efforts in algebraic topology have also contributed to expanding the methodological scope of sheaf theory. For example, Ayzenberg et al. [33] proposed a poset-oriented algorithm for sheaf cohomology, offering novel ways to incorporate topological structures into machine learning.

Nevertheless, the majority of existing sheaf-based approaches operate within the constraints of single-frequency diffusion or shallow propagation schemes. The absence of systematic exploration into multi-frequency filtering and spectral decomposition limits the ability to integrate sheaf theory with advanced signal processing tools such as wavelets and multi-scale spectral analysis. Addressing this gap is essential for enhancing the expressiveness and interpretability of sheaf-based learning frameworks when applied to complex relational data.

2.4. LLM for Education

The rapid advancement of large language models is profoundly reshaping the landscape of educational research and practice, particularly in the areas of personalized learning, automated assessment, and intelligent educational support [34,35]. Early efforts, such as GPT-3 [36], have already demonstrated remarkable few-shot learning capabilities, showing that these models can interpret students’ learning behaviors and linguistic expressions to generate personalized learning resources and pathways, thereby significantly improving learning outcomes. However, deploying such models in authentic educational settings remains challenging. A primary obstacle lies in the reliance on large-scale annotated datasets during training, while educational data are often scarce, noisy, and imbalanced, which hampers generalization and limits stable deployment.

With the continued scaling and enhancement of model capabilities, the GPT-4 technical report [37] further revealed that LLMs can perform at a level comparable to students in standardized tests across mathematics, physics, and computer science, while effectively handling both multiple-choice and open-ended questions. This expanded capacity has unlocked new opportunities for intelligent educational applications. Recent studies have further validated LLMs’ strengths in supporting reading, writing, and knowledge integration. For instance, Susnjak & McIntosh [38] showed that ChatGPT-4 can generate logically coherent and comprehensive answers to multidisciplinary problems; Malinka et al. [39] reported quantitative evidence that students who refined their answers with LLM assistance achieved significantly higher scores in a computer security course than their peers in the control group. In addition, several survey studies [40] have synthesized the state-of-the-art applications of LLMs in education, highlighting their potential in facilitating teacher–student interaction, optimizing personalized learning strategies, and enhancing the automation of assessment and feedback. Meanwhile, Ravi et al. [41] collaborated with K-12 teachers to co-design LLM-powered tools, focusing on in-situ design processes, teacher needs, risk assessment, and classroom integration strategies, thereby offering critical insights into teacher-centered design and school-based applications.

As powerful semantic modeling tools, LLMs have already shown the ability to extract deep semantic information from multimodal educational data. This capability opens new avenues for tackling complex prediction tasks in educational contexts. Motivated by this, and from the perspective of unifying semantic modeling with structured learning, we propose the EduSheaf framework, which for the first time integrates LLMs with signed graph learning for student performance prediction—addressing a critical research gap in this domain.

3. Notation and Problem Formulation

This section aims to formalize the modeling of interactions between students and multiple-choice questions. The interaction data not only contain students’ response outcomes but also incorporate textual information such as question stems, options, answers, and explanations, which provides opportunities for constructing richer semantic representations. To systematically capture these interactions, we adopt a signed graph modeling approach, where students and questions are represented as distinct types of nodes, and students’ responses to questions are encoded as edges with positive or negative signs.

Signed Graph. We define the graph as

G = (U, V, E)

, where

U = {u_{1}, u_{2}, \dots, u_{| U |}}

represents the set of students, and

V = {v_{1}, v_{2}, \dots, v_{| V |}}

represents the set of questions, with

U \cap V = Ø

). The edge set

E \subset U \times V

represents pairwise relationships between students and questions, partitioned into positive edges (

E^{+}

) and negative edges (

E^{-}

), such that

E = E^{+} \cup E^{-}

, and

E^{+} \cap E^{-} = Ø

.

Task Definition. Within this signed graph framework, student performance prediction is formalized as a signed edge prediction problem. Specifically, given a signed graph G, the goal is to learn low-dimensional embeddings for students and questions: each student

u_{i} \in U

is associated with an embedding

z_{u_{i}}

, and each question

q_{j}

with an embedding

w_{q_{j}}

, where

z_{u_{i}}, w_{q_{j}} \in R^{d}

and d denotes the embedding dimension. A mapping function

f (z_{u_{i}}, w_{v_{j}}) \to {- 1, + 1}

is then applied to predict the sign of edge

e_{i j}

.

LLM-based Semantic Extraction. In the EduSheaf framework, we leverage large language models to enrich the semantic representation of multiple-choice questions. Specifically, question stems, options, and explanations are reformulated into structured sentences, from which the LLM extracts key concepts and their associated weights. These are then combined with GloVe embeddings to generate weighted semantic representations. By integrating this semantic information with the structural features of signed graphs, we obtain more comprehensive node embeddings that enhance prediction performance. Compared with traditional approaches, this strategy effectively addresses the underutilization of semantic cues in MCQ representation and introduces a novel perspective for fusing textual and graph-structured data in educational contexts.

To illustrate the semantic extraction process, we provide a representative example. Suppose an MCQ is authored by a first-year law student and includes a stem, several answer options, the correct answer, and an explanation. The objective is to identify key knowledge points that capture the essential legal concepts reflected in the question. To guarantee the structural validity of the LLM-generated outputs, we employ a constrained prompt that explicitly enforces a strict JSON schema, including predefined fields for semantic keywords and their corresponding weights. The model is instructed to return a single, well-formed JSON object without any additional natural language text. At runtime, each response is automatically parsed and validated using a schema-based checker. We verify that (i) all required fields are present, (ii) keyword weights are numerical and bounded in [0, 1], and (iii) the sum of weights is normalized to 1. If any of these conditions fail (e.g., malformed JSON, missing fields, or invalid normalization), the query is automatically re-issued using a fallback prompt with stricter formatting constraints. In the rare case that repeated attempts fail, the corresponding question is assigned a default semantic representation derived from the average embedding statistics of the training corpus, ensuring robustness of the preprocessing pipeline and preventing interruptions to downstream model training. For instance, given a contract law question, the LLM generates a response such as:

{‘‘Keywords’’: [

{‘‘keyword’’: ‘‘Anticipatory breach’’,

‘‘percentage’’: 0.5},

{‘‘keyword’’: ‘‘Repudiation’’,

‘‘percentage’’: 0.2},

{‘‘keyword’’: ‘‘Contract law’’,

‘‘percentage’’: 0.1},

{‘‘keyword’’: ‘‘Unequivocal renunciation’’,

‘‘percentage’’: 0.1},

{‘‘keyword’’: ‘‘Common law’’,

‘‘percentage’’: 0.1}

]

}

This structured output highlights the primary legal notions embedded in the MCQ while preserving proportional relevance, thereby enabling the integration of semantic knowledge into downstream graph-based representations. A more detailed description of the large language model processing pipeline is elaborated in Appendix A.

In our implementation, semantic embeddings are generated using the OpenAI API with the model identifier gpt-4-0613 (June 2023 snapshot) as an offline preprocessing step. Each multiple-choice question is processed independently through a fixed, structured prompt, and the extracted keywords and weights are cached and reused throughout all training and inference stages. To ensure reproducibility, we adopt a deterministic decoding strategy by fixing the inference parameters, with the temperature set to 0.0 and nucleus sampling disabled

(t o p_{p} = 1.0)

. The maximum output length is capped at 256 tokens, which is sufficient to accommodate the structured JSON-style output containing extracted semantic keywords and their associated weights. Under this configuration, repeated queries with identical prompts yield consistent outputs. The average processing time is approximately 0.8–1.2 s per question. Under the OpenAI pricing scheme at the time of experimentation, the total preprocessing cost for the largest dataset is on the order of tens of USD, which constitutes a minor overhead relative to model training and evaluation.

The key notations used in this paper are summarized in Table 1 for the reader’s convenience.

Remark 1.

In this work, student performance prediction is formulated under an edge-split transductive signed edge prediction setting. The original dataset consists of labeled student–question interactions, where positive edges correspond to correct responses and negative edges correspond to incorrect responses. The training, validation, and test sets are constructed by randomly partitioning this set of labeled signed edges. Consequently, the test set is composed of held-out observed interactions with known ground-truth labels, rather than artificially generated negatives sampled from unobserved student–question pairs. In this context, we define “unobserved” interactions as student–question pairs for which no response is recorded in the dataset, i.e., the student has not attempted the question and no label is available. Importantly, we do not assume a missing-at-random mechanism and we do not perform additional negative sampling from these unobserved pairs during either training or evaluation. This protocol ensures that the reported performance reflects the model’s ability to generalize to unseen labeled interactions, rather than its sensitivity to a particular negative sampling strategy.

4. Proposed Method

4.1. Framework Overview

As depicted in Figure 1, the EduSheaf framework is composed of three tightly coupled modules, each addressing a different stage of the modeling process. First, the LLM-based semantic extraction module processes multiple-choice questions by parsing their key elements—such as stems, candidate options, ground-truth answers, and explanatory texts—and encoding them into semantic embeddings enriched with contextual and pedagogical knowledge. This design allows the framework to capture fine-grained semantic variations across questions, which is often overlooked in conventional representations.

Second, the signed graph construction module formalizes the interactions between students and MCQs into a signed bipartite graph structure. In this representation, students and questions are modeled as heterogeneous nodes, while correct and incorrect responses are encoded as positive and negative edges, respectively. Such a formulation enables the graph to preserve both supportive and conflicting learning signals, thereby offering a more expressive characterization of student performance patterns.

Finally, the sheaflet-based signed graph neural network constitutes the central innovation of EduSheaf. By integrating low-pass filters that capture global learning trends with high-pass filters that emphasize local discrepancies, this module facilitates multi-frequency feature learning on signed graphs. The incorporation of sheaflet operators further strengthens the capacity to model complex dependencies, making the framework more robust to sparsity and heterogeneity in educational data. This novel neural architecture will be introduced in greater detail in the following section.

4.2. Sheaflets on Signed Graphs

In this section, we introduce EduSheaflet, a sheaflet-based signed graph neural network framework that unifies cellular sheaf theory with framelet transforms to jointly capture low-pass and high-pass information on signed graphs. To provide a rigorous basis for this design, we begin by formalizing the concept of cellular sheaves defined over signed graphs and outlining the corresponding linear sheaf Laplacian, which serves as the mathematical foundation for subsequent model construction.

Basics of Sheaves on graphs. A cellular sheaf

(G, F)

on a signed graph

G = (V, E)

is specified by the following components:

A vector space $F (v)$ for each node $v \in V$ ;
A vector space $F (e^{+})$ for each positive edge $e^{+} \in E$ ;
A vector space $F (e^{-})$ for each negative edge $e^{-} \in E$ ;
A linear map $F_{v ⊴ e^{+}} : F (v) \to F (e^{+})$ for each incident node–positive-edge pair $v ⊴ e^{+}$ ;
A linear map $F_{v ⊴ e^{-}} : F (v) \to F (e^{-})$ for each incident node–negative-edge pair $v ⊴ e^{-}$ .

The vector spaces associated with nodes and edges are referred to as stalks, while the corresponding linear maps are called restriction maps. Grouping these spaces leads to two fundamental cochain spaces: the collection of node stalks forms the space of 0-cochains, and the collection of edge stalks forms the space of 1-cochains.

More generally, a cellular sheaf

F

on a signed graph can be described as a triple

〈 F (v), F (e), F_{v ⊴ e} 〉,

where (i)

F (v)

are vertex stalks, i.e., vector spaces attached to each node v; (ii)

F (e)

are signed edge stalks, i.e., vector spaces attached to each edge e; (iii)

F_{v ⊴ e} : F (v) \to F (e)

are restriction maps defined for each incident node–edge pair.

On this basis, the linear sheaf Laplacian is defined as

\begin{matrix} {(L_{F})}_{v v} & = \sum_{e; v \in e} F_{v ⊴ e}^{⊤} F_{v ⊴ e} \in R^{d \times d}, \end{matrix}

(1)

\begin{matrix} {(L_{F})}_{u v} & = - \sum_{e; u, v \in e} F_{u ⊴ e}^{⊤} F_{v ⊴ e} \in R^{d \times d}, \end{matrix}

(2)

where d is the dimension of the stalks, and each restriction map

F_{v ⊴ e} : R^{d} \to R^{d}

governs the information flow from node v to edge e.

The normalized sheaf Laplacian is expressed as

Δ_{F} = D^{- \frac{1}{2}} L_{F} D^{- \frac{1}{2}},

(3)

where D is the block-diagonal degree matrix of

L_{F}

. If the underlying graph has n nodes, the resulting sheaf Laplacian has dimensions

n d \times n d

.

Construction of Sheaflets on Signed Graphs. Consider a signed graph

G_{s} = (V, E)

with N nodes and the associated signed graph Laplacian L. Following the principles of framelet construction on graphs, we extend this methodology to define sheaflets on graphs. Let

{(u_{ℓ}, λ_{ℓ})}_{ℓ = 1}^{N d}

denote the eigenpairs of the linear sheaf graph Laplacian

L_{F}

. For

j \in Z

and

p \in V

, the undecimated sheaflets

φ_{j, p} (v)

and

ψ_{j, p}^{(r)} (v)

at scale j are defined by

φ_{j, p} (v) = \sum_{ℓ = 1}^{N} \hat{α} (\frac{λ_{ℓ}}{2^{j}}) u_{ℓ} (p) u_{ℓ} (v),

(4)

ψ_{j, p}^{(r)} (v) = \sum_{ℓ = 1}^{N} {\hat{β}}^{(r)} (\frac{λ_{ℓ}}{2^{j}}) u_{ℓ} (p) u_{ℓ} (v), r = 1, \dots, n .

(5)

The scaling functions

{α, β^{(1)}, \dots, β^{(n)}}

are associated with a filter bank

η = {a, b^{(1)}, \dots, b^{(n)}}

, satisfying for all

ξ \in R

:

\begin{matrix} \hat{α} (2 ξ) & = \hat{a} (ξ) \hat{α} (ξ), \end{matrix}

(6)

\begin{matrix} {\hat{β}}^{(r)} (2 ξ) & = {\hat{b}}^{(r)} (ξ) \hat{α} (ξ), r = 1, \dots, n, \end{matrix}

(7)

where

\hat{h} (ξ)

denotes the Fourier transform of h, defined as

\hat{h} (ξ) : = \sum_{k \in Z} h (k) e^{- 2 π i k ξ} .

Here,

α

serves as the low-pass scaling function, while

{β^{(r)}}_{r = 1}^{n}

corresponds to the high-pass functions, with n denoting the number of high-pass channels. This construction naturally captures both coarse-grained and fine-grained information of graph signals, thereby enabling multi-scale representations that integrate global and local structures.

The framelet coefficients

V_{0}, W_{j}^{r} \in R^{N \times d}

are defined as the inner product between the sheaflet basis and a graph signal

X \in R^{N \times d}

, where d is the feature dimension. The coefficient matrices share the same dimensions as the node feature matrix X:

V_{0} = 〈φ_{0, \cdot}, X〉 = U^{⊤} \hat{α} (\frac{Λ}{2}) U X,

(8)

W_{j}^{r} = 〈ψ_{j, \cdot}^{(r)}, X〉 = U^{⊤} {\hat{β}}^{(r)} (\frac{Λ}{2^{j + 1}}) U X .

(9)

Let

W_{r, j}

denote the decomposition operators such that

V_{0} = W_{0, J} X

and

W_{j}^{r} = W_{r, j} X

. From Equation (8), the framelet transform operators can be expressed as

\begin{matrix} W_{0, J} & = U^{⊤} \hat{a} (2^{- K + J - 1} Λ) \dots \hat{a} (2^{- K} Λ) U : = U^{⊤} Λ_{0, J} U, \\ W_{r, 1} & = U^{⊤} {\hat{b}}^{(r)} (2^{- K} Λ) U : = U^{⊤} Λ_{r, 1} U, \\ W_{r, j} & = U^{⊤} {\hat{b}}^{(r)} (2^{- K + j - 1} Λ) \hat{a} (2^{- K + j - 2} Λ) \dots \\ \hat{a} (2^{- K} Λ) U : = U^{⊤} Λ_{r, j} U . \end{matrix}

(10)

Signed Graph Neural Networks with Sheaflets ( $S^{2}$ GNN). Due to the intrinsic properties of sheaflet decomposition and reconstruction, the following constraint holds:

W_{0, J}^{⊤} W_{0, J} F + \sum_{r, j} W_{r, j}^{⊤} W_{r, j} F = F,

(11)

which guarantees perfect reconstruction of the graph signal F. Within our framework, we employ Signed Graph Neural Networks with Sheaflets (

S^{2}

GNN) to model node feature propagation on signed graphs. The update rule for node representations is defined as:

F^{(ℓ + 1)} = σ (W_{0, J}^{⊤} Θ_{0, J} W_{0, J} F^{(ℓ)} W^{(ℓ)} + \sum_{r, j} W_{r, j}^{⊤} Θ_{r, j} W_{r, j} F^{(ℓ)} W^{(ℓ)}),

(12)

where

Θ_{0, J} = diag (θ_{0, J})

and

Θ_{r, j} = diag (θ_{r, j})

are learnable filter coefficients corresponding to low- and high-frequency components, respectively.

W^{(ℓ)}

denotes a shared trainable weight matrix, and

σ (\cdot)

is a nonlinear activation function such as ReLU.

This formulation enables the

S^{2}

GNN to jointly capture multi-scale patterns in signed graphs: the low-frequency component encodes shared global structures, while the high-frequency component highlights individual node variations. By integrating sheaflets into the graph convolution, the model preserves local consistency at the edge level and effectively propagates information across heterogeneous and signed interactions, providing a unified mechanism for learning rich, multi-resolution node representations.

4.3. Training Objective

The EduSheaf framework formulates student performance prediction as a link sign prediction task on a bipartite graph of students and questions. To this end, semantic information is incorporated into question nodes, while latent representations are jointly learned for both students and questions. The prediction objective is to infer the polarity of unobserved interaction edges, thereby capturing not only a student’s hidden proficiency with respect to a specific question but also the underlying structural dependencies between learners and assessment items.

Formally, for each student i and question j, we assign embedding vectors

z_{u_{i}}

and

w_{j} \in R^{d}

, where d denotes the latent dimension. A classifier

f (z_{u_{i}}, w_{j}) \to {- 1, + 1}

is then employed to predict the sign of the potential edge

e_{i j}

. In practice, the two embeddings are concatenated to form a joint representation that preserves individual semantics while encoding their interactions, which is subsequently passed through a multi-layer perceptron (MLP):

P_{(u, v)} = MLP (z_{u_{i}} ∥ w_{j}),

(13)

where the operator “∥” indicates concatenation. This design allows the MLP to model nonlinear dependencies essential for accurate edge polarity prediction.

The output

P_{(u, v)}

denotes the probability that the edge between

u_{i} \in U

and

v_{j} \in V

is positive. The larger values correspond to a higher likelihood of correct responses, whereas the smaller values indicate negative outcomes. To optimize the prediction task, we employ the binary cross-entropy loss function [24]:

L_{CE} = - y log (P_{(u, v)}) + (1 - y) log (1 - P_{(u, v)}),

(14)

where the ground-truth label y maps

{- 1, + 1}

to

{0, 1}

. Minimizing this loss enables EduSheaf to refine its embeddings in a manner that strengthens predictive accuracy while preserving the nuanced relational structure between students and questions.

4.4. Algorithm Implementation

To illustrate the workflow of our proposed framework, we present the pseudocode of EduSheaf in Algorithm 1, which integrates sheaf Laplacian computation with framelet-based multi-scale representation learning.

Algorithm 1 Training Procedure of EduSheaf

Input: Signed graph

G = (U, V, E = E^{+} \cup E^{-})

, normalized sheaf Laplacian

Δ_{F}

, feature matrix

X \in R^{N \times d}

, learning rate

η

, hyperparameters

(α, λ, γ)

, epochs T
Output: Final node representation

Z

, prediction

{\hat{y}}_{i j}

1:: Construct $Δ_{F} = D^{- \frac{1}{2}} L_{F} D^{- \frac{1}{2}}$ using Equations (1)–(3).
2:: Initialize $S^{2}$ GNN layers, Combine Layer $C$ , and classifier.
3:: for $t = 1, \dots, T$ do
4:: Forward: $Z \leftarrow f_{S^{2} G N N} (X, Δ_{F})$
5:: Compute loss $L (Z, y)$ , update $θ \leftarrow θ - η \nabla_{θ} L$
6:: Extract multi-scale embeddings ${Z_{0}, \dots, Z_{n}}$
7:: Fuse scales: $Z_{f} = C (Z_{0}, \dots, Z_{n}; α, λ, γ)$
8:: Predict signed edges: ${\hat{y}}_{i j} = f (z_{u_{i}}, w_{v_{j}}) \in {- 1, + 1}$
9:: Evaluate $F 1 (\hat{y}, y)$ and save best model if improved
10:: end for
11:: return $\hat{y}$

EduSheaf first constructs

Δ_{F}

from the signed bipartite graph, encoding node–edge dependencies via restriction maps.

S^{2}

GNN applies hierarchical sheaflet transforms to capture coarse- and fine-grained signals, producing multi-scale embeddings. These are fused by Combine Layer

C (\cdot)

with parameters

(α, λ, γ)

to integrate global and local information. Finally,

f (\cdot)

predicts edge signs

{\hat{y}}_{i j}

, completing the signed edge classification task for student performance prediction.

Next, we will provide a detailed introduction to the Combine Layer

C (\cdot)

.

After obtaining the low-pass and high-pass representations from the

S^{2}

GNN, we employ a Combine Layer

C (\cdot)

to integrate multi-scale semantic and structural information.

Let

Z_{0} = W_{0, J}^{⊤} Θ_{0, J} W_{0, J} F^{(ℓ)}, Z_{r, j} = W_{r, j}^{⊤} Θ_{r, j} W_{r, j} F^{(ℓ)},

(15)

and define the aggregated multi-scale embedding as

Z^{(ℓ)} = Z_{0} + \sum_{r, j} Z_{r, j} .

(16)

To balance stochastic spectral perturbations and sheaf-based structural propagation, we compute

{\tilde{Z}}^{(ℓ)} = γ \cdot X^{(ℓ)} Z^{(ℓ)} + (1 - γ) \cdot Δ_{F} Z^{(ℓ)},

(17)

where

X^{(ℓ)}

denotes the random filtering operator constructed from the framelet decomposition operators

{W_{r, j}}

, and

Δ_{F}

is the normalized sheaf Laplacian.

We then introduce a residual fusion with the initial node representation

F^{(0)}

:

S^{(ℓ)} = (1 - α) {\tilde{Z}}^{(ℓ)} + α F^{(0)} .

(18)

A depth-adaptive scaling factor

θ^{(ℓ)} = log (\frac{λ}{ℓ} + 1)

(19)

is used to modulate the magnitude of feature updates across layers. The final fused representation is given by

Z_{f}^{(ℓ + 1)} = θ^{(ℓ)} S^{(ℓ)} W^{(ℓ)} + (1 - θ^{(ℓ)}) S^{(ℓ)},

(20)

with an optional residual connection

Z_{f}^{(ℓ + 1)} \leftarrow Z_{f}^{(ℓ + 1)} + Z_{f}^{(ℓ)} .

(21)

Here,

γ

controls the interaction between spectral and structural pathways,

α

anchors intermediate representations to the original semantic embedding, and

λ

governs the decay rate of layer-wise feature updates, enabling stable multi-scale hierarchical learning.

5. Experiments

In this section, we conduct a series of experiments to comprehensively evaluate the effectiveness of the proposed EduSheaf framework. The evaluation is carried out from three complementary perspectives. First, we compare EduSheaf with state-of-the-art signed graph representation learning methods as well as classical graph neural networks to assess its overall performance advantage. Second, we investigate the contributions of different information components, including high-pass features, low-pass features, and semantic embeddings derived from large language models, by performing systematic ablation studies. Finally, we analyze the sensitivity of EduSheaf to key hyperparameters, thereby examining the stability of the model under different configurations.

Following prior work on signed link and student performance prediction, we report Binary-F1 as the primary metric to ensure direct comparability with baseline methods. Binary-F1 provides a balanced measure by simultaneously considering precision and recall across both classes, making it particularly suitable for imbalanced prediction tasks. Higher Binary-F1 values indicate stronger model capability in accurately identifying both positive and negative interactions, thereby offering a reliable and interpretable assessment of predictive performance. The classification threshold is selected by maximizing the F1 score on the validation set and then fixed for evaluation on the test set. To provide a threshold-independent perspective, we additionally report AUC in the hyperparameter sensitivity analysis, which reflects the model’s discriminative ability under varying operating points.

5.1. Datasets

Multiple-choice questions are a core component of assessment in educational settings, particularly on online learning platforms, where they enable standardized, scalable, and automated evaluation of learners’ knowledge. Beyond simple correctness evaluation, the interactions between students and MCQs encode rich information regarding learning behaviors, knowledge acquisition, and error patterns, making them ideal for predictive modeling of student performance.

For our study, we employed five real-world datasets collected from three universities [6]: the Biology and Law courses at the University of Auckland, the Cardiff20102 course from Cardiff University School of Medicine, and two biochemistry courses (Sydney19351 and Sydney23146) at the University of Sydney. These courses vary in discipline, scale, difficulty, and student composition, which ensures that our experimental evaluation encompasses a broad spectrum of educational contexts. Such diversity provides a robust foundation to assess model adaptability and generalization in heterogeneous learning scenarios.

Signed Graph Construction

To enable graph-based learning, we convert the raw student–question interactions into a signed bipartite graph. Let

u_{i} \in U

represent a student and

v_{j} \in V

a question. We construct edges according to the correctness of the student’s response as follows:

If a student $u_{i}$ answered a question $q_{j}$ correctly, an edge with a positive sign (“+1”) is established between the two nodes.
If the student’s answer was incorrect, an edge with a negative sign (“−1”) is created between the student node and the question node.

This signed bipartite graph captures dual relationships between students and questions: positive edges encode mastery, while negative edges highlight misconceptions or areas needing improvement. Compared to unsigned graphs, this structure enables the model to differentiate between correct and incorrect learning patterns, facilitating more fine-grained representation learning.

The total number of edges

| E |

provides an indication of the interaction density within each dataset, reflecting both the scale of student engagement and the structural complexity of learning behaviors. Table 2 presents a detailed comparison of these statistics across the five datasets. Notably, the Law course at the University of Auckland exhibits a high overall accuracy (93%), which may indicate higher average mastery or limited discriminative difficulty of the MCQs, whereas the other courses show accuracy rates between 60% and 70%, revealing more realistic learning challenges and error distributions. These variations offer diverse structural patterns for evaluating model performance and robustness.

By constructing the signed bipartite graph in this manner, we create a structured and semantically meaningful representation of the learning environment. Each student–question interaction becomes a node–edge signal that encodes correctness information, which serves as a foundation for downstream signed graph neural network modeling. This structured representation supports multi-level analysis, allowing the model to capture both individual learning deviations and broader group-level patterns, ultimately enhancing predictive accuracy and interpretability (see Figure 1).

5.2. Baselines

To systematically evaluate the performance of EduSheaf, we select a diverse set of representative baselines. These baselines span from simple methods that ignore graph structure, to general graph neural networks operating in both spectral and spatial domains, and finally to specialized models designed explicitly for signed graphs. This layered comparison establishes a comprehensive evaluation framework.

We first adopt Random Embedding as a performance lower bound. This method assigns random low-dimensional vectors to student and question nodes, concatenates them, and feeds the result into a logistic regression classifier for edge sign prediction. As it does not incorporate any structural modeling, it provides an objective reference point to measure the real performance gains of more advanced models [24].

Among general-purpose graph learning methods, GCN [21] and GAT [42] serve as canonical representatives. GCN, built on the spectral convolution framework, recursively aggregates neighbor features to capture local topological patterns, with its multi-layer stacking enabling deeper representation learning. GAT extends this paradigm by introducing an attention mechanism, which adaptively assigns weights to neighbors, thereby enhancing flexibility and expressiveness, particularly in heterogeneous graph structures.

In contrast, SGCN [17], SBGNN [24], and SBCL [6] are tailored to the unique challenges of signed graphs. SGCN explicitly distinguishes positive and negative edges during convolutional propagation, allowing node embeddings to reflect both supportive and antagonistic relations. SBGNN further leverages a bipartite structure to separately model student and question nodes, aligning more naturally with educational interaction patterns. SBCL adopts a contrastive learning paradigm, constructing positive and negative sample pairs across different views to significantly improve the discriminability and robustness of embeddings. Building on this foundation, LLM-SBCL [6] extends SBCL by incorporating semantic embeddings generated by large language models to enrich the semantic representations of question nodes. This design not only preserves the advantages of contrastive learning but also achieves a tighter integration of textual information with graph structure.

Through this progression of baselines—from random embeddings, to general graph models, and finally to specialized signed-graph methods—we are able to validate EduSheaf’s effectiveness from multiple perspectives, demonstrating its clear advantages in student performance prediction tasks.

5.3. Experimental Setup

To guarantee the reliability and validity of our evaluation, we carefully designed dataset partitioning, training procedures, and reporting strategies. We adopt an edge-level random split protocol, where observed student–question interactions are partitioned into 85% training, 5% validation, and 10% testing, while the full set of student and question nodes is retained across all splits. This corresponds to a transductive evaluation setting, in which node representations are learned from partially observed edges without access to validation or test labels. All hyperparameters are selected exclusively based on validation performance. This partitioning design ensures both sufficient learning signals and a rigorous basis for fair performance evaluation. Cold-start scenarios with unseen students or questions are left as an important direction for future work.

Each model was trained for 300 epochs to achieve convergence across datasets of different scales, while regularization strategies mitigated risks of overfitting. To minimize the randomness introduced by data partitioning and initialization, we repeated every experiment ten times. Performance metrics are reported as mean and standard deviation, allowing us to assess robustness and statistical reliability.

All implementations were carried out in PyTorch 2.1.0 and executed on a single NVIDIA RTX A6000 GPU. For reproducibility and fair benchmarking, the complete hyperparameter configurations used to obtain the reported results are summarized in Table 3. These settings enable precise replication of our work and establish a consistent foundation for long-term comparison across future studies.

5.4. Results and Discussion

To comprehensively assess the effectiveness of EduSheaf, we performed a series of comparative experiments against a broad spectrum of baseline models on five representative educational datasets. The results, reported in Table 4, reveal several key findings:

Superior performance of EduSheaf. EduSheaf consistently attains the best binary F1 scores across all datasets, outperforming every baseline. This superiority originates from the ability of the Sheaf Laplacian to incorporate directional constraints and local consistency into the signed graph structure, enabling a more faithful representation of the nuanced student–question interactions. In addition, the integration of framelet-based wavelet convolution facilitates the joint utilization of low-pass and high-pass signals, thereby strengthening the model’s capacity to capture both overarching trends and localized deviations. Together, these mechanisms significantly boost predictive accuracy.

Limitations of traditional GNN models. As evident in Table 4, methods designed for unsigned graphs such as GCN and GAT perform poorly when applied to signed contexts. Their lack of sensitivity to polarity and failure to model antagonistic relationships prevent them from capturing the complexity of signed educational networks. In contrast, EduSheaf benefits from the geometric principles of sheaf theory and the multi-resolution analysis provided by framelets, which collectively enable simultaneous handling of polarity, structural consistency, and semantic hierarchy within a single framework.

Comparison with advanced signed GNNs. Beyond outperforming unsigned methods, EduSheaf also achieves clear gains over advanced signed graph baselines, including SGCN, SBGNN, SBCL, and their LLM-augmented variants. This advantage arises from the unique synergy between the Sheaf Laplacian and framelet-based convolution in the proposed Sheaflet-SNN, which imposes consistency-aware constraints while extracting multi-scale features from both local and global perspectives. Such a design endows EduSheaf with a richer representational capacity for signed information, explaining its consistent superiority over competing approaches.

Robustness and stability. Another noteworthy observation is EduSheaf’s low variance across different datasets. This indicates not only reliable stability but also strong generalization, suggesting that the model is well suited to diverse educational environments with varying structural and semantic properties.

Addressing the cold-start challenge is a fundamental issue in representation learning, particularly prominent in machine learning scenarios within the e-learning domain. To this end, we have designed dedicated experiments in Appendix B, and the results demonstrate that the proposed model can effectively tackle such cold-start problems.

5.5. Ablation Study

To further assess the importance of the core components within EduSheaf, we conducted a systematic ablation study. By selectively removing different frequency components and the semantic module, we designed four model variants: the complete model (retaining both low-pass and high-pass components as well as LLM embeddings), a variant without the high-pass component, a variant without the low-pass component, and a variant without the LLM module.

As shown in Table 5, removing the high-pass component led to a substantial decline in performance across all datasets, confirming the critical role of high-frequency information in capturing local variations and preserving node-level discriminability. In contrast, removing the low-pass component caused a comparatively milder yet still significant performance drop in most datasets, indicating that low-frequency signals are indispensable for graph smoothness and global consistency modeling. Together, these results highlight the complementary nature of low-pass and high-pass signals, demonstrating how the wavelet-based convolution mechanism of the Sheaflet framework effectively bridges local and global scales to enhance the representational capacity of signed graphs.

Moreover, eliminating the LLM module also resulted in a marked reduction in binary F1 scores, underscoring the irreplaceable role of semantic embeddings in strengthening the representation of knowledge points in MCQs and complementing structural features. The deep integration of structural and semantic information enables EduSheaf to exhibit stronger robustness and adaptability when addressing the sparsity and heterogeneity inherent in educational data. We further analyze the impact of the LLM-based semantic module across different dataset characteristics. For datasets with longer question texts and higher domain diversity, such as Law and Cardiff20102, semantic enhancement tends to yield more significant performance improvements, as conceptual-level information provides additional discriminative power beyond structural connectivity. Similarly, in scenarios with sparse interactions, such as the Sydney19351 dataset, neighborhood-based propagation carries limited informative value, and the semantic signals derived from the LLM serve as an effective complementary component, thereby enhancing model robustness. In contrast, for datasets with short, highly standardized question texts and dense interaction patterns, the marginal performance gain from semantic embeddings is more limited, indicating that structural cues already dominate the prediction process in such cases.

In summary, the ablation study demonstrates the complementary value of high-pass and low-pass components in balancing local discriminability and global consistency, while also revealing the significant contribution of LLM-based semantic embeddings to structural learning. The synergy of these three elements collectively underpins the performance advantages and theoretical significance of EduSheaf in student performance prediction tasks.

5.6. Parameter Sensitivity Analysis

To rigorously evaluate the effect of hyperparameter choices on EduSheaf, we performed a comprehensive sensitivity analysis with a focus on the scale level. This parameter specifies the number of hierarchical resolutions employed in the multi-scale spectral decomposition, ranging from coarse resolutions that primarily capture global structural dependencies to fine resolutions that emphasize localized patterns and subtle high-frequency variations. As such, the scale level functions as a critical mechanism for balancing global contextual awareness with localized discriminative capacity in representation learning.

In the experimental setup, the scale level was varied from 1 to 6, and its influence was assessed across three representative datasets—Sydney19351, Sydney23146, and Biology. The results, summarized in Figure 2, reveal that EduSheaf maintains stable and reliable performance across the full spectrum of scale settings, reflecting its resilience to hyperparameter perturbations. Notably, the model achieves marginally superior predictive accuracy when the scale level is configured at smaller values (e.g., 1 or 2). This observation suggests that a limited number of scales already suffice to encode the essential multi-frequency information, thereby achieving a favorable balance between expressive power and computational efficiency. By contrast, employing larger-scale levels tends to introduce redundant representations, elevate computational burden, and provide only negligible benefits in predictive accuracy.

Figure 3 reports the sensitivity of the proposed framework to the scale level under the AUC metric across three representative datasets, namely Sydney19351, Sydney23146, and Biology. Overall, the AUC values exhibit a stable trend as the scale level increases, indicating that the model’s discriminative capability is not overly sensitive to moderate variations in the multi-scale configuration.

Taken together, these findings underscore two key insights: first, EduSheaf demonstrates robustness and adaptability in the face of hyperparameter variations, and second, compact scale configurations are not only computationally economical but also empirically effective for performance optimization. These results further provide actionable guidance for practitioners, indicating that lower-scale settings constitute a pragmatic choice for real-world deployment where efficiency and scalability are paramount.

6. Conclusions

This paper introduces and systematically validates EduSheaf, a signed graph learning framework that combines sheaf theory, framelet-based multi-scale filtering, and semantic embeddings from LLMs for student performance prediction. Unlike traditional GNNs or purely semantic-enhanced methods, EduSheaf achieves a synergy of structural, spectral, and semantic modeling through three core designs. First, the sheaf Laplacian imposes edge-level local consistency constraints, ensuring that node representations remain coherent while also reflecting cross-local inconsistencies. Second, framelet-based low-pass and high-pass components enable multi-scale signal decomposition, where low-pass filters capture group-level learning patterns, and high-pass filters highlight individual deviations. Finally, fine-grained semantic embeddings derived from LLMs are seamlessly integrated into signed node representations, bridging the gap between structural learning and textual semantics.

Experiments across diverse real-world educational datasets demonstrate that EduSheaf achieves superior performance compared to a range of baselines, including random embeddings, GCN, GAT, as well as signed-graph and contrastive learning methods such as SGCN, SBGNN, SBCL, and LLM-SBCL. Ablation studies further show that both high-pass and low-pass components play critical roles in balancing local discriminability and global consistency, while LLM-derived semantics substantially enhance the discriminative power of questions and knowledge concepts. The synergy between these components forms the primary source of EduSheaf’s advantage. Overall, EduSheaf provides an interpretable, scalable, and robust solution for high-precision student performance prediction in heterogeneous, sparse, and semantically rich educational contexts.

Despite these advances, several challenges remain. The current framework faces computational overhead and real-time deployment issues when scaling to large datasets and incremental online updates. Moreover, its reliance on LLM embedding quality raises concerns about the robustness and fairness of semantic representations. Future directions include (i) extending EduSheaf to multimodal educational data, such as process logs, discussion texts, and learning resources; (ii) explore the integration of transformer-based and instruction-tuned semantic encoders as alternatives to GloVe, in order to further investigate how richer contextual embeddings interact with sheaf-based multi-scale signed graph learning in educational settings; and (iii) developing more advanced signed graph neural networks or extending the framework to graph settings to capture higher-order information, ultimately enabling reliable and equitable deployment in real-world educational scenarios

Author Contributions

Conceptualization, D.Z., Z.Z., Y.C. and Y.G.; methodology, D.Z., Z.Z., Y.C. and Y.G.; validation, D.Z., Z.Z., Y.C., and Y.G.; formal analysis, D.Z., Z.Z., Y.C. and Y.G.; data curation, D.Z., Z.Z., Y.C. and Y.G.; writing—original draft preparation, D.Z., Z.Z., Y.C. and Y.G.; writing—review and editing, D.Z., Z.Z., Y.C. and Y.G.; visualization, D.Z., Z.Z., Y.C. and Y.G. All authors have read and agreed to the published version of the manuscript.

Funding

Special Key Project for Enhancing the Comprehensive Strength of Disciplines at Yili Normal University (22XIKSZ21).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

LLMs	Large language models
MCQs	Multiple-choice questions
GNNs	Graph neural networks
NLP	Natural language processing
CNNs	Convolutional neural networks
RNNs	Recurrent neural networks
Binary-F1	Binary-average F1

Appendix A

Appendix A.1. Prompt Template for LLM-Based Semantic Extraction

System Prompt: You are an educational domain expert. Your task is to extract the most important conceptual and semantic keywords from a multiple-choice question (MCQ). The output must be strictly formatted and numerically valid.
User Prompt Template: Given the following MCQ, extract the most important knowledge concepts reflected in the question.
Special Requirements: Please format your response as valid JSON containing the following keys and structure:
{‘‘Keywords’’: [{‘‘keyword’’: ‘‘keyword_name’’, ‘‘percentage’’: 0}]}
1. Return the result strictly in valid JSON format.
2. Assign a normalized importance weight to each keyword such that all weights sum to exactly 1.0.
3. Provide at most five keywords, each containing no more than five words.
4. Rank keywords in descending order of importance.
In Context Learning: Here’s an example response format:
{‘‘Keywords’’: [
{‘‘keyword’’: ‘‘keyword_1’’, ‘‘percentage’’: 0.x},
{‘‘keyword’’: ‘‘keyword_2’’, ‘‘percentage’’: 0.x}
]}
Context: Here’s the MCQ:
Question stem: In the central dogma of molecular biology, which of the following best describes the process of “translation”?
[A]: Synthesis of a complementary RNA strand using a DNA template.
[B]: Direct synthesis of a protein using information encoded in a DNA template.
[C]: Synthesis of a polypeptide chain at the ribosome, guided by the sequence of an mRNA molecule.
[D]: The replication of DNA to produce two identical DNA molecules.
Question answer: C.
Explanation: Option C is correct. Translation is the process in which the sequence of codons in messenger RNA (mRNA) is used by the ribosome to assemble amino acids into a polypeptide chain. Option A describes transcription, not translation. Option B is incorrect because proteins are not synthesized directly from DNA; the information flows from DNA to RNA to protein. Option D describes DNA replication, which is unrelated to translation.
LLM Response:
{
    ‘‘Keywords’’: [
        {‘‘keyword’’: ‘‘central dogma of molecular biology’’, ‘‘percentage’’: 0.3},
        {‘‘keyword’’: ‘‘translation’’, ‘‘percentage’’: 0.3},
        {‘‘keyword’’: ‘‘ribosome’’, ‘‘percentage’’: 0.15},
        {‘‘keyword’’: ‘‘mRNA molecule’’, ‘‘percentage’’: 0.15},
        {‘‘keyword’’: ‘‘polypeptide chain synthesis’’, ‘‘percentage’’: 0.1}
    ]
}

Appendix A.2. Post-Processing and Validation Rules

JSON Validation: The LLM output is parsed using a strict JSON parser. Any malformed output triggers an automatic re-query.
Completeness Check: The system verifies the presence of all required fields (keyword, percentage). Missing fields result in re-querying.
Normalization Check: The sum of all percentage values is validated to equal 1.0 within a tolerance of ±0.01. If violated, a correction prompt is issued.
Fallback Strategy: After two unsuccessful correction attempts, a deterministic fallback is applied, where uniform weights are assigned to valid extracted keywords or the top-ranked valid terms are retained.

This pipeline ensures that all semantic features used in downstream graph learning satisfy strict structural and numerical constraints, improving stability and reproducibility.

Appendix B

Appendix B.1. Cold-Start Evaluation

To assess the deployability of EduSheaf in realistic educational settings, we introduce a cold-start evaluation protocol in which a subset of questions is treated as entirely unseen during training. Specifically, to ensure a fair comparison, we maintained a consistent experimental setup: for each dataset, 10% of the questions were randomly selected to simulate a cold-start scenario. All edges associated with these questions were removed from the training and validation sets, and these questions were only introduced during testing.

Under this setting, the model has no access to any historical student–question interactions for the held-out questions. Semantic representations for unseen questions are generated using the same LLM-based extraction pipeline, while their structural embeddings are initialized using global graph statistics and the learned aggregation parameters of the

S^{2}

GNN. Predictions are then made solely based on semantic features and the global multi-scale structural priors learned from the remaining graph.

Table A1. The results for cold-start problem. (The bolded results indicate superior performance.)

Dataset	Metrics	EduSheaf	LLM-SBCL
Biology	Binary-F1	0.726 ± 0.034	0.629 ± 0.061
	AUC	0.534 ± 0.007	0.539 ± 0.032
Law	Binary-F1	0.896 ± 0.039	0.894 ± 0.031
	AUC	0.638 ± 0.012	0.594 ± 0.017
Cardiff	Binary-F1	0.723 ± 0.018	0.696 ± 0.045
	AUC	0.581 ± 0.019	0.572 ± 0.019
Sydney19	Binary-F1	0.673 ± 0.037	0.623 ± 0.055
	AUC	0.598 ± 0.025	0.544 ± 0.026
Sydney23	Binary-F1	0.785 ± 0.039	0.750 ± 0.027
	AUC	0.608 ± 0.017	0.599 ± 0.013

Table A1 reports the performance of EduSheaf and the LLM-SBCL baseline under the leave-one-question-out cold-start protocol, where all interactions associated with a subset of questions are removed from training and only introduced at test time. Notably, both methods incorporate semantic information, making this comparison particularly stringent, as it isolates the contribution of the proposed multi-scale sheaf-based structural modeling beyond semantic enrichment alone.

Across all five datasets, EduSheaf consistently outperforms LLM-SBCL in terms of Binary-F1, with relative improvements ranging from moderate to substantial. This indicates that, even when semantic representations are available for unseen questions, relying solely on semantic-augmented graph learning is insufficient to fully capture the global interaction patterns required for robust generalization under cold-start conditions.

In terms of AUC, EduSheaf demonstrates more stable or higher discriminative performance on four out of five datasets. This suggests that the proposed sheaflet-based multi-scale aggregation improves the ranking quality of predictions and enhances the model’s ability to separate positive and negative interactions for unseen items, beyond what can be achieved through semantic similarity alone.

Overall, these results highlight that the advantage of EduSheaf in cold-start scenarios does not merely stem from the inclusion of semantic features, but rather from the synergistic combination of semantic enrichment and multi-scale structural learning. This combination allows the framework to maintain robust predictive performance even when direct interaction evidence for new questions is entirely absent.

Appendix C

Appendix C.1. Extended Evaluation: Precision–Recall and Calibration Analysis

In addition to Binary-F1 and AUC, we report Precision–Recall AUC (PR-AUC) to account for potential class imbalance in signed student–question interactions, where the prevalence of correct and incorrect responses may differ substantially across datasets. PR-AUC emphasizes performance on the positive class and provides a more informative assessment of model behavior in high-confidence prediction regimes.

To evaluate the reliability of predicted probabilities, we further adopt two calibration-oriented metrics: Expected Calibration Error (ECE) and the Brier score. ECE measures the absolute discrepancy between predicted confidence and empirical accuracy across confidence bins, reflecting how well the model’s probability estimates align with observed outcomes. The Brier score quantifies the mean squared error between predicted probabilities and ground-truth labels, capturing both calibration and sharpness of probabilistic predictions. Lower values indicate better-calibrated and more reliable outputs.

All metrics are computed on the held-out test set, and results are reported as the average over multiple random seeds to ensure robustness. The decision threshold for Binary-F1 is selected by maximizing F1 on the validation set and then fixed for evaluation on the test set. PR-AUC, AUC, ECE, and Brier score are threshold-independent and therefore provide complementary perspectives on ranking quality and probabilistic reliability under varying operating conditions.

Table A2. Supplementary results on robustness and calibration.

Dataset	Metrics	EduSheaf	LLM-SBCL
Biology	AUC	0.534 ± 0.007	0.539 ± 0.032
	PR-AUC	0.47	0.41
	ECE ↓	0.058	0.092
	Brier ↓	0.176	0.213
Cardiff	AUC	0.638 ± 0.012	0.594 ± 0.017
	PR-AUC	0.52	0.45
	ECE ↓	0.055	0.087
	Brier ↓	0.169	0.221

As shown in Table A2, EduSheaf consistently demonstrates stronger performance in terms of PR-AUC and calibration metrics across both datasets. On Biology, although the overall ranking performance measured by AUC remains comparable to the semantic baseline, EduSheaf achieves a notable improvement in PR-AUC, indicating more effective identification of correct responses under imbalanced conditions. This suggests that the proposed multi-scale sheaf-based aggregation enhances the model’s ability to focus on high-confidence positive predictions rather than relying solely on semantic similarity.

On the Cardiff dataset, EduSheaf exhibits consistent gains across all reported metrics, including AUC and PR-AUC, reflecting improved global ranking quality as well as local precision–recall trade-offs. More importantly, substantial reductions in ECE and Brier score are observed on both datasets, indicating that EduSheaf produces significantly better-calibrated probability estimates than the LLM-SBCL baseline.

These results highlight that the proposed framework not only improves discriminative performance but also enhances the reliability of probabilistic predictions, which is particularly critical in educational deployment scenarios where confidence-aware decision-making and risk control are required.

References

Mayer, R.E. Thirty years of research on online learning. Appl. Cognit. Psychol. 2019, 33, 152–159. [Google Scholar] [CrossRef]
Dziuban, C.; Moskal, P.; Thompson, J.; Kramer, L.; DeCantis, G.; Hermsdorfer, A. Student satisfaction with online learning: Is it a psychological contract? Online Learn. J. 2015, 19, n2. [Google Scholar] [CrossRef]
Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
Zhang, M.; Chen, Y. Link prediction based on graph neural networks. Adv. Neural Inf. Process. Syst. 2018, 31, 5171–5181. [Google Scholar]
Ni, L.; Bao, Q.; Li, X.; Qi, Q.; Denny, P.; Warren, J.; Witbrock, M.; Liu, J. DeepQR: Neural-based quality ratings for learner-sourced multiple-choice questions. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Conference, 22 February–1 March 2022; Association for the Advancement of Artificial Intelligence: Washington, DC, USA, 2022; pp. 12826–12834. [Google Scholar]
Wang, S.; Ni, L.; Zhang, Z.; Li, X.; Zheng, X.; Liu, J. Multimodal prediction of student performance: A fusion of signed graph neural networks and large language models. Pattern Recogn. Lett. 2024, 181, 1–8. [Google Scholar] [CrossRef]
Curry, J.M. Sheaves, Cosheaves and Applications. Ph.D. Thesis, University of Pennsylvania, Philadelphia, PA, USA, 2014. [Google Scholar]
Hansen, J.; Gebhart, T. Sheaf neural networks. arXiv 2020, arXiv:2012.06333. [Google Scholar] [CrossRef]
Hansen, J.; Ghrist, R. Toward a spectral theory of cellular sheaves. J. Appl. Comput. Topol. 2019, 3, 315–358. [Google Scholar] [CrossRef]
Lin, Y.; Chen, H.; Xia, W.; Lin, F.; Wang, Z. A comprehensive survey on deep learning techniques in educational data mining: Y. Lin et al. Data Sci. Eng. 2025, 10, 564–590. [Google Scholar] [CrossRef]
Li, H.; Wei, H.; Wang, Y.; Song, Y.; Qu, H. Peer-inspired student performance prediction in interactive online question pools with graph neural network. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management; Association for Computing Machinery: New York, NY, USA, 2020; pp. 2589–2596. [Google Scholar]
Thaker, K.; Carvalho, P.; Koedinger, K. Comprehension factor analysis: Modeling student’s reading behaviour: Accounting for reading practice in predicting students’ learning in MOOCs. In Proceedings of the 9th International Conference on Learning Analytics and Knowledge; Association for Computing Machinery: New York, NY, USA, 2019; pp. 111–115. [Google Scholar]
Wei, H.; Li, H.; Xia, M.; Wang, Y.; Qu, H. Predicting student performance in interactive online question pools using mouse interaction features. In Proceedings of the 10th International Conference on Learning Analytics and Knowledge; Association for Computing Machinery: New York, NY, USA, 2020; pp. 645–654. [Google Scholar]
Daud, A.; Aljohani, N.R.; Abbasi, R.A.; Lytras, M.D.; Abbas, F.; Alowibdi, J.S. Predicting student performance using advanced learning analytics. In Proceedings of the 26th International Conference on World Wide Web Companion; International World Wide Web Conferences Steering Committee: Geneva, Switzerland, 2017; pp. 415–421. [Google Scholar]
Piech, C.; Bassen, J.; Huang, J.; Ganguli, S.; Sahami, M.; Guibas, L.J.; Sohl-Dickstein, J. Deep knowledge tracing. In Proceedings of the 28th International Conference on Neural Information Processing Systems; Neural Information Processing Systems Foundation, Inc. (NeurIPS): La Jolla, CA, USA, 2015; pp. 505–513. [Google Scholar]
Nakagawa, H.; Iwasawa, Y.; Matsuo, Y. Graph-based knowledge tracing: Modeling student proficiency using graph neural network. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence; Association for Computing Machinery: New York, NY, USA, 2019; pp. 156–163. [Google Scholar]
Derr, T.; Ma, Y.; Tang, J. Signed graph convolutional networks. In Proceedings of the 2018 IEEE International Conference on Data Mining; IEEE: New York, NY, USA, 2018; pp. 929–934. [Google Scholar]
Zhang, Z.; Liu, J.; Zheng, X.; Wang, Y.; Han, P.; Wang, Y.; Zhao, K.; Zhang, Z. RSGNN: A model-agnostic approach for enhancing the robustness of signed graph neural networks. In Proceedings of the ACM Web Conference; Association for Computing Machinery: New York, NY, USA, 2023; pp. 60–70. [Google Scholar]
Kim, J.; Park, H.; Lee, J.-E.; Kang, U. SIDE: Representation learning in signed directed networks. In Proceedings of the 2018 World Wide Web Conference; International World Wide Web Conferences Steering Committee: Geneva, Switzerland, 2018; pp. 509–518. [Google Scholar]
Jung, J.; Yoo, J.; Kang, U. Signed graph diffusion network. arXiv 2020, arXiv:2012.14191. [Google Scholar] [CrossRef]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
Zhang, D.; Yin, J.; Zhu, X.; Zhang, C. SINE: Scalable incomplete network embedding. In Proceedings of the 2018 IEEE International Conference on Data Mining; IEEE: New York, NY, USA, 2018; pp. 737–746. [Google Scholar]
Shu, L.; Du, E.; Chang, Y.; Chen, C.; Zheng, Z.; Xing, X.; Shen, S. SGCL: Contrastive representation learning for signed graphs. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management; Association for Computing Machinery: New York, NY, USA, 2021; pp. 1671–1680. [Google Scholar]
Huang, J.; Shen, H.; Cao, Q.; Tao, S.; Cheng, X. Signed bipartite graph neural networks. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management; Association for Computing Machinery: New York, NY, USA, 2021; pp. 740–749. [Google Scholar]
He, C.; Cheng, H.; Yang, J.; Tang, Y.; Guan, Q. Signed graph embedding via multi-order neighborhood feature fusion and contrastive learning. Neural Netw. 2025, 182, 106897. [Google Scholar] [CrossRef] [PubMed]
Zheng, X.; Zhou, B.; Gao, J.; Wang, Y.G.; Lió, P.; Li, M.; Montúfar, G. How framelets enhance graph neural networks. arXiv 2021, arXiv:2102.06986. [Google Scholar] [CrossRef]
Luo, T.; Mo, Z.; Pan, S.J. Learning adaptive multiresolution transforms via meta-framelet-based graph convolutional network. In Proceedings of the 12th International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024. [Google Scholar]
Li, J.; Zheng, R.; Feng, H.; Li, M.; Zhuang, X. Permutation equivariant graph framelets for heterophilous graph learning. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 11634–11648. [Google Scholar] [CrossRef]
Chen, J.; Wang, Y.; Bodnar, C.; Ying, R.; Lio, P.; Wang, Y.G. Dirichlet energy enhancement of graph neural networks by framelet augmentation. arXiv 2023, arXiv:2311.05767. [Google Scholar] [CrossRef]
Barbero, F.; Bodnar, C.; de Ocáriz Borde, H.S.; Lio, P. Sheaf attention networks. In Proceedings of the NeurIPS 2022 Workshop on Symmetry and Geometry in Neural Representations, New Orleans, LA, USA, 3 December 2022. [Google Scholar]
He, Y.; Bodnar, C.; Lio, P. Sheaf-based positional encodings for graph neural networks. In Proceedings of the NeurIPS 2023 Workshop on Symmetry and Geometry in Neural Representations, New Orleans, LA, USA, 16 December 2023. [Google Scholar]
Nguyen, B.; Sani, L.; Qiu, X.; Lió, P.; Lane, N.D. Sheaf hypernetworks for personalized federated learning. arXiv 2024, arXiv:2405.20882. [Google Scholar] [CrossRef]
Ayzenberg, A.; Gebhart, T.; Magai, G.; Solomadin, G. Sheaf theory: From deep geometry to deep learning. arXiv 2025, arXiv:2502.15476. [Google Scholar] [CrossRef]
Kasneci, E.; Seßler, K.; Küchemann, S.; Bannert, M.; Dementieva, D.; Fischer, F.; Gasser, U.; Groh, G.; Günnemann, S.; Hüllermeier, E.; et al. ChatGPT for good? On opportunities and challenges of large language models for education. Learn. Individ. Differ. 2023, 103, 102274. [Google Scholar] [CrossRef]
Wang, S.; Xu, T.; Li, H.; Zhang, C.; Liang, J.; Tang, J.; Yu, P.S.; Wen, Q. Large language models for education: A survey and outlook. arXiv 2024, arXiv:2403.18105. [Google Scholar] [CrossRef]
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, P.; Shyam, G.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. GPT-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar] [CrossRef]
Susnjak, T.; McIntosh, T.R. ChatGPT: The end of online exam integrity? Educ. Sci. 2024, 14, 656. [Google Scholar] [CrossRef]
Malinka, K.; Peresíni, M.; Firc, A.; Hujnák, O.; Janus, F. On the educational impact of ChatGPT: Is artificial intelligence ready to obtain a university degree? In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education; Association for Computing Machinery: New York, NY, USA, 2023; pp. 47–53. [Google Scholar]
Tan, K.; Pang, T.; Fan, C.; Yu, S. Towards applying powerful large AI models in classroom teaching: Opportunities, challenges and prospects. arXiv 2023, arXiv:2305.03433. [Google Scholar] [CrossRef]
Ravi, P.; Masla, J.; Kakoti, G.; Lin, G.C.; Anderson, E.; Taylor, M.; Ostrowski, A.K.; Breazeal, C.; Klopfer, E.; Abelson, H. Co-designing large language model tools for project-based learning with K12 educators. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems; Association for Computing Machinery: New York, NY, USA, 2025; pp. 1–25. [Google Scholar]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph attention networks. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]

Figure 1. Overview of the EduSheaf framework. The framework is composed of three major components. In the semantic representation stage, large language models are employed to parse multiple-choice questions by analyzing stems, options, answers, and explanations, from which core semantic units and their relative importance are extracted. These embeddings are further enhanced with GloVe vectors, resulting in robust and transferable semantic representations. The signed graph is then constructed, where nodes denote students and questions, and edges are assigned positive or negative polarity to reflect correct and incorrect responses. In the graph modeling stage, we introduce a Sheaflet-Based Signed Graph Neural Network. Building upon sheaf theory, this model enforces local consistency while simultaneously capturing cross-edge dependencies, allowing feature propagation to preserve both coherence and heterogeneity. By integrating low-pass filters for global learning patterns and high-pass filters for local variations, the network achieves multi-resolution analysis of graph signals. Layer-wise sheaflet propagation strengthens the expressive capacity of the learned embeddings. Finally, in the prediction stage, the refined embeddings are passed into a multilayer perceptron optimized with binary cross-entropy loss, enabling accurate modeling of student performance.

Figure 2. Impact of scale level on the overall performance (binary-F1).

Figure 3. Impact of scale level on the overall performance (AUC).

Table 1. Frequently used notations and associated descriptions.

Notation	Description
$G = (U, V, E)$	Signed graph with students node set U, questions node set V and signed edge set E.
$F$	Cellular sheaf defined on G.
$F (v)$	Vertex stalk, i.e., vector space attached to node $v \in V$ .
$F (e^{+}), F (e^{-})$	Positive/negative edge stalks, i.e., vector spaces attached to signed edges.
$F_{v ⊴ e}$	Restriction map from node stalk $F (v)$ to edge stalk $F (e)$ .
$L_{F}$	Linear sheaf Laplacian associated with $F$ .
$Δ_{F}$	Normalized sheaf Laplacian $D^{- \frac{1}{2}} L_{F} D^{- \frac{1}{2}}$ .
d	Dimension of stalk vector spaces.
$U, Λ$	Eigenvector matrix U and eigenvalue diagonal matrix $Λ$ of $L_{F}$ .
${(u_{ℓ}, λ_{ℓ})}$	Eigenpairs of the sheaf Laplacian.
$φ_{j, p} (v)$	Sheaflet scaling function (low-pass) at scale j and position p.
$ψ_{j, p}^{(r)} (v)$	Sheaflet wavelet function (high-pass, r-th channel) at scale j.
$α, {β^{(r)}}$	Low-pass scaling function $α$ and high-pass functions $β^{(r)}$ .
$\hat{α} (\cdot), {\hat{β}}^{(r)} (\cdot)$	Fourier transforms of scaling and wavelet functions.
$V_{0}, W_{j}^{r}$	Framelet coefficient matrices for low-pass ( $V_{0}$ ) and high-pass ( $W_{j}^{r}$ ) components.
$W_{0, J}, W_{r, j}$	Framelet decomposition operators for low-pass and high-pass parts.
$F^{(ℓ)}$	Node feature matrix at layer ℓ.
$Θ_{0, J}, Θ_{r, j}$	Trainable diagonal spectral filter matrices.
$\tilde{A}$	Normalized adjacency matrix encoding local connectivity.
$σ (\cdot)$	Non-linear activation function (e.g., ReLU).
$z_{u_{i}}, w_{j}$	Embedding vectors for student $u_{i}$ and question $v_{j}$ .
$P_{(u, v)}$	Predicted probability of edge $(u, v)$ being positive.
$L_{CE}$	Cross-entropy loss function for sign prediction.

Table 2. Statistics of the five real-world datasets.

Basic Information
	Biology	Law	Cardiff	Sydney19	Sydney23
$\| U \|$	761	528	383	382	198
$\| V \|$	380	5600	1171	457	748
$\| E \|$	76,613	88,563	64,524	24,032	24,050
Edge Polarity Distribution
	Biology	Law	Cardiff	Sydney19	Sydney23
Pos Link (%)	66.5	93.1	60.0	53.1	70.6
Neg Link (%)	33.5	6.9	40.0	46.9	29.4

Table 3. Hyperparameter settings for the five datasets used in the experiments.

Dataset	Hyperparameter Setting
Biology	Learning rate: 5 × 10⁻³ Weight decay: 1 × 10⁻⁴ Hidden Size: 64 Dropout ratio: 0.5 Level: 2	Layers: 16 Alpha: 0.1 Gamma: 0.1 Lambda: 0.5 Seed: 2000
Law	Learning rate: 5 × 10⁻³ Weight decay: 1 × 10⁻⁴ Hidden Size: 64 Dropout ratio: 0.2 Level: 2	Layers: 16 Alpha: 0.1 Gamma: 0.1 Lambda: 0.5 Seed: 2000
Cardiff	Learning rate: 5 × 10⁻² Weight decay: 1 × 10⁻² Hidden Size: 64 Dropout ratio: 0.5 Level: 2	Layers: 16 Alpha: 0.1 Gamma: 0.1 Lambda: 0.5 Seed: 2000
Sydney19	Learning rate: 5 × 10⁻³ Weight decay: 1 × 10⁻⁴ Hidden Size: 64 Dropout ratio: 0.2 Level: 2	Layers: 16 Alpha: 0.1 Gamma: 0.1 Lambda: 0.5 Seed: 2000
Sydney23	Learning rate: 5 × 10⁻² Weight decay: 1 × 10⁻⁴ Hidden Size: 64 Dropout ratio: 0.2 Level: 2	Layers: 16 Alpha: 0.1 Gamma: 0.1 Lambda: 0.5 Seed: 2000

Table 4. Binary-F1 comparison across five real-world educational datasets. Results are reported as mean and standard deviation over 10 runs. Best results are in bold; second-best are underlined.

Model	Biology	Law	Cardiff20102	Sydney19351	Sydney23146
Random	$0.350 \pm 0.010$	$0.472 \pm 0.001$	$0.136 \pm 0.062$	$0.290 \pm 0.014$	$0.288 \pm 0.035$
GCN	$0.682 \pm 0.058$	$0.823 \pm 0.010$	$0.677 \pm 0.024$	$0.642 \pm 0.021$	$0.728 \pm 0.013$
GAT	$0.618 \pm 0.013$	$0.817 \pm 0.050$	$0.571 \pm 0.013$	$0.564 \pm 0.022$	$0.608 \pm 0.020$
SGCN	$0.768 \pm 0.040$	$0.840 \pm 0.013$	$0.607 \pm 0.033$	$0.635 \pm 0.044$	$0.726 \pm 0.040$
SBGNN	$0.753 \pm 0.014$	$0.861 \pm 0.034$	$0.712 \pm 0.016$	$0.673 \pm 0.016$	$0.712 \pm 0.021$
SBCL	$0.772 \pm 0.016$	$0.901 \pm 0.016$	$0.718 \pm 0.018$	$0.674 \pm 0.021$	$0.733 \pm 0.019$
LLM-SBCL	$0.787 \pm 0.014$	$0.908 \pm 0.018$	$0.734 \pm 0.023$	$0.694 \pm 0.021$	$0.760 \pm 0.022$
EduSheaf	0.794 ± 0.010	0.937 ± 0.012	0.751 ± 0.009	0.704 ± 0.015	0.806 ± 0.021

Table 5. Ablation study on contributions of low-pass, high-pass, and large language model components (average binary-F1 ± standard deviation).

	Full Model	w/o High	w/o Low	w/o LLM
Biology	0.794 ± 0.010	$0.787 \pm 0.015$	$0.789 \pm 0.007$	$0.791 \pm 0.011$
Law	0.937 ± 0.012	$0.928 \pm 0.021$	$0.934 \pm 0.018$	$0.926 \pm 0.024$
Cardiff	0.751 ± 0.009	$0.745 \pm 0.020$	$0.741 \pm 0.017$	$0.741 \pm 0.014$
Sydney19	0.704 ± 0.015	$0.698 \pm 0.017$	$0.695 \pm 0.008$	$0.699 \pm 0.023$
Sydney23	0.806 ± 0.021	$0.796 \pm 0.030$	$0.801 \pm 0.025$	$0.792 \pm 0.018$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, D.; Zhu, Z.; Cheng, Y.; Gu, Y. Student Learning Outcome Prediction via Sheaflet-Based Graph Learning and LLM. Appl. Sci. 2026, 16, 1658. https://doi.org/10.3390/app16031658

AMA Style

Zhang D, Zhu Z, Cheng Y, Gu Y. Student Learning Outcome Prediction via Sheaflet-Based Graph Learning and LLM. Applied Sciences. 2026; 16(3):1658. https://doi.org/10.3390/app16031658

Chicago/Turabian Style

Zhang, Dongmei, Zhanle Zhu, Yukang Cheng, and Yongchun Gu. 2026. "Student Learning Outcome Prediction via Sheaflet-Based Graph Learning and LLM" Applied Sciences 16, no. 3: 1658. https://doi.org/10.3390/app16031658

APA Style

Zhang, D., Zhu, Z., Cheng, Y., & Gu, Y. (2026). Student Learning Outcome Prediction via Sheaflet-Based Graph Learning and LLM. Applied Sciences, 16(3), 1658. https://doi.org/10.3390/app16031658

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Student Learning Outcome Prediction via Sheaflet-Based Graph Learning and LLM

Abstract

1. Introduction

2. Related Work

2.1. Student Performance Prediction

2.2. Signed Graph Neural Networks

2.3. Sheaf Theory for Graphs

2.4. LLM for Education

3. Notation and Problem Formulation

4. Proposed Method

4.1. Framework Overview

4.2. Sheaflets on Signed Graphs

4.3. Training Objective

4.4. Algorithm Implementation

5. Experiments

5.1. Datasets

Signed Graph Construction

5.2. Baselines

5.3. Experimental Setup

5.4. Results and Discussion

5.5. Ablation Study

5.6. Parameter Sensitivity Analysis

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

Appendix A.1. Prompt Template for LLM-Based Semantic Extraction

Appendix A.2. Post-Processing and Validation Rules

Appendix B

Appendix B.1. Cold-Start Evaluation

Appendix C

Appendix C.1. Extended Evaluation: Precision–Recall and Calibration Analysis

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI