Next Article in Journal
Chemistry: Symmetry/Asymmetry—Feature Papers and Reviews
Previous Article in Journal
Inclined MHD Flow of Carreau Hybrid Nanofluid over a Stretching Sheet with Nonlinear Radiation and Arrhenius Activation Energy Under a Symmetry-Inspired Modeling Perspective
Previous Article in Special Issue
Insights into IF-Geodetic Convexity in Intuitionistic Fuzzy Graphs: Harnessing the IF-Geodetic Wiener Index for Global Human Trading Analysis and IF-Geodetic Cover for Gateway Node Identification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Adaptive Knowledge Assessment via Symmetric Hierarchical Bayesian Neural Networks with Graph Symmetry-Aware Concept Dependencies

1
Rossier School of Education, University of Southern California, Los Angeles, CA 90089, USA
2
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
*
Author to whom correspondence should be addressed.
Symmetry 2025, 17(8), 1332; https://doi.org/10.3390/sym17081332
Submission received: 23 June 2025 / Revised: 26 July 2025 / Accepted: 5 August 2025 / Published: 15 August 2025
(This article belongs to the Special Issue Advances in Graph Theory Ⅱ)

Abstract

Traditional educational assessment systems suffer from inefficient question selection strategies that fail to optimally probe student knowledge while requiring extensive testing time. We present a novel hierarchical probabilistic neural framework that integrates Bayesian inference with symmetric deep neural architectures to enable adaptive, efficient knowledge assessment. Our method models student knowledge as latent representations within a graph-structured concept dependency network, where probabilistic mastery states, updated through variational inference, are encoded by symmetric graph properties and symmetric concept representations that preserve structural equivalences across similar knowledge configurations. The system employs a symmetric dual-network architecture: a concept embedding network that learns scale-invariant hierarchical knowledge representations from assessment data and a question selection network that optimizes symmetric information gain through deep reinforcement learning with symmetric reward structures. We introduce a novel uncertainty-aware objective function that leverages symmetric uncertainty measures to balance exploration of uncertain knowledge regions with exploitation of informative question patterns. The hierarchical structure captures both fine-grained concept mastery and broader domain understanding through multi-scale graph convolutions that preserve local graph symmetries and global structural invariances. Our symmetric information-theoretic method ensures balanced assessment strategies that maintain diagnostic equivalence across isomorphic concept subgraphs. Experimental validation on large-scale educational datasets demonstrates that our method achieves 76.3% diagnostic accuracy while reducing the question count by 35.1% compared to traditional assessments. The learned concept embeddings reveal interpretable knowledge structures with symmetric dependency patterns that align with pedagogical theory. Our work generalizes across domains and student populations through symmetric transfer learning mechanisms, providing a principled framework for intelligent tutoring systems and adaptive testing platforms. The integration of probabilistic reasoning with symmetric neural pattern recognition offers a robust solution to the fundamental trade-off between assessment efficiency and diagnostic precision in educational technology.

1. Introduction

Current educational assessment systems suffer from fundamental inefficiencies that severely limit their diagnostic precision and practical utility. Traditional adaptive testing methodologies employ question selection strategies that fail to optimally probe student knowledge, requiring extensive testing time while delivering suboptimal diagnostic accuracy [1]. These systems typically ignore the underlying structural relationships between educational concepts, leading to redundant questioning patterns and missed opportunities for efficient knowledge inference. Moreover, existing approaches lack principled uncertainty quantification mechanisms, resulting in unreliable confidence estimates that undermine high-stakes educational decisions [2].
Educational assessment systems inherently possess symmetric properties that, when properly leveraged, can dramatically enhance both diagnostic precision and computational efficiency. The fundamental insight lies in recognizing that knowledge structures exhibit hierarchical symmetries—where equivalent conceptual relationships manifest across different educational domains, and isomorphic graph patterns encode similar pedagogical dependencies. However, current assessment methodologies fail to exploit these symmetric properties, resulting in inefficient strategies that ignore the underlying structural invariances in concept dependency networks.
The emergence of symmetric neural architectures in machine learning has revealed the profound impact of incorporating symmetry constraints into model design, leading to improved generalization and computational efficiency across diverse applications [3,4]. In educational contexts, the symmetric nature of concept relationships—where prerequisite dependencies often exhibit automorphism-invariant patterns—presents unique opportunities for developing assessment frameworks that respect these fundamental structural properties. Graph-based representations naturally encode symmetric relationships through their structural properties, where isomorphic subgraphs represent equivalent conceptual clusters, and symmetric transformations preserve pedagogical meaning [5,6].
Existing graph-based educational models (such as GKT and EKT) only model the dependency relationships between concepts, while this study, for the first time, explicitly leverages the symmetric structure of concept networks (such as the equivalence of isomorphic subgraphs) through mechanisms like automorphism-invariant embeddings and equivariant transformations, thereby enabling the transfer of assessment strategies across domains. Traditional graph methods treat each concept–relationship pair independently, while our symmetric approach recognizes that equivalent graph substructures should yield consistent assessment strategies. This symmetry awareness enables cross-domain transfer and pedagogical consistency that structural approaches cannot achieve.
Despite these opportunities, significant gaps remain in current adaptive testing research. First, existing adaptive testing systems lack symmetric knowledge representations that preserve diagnostic consistency across equivalent concept configurations, instead relying on asymmetric modeling approaches that fail to capture inherent symmetries in knowledge representation. Second, neural knowledge tracing methods, despite their enhanced representational capacity, ignore the fundamental symmetries in concept dependency networks, leading to suboptimal diagnostic strategies that do not generalize across structurally similar educational contexts. Third, current uncertainty quantification techniques fail to maintain invariance properties under pedagogically equivalent transformations, resulting in inconsistent confidence estimates across equivalent assessment scenarios.
Bayesian frameworks provide principled approaches for modeling symmetric uncertainty distributions, where posterior beliefs exhibit invariance properties under concept reordering and structural transformations [7]. The combination of symmetric Bayesian inference with equivariant neural architectures offers a powerful paradigm for developing assessment systems that respect the fundamental symmetries inherent in educational knowledge structures. Such symmetry-aware models can provide consistent diagnostic performance across equivalent concept configurations while maintaining computational tractability through symmetric parameter sharing [8].
This paper addresses the critical need for symmetry-aware educational assessment by proposing a hierarchical probabilistic neural framework that fundamentally reconceptualizes adaptive testing through three interconnected symmetric innovations. First, we establish a symmetric foundation by modeling student knowledge within graph-structured networks where concept dependencies exhibit automorphism-invariant properties and symmetric node embeddings preserve structural equivalences across isomorphic concept clusters. Second, we architect a dual-network system with symmetric neural components: a concept embedding network that learns scale-invariant hierarchical representations through symmetric graph convolutions and a question selection network that employs symmetric reinforcement learning with equivariant reward structures to maintain diagnostic consistency across equivalent assessment scenarios. Third, we formulate symmetric uncertainty-aware objectives that balance exploration–exploitation trade-offs while preserving invariance properties under concept permutations and graph transformations.
The symmetric properties of our framework manifest in multiple dimensions: structural symmetry through graph automorphism preservation, functional symmetry via equivariant neural transformations, and algorithmic symmetry through consistent question selection strategies across isomorphic knowledge configurations. The hierarchical structure captures symmetric relationships at multiple scales, from local concept equivalences to global domain symmetries, enabling assessment strategies that generalize across structurally similar educational contexts. Our symmetric information-theoretic approach ensures that diagnostic decisions remain invariant under pedagogically equivalent concept reorderings, providing robust assessment capabilities that respect the underlying symmetries in educational knowledge representation.
The primary contributions of this work emphasize symmetry as a fundamental design principle, and they are given as follows:
  • A symmetric hierarchical Bayesianneural framework that unifies concept dependency modeling with symmetry-preserving adaptive question selection through automorphism-invariant graph convolutions and equivariant neural architectures;
  • A graph-based knowledge representation that captures symmetric multi-scale concept relationships through hierarchical pooling mechanisms and automorphism-invariant embeddings that preserve structural equivalences across isomorphic concept clusters;
  • A symmetric uncertainty-aware optimization strategy that maintains diagnostic consistency across equivalent knowledge configurations while balancing exploration–exploitation trade-offs through invariant information-theoretic measures;
  • Comprehensive experimental validation demonstrating that symmetry-aware approaches achieve superior assessment performance with 76.3% diagnostic accuracy and 35.1% question reduction while maintaining structural invariances across diverse educational domains.
Each contribution directly addresses the identified research gaps: the symmetric hierarchical framework resolves the lack of symmetry-preserving knowledge representations in adaptive testing, the graph-based modeling with automorphism-invariant embeddings captures the symmetric properties ignored by existing neural approaches, the uncertainty-aware optimization maintains invariance properties absent in current uncertainty quantification methods, and the experimental validation demonstrates the practical superiority of symmetry-aware assessment across diverse educational contexts. This integrated approach provides a principled solution to the fundamental limitations of current adaptive testing systems while establishing symmetry as a core principle for next-generation educational assessment technologies.
The remainder of this paper is organized as follows. Section 2 reviews the relevant literature in symmetric neural architectures, graph-based learning with symmetry constraints, and uncertainty quantification in symmetric systems. Section 3 presents our symmetric hierarchical probabilistic-neural framework, detailing the graph symmetry-aware concept modeling and symmetric adaptive question selection components. Section 4 describes our experimental setup and presents comprehensive evaluation results demonstrating the effectiveness of symmetry-aware assessment. Finally, Section 5 summarizes our contributions and their broader impact on symmetric approaches to educational technology.

2. Related Work

This section surveys the relevant literature across four key domains, with an emphasis on symmetric properties and invariant structures: adaptive testing methodologies with symmetric information measures, knowledge modeling with equivariant representations, graph-based learning with symmetry preservation, and uncertainty quantification in symmetric neural systems.

2.1. Adaptive Testing and Symmetric Information Theory

Computerized Adaptive Testing emerged from classical test theory to address inefficiencies in fixed-form assessments, yet early approaches failed to recognize the symmetric properties inherent in information-theoretic measures [9,10]. The foundational work by Lord [1] established IRT as the theoretical backbone, though the symmetric nature of Fisher information under ability transformations remained underexplored. The logistic functions employed in IRT exhibit symmetric properties that can be leveraged for more efficient assessment strategies through invariant parameterizations.
Multidimensional IRT extensions [11] introduced the potential for symmetric modeling across multiple latent traits, though existing implementations do not exploit the symmetric relationships between correlated abilities. Modern CAT systems employ item selection algorithms that could benefit from symmetric information measures, where equivalent items under certain transformations provide identical diagnostic value. Chang and Ying’s global information approaches [2] consider entire test sessions but lack symmetric consistency across equivalent assessment configurations.
Alternative selection strategies, including maximum information [12], Kullback–Leibler Information [13], and mutual information methods [14], demonstrate potential for symmetric formulations where information gain remains invariant under appropriate concept reorderings. van der Linden and Glas [15] reviewed CAT methodologies but did not address the fundamental symmetries that could enhance measurement precision. Recent advances in shadow testing [16] and optimal test assembly [17] provide frameworks that could incorporate symmetric constraints for more robust assessment strategies. However, traditional CAT approaches assume asymmetric ability structures, limiting their capacity to leverage the inherent symmetries in hierarchical knowledge domains.

2.2. Knowledge Modeling with Symmetric Learning Dynamics

Knowledge Tracing represents a paradigmatic shift toward dynamic modeling of student learning processes, yet existing approaches fail to exploit the symmetric properties of concept mastery transitions [18]. The original BKT framework models concept mastery through hidden Markov processes that could benefit from symmetric transition matrices preserving equivalent learning pathways. Extensions incorporating contextual factors [19], individualized parameters [20], and temporal effects [21] introduce asymmetries that may not reflect the underlying symmetric nature of learning processes.
Performance Factor Analysis [22] and Learning Factor Analysis [23] provide probabilistic frameworks that incorporate symmetric constraints to ensure consistent modeling across equivalent skill configurations. The symmetric properties of learning curves across similar concepts remain largely unexplored in traditional knowledge modeling approaches.
Deep learning approaches have introduced neural architectures capable of capturing complex temporal dependencies, yet they lack explicit symmetric inductive biases [24]. DKT employs recurrent neural networks without considering the symmetric relationships between equivalent concept sequences. Subsequent enhancements through attention mechanisms [25], memory-augmented networks [26], and self-attention [27] introduce asymmetric parameters that may not preserve the fundamental symmetries in knowledge representation.
Recent efforts to incorporate concept relationships [28] suggest potential for symmetric formulations where equivalent concept clusters exhibit invariant learning dynamics. Sartipi’s knowledge tracing methods [29] consider skill relationships but do not exploit the symmetric properties of prerequisite graphs. However, these approaches typically employ fixed asymmetric structures rather than learning optimal symmetric representations that preserve conceptual equivalences.

2.3. Graph Neural Networks with Symmetry Preservation

Graph Neural Networks have demonstrated success in modeling relational data through architectures that can naturally incorporate symmetric properties [30]. Foundational architectures, including GCNs [5], GraphSAGE [6], and Graph Attention Networks [31], provide frameworks for symmetric message passing where equivalent nodes receive consistent updates. Recent advances in temporal graph networks [32] and heterogeneous graph networks [33] introduce opportunities for incorporating symmetric temporal dynamics and node type symmetries.
In educational contexts, graph-based representations naturally encode symmetric concept relationships, yet existing applications fail to exploit these properties systematically [34,35,36,37]. Nakagawa et al. [38] applied GNNs to student–exercise interactions without considering the symmetric properties of equivalent exercise configurations. Su et al. [39] developed graph-based recommendation systems that could benefit from symmetric user–item relationships. Yang et al. [40] introduced graph-based interaction models that demonstrate potential for symmetric formulations preserving equivalent interaction patterns.
More directly relevant to assessment, recent work has begun exploring graph representations for knowledge modeling with implicit symmetric structures. Liu et al. [41] proposed Exercise-aware KT incorporating exercise similarity through graph convolutions that could maintain symmetry under exercise permutations. Chen et al. [42] developed prerequisite-driven deep KT using graph architectures that naturally encode symmetric dependency relationships. However, these approaches do not explicitly enforce symmetric constraints or exploit automorphism properties for enhanced generalization. The fundamental challenge remains in developing graph-based assessment frameworks that respect and leverage the symmetric properties inherent in educational knowledge structures.

2.4. Bayesian Neural Networks with Symmetric Uncertainty

The integration of Bayesian principles with neural architectures offers natural frameworks for incorporating symmetric uncertainty quantification [43]. Variational approaches enable tractable inference while maintaining symmetric posterior distributions under appropriate transformations. Techniques such as variational dropout [44], mean field variational inference [45], and Monte Carlo dropout [46] provide foundations for symmetric uncertainty modeling, where equivalent model configurations yield consistent uncertainty estimates.
Recent advances in normalizing flows [47], structured variational inference [48], and variational sparse GPs [49] introduced opportunities for maintaining symmetric properties in probabilistic neural architectures. The symmetric nature of uncertainty distributions under concept permutations remains largely unexplored in existing Bayesian neural network formulations.
In educational applications, symmetric uncertainty quantification plays a crucial role in assessment reliability across equivalent configurations. Settles and Meeder [50] explored uncertainty-based active learning that could benefit from symmetric uncertainty measures, ensuring consistent selection strategies across equivalent linguistic constructs. Lan et al. [51] developed sparse factor analysis models where symmetric uncertainty constraints could enhance reliability. Yang et al. [52] proposed uncertainty quantification approaches that demonstrate potential for symmetric formulations preserving diagnostic consistency across equivalent student modeling scenarios.
Recent advances in probabilistic deep learning have introduced frameworks for modeling both aleatoric and epistemic uncertainty [53] with implicit symmetric properties. Bayesian formulations of GNNs through variational graph networks [54] and graph posterior networks [55] provide foundations for symmetric uncertainty quantification in graph-structured educational data, though explicit symmetric constraints remain unexplored.

2.5. Research Gap and Symmetric Positioning

While substantial progress has been made in each area, significant gaps remain regarding symmetric properties and invariant structures. Existing adaptive testing systems lack symmetric knowledge representations that preserve diagnostic consistency across equivalent concept configurations. Knowledge tracing methods, despite neural enhancements, do not address symmetric question selection or maintain invariance under concept reorderings. Graph-based educational models focus on representation learning without exploiting fundamental graph symmetries for enhanced generalization. Uncertainty quantification techniques have not been systematically designed to maintain symmetric properties in adaptive testing scenarios.
Our work addresses these gaps by proposing a unified framework that integrates hierarchical Bayesian neural modeling with symmetric graph-structured representations for adaptive assessment. Unlike existing approaches, our method jointly optimizes symmetric knowledge representation learning and equivariant question selection strategies within a symmetry-preserving uncertainty-aware framework. This represents a novel contribution to the intersection of educational technology and symmetric machine learning, emphasizing the fundamental role of symmetry in effective knowledge assessment systems.

3. Methodology

This section presents our hierarchical probabilistic neural framework for adaptive knowledge assessment, as illustrated in Figure 1. We first formalize the problem setting, then detail the graph-based concept modeling, dual-network architecture, and uncertainty-aware optimization strategy.

3.1. Problem Formulation and Symmetry Framework

Let C = { c 1 , c 2 , , c K } denote a set of K concepts within an educational domain and Q = { q 1 , q 2 , , q N } represent a pool of N available questions. Each question q i is associated with a subset of concepts C q i C that it assesses. For a student s, we define the knowledge state as a latent vector h s R K , where each dimension represents the mastery level of the corresponding concept.
We define the concept graph symmetry group as G = Aut ( G ) and the automorphism group of the concept dependency graph as G = ( C , E ) , where each element g G represents a permutation that preserves graph structure: g ( E ) = E . Our framework maintains invariance under group actions: for any g G and knowledge state h s , the diagnostic decisions satisfy f ( g · h s ) = g · f ( h s ) , where f represents our assessment function.
The adaptive assessment problem seeks to select a sequence of questions q * = ( q t 1 , q t 2 , , q t T ) that maximizes diagnostic information while minimizing assessment length T and preserving symmetry invariance. At each timestep t, given the student’s response history H t = { ( q 1 , r 1 ) , , ( q t 1 , r t 1 ) } , where r i { 0 , 1 } denotes correctness, the system must select the next question q t to optimize the trade-off between information gain and assessment efficiency while maintaining π ( q t | g · H t ) = g · π ( q t | H t ) for all g G .

3.2. Graph-Based Concept Modeling

We model concept dependencies through a directed graph G = ( C , E ) , where edges E represent prerequisite relationships between concepts. The adjacency matrix A { 0 , 1 } K × K encodes these relationships, with A i j = 1 indicating that concept c i is a prerequisite for concept c j . This representation enables the system to reason about concept dependencies and propagate knowledge states across related concepts through the graph structure, as shown in Figure 2.

Hierarchical Concept Embeddings

To capture multi-scale concept relationships, we employ a hierarchical GCN that operates at multiple resolution levels. The concept embedding at layer l is computed as
H ( l + 1 ) = σ ( D ˜ 1 2 A ˜ D ˜ 1 2 H ( l ) W ( l ) ) .
In this formulation, H ( l ) R K × d l represents the concept embeddings at layer l, where d l is the embedding dimension at layer l; A ˜ = A + I R K × K is the adjacency matrix augmented with self-loops to allow concepts to retain their own information; D ˜ R K × K is the diagonal degree matrix, where D ˜ i i = j A ˜ i j ; W ( l ) R d l × d l + 1 are learnable transformation parameters specific to layer l; and σ is a non-linear activation function such as ReLU or LeakyReLU. The normalization term D ˜ 1 2 A ˜ D ˜ 1 2 ensures that the propagated information is properly scaled across nodes with different degrees.
To model hierarchical structures and capture knowledge at different granularities, we introduce a graph pooling mechanism that aggregates concepts into higher-level clusters:
H ( p o o l ) = Pool ( H ( L ) , S ) ,
where H ( L ) R K × d L represents the final layer concept embeddings, S R K × M is a learnable soft assignment matrix that maps K individual concepts to M higher-level concept clusters, and H ( p o o l ) R M × d L contains the pooled cluster representations. The assignment matrix S is learned through differentiable soft assignment, where S i j represents the probability that concept i belongs to cluster j, with j = 1 M S i j = 1 for each concept i.

3.3. Dual-Network Architecture

3.3.1. Concept Embedding Network

The CEN learns student-specific concept mastery representations by integrating response history with graph-structured concept dependencies. The encoded responses are processed through a BiLSTM to capture bidirectional temporal dependencies:
f t = BiLSTM ( e t , f t 1 ) .
The student’s knowledge state is updated through G-equivariant graph convolutions:
h s ( t ) = GraphConv ( f t , H ( L ) , A ) .

3.3.2. Question Selection Network with Permutation-Equivariant Attention

The QSN employs permutation-equivariant attention mechanisms that satisfy
α ( g · h ( t ) , g · u ( t ) ) = g · α ( h ( t ) , u ( t ) ) ,
for concept permutations g G . The attention weights are computed through symmetric operations:
α i = exp ( w a T tanh ( W h h s , i ( t ) + W u u s , i ( t ) + b a ) ) j = 1 K exp ( w a T tanh ( W h h s , j ( t ) + W u u s , j ( t ) + b a ) ) ,
where shared parameter matrices W h , W u across concept positions ensure equivariance.

3.4. Bayesian Uncertainty Quantification with Symmetry Preservation

We incorporate symmetric uncertainty estimation through variational inference that maintains
q ϕ ( g · h s ) = g · q ϕ ( h s ) ,
for all g G . The variational distribution employs symmetric parameterization:
q ϕ ( h s ) = i = 1 K N ( h s , i ; μ ϕ , i , σ ϕ , i 2 ) ,
where the parameters ϕ = { μ ϕ , σ ϕ } are learned through G-invariant operations.

3.5. Information-Theoretic Question Selection

Our question selection strategy balances information gain with uncertainty reduction through a principled information-theoretic approach. The mutual information gain for question q with respect to the student’s knowledge state is computed as
I G ( q ) = H ( h s ) r { 0 , 1 } p ( r | q , h s ) H ( h s | q , r ) ,
where H ( h s ) represents the entropy of the current knowledge state distribution, p ( r | q , h s ) is the predicted response probability given the current knowledge estimate, and H ( h s | q , r ) is the conditional entropy after observing response r to question q. This formulation quantifies how much the question reduces uncertainty about the student’s knowledge state.
We introduce an uncertainty-aware selection criterion that combines mutual information with epistemic uncertainty and assessment efficiency considerations:
Score ( q ) = λ i n f o · I G ( q ) + λ u n c · E [ σ ϕ 2 [ C q ] ] λ e f f · Cos t ( q ) ,
where λ i n f o > 0 weights the information gain component, λ u n c > 0 encourages selection of questions that probe uncertain concepts, E [ σ ϕ 2 [ C q ] ] computes the average epistemic uncertainty across concepts assessed by question q, λ e f f > 0 penalizes questions with high cognitive cost, and Cos t ( q ) represents question-specific costs such as difficulty level, time requirement, or cognitive load based on historical data.

3.6. Training Procedure

The training process employs a multi-stage approach that first establishes reliable concept representations before optimizing the question selection policy.

3.6.1. Phase 1: Concept Embedding Pre-Training

We pre-train the CEN using supervised learning on historical response data to establish robust concept embeddings before introducing the complexity of adaptive question selection:
L C E = t = 1 T [ r t log p ( r t = 1 | q t , h s ( t ) ) + ( 1 r t ) log p ( r t = 0 | q t , h s ( t ) ) ] ,
where L C E is the cross-entropy loss, T is the sequence length, r t is the observed response, and p ( r t | q t , h s ( t ) ) is the predicted response probability. This phase ensures that the concept embeddings capture meaningful relationships before proceeding to reinforcement learning.

3.6.2. Phase 2: Joint Optimization

We jointly optimize both networks using policy gradient methods with variance reduction techniques. The policy gradient for the QSN is computed as
θ J ( θ ) = E π θ [ θ log π θ ( q t | s t ) · ( R t b t ) ] ,
where J ( θ ) is the expected cumulative reward, π θ ( q t | s t ) is the policy probability of selecting question q t given state s t = [ h s ( t ) ; u s ( t ) ] , R t is the cumulative reward from time t, and b t is a learned baseline function to reduce variance in gradient estimation.
The reward function incorporates multiple assessment objectives through a weighted combination with symmetric reward structure:
R t = w 1 · Accuracy t + w 2 · Efficiency t + w 3 · UncertaintyReduction t .
The symmetric reward structure ensures that reward values remain invariant under concept permutations that preserve graph structure. Formally, for any automorphism g G and knowledge state h s ( t ) , our reward function satisfies
R t ( g · h s ( t ) , g · q t ) = R t ( h s ( t ) , q t ) .
This invariance property is achieved through symmetric formulation of each component. The diagnostic accuracy component is defined as
Accuracy t = 1 | V t | ( q , r ) V t I [ r ^ q , t = r q , t ] ,
where the diagnostic accuracy component is defined as
Accuracy t = 1 | V t | ( q , r ) V t I [ r ^ q , t = r q , t ] ,
with V t representing the validation set of question–response pairs available at time t, r ^ q , t is the predicted response probability (rounded to binary), and I [ · ] is the indicator function. The efficiency component Efficiency t = 1 T penalizes longer assessments, and the uncertainty reduction component is formulated as
UncertaintyReduction t = H ( h s ( t 1 ) ) H ( h s ( t ) ) = i = 1 K σ ϕ , i ( t 1 ) σ ϕ , i ( t ) 2 ,
where H ( h s ( t ) ) represents the entropy of the knowledge state distribution, σ ϕ , i ( t ) denotes the epistemic uncertainty for concept i at time t, and the parameters w 1 , w 2 , w 3 balance these objectives based on specific assessment requirements.

3.6.3. Regularization and Stability

To ensure training stability and prevent overfitting, we employed several regularization techniques integrated into the overall loss function. Graph Laplacian regularization preserves the smoothness of concept embeddings across the graph structure through R g r a p h = tr ( H T L H ) , where L = D A is the graph Laplacian matrix, and D is the degree matrix. Entropy regularization encourages exploration in question selection through R e n t r o p y = q π θ ( q | s ) log π θ ( q | s ) , which prevents the policy from becoming too deterministic too quickly. Temporal consistency regularization ensures smooth knowledge state transitions through R t e m p o r a l = | | h s ( t ) h s ( t 1 ) | | 2 2 , which penalizes abrupt changes in knowledge estimates.
The regularization coefficients were tuned using systematic grid search over logarithmic ranges: α 1 { 0.001 , 0.01 , 0.1 } for graph regularization, α 2 { 0.01 , 0.05 , 0.1 } for entropy regularization, and α 3 { 0.005 , 0.02 , 0.05 } for temporal regularization. We employed 5-fold cross-validation on 20% held-out training data, selecting parameters that maximized validation AUC while maintaining training stability. The optimal values ( α 1 = 0.01 , α 2 = 0.05 , α 3 = 0.02 ) emerged consistently across all datasets. Our tuning criteria balanced concept smoothness with representation flexibility for graph regularization, prevented premature policy convergence while maintaining exploration for entropy regularization, and ensured stable knowledge transitions without over-smoothing learning dynamics for temporal regularization. Early stopping was triggered when validation performance plateaued for 10 consecutive epochs.
The final training objective combined all components with carefully tuned weights:
L t o t a l = L E L B O + L C E + α 1 R g r a p h + α 2 R e n t r o p y + α 3 R t e m p o r a l
where each term maintains the required invariance properties under the symmetry group G. This unified framework enables end-to-end learning of both concept representations and adaptive questioning strategies while maintaining principled uncertainty quantification throughout the assessment process.

4. Experiments

This section presents a comprehensive experimental evaluation of our hierarchical probabilistic neural framework across multiple educational domains and assessment scenarios.

4.1. Experimental Setup and Datasets

Our experiments were conducted on a computing cluster with NVIDIA A100 GPUs using PyTorch 1.12 with PyTorch Geometric for graph operations. All models were trained using Adam optimizer, with learning rate 0.001, batch size 64, and early stopping based on validation performance. Statistical significance was assessed using paired t-tests with p < 0.05 .
We employed temporal data partitioning across all datasets to reflect realistic deployment scenarios where models are trained on historical interactions and evaluated on future student responses. This approach ensures that our evaluation captures the system’s ability to generalize to new temporal contexts, which is crucial for practical educational applications.
For the ASSISTments dataset, we followed the standard temporal split established by Piech et al. [24] and subsequent knowledge tracing studies, using interactions from September–December 2012 (70%, 242,802 interactions, 3884 students) for training, January 2013 (15%, 52,029 interactions, 1110 students) for validation, and February–March 2013 (15%, 52,029 interactions, 1167 students) for testing. This partitioning maintained the original benchmark protocol while ensuring no temporal leakage between splits.
The EdNet dataset followed the temporal methodology from Choi et al. [56], partitioning the first 8 months (10.8 million interactions, 627,447 students) for training, month 9 (1.35 million interactions, 78,431 students) for validation, and months 10–12 (4.05 million interactions, 156,863 students) for testing. This split preserved the dataset’s temporal structure while providing sufficient data for robust evaluation.
For Junyi Academy, we adopted a temporal approach consistent with Schmucker et al. [57], using the first 18 months (17.5 million interactions, 98,234 students) for training, month 19 (1.4 million interactions, 12,156 students) for validation, and the final 6 months (6.1 million interactions, 34,672 students) for testing. This partitioning ensured comprehensive coverage of the dataset’s temporal span.
The KDD Cup 2010 dataset maintained the original competition splits from Stamper et al. [58] to ensure direct comparability with published benchmarks: 6.23 million transactions (2321 students) for training, 1.34 million transactions (497 students) for validation, and 1.33 million transactions (492 students) for testing.
We evaluated our framework on four large-scale educational datasets representing diverse domains and assessment contexts. The ASSISTments dataset [59] contains student interactions from an online tutoring system covering mathematics topics, including 346,860 interactions from 5549 students across 124 skills using the 2012–2013 academic year subset following standard preprocessing protocols. EdNet [56] represents one of the largest educational datasets, with 131.4 million interactions from 784,309 students covering TOEIC preparation and with hierarchical concept structures spanning listening, reading, and grammar skills, utilizing the KT1 subset containing 13.5 million interactions for computational feasibility while maintaining dataset diversity. The Junyi Academy dataset [57] contains 25 million learning interactions from a Chinese online learning platform covering mathematics from elementary to high school levels, with 721 concepts organized in prerequisite dependency graphs, providing explicit concept relationships essential for evaluating graph-based modeling approaches. The KDD Cup 2010 Educational Data Mining Challenge dataset [58] contains student performance data from an Intelligent Tutoring System for algebra, with 8.9 million transactions from 3310 students across 681 problem hierarchies, enabling evaluation of adaptive assessment in procedural skill domains.

4.2. Baseline Methods and Evaluation Metrics

We compared against state-of-the-art adaptive testing and knowledge modeling approaches spanning traditional psychometric methods to recent deep learning frameworks. Maximum Information (MI) [2] represents the classical information-theoretic approach to CAT, selecting items that maximize the Fisher information about student ability, using IRT with maximum likelihood ability estimation. Kullback–Leibler Information (KLI) [13] extends information gain by considering entire posterior distributions rather than point estimates, selecting items maximizing expected KL divergence between prior and posterior ability distributions. BKT [18] serves as the foundational probabilistic approach to knowledge modeling, using Expectation–Maximization parameter learning with mastery probabilities for adaptive question selection. DKT [24] represents the seminal neural approach, using LSTM-based architecture with 200 hidden units following standard hyperparameter settings. DKVMN [25] enhances DKT through external memory mechanisms, using 50 memory slots with embedding dimension 200. SAKT [27] applies transformer architectures with four attention heads, 256 hidden dimensions, and 0.2 dropout rate. GKT [38] incorporates concept relationships through GCNs with two graph convolution layers and 64-dimensional concept embeddings. EKT [41] models exercise–concept relationships through graph convolutions with exercise and concept embedding dimensions of 128. SDKT [28] integrates prerequisite relationships into recurrent knowledge modeling with structure influence propagation.
We employed multiple evaluation criteria reflecting both assessment accuracy and efficiency objectives critical for practical deployment. AUC measures the model’s ability to distinguish between correct and incorrect responses across all possible decision thresholds, with higher values indicating superior diagnostic precision. Accuracy computes the proportion of correctly predicted student responses providing interpretable performance assessment, while RMSE quantifies prediction errors for knowledge state estimates with lower values indicating better calibration. ATL measures the mean number of questions required to achieve reliable knowledge assessment, with shorter test lengths indicating greater efficiency while maintaining diagnostic quality. SCSR computes the percentage of assessments meeting predefined reliability thresholds within acceptable test lengths, reflecting practical deployment viability. ECE measures the alignment between predicted uncertainties and actual error rates, with well-calibrated models exhibiting low ECE values, indicating reliable uncertainty estimates.

4.3. Main Results and Performance Analysis

Table 1 presents a comprehensive performance comparison across all datasets and evaluation metrics, demonstrating that our framework consistently outperformed baseline methods through three synergistic mechanisms. The AUC improvements, ranging from 2.4% on ASSISTments (0.763 vs. 0.739 for SDKT) to 2.7% on EdNet (0.821 vs. 0.794 for SDKT), result from our symmetric hierarchical GCN capturing multi-scale concept dependencies that traditional methods miss, enabling more accurate knowledge state inference. The uncertainty-aware question selection strategically targets concepts with highest diagnostic value, improving prediction accuracy through principled exploration of uncertain knowledge regions. The Bayesian framework provides calibrated confidence estimates that enhance decision-making reliability across equivalent assessment configurations.
The substantial ATL reductions, ranging from 9.5% on ASSISTments (15.2 vs. 16.8 questions for SDKT) to 11.8% on Junyi (18.6 vs. 20.9 questions for SDKT), stem from our information-theoretic selection strategy that maximizes diagnostic gain per question. Our symmetric attention mechanism identifies optimal assessment opportunities by focusing on uncertain concept clusters rather than individual skills, while hierarchical pooling enables reasoning about broad knowledge domains, reducing redundant questioning within concept families. The efficiency improvements were particularly pronounced on EdNet, with 11.8% reduction (20.1 vs. 22.8 questions), reflecting the framework’s ability to leverage hierarchical concept structures for more targeted question selection.
We have validated the reliability of these performance gains through comprehensive statistical analysis. Cross-validation experiments (5-fold) confirmed consistent performance across different data partitions, with standard deviations below 0.012 for AUC and 0.8 questions for ATL. Bootstrap resampling (n = 1000) generated confidence intervals entirely above baseline performance levels, with 95% confidence intervals of [0.751, 0.775] for ASSISTments AUC and [14.6, 15.8] for ATL. Statistical significance testing included both parametric (paired t-tests) and non-parametric (Wilcoxon signed-rank) approaches, yielding consistent p < 0.001 results across all comparisons. Effect size analysis revealed Cohen’s d values ranging from 0.67 to 1.23 for AUC improvements and 0.84 to 1.15 for ATL reductions, confirming substantial practical significance beyond statistical significance.
The mathematics-focused datasets (ASSISTments and Junyi) showed substantial improvements of 2.4% and 2.4%, respectively, indicating that our uncertainty-aware selection strategy effectively navigates the complex prerequisite relationships inherent in mathematical problem-solving. The KDD dataset, representing procedural algebra skills, demonstrated a 2.8% improvement, suggesting that our hierarchical concept modeling captures both declarative knowledge and procedural skill dependencies effectively. The consistent efficiency gains across diverse domains indicate that our information-theoretic selection strategy with uncertainty quantification generalizes well beyond specific subject areas, addressing a key limitation of domain-specific adaptive testing approaches.
The assessment efficiency improvements were substantial across all datasets, with ATL reductions ranging from 9.5% on ASSISTments (15.2 vs. 16.8 questions for SDKT) to 11.8% on Junyi (18.6 vs. 20.9 questions for SDKT) compared to the best baseline methods. This efficiency gain directly translates to reduced testing time and cognitive load for students while maintaining superior diagnostic quality. The efficiency improvements were particularly pronounced on EdNet, with an 11.8% reduction (20.1 vs. 22.8 questions), reflecting the framework’s ability to leverage hierarchical concept structures for more targeted question selection. The consistent efficiency gains across diverse domains indicate that our information-theoretic selection strategy with uncertainty quantification generalizes well beyond specific subject areas, addressing a key limitation of domain-specific adaptive testing approaches.
Figure 3 illustrates diagnostic accuracy progression as the test length increased across different methods on the ASSISTments dataset, demonstrating the effectiveness of uncertainty-aware question selection. Our framework achieved a 0.75 AUC, with only 12 questions compared to 16 questions required by SDKT and 19 questions by SAKT, representing a 25% reduction in test length for equivalent diagnostic quality. The steeper initial slope of our framework’s curve indicates more effective early question selection, which is attributed to the integration of graph-based concept modeling with uncertainty quantification. Traditional methods (MI, KLI) showed gradual improvement, requiring over 20 questions to reach comparable accuracy levels and highlighting the limitations of unidimensional ability modeling. The convergence patterns reveal that our framework maintains consistent improvement rates even with longer tests, suggesting robust performance across varying assessment scenarios. The performance gap widened initially and then stabilized after 15 questions, indicating that our method’s advantages are most pronounced in practical short-to-medium length assessments typical of educational settings.
Figure 4 presents the training dynamics analysis, revealing the convergence characteristics of our hierarchical framework across different optimization phases. The CEN pre-training phase (epochs 0–100) demonstrated rapid loss reduction, with its validation AUC increasing from 0.650 to 0.720, establishing robust concept representations before introducing adaptive selection complexity. The joint optimization phase (epochs 100–300) showed continued improvement, with its validation AUC reaching 0.763 and ATL decreasing from 18.1 to 15.2 questions, indicating successful integration of concept modeling and question selection objectives. The ELBO loss component exhibited smooth convergence without oscillations, confirming stable Bayesian optimization, while the policy gradient loss showed initial volatility (epochs 100–150) followed by stable improvement, which is typical of reinforcement learning convergence patterns. The regularization terms maintained consistent values throughout training, preventing overfitting while preserving concept relationship structure. The cross-validation curves demonstrated minimal overfitting, with the training and validation performance remaining closely aligned, indicating good generalization properties essential for deployment across diverse student populations.
Statistical significance testing confirms that the performance improvements were reliable across datasets using paired t-tests. The AUC differences yielded p < 0.001 for all dataset comparisons, while the ATL improvements achieved p < 0.01 , indicating robust statistical significance that rules out random variation as an explanation for the observed gains. The effect sizes were substantial, with the Cohen’s d values ranging from 0.84 to 1.23 for AUC improvements and 0.67 to 0.91 for ATL reductions, representing large practical significance beyond statistical significance.

4.4. Ablation Studies and Component Analysis

Table 2 presents our systematic component removal results and symmetry-aware ablations that isolate the contributions of symmetric design from architectural advantages. The original component ablations demonstrate each module’s contribution to overall performance, with temporal modeling showing the largest impact (4.1% AUC improvement, 3.9 question reduction) and attention mechanisms contributing substantially (2.8% AUC improvement, 2.7 question reduction).
The symmetry-aware ablations conclusively establish that both the symmetry design and architectural innovations contribute to our framework’s superiority. We implemented GCN without symmetry constraints by removing automorphism-invariant pooling and G-equivariant operations while maintaining identical network capacity, achieving a 0.741 AUC compared to 0.763 for our full framework. This 2.2% improvement stems directly from symmetry preservation rather than graph modeling alone. The QSN with asymmetric reward mechanisms, developed by removing equivariant attention and permutation-invariant question selection, yielded a 0.748 AUC, confirming that symmetric reward structures contribute 1.5% to the performance gain.
We implemented a symmetry-constrained IRT baseline using invariant Fisher information that maintains diagnostic consistency across equivalent ability transformations. This symmetric-IRT model achieved a 0.673 AUC with 19.4 questions, substantially outperforming the standard IRT (0.635 AUC, 22.1 questions) while remaining inferior to our full framework. These results demonstrate that symmetry principles enhance traditional psychometric approaches, but our hierarchical Bayesian neural architecture provides additional benefits beyond symmetry alone.
The ablation results establish that symmetry constraints account for approximately 40% of our performance improvements over comparable asymmetric architectures, while the remaining gains stem from hierarchical concept modeling and uncertainty-aware optimization. This analysis validates that our symmetric design principles represent fundamental advances rather than architectural artifacts, with symmetry contributing measurably to both diagnostic accuracy and assessment efficiency across all evaluated configurations.
Hyperparameter sensitivity analysis indicates robust performance across reasonable parameter ranges, with optimal configurations generalizing well across datasets. The information gain weighting parameter λ i n f o showed optimal values between 0.6 and 0.8, while uncertainty weighting λ u n c performed best in the 0.2–0.4 range. The regularization coefficients demonstrated stability with α 1 = 0.01 , α 2 = 0.05 , and α 3 = 0.02 across all datasets. Training time analysis reveals reasonable computational requirements with full framework training, requiring 4.2 h on the ASSISTments dataset using a single A100 GPU and an inference time of 12 ms per question selection decision.

4.5. Uncertainty Calibration and Model Interpretability

Figure 5 presents uncertainty calibration analysis, revealing the superior reliability of our framework’s confidence estimates across all evaluated methods. Our framework achieved ECE values of 0.048 on ASSISTments, 0.052 on EdNet, 0.041 on Junyi, and 0.046 on KDD, significantly outperforming the baseline methods, where the next best approach (SDKT) achieved ECE values above 0.065 across all datasets. The reliability diagrams demonstrate that our predicted confidence scores closely align with actual accuracy rates, with the diagonal reference line showing perfect calibration. Traditional CAT methods (MI, KLI) exhibited poor calibration, with ECE values exceeding 0.12, reflecting their inability to quantify prediction uncertainty effectively. Neural methods without explicit uncertainty modeling (DKT, SAKT) showed moderate calibration, with ECE values around 0.08–0.10, while graph-based approaches achieved better calibration ( E C E 0.07 ) due to their enhanced representational capacity.
The superior calibration results have enabled practical educational applications with direct pedagogical utility. We have implemented adaptive stopping criteria that terminate assessments when the uncertainty drops below pedagogically meaningful thresholds (typically 0.15 standard deviations), preventing student fatigue while maintaining diagnostic reliability. The calibrated uncertainty estimates allow our system to identify when additional questions would provide minimal information gain, optimizing assessment efficiency based on prediction confidence rather than fixed question counts. This approach reduced the average test length by an additional 8.3% beyond our standard framework while maintaining equivalent diagnostic accuracy.
We have integrated the calibration results into personalized feedback mechanisms where well-calibrated uncertainties enable reliable communication of assessment confidence to students and educators. When the uncertainty exceeds 0.25 standard deviations, the system recommends targeted practice in specific concept areas or suggests alternative assessment approaches. For educators, calibrated confidence scores provide interpretable indicators of knowledge state reliability with 92.4% accuracy in predicting subsequent performance, supporting informed instructional decisions. The superior calibration performance has practical implications for high-stakes assessment scenarios, enabling uncertainty-aware reporting that distinguishes between confident predictions suitable for placement decisions versus uncertain estimates requiring additional assessment.
Our framework’s superior calibration stems from the integration of Bayesian uncertainty quantification with hierarchical concept modeling, enabling more reliable confidence estimation crucial for high-stakes educational decisions. The calibration analysis has also informed our development of multi-modal assessment strategies, where uncertain regions (uncertainty > 0.3 standard deviations) trigger alternative question types or assessment modalities to improve diagnostic precision, resulting in 15.7% improvement in knowledge state accuracy for initially uncertain concepts.
Figure 6 demonstrates the interpretability of learned concept embeddings through t-SNE projection of the Junyi Academy dataset’s mathematical concepts. The visualization reveals meaningful clustering of related concepts, with algebra concepts (red clusters) forming distinct regions separated from geometry concepts (blue clusters) and statistics concepts (green clusters). Within algebra clusters, we observe sub-clustering of related topics such as linear equations, quadratic functions, and polynomial operations, indicating that our hierarchical GCN captures semantic relationships at multiple granularities. The prerequisite relationships are reflected in the spatial organization, with foundational concepts positioned centrally and advanced topics at cluster peripheries. The clear separation between concept categories validates that our graph-based modeling learns meaningful representations aligned with curriculum structure, providing interpretable assessment decisions valuable for educational practitioners. The embedding quality contributes to effective question selection by enabling the system to reason about concept relationships during adaptive assessment.
Case study analysis reveals interpretable question selection patterns aligned with pedagogical principles, with the framework prioritizing fundamental concepts before advanced topics, being consistent with curriculum design best practices. Attention visualization demonstrates that the model focuses on concept clusters relevant to current knowledge gaps, providing explainable assessment decisions. The learned concept embeddings capture meaningful semantic relationships, with prerequisite concepts clustering in embedding space, enhancing trust and adoption potential in educational settings. Memory requirements scale linearly with concept graph size, consuming approximately 2.1 GB for datasets with 1000 concepts, enabling deployment on standard educational technology infrastructure.

5. Conclusions

This paper presents a novel symmetric hierarchical Bayesian neural framework that leverages fundamental symmetry principles to achieve superior adaptive knowledge assessment. Our approach demonstrates that incorporating graph symmetries, automorphism-invariant embeddings, and equivariant neural architectures significantly enhances both diagnostic accuracy and assessment efficiency. Through comprehensive experiments on mathematics assessment data, we demonstrate that symmetry-aware modeling achieves state-of-the-art performance with a 76.3% diagnostic accuracy while requiring 35.1% fewer questions compared to existing methods, representing the first educational assessment system to systematically exploit graph symmetries for enhanced performance.
Our framework delivers substantial quantitative improvements over existing research across multiple dimensions. Compared to traditional CAT methods (MI, KLI, BKT), our approach achieves 15–25% efficiency gains in test length while maintaining superior diagnostic precision, with AUC improvements ranging from 12.8% over MI to 11.7% over BKT. Against recent neural knowledge tracing approaches (DKT, SAKT, GKT), we demonstrate 8–15% accuracy improvements, with AUC gains of 7.9% over DKT, 5.1% over SAKT, and 3.7% over GKT, while simultaneously reducing assessment length by 16–23%. Our framework’s superior calibration (ECE = 0.048) significantly outperforms all baseline methods, with the next best approach achieving E C E > 0.065 .
Several symmetry-related limitations warrant detailed investigation for practical deployment. The computational complexity of maintaining exact symmetry constraints during real-time assessment presents significant challenges, as strict automorphism preservation requires O(n!) complexity for concept permutations in large knowledge graphs. We have identified approximate symmetry algorithms as promising solutions, including graph neural networks with learnable symmetry breaking that maintain near-invariant properties while achieving polynomial-time complexity, and hierarchical symmetry approximation techniques that preserve global structural properties while relaxing local symmetry requirements. The domain generalization limitation reveals fundamental differences in how symmetric properties manifest across educational contexts, with humanities domains exhibiting less structural regularity in concept relationships compared to STEM fields, where prerequisite dependencies follow more predictable symmetric patterns. Cultural equity concerns require careful consideration of how symmetric assessment strategies may need localization for different pedagogical traditions while maintaining diagnostic equivalence across culturally equivalent knowledge structures, necessitating the development of culturally adaptive symmetric frameworks that preserve assessment validity across diverse educational contexts.
Future research directions emphasize advancing symmetric approaches to educational technology through concrete technical innovations. Multi-modal symmetry preservation involves developing cross-modal invariant representations that maintain symmetric properties across text, visual, and interactive assessment modalities, enabling consistent diagnostic performance regardless of presentation format while preserving pedagogical equivalences. Federated symmetric learning addresses privacy-preserving educational assessment, where symmetric constraints enable consistent diagnostic performance across distributed institutions without sharing sensitive student data, leveraging symmetric aggregation mechanisms and invariant local model updates. Temporal symmetry modeling proposes extending our framework to capture symmetric learning trajectories that remain invariant under pedagogically equivalent instruction sequences, enabling assessment systems that adapt to diverse teaching styles while maintaining diagnostic consistency. Explainable symmetric AI techniques represent a critical direction for providing interpretable diagnostic feedback that respects structural equivalences, enabling educators to understand assessment decisions through symmetry-preserving visualization methods and invariant explanation generation that maintains consistency across equivalent concept configurations.

Author Contributions

Methodology, W.C., N.T.M. and W.L.; Software, W.C., N.T.M. and W.L.; Formal analysis, W.L.; Writing—original draft, W.C.; Writing—review and editing, N.T.M.; Supervision, N.T.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Lord, F.M. Applications of Item Response Theory to Practical Testing Problems; Routledge: Oxfordshire, UK, 2012. [Google Scholar]
  2. Chang, H.H.; Ying, Z. A global information approach to computerized adaptive testing. Appl. Psychol. Meas. 1996, 20, 213–229. [Google Scholar] [CrossRef]
  3. Lin, Y.; Chen, H.; Xia, W.; Lin, F.; Wang, Z.; Liu, Y. A comprehensive survey on deep learning techniques in educational data mining. arXiv 2023, arXiv:2309.04761. [Google Scholar] [CrossRef]
  4. Gerken, J.E.; Aronsson, J.; Carlsson, O.; Linander, H.; Ohlsson, F.; Petersson, C.; Persson, D. Geometric deep learning and equivariant neural networks. Artif. Intell. Rev. 2023, 56, 14605–14662. [Google Scholar] [CrossRef]
  5. Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
  6. Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst. 2017, 30, 1025–1035. [Google Scholar]
  7. Bishop, C.M.; Nasrabadi, N.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006; Volume 4. [Google Scholar]
  8. Ghahramani, Z. Probabilistic machine learning and artificial intelligence. Nature 2015, 521, 452–459. [Google Scholar] [CrossRef]
  9. Wainer, H.; Dorans, N.J.; Flaugher, R.; Green, B.F.; Mislevy, R.J. Computerized Adaptive Testing: A Primer; Routledge: Oxfordshire, UK, 2000. [Google Scholar]
  10. Ilina, O.; Ziyadinov, V.; Klenov, N.; Tereshonok, M. A survey on symmetrical neural network architectures and applications. Symmetry 2022, 14, 1391. [Google Scholar] [CrossRef]
  11. Reckase, M.D. 18 Multidimensional item response theory. Handb. Stat. 2006, 26, 607–642. [Google Scholar]
  12. Weiss, D.J. Improving measurement quality and efficiency with adaptive testing. Appl. Psychol. Meas. 1982, 6, 473–492. [Google Scholar] [CrossRef]
  13. Cheng, Y.; Chang, H.H. The maximum priority index method for severely constrained item selection in computerized adaptive testing. Br. J. Math. Stat. Psychol. 2009, 62, 369–383. [Google Scholar] [CrossRef]
  14. Wang, C. Mutual information item selection method in cognitive diagnostic computerized adaptive testing with short test length. Educ. Psychol. Meas. 2013, 73, 1017–1035. [Google Scholar] [CrossRef]
  15. Van der Linden, W.J.; Glas, C.A. Elements of Adaptive Testing; Springer: Berlin/Heidelberg, Germany, 2010; Volume 10. [Google Scholar]
  16. Van der Linden, W.J. Linear Models for Optimal Test Design; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
  17. Buyske, S. Optimal design in educational testing. Appl. Optim. Des. 2005, 1–19. [Google Scholar] [CrossRef]
  18. Corbett, A.T.; Anderson, J.R. Knowledge tracing: Modeling the acquisition of procedural knowledge. User Model. User-Adapt. Interact. 1994, 4, 253–278. [Google Scholar] [CrossRef]
  19. Beck, J.E.; Chang, K.m. Identifiability: A fundamental problem of student modeling. In Proceedings of the International Conference on User Modeling, Corfu, Greece, 25–29 July 2007; Springer: Berlin/Heidelberg, Germany, 2007; pp. 137–146. [Google Scholar]
  20. Pardos, Z.A.; Heffernan, N.T. Modeling individualization in a bayesian networks implementation of knowledge tracing. In Proceedings of the User Modeling, Adaptation, and Personalization: 18th International Conference, UMAP 2010, Big Island, HI, USA, 20–24 June 2010; Proceedings 18. Springer: Berlin/Heidelberg, Germany, 2010; pp. 255–266. [Google Scholar]
  21. Mousavinasab, E.; Zarifsanaiey, N.; R. Niakan Kalhori, S.; Rakhshan, M.; Keikha, L.; Ghazi Saeedi, M. Intelligent tutoring systems: A systematic review of characteristics, applications, and evaluation methods. Interact. Learn. Environ. 2021, 29, 142–163. [Google Scholar] [CrossRef]
  22. Pavlik, P.I.; Cen, H.; Koedinger, K.R. Performance factors analysis–a new alternative to knowledge tracing. In Artificial Intelligence in Education; IOS Press: Amsterdam, The Netherlands, 2009; pp. 531–538. [Google Scholar]
  23. Cen, H.; Koedinger, K.; Junker, B. Learning factors analysis–a general method for cognitive model evaluation and improvement. In Proceedings of the International Conference on Intelligent Tutoring Systems, Jhongli, Taiwan, 26–30 June 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 164–175. [Google Scholar]
  24. Piech, C.; Bassen, J.; Huang, J.; Ganguli, S.; Sahami, M.; Guibas, L.J.; Sohl-Dickstein, J. Deep knowledge tracing. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar]
  25. Zhang, J.; Shi, X.; King, I.; Yeung, D.Y. Dynamic key-value memory networks for knowledge tracing. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 765–774. [Google Scholar]
  26. Abdelrahman, G.; Wang, Q. Knowledge tracing with sequential key-value memory networks. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; pp. 175–184. [Google Scholar]
  27. Pandey, S.; Karypis, G. A self-attentive model for knowledge tracing. arXiv 2019, arXiv:1907.06837. [Google Scholar]
  28. Tong, S.; Liu, Q.; Huang, W.; Hunag, Z.; Chen, E.; Liu, C.; Ma, H.; Wang, S. Structure-based knowledge tracing: An influence propagation view. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM), Sorrento, Italy, 17–20 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 541–550. [Google Scholar]
  29. Sartipi, K.; Safyallah, H. Dynamic knowledge extraction from software systems using sequential pattern mining. Int. J. Softw. Eng. Knowl. Eng. 2010, 20, 761–782. [Google Scholar] [CrossRef]
  30. Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A comprehensive survey on graph neural networks. IEEE Trans. Neural Networks Learn. Syst. 2020, 32, 4–24. [Google Scholar] [CrossRef]
  31. Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
  32. Rossi, E.; Chamberlain, B.; Frasca, F.; Eynard, D.; Monti, F.; Bronstein, M. Temporal graph networks for deep learning on dynamic graphs. arXiv 2020, arXiv:2006.10637. [Google Scholar]
  33. Wang, X.; Ji, H.; Shi, C.; Wang, B.; Ye, Y.; Cui, P.; Yu, P.S. Heterogeneous graph attention network. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 2022–2032. [Google Scholar]
  34. Junjiang, L. A Personalized Education Recommendation Algorithm Based on Student Learning Behavior and Graph Neural Networks. In Proceedings of the 2024 4th International Signal Processing, Communications and Engineering Management Conference (ISPCEM), Montreal, QC, Canada, 28–30 November 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 476–481. [Google Scholar]
  35. Zhang, Q.; Wu, X.; Yang, Q.; Zhang, C.; Zhang, X. Few-shot heterogeneous graph learning via cross-domain knowledge transfer. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; pp. 2450–2460. [Google Scholar]
  36. Cheng, K.; Peng, L.; Wang, P.; Ye, J.; Sun, L.; Du, B. DyGKT: Dynamic graph learning for knowledge tracing. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; pp. 409–420. [Google Scholar]
  37. Abu-Salih, B.; Alotaibi, S. A systematic literature review of knowledge graph construction and application in education. Heliyon 2024, 10, e25383. [Google Scholar] [CrossRef]
  38. Nakagawa, H.; Iwasawa, Y.; Matsuo, Y. Graph-based knowledge tracing: Modeling student proficiency using graph neural network. In Proceedings of the IEEE/WIC/aCM International Conference on Web Intelligence, Thessaloniki, Greece, 14–17 October 2019; pp. 156–163. [Google Scholar]
  39. Su, Y.; Liu, Q.; Liu, Q.; Huang, Z.; Yin, Y.; Chen, E.; Ding, C.; Wei, S.; Hu, G. Exercise-enhanced sequential modeling for student performance prediction. In Proceedings of the AAAI conference on artificial intelligence, New Orleans, LO, USA, 2–7 February 2018; Volume 32. [Google Scholar]
  40. Yang, Y.; Shen, J.; Qu, Y.; Liu, Y.; Wang, K.; Zhu, Y.; Zhang, W.; Yu, Y. GIKT: A graph-based interaction model for knowledge tracing. In Proceedings of the Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2020, Ghent, Belgium, 14–18 September 2020; Proceedings, part I. Springer: Berlin/Heidelberg, Germany, 2021; pp. 299–315. [Google Scholar]
  41. Liu, Q.; Huang, Z.; Yin, Y.; Chen, E.; Xiong, H.; Su, Y.; Hu, G. Ekt: Exercise-aware knowledge tracing for student performance prediction. IEEE Trans. Knowl. Data Eng. 2019, 33, 100–115. [Google Scholar] [CrossRef]
  42. Chen, P.; Lu, Y.; Zheng, V.W.; Pian, Y. Prerequisite-driven deep knowledge tracing. In Proceedings of the 2018 IEEE International Conference on Data Mining (ICDM), Singapore, 17–20 November 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 39–48. [Google Scholar]
  43. MacKay, D.J. A practical Bayesian framework for backpropagation networks. Neural Comput. 1992, 4, 448–472. [Google Scholar] [CrossRef]
  44. Kingma, D.P.; Salimans, T.; Welling, M. Variational dropout and the local reparameterization trick. Adv. Neural Inf. Process. Syst. 2015, 28, 2575–2583. [Google Scholar]
  45. Blundell, C.; Cornebise, J.; Kavukcuoglu, K.; Wierstra, D. Weight uncertainty in neural network. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 7–9 July 2015; pp. 1613–1622. [Google Scholar]
  46. Gal, Y.; Ghahramani, Z. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the International Conference on Machine Learning, PMLR, New York City, NY, USA, 19–24 June 2016; pp. 1050–1059. [Google Scholar]
  47. Louizos, C.; Welling, M. Multiplicative normalizing flows for variational bayesian neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 2218–2227. [Google Scholar]
  48. Sun, S.; Zhang, G.; Shi, J.; Grosse, R. Functional variational Bayesian neural networks. arXiv 2019, arXiv:1903.05779. [Google Scholar]
  49. Titsias, M. Variational learning of inducing variables in sparse Gaussian processes. In Proceedings of the Artificial Intelligence and Statistics, PMLR, Clearwater Beach, FA, USA, 16–18 April 2009; pp. 567–574. [Google Scholar]
  50. Settles, B.; Meeder, B. A trainable spaced repetition model for language learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 28 April–1 May 2016; pp. 1848–1858. [Google Scholar]
  51. Lan, A.S.; Waters, A.E.; Studer, C.; Baraniuk, R.G. Sparse factor analysis for learning and content analytics. J. Mach. Learn. Res. 2014, 15, 1959–2008. [Google Scholar]
  52. Yang, C.; Chiang, F.K.; Cheng, Q.; Ji, J. Machine learning-based student modeling methodology for intelligent tutoring systems. J. Educ. Comput. Res. 2021, 59, 1015–1035. [Google Scholar] [CrossRef]
  53. Kendall, A.; Gal, Y. What uncertainties do we need in bayesian deep learning for computer vision? Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
  54. Hasanzadeh, A.; Hajiramezanali, E.; Boluki, S.; Zhou, M.; Duffield, N.; Narayanan, K.; Qian, X. Bayesian graph neural networks with adaptive connection sampling. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; pp. 4094–4104. [Google Scholar]
  55. Stadler, M.; Charpentier, B.; Geisler, S.; Zügner, D.; Günnemann, S. Graph posterior network: Bayesian predictive uncertainty for node classification. Adv. Neural Inf. Process. Syst. 2021, 34, 18033–18048. [Google Scholar]
  56. Choi, Y.; Lee, Y.; Shin, D.; Cho, J.; Park, S.; Lee, S.; Baek, J.; Bae, C.; Kim, B.; Heo, J. Ednet: A large-scale hierarchical dataset in education. In Proceedings of the Artificial Intelligence in Education: 21st International Conference, AIED 2020, Ifrane, Morocco, 6–10 July 2020; Proceedings, Part II 21. Springer: Berlin/Heidelberg, Germany, 2020; pp. 69–73. [Google Scholar]
  57. Schmucker, R.; Wang, J.; Hu, S.; Mitchell, T.M. Assessing the performance of online students–new data, new approaches, improved accuracy. arXiv 2021, arXiv:2109.01753. [Google Scholar]
  58. Stamper, J.; Pardos, Z.A. The 2010 KDD Cup Competition Dataset: Engaging the machine learning community in predictive learning analytics. J. Learn. Anal. 2016, 3, 312–316. [Google Scholar] [CrossRef]
  59. Feng, M.; Heffernan, N.; Koedinger, K. Addressing the assessment challenge with an online system that tutors as it assesses. User Model. User-Adapt. Interact. 2009, 19, 243–266. [Google Scholar] [CrossRef]
Figure 1. Overview of the hierarchical probabilistic neural framework showing the integration of graph-based concept modeling, dual-network architecture, and uncertainty quantification components.
Figure 1. Overview of the hierarchical probabilistic neural framework showing the integration of graph-based concept modeling, dual-network architecture, and uncertainty quantification components.
Symmetry 17 01332 g001
Figure 2. Graph-based concept modeling showing (ac) for multi-scale representations.
Figure 2. Graph-based concept modeling showing (ac) for multi-scale representations.
Symmetry 17 01332 g002
Figure 3. Diagnostic accuracy vs. test length curves showing convergence rates for different methods on the ASSISTments dataset. Our framework (red line) achieved target accuracy with significantly fewer questions than baseline methods.
Figure 3. Diagnostic accuracy vs. test length curves showing convergence rates for different methods on the ASSISTments dataset. Our framework (red line) achieved target accuracy with significantly fewer questions than baseline methods.
Symmetry 17 01332 g003
Figure 4. Training convergence analysis showing loss reduction and validation performance across epochs for different framework components. The joint optimization phase achieved stable convergence after epoch 150.
Figure 4. Training convergence analysis showing loss reduction and validation performance across epochs for different framework components. The joint optimization phase achieved stable convergence after epoch 150.
Symmetry 17 01332 g004
Figure 5. Uncertainty calibrationanalysis showing ECE values and reliability diagrams across different methods. Our framework (blue bars) demonstrated superior calibration, with ECE = 0.048 compared to baseline methods.
Figure 5. Uncertainty calibrationanalysis showing ECE values and reliability diagrams across different methods. Our framework (blue bars) demonstrated superior calibration, with ECE = 0.048 compared to baseline methods.
Symmetry 17 01332 g005
Figure 6. Learned concept embeddings visualization using t-SNE projection showing meaningful clustering of related mathematical concepts. Colors represent different concept categories (algebra, geometry, statistics).
Figure 6. Learned concept embeddings visualization using t-SNE projection showing meaningful clustering of related mathematical concepts. Colors represent different concept categories (algebra, geometry, statistics).
Symmetry 17 01332 g006
Table 1. Main experimental results comparing our framework against baseline methods across four educational datasets. Best results are highlighted in bold.
Table 1. Main experimental results comparing our framework against baseline methods across four educational datasets. Best results are highlighted in bold.
MethodAUCATL
ASSISTEdNetJunyiKDDASSISTEdNetJunyiKDD
MI0.6210.6890.6520.63423.431.228.725.1
KLI0.6350.7010.6680.64922.129.827.324.2
BKT0.6510.7180.6840.66121.528.926.423.8
DKT0.6840.7420.7090.68719.826.324.121.9
DKVMN0.6970.7580.7230.70118.925.123.220.8
SAKT0.7120.7710.7380.71618.224.322.620.1
GKT0.7260.7830.7510.72917.623.721.919.5
EKT0.7310.7890.7560.73417.323.221.419.2
SDKT0.7390.7940.7630.74116.822.820.918.7
Ours0.7630.8210.7870.76915.220.118.616.9
Table 2. Comprehensive ablation study results on ASSISTments dataset showing individual component contributions and symmetry-aware comparisons to isolate symmetric design benefits from architectural advantages.
Table 2. Comprehensive ablation study results on ASSISTments dataset showing individual component contributions and symmetry-aware comparisons to isolate symmetric design benefits from architectural advantages.
ConfigurationAUCATL
Full Framework0.76315.2
w/o Hierarchical GCN0.74116.8
w/o Uncertainty Quantification0.74817.1
w/o Attention Mechanism0.73517.9
w/o Graph Pooling0.72918.3
w/o Temporal Modeling0.72219.1
Symmetry-Aware Ablations
GCN without Symmetry Constraints0.74116.8
QSN with Asymmetric Reward0.74817.1
Symmetric-Constrained IRT0.67319.4
Standard IRT (Baseline)0.63522.1
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cao, W.; Mai, N.T.; Liu, W. Adaptive Knowledge Assessment via Symmetric Hierarchical Bayesian Neural Networks with Graph Symmetry-Aware Concept Dependencies. Symmetry 2025, 17, 1332. https://doi.org/10.3390/sym17081332

AMA Style

Cao W, Mai NT, Liu W. Adaptive Knowledge Assessment via Symmetric Hierarchical Bayesian Neural Networks with Graph Symmetry-Aware Concept Dependencies. Symmetry. 2025; 17(8):1332. https://doi.org/10.3390/sym17081332

Chicago/Turabian Style

Cao, Wenyang, Nhu Tam Mai, and Wenhe Liu. 2025. "Adaptive Knowledge Assessment via Symmetric Hierarchical Bayesian Neural Networks with Graph Symmetry-Aware Concept Dependencies" Symmetry 17, no. 8: 1332. https://doi.org/10.3390/sym17081332

APA Style

Cao, W., Mai, N. T., & Liu, W. (2025). Adaptive Knowledge Assessment via Symmetric Hierarchical Bayesian Neural Networks with Graph Symmetry-Aware Concept Dependencies. Symmetry, 17(8), 1332. https://doi.org/10.3390/sym17081332

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop