Learning Prediction of Multi-Topological GCN Based on Attention Mechanism

Fan, Di; Tan, Yifan; Fan, Leihua; Zhao, Fuyan; Lv, Changzhi

doi:10.3390/electronics15091898

Open AccessArticle

Learning Prediction of Multi-Topological GCN Based on Attention Mechanism

by

Di Fan

,

Yifan Tan

,

Leihua Fan

,

Fuyan Zhao

and

Changzhi Lv

^*

Shandong University of Science and Technology, Qingdao 266590, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(9), 1898; https://doi.org/10.3390/electronics15091898

Submission received: 4 March 2026 / Revised: 21 April 2026 / Accepted: 27 April 2026 / Published: 30 April 2026

Download

Browse Figures

Versions Notes

Abstract

The lack of graph information caused by ignoring the association between learners often affects the accuracy of graph-based learning. This paper proposes an approach called attention-based multi-topological graph convolution (A-MTGCN) to address this. It uses a graph neural network to predict academic tasks. The method involves an attention mechanism that assigns weights to different academic characteristics to reflect their effects on prediction. Additionally, the topology between learners is constructed from multiple perspectives to capture potential interactions and collaboration, forming a weighted learner association diagram. This reduces redundancy and information dispersion in the graph, while retaining the correlation features. The approach divides learners into four types. Experiments show the enhanced GCN performs well in learner node classification, with an accuracy of 92.53%, precision of 89.15%, recall of 92.27%, and F1-score of 87.83%. The evolution process of learners’ learning state is reflected by constructing learners’ state transition matrix.

Keywords:

online learning; learning prediction; attention mechanism; GCN

1. Introduction

Predicting learner performance is a long-standing research problem in educational data mining and is currently one of the main application scenarios for interpretability in this field [1]. To develop personalized intervention strategies, schools and educational institutions must gain an in-depth understanding of learners’ academic performance and behavioral characteristics. However, due to differences in individual needs and interests, learners’ engagement styles and learning paths are highly diverse. While these diverse learning behaviors provide a wealth of data for online education, they also present significant challenges in terms of analyzing the learning process and predicting student performance [2,3,4]. Accurate analysis of student performance can help instructors identify those at risk of failing or dropping out, provide timely feedback and intervention, and develop personalized learning paths and resources. It also enables learners to monitor their own progress, adjust their learning strategies, and enhance their motivation and self-regulation skills.

Both domestic and international researchers have conducted extensive and in-depth studies on the predictive analysis of learning outcomes, aiming to uncover underlying patterns in learners’ academic performance through educational data mining techniques. In the field of traditional machine learning, for example, Riestra et al. [5] developed a learning outcome prediction model based on five algorithms: logistic regression, support vector machines, naïve Bayes, decision trees, and multi-layer perceptions. By analyzing a large volume of LMS log data, they achieved early and accurate predictions of learners’ academic performance. Of the two models designed by Alshabandar et al. [6], Random Forests performed best in terms of regular academic performance, while Gradient Boosting Machines performed best in terms of final academic achievement. Lu et al. [7] proposed a model based on dynamic time warping and developed the Dynamic Time Warping with Black Widow Optimization (DTBW) and the Dynamic Time Warping with Fox Optimization (DTFO) models, which achieved accuracy rates of 93.7% and 93.5%, respectively, on public datasets. The introduction of deep learning technologies has further driven the development of this field. Wang et al. [8] proposed a hybrid model that combines the Long Short-Term Memory (LSTM) with DistilBERT, achieving an accuracy rate of 98.7% whilst optimizing computational efficiency. Meanwhile, Junejo et al. [9] designed a deep learning model for predicting academic performance tailored to learning scenarios during the pandemic. This model outperformed existing methods on multiple metrics, demonstrating excellent predictive performance. Li Mengying et al. [10] improved the accuracy of personalized predictions by identifying key features using a dual-path attention mechanism.

Although significant advances have been made using the aforementioned methods, there are still several limitations. Firstly, many models fail to leverage the structural information and attribute feature spaces inherent in learner interactions during feature extraction. This means they overlook the temporal dynamics and long-term dependencies of learning behavior [11]. Secondly, existing studies often treat learners as independent entities, failing to explore the complex relationships between them in sufficient depth.

In real-world online learning environments, educational data is predominantly sourced from Learning Management System (LMS) logs. However, as smart education and the ‘Smart City’ concept develop, educational settings are gradually adopting multi-source data collection methods based on the Internet of Things (IoT). For example, smart classroom equipment and learning behavior sensors can be used to track learners’ behavioral patterns and interaction modes more comprehensively, thereby providing richer datasets for modeling learning states. However, introducing multi-source, heterogeneous data increases the complexity of data structures and modeling. Graph neural networks (GNNs) are powerful tools for modeling structured data and provide novel solutions for learning performance prediction. GNNs can fully leverage node features and graph structures by recursively aggregating information from neighboring nodes to perform advanced node classification tasks [12]. This allows them to capture the complex relationships between learners and content. In recent years, the use of GNNs in education has continued to grow. For example, Wang et al. [13] constructed a heterogeneous graph that integrated text content and knowledge point associations. They achieved a more comprehensive problem representation through a heterogeneous graph neural network (HGNN). This demonstrated the effectiveness of multidimensional structural integration for educational data modeling. Fang et al. [14] constructed a multi-level knowledge graph based on multi-source data, such as student academic performance and course selection history. They used GCN to mine latent associations, laying a structural basis for personalized recommendations. Fan et al. [15] employed an enhanced GCN to learn heterogeneous graph embeddings. They combined this with multi-task learning. This improved the accuracy of MOOC recommendations. This further validated the value of multi-topological/heterogeneous graph structures in educational scenarios. However, current GNN models exhibit significant limitations. Traditional GCNs rely on a single topological structure, which makes it difficult to capture multidimensional learning interactions comprehensively. Multimodal Graph Attention Network for Recommendation (MGAT) incorporates an attention mechanism but does not fully account for the complementarity of different topological structures. It also overlooks the temporal dynamics of learning behavior [16]. Dual Graph Ensemble Learning Method for Knowledge Tracing (DGEKT), proposed by Cui et al. [17], adopts a dual-graph structure; nonetheless, its core focuses on modeling knowledge associations. This makes it difficult to adapt to the requirements of integrating multi-perspective behavioral relationships in learning progress prediction. Similarly, although heterogeneous GNN models, such as Graph-based Knowledge Tracing for Performance Recommendation (GKTPR, proposed by Zhang et al. [18]), consider multi-entity associations, they focus on the joint task of knowledge tracking and path recommendation. None of these models have been optimized for feature weight allocation and multi-graph fusion in learning progress prediction. Furthermore, the GCN-SynDCL method put forward by Achari et al. [19] enhances the graph structure by synthesizing minority nodes to address the common issue of class imbalance in node classification. Yet, it does not address the core requirement of multi-topology fusion and thus fails to overcome the limitations of single-topology representation. In recent years, scholars have begun to focus on modeling the temporal dimension. Xia et al. [20] proposed a spatiotemporal GNN model that effectively captured the temporal dependencies of learning trajectories on the ASSISTments dataset. Previous research has primarily concentrated on modeling static or stage-based features, frequently overlooking the temporal dynamics of learning behavior and its long-term dependencies. Studies indicate that incorporating time-aware mechanisms can improve modeling capabilities for sequential data and more accurately depict the evolution of learning states. Therefore, future research is expected to improve the model’s dynamic predictive capabilities by further integrating temporal information into learner state transition modeling.

In summary, existing methods face three major challenges. First, it is difficult for a single topological structure to comprehensively characterize the multidimensional interactions among learners. Second, the weighting scheme for features and topological information is not sufficiently refined, which leads to redundant information that interferes with prediction results. Third, structural information and dynamic temporal information have not been effectively combined. To address this issue, this paper proposes a Multi-Topology Graph Convolutional Network based on an attention mechanism (A-MTGCN). Unlike graph attention networks (GATs), which focus on node-level attention within a single topology, and heterogeneous graph neural networks, which rely on predefined relationship types, the A-MTGCN dynamically integrates multidimensional learner relationships from various graph structures through an adaptive attention mechanism. This design allows the model to capture context-dependent interactions more effectively than static, multi-view, graph neural network frameworks employing fixed fusion strategies. In this paper, we use attention mechanisms to differentiate feature weights and achieve complementary structural information through multi-topological fusion. This reduces redundancy and enhances generalization capabilities. The core innovations of this method include (1) using an attention mechanism to adaptively assign feature weights and enhance the contribution of key information; (2) constructing a multidimensional topological graph to capture interaction patterns comprehensively; and (3) integrating features from multiple relationship graphs to reduce structural redundancy, preserve multi-level information, and improve prediction accuracy and model generalization.

2. A-MTGCN Model Learning Situation Prediction Framework and Learner Graph Construction

2.1. A-MTGCN Model Framework

The A-MTGCN model framework constructed in this article is shown in Figure 1, which mainly consists of three parts: learner graph construction, graph convolution feature fusion, and learning situation prediction. In learner graph construction, an attention mechanism is used to allocate adaptive weights to different attribute features in academic data for online learning data, in order to enhance the influence of key features on learning situation prediction. Three topological structures are constructed based on Euclidean distance, cosine similarity, and Manhattan distance, respectively, to depict the edge connection relationships between learners from different perspectives, in order to capture multi-level association information between learners. In feature fusion, an attention mechanism is used to dynamically adjust the weight coefficients of three topological structures, generating a fused global topological graph representation. On this basis, a graph convolutional neural network is used to propagate information between nodes, iteratively update node representations, and integrate information from neighboring nodes to learn more discriminative node embeddings. In prediction, based on the learned node embedding representation, the multi-layer perceptron (MLP) classification model is used to predict the future learning situation of learners.

2.2. Learner Graph Constructs

The main reason for creating a learner graph is to visualize and organize the connections among learners, which can then be used in GCNs to predict student performance. Before constructing the graph, the raw learner data is feature-engineered and preprocessed. This paper primarily uses features derived from Learning Management System (LMS) log data, including statistical features such as learning engagement, interactions, and academic outcomes. These features are obtained by aggregating raw behavioral data. Normalization is then applied to eliminate the effects of unit scale, and feature enhancement is performed using an attention mechanism. This forms the final node feature representations for the model. In an online learning dataset, each learner is represented by a node, and the relationships between learners are represented by edges. This creates a graph structure of learner relationships, as shown in Figure 2.

Each node or learner contains different attributes, corresponding to a feature set of

X \in R^{N \times M}

, where N represents the number of nodes and M represents the number of features. The node relationship graph of online learners is defined as

G = (C, E)

, where C represents the set of nodes

C = {c_{1}, c_{2}, \dots, c_{N}}

of

y_{i}

learners; E represents the edge connection relationship

E = {(c_{i}, c_{j}) | S_{i j}}

between two learner nodes, where

E \subseteq C \times C

and

S_{i j}

have edge weights between nodes

c_{i}

and

c_{j}

, representing their similarity. For the prediction task in this article, the relationship between learners is mutual and can be viewed as an undirected graph, denoted as

S_{i j} = S_{j i}

, which represents the label of node

c_{i}

. The adjacency matrix describing the characteristic relationship between nodes is set to

A \in R^{N \times N}

, where element

a_{i j}

represents whether there is an edge between nodes

c_{i}

and

c_{j}

. If there is an edge, it is

a_{i j} = 1

; otherwise, it is 0.

\tilde{A} = A + I_{N}

represents the adjacency matrix after adding the self-ring, and

I_{N}

is the identity matrix of

N \times N

. The degree matrix is represented as

D = d i a g (d_{1}, d_{2}, \dots, d_{N}) \in R^{N \times N}

, where

d_{i} = \sum_{j = 1}^{N} a_{i j}

, and the degree matrix after adding the self-loop is represented as

\tilde{D} = d i a g (d_{1} + 1, d_{2} + 1, \dots, d_{N} + 1) = d i a g ({\tilde{d}}_{1}, {\tilde{d}}_{2}, \dots, {\tilde{d}}_{N})

.

2.2.1. Feature Enhancement

In the online learning dataset, each learner node contains multiple attribute features. However, there are significant differences in the contribution of different features to learning situation prediction. Among them, features such as learning engagement, learning interaction, and daily performance may have a strong impact on learning situation prediction, while features such as basic information have a relatively weak impact. The normal graph construction method often adopts a strategy of indiscriminate treatment when processing features, which cannot effectively distinguish the importance of different features. This approach may result in key factors not being fully explored and utilized, thereby limiting the predictive performance of the model and its ability to capture complex relationships to a certain extent. To address this issue, A-MTGCN introduces an attention mechanism in the data input stage, which can automatically learn the contribution of different features to learning prediction and assign higher attention weights to important features. The feature enhancement method is shown in Figure 3.

Define the learner’s original feature set

X = {x_{1}, x_{2}, \dots, x_{N}} \in R^{N \times M}

, and for each node feature

x_{i} = {x_{i 1}, x_{i 2}, \dots, x_{i M}}

, first input it into the attention layer to obtain the attention score

e_{i j}

corresponding to each feature:

e_{i j} = t a n h (W_{t} x_{i j} + b_{t})

(1)

Among them,

x_{i j}

represents the value of the i-th learner on the j-th feature dimension,

W_{t}

represents the attention layer weight parameter, and

b_{t}

represents the bias vector.

Normalize

e_{i j}

using

s o f t m a x

to obtain attention weight

ψ_{i j}

, and then, weight each feature to construct a new weighted feature

x_{i}^{'} = {x_{i 1}^{'}, x_{i 2}^{'}, \dots, x_{i M}^{'}}

. The weighting process is as follows:

x_{i j}^{'} = ψ_{i j} x_{i j}

(2)

Among them,

ψ_{i j} = \exp (e_{i j}) / \sum_{k = 1}^{M} \exp (e_{i k})

represents the attention weight

ψ_{i j} \in [0, 1]

of the i-th learner on the j-th feature.

The above operation can allocate different attention weights based on the importance of features, thereby enhancing the attention to key features and improving the discriminative ability for the current task.

2.2.2. Multi-Topology Diagram Structure Design

To address the limitation of a single perspective in capturing multi-level relation-ships, when calculating the similarity between learner nodes, this paper proposes a learner node relationship establishment method that incorporates multiple similarity measures. Euclidean distance, cosine similarity, and Manhattan distance are comprehensively utilized to measure the degree of similarity between nodes from different perspectives. At the same time, the attention mechanism is introduced to dynamically adjust the weight of each similarity metric, so that the model can automatically optimize the contribution ratio of each metric according to the data characteristics. The method can comprehensively measure the similarity between learner nodes from different perspectives, which significantly improves the accuracy of relationship graph construction.

Each feature describes the performance of learners in a specific dimension, and the feature vectors of different learners can be used for node similarity calculation, thereby constructing a relationship graph between learners. Three similarity measurement methods, namely Euclidean distance, cosine similarity, and Manhattan distance, were used to construct the graph structure. The multi-topological graph is shown in Figure 4.

For learner nodes

c_{i}

and

c_{j}

, whose corresponding features are

x_{i}^{'}

and

x_{j}^{'}

(denoted as

x_{i}^{'} = {(x_{i 1}^{'}, x_{i 2}^{'}, \dots, x_{i M}^{'})}^{T}

and

x_{j}^{'} = {(x_{j 1}^{'}, x_{j 2}^{'}, \dots, x_{j M}^{'})}^{T}

, respectively), the Euclidean similarity (

v_{E u c}

), cosine similarity (

v_{\cos}

), and Manhattan distance similarity (

v_{M a n}

) are defined as follows:

ν_{E u c} (c_{i}, c_{j}) = e x p (- d_{E u c} (c_{i}, c_{j}) / {\bar{d}}_{E u c})

(3)

ν_{Cos} (c_{i}, c_{j}) = d_{Cos} (c_{i}, c_{j})

(4)

ν_{M a n} (c_{i}, c_{j}) = e x p (- d_{M a n} (c_{i}, c_{j}) / {\bar{d}}_{M a n})

(5)

Here,

d_{E u c} (c_{i}, c_{j}) = \sqrt{\sum_{k = 1}^{M} {({x_{i k}}^{'} - {x_{j k}}^{'})}^{2}}

represents the Euclidean distance, reflecting overall behavioral differences.

d_{Cos} (c_{i}, c_{j}) = (x_{i}^{'} \cdot x_{j}^{'}) / (‖x_{i}^{'}‖ ‖x_{j}^{'}‖)

represents the cosine similarity, reflecting consistent behavioral patterns.

d_{M a n} (c_{i}, c_{j}) = \sum_{k = 1}^{M} |{x_{i k}}^{'} - {x_{j k}}^{'}|

represents the Manhattan distance, emphasizing differences on a per-dimension basis.

{\bar{d}}_{Euc}

and

{\bar{d}}_{Man}

represent the means of all Euclidean and Manhattan distances, respectively.

Nodal similarity based on Euclidean distance mainly measures the geometric distance between two nodes, which is suitable for measuring the degree of deviation of learners’ overall learning behaviors and is used to analysis global differences between learners; nodal similarity based on cosine similarity focuses mainly on directional similarity between vectors, which is suitable for measuring the similarity of learners’ learning styles; and nodal similarity based on Manhattan distance pays more attention to the feature dimension and Manhattan distance-based node similarity focuses more on itemized differences in feature dimensions, which can capture inter-learner feature variations more finely.

2.2.3. Attention Weighting Based on Multi-Scale Node Similarity

In order to optimize the relationship between node pairs, an attention mechanism is introduced to generate edge weight coefficients between each pair of nodes, reflecting the contribution of the similarity measurement in constructing the topology structure between nodes. The generated dynamic weights will be applied to each similarity measure to obtain a weighted similarity representation, generating an overall topology representation for subsequent information transmission and prediction stages.

In this paper, the feature vectors

x_{i}^{'}

and

x_{j}^{'}

of learner nodes

c_{i}

and

c_{j}

are concatenated to obtain a new input vector

ξ_{i j} = (x_{i}^{'} ‖ x_{j}^{'})

.

ξ_{i j}

is fed into the multi-layer perceptron (MLP), and its representation is learnt to obtain the attention coefficients of the corresponding nodes through nonlinear mapping, and Figure 5 illustrates the mapping process [21].

This mapping process can effectively capture the deep nonlinear relationships in the similarity features, thus improving the accuracy and expressiveness of the attention weight allocation. The mapping formula is

ξ_{i j}^{'} = W_{2} σ (W_{1} ξ_{i j} + b_{1}) + b_{2}

(6)

where

σ (\cdot)

is the activation function;

{W_{1}, W_{2}}

is the weight vector;

{b_{1}, b_{2}}

is the bias vector; and

ξ_{i j}^{'} = (o_{i j}^{1}, o_{i j}^{2}, o_{i j}^{3})

is the weight coefficients of the three node similarities based on Euclidean distance, cosine similarity, and Manhattan distance. The input of

ξ_{i j}^{'}

to the

s o f t m a x

layer is converted into dynamic attention weights of similarity

α_{i j}^{p}

, and the conversion formula is

α_{i j}^{p} = e x p (o_{i j}^{p}) / \sum_{k = 1}^{3} \exp (o_{i j}^{k}), p = 1, 2, 3

(7)

where

α_{i j}^{p} \in [0, 1]

and satisfies

\sum_{p = 1}^{3} α_{i j}^{p} = 1

.

The attention weight

α_{i j}^{p}

can adaptively adjust the importance of each similarity metric according to the characteristics of the input learner node pairs

c_{i}

and

c_{j}

. A dynamic edge weight will be obtained after applying it to the three similarity vectors. The edge weight

S_{i j}

between node pair

c_{i}

and

c_{j}

is

S_{i j} = α_{i j}^{1} \cdot ν_{E u c} (c_{i}, c_{j}) + α_{i j}^{2} \cdot ν_{Cos} (c_{i}, c_{j}) + α_{i j}^{3} \cdot ν_{M a n} (c_{i}, c_{j})

(8)

The edge construction similarity threshold is set to

ε

. When

S_{i j} \geq ε

, there is an edge between two nodes, and when

S_{ij} < ε

, there is no edge between two nodes. In this way, different learner nodes learn differently based on their own feature differences

α_{i j}^{p}

. This allows the model to adaptively determine the contribution of each similarity metric for different node pairs, thereby constructing a graph that more flexibly integrates information from Euclidean distance, cosine similarity, and Manhattan distance.

3. Graph Convolutional Feature Fusion and Learning Prediction

3.1. Graph Convolutional Feature Fusion

The core goal of graph convolutional feature fusion is to propagate and integrate information over the constructed graph

G = (C, E)

of learner nodes via GCN, thus enabling each learner node to incorporate feature information carried by its neighboring nodes and continuously update its representation, so that the node obtains a richer representation in the embedding space, to enhance the contextual information-awareness of the learner node, and ultimately to improve model prediction capability. In the process of information dissemination, the learner node

c_{i}

fuses and updates information through graph convolutional neural networks based on its current embedding representation and the embedding representations of neighboring nodes connected to it. As shown in Figure 6,

c_{i}

aggregates features directly from its first-order neighboring nodes to complete the first layer of information exchange. At the same time, the first-order neighboring nodes themselves will also obtain features from the second-order neighboring nodes, and then, these updated features will be further transmitted to the target node.

Through this layer-by-layer information aggregation, GCN enables the target node

c_{i}

to not only utilize the information of directly connected nodes but also capture the structural features of their more distant neighbors. The feature aggregation process of the target node is

c_{i}

as follows:

h_{i}^{(1)} = R e L U (\sum_{j \in Φ_{i} \cup {i}} \frac{1}{\sqrt{{\tilde{d}}_{i} {\tilde{d}}_{j}}} {\tilde{a}}_{i j} h_{j}^{(0)} Θ^{(0)})

(9)

Among them,

h_{i}^{(1)}

is the feature representation of the target node

c_{i}

after one layer of graph convolution;

Φ_{i}

represents the index set of neighboring nodes of the node

c_{i}

;

{\tilde{a}}_{i j}

represents the elements of the adjacency matrix

\tilde{A} = A + I_{N}

after adding the self-ring;

{\tilde{d}}_{i}

is the degree of node

c_{i}

after adding the self-ring,

h_{j}^{(0)} = x_{j}^{'}

; and

Θ^{(0)}

represents the weight matrix in the initial state.

After

l

layers of propagation, the target node

c_{i}

is able to integrate neighbor information from farther distances, making its representation more comprehensive in capturing the structural information of the entire graph, ultimately achieving a relatively stable and fully integrated state with surrounding node information. The propagation diagram is shown in Figure 7. The embedding representation

h_{i}^{(l)}

of node

c_{i}

after

l

layers of graph convolution is

h_{i}^{(l)} = R e L U (\sum_{j \in Φ_{i} \cup {i}} \frac{1}{\sqrt{{\tilde{d}}_{i} {\tilde{d}}_{j}}} {\tilde{a}}_{i j} \cdot h_{j}^{(l - 1)} \cdot Θ^{(l - 1)}), 1 \leq l \leq L_{s}

(10)

Among them,

L_{s}

represents the maximum number of graph convolutional layers.

Figure 7. l-layer schematic diagram of convolutional information propagation.

If the embedding representation of a single node is taken as a specific row vector of the global node embedding matrix, the formation of the global node embedding matrix

H^{(l)}

can be expressed as

H^{(l)} = [h_{1}^{T (l)}, h_{2}^{T (l)} \dots, h_{N}^{T (l)}]

(11)

To avoid gradient vanishing and over-smoothing problems in deep network training, residual connections are introduced in the model, as shown in Figure 8, to enable efficient information transfer between layers and maintain feature discriminability.

The full graph feature fusion expression after adding the residual term is

{\hat{H}}^{(l)} = R e L U ({\tilde{D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{- \frac{1}{2}} H^{(l - 1)} Θ^{(l - 1)}) + H^{(l - 1)} Θ_{r e s}^{(l - 1)}, 1 \leq l \leq L_{s}

(12)

where

H^{(l - 1)}

is the

l

-layer input feature matrix as a residual term.

Θ_{r e s}^{(l - 1)}

denotes the linear mapping matrix of layer

l

residual connections whose gradient update during training is expressed as

Θ_{r e s}^{(l - 1)} \leftarrow Θ_{r e s}^{(l - 1)} - Λ \cdot \frac{\partial L_{A-MTGCN}}{\partial Θ_{r e s}^{(l - 1)}}

(13)

where

Λ

is the learning rate,

L_{A-MTGCN}

is the model loss function, and

\partial L_{A-MTGCN} / \partial Θ_{r e s}^{(l - 1)}

denotes the gradient of the loss function over the weight matrix.

3.2. Prediction of Academic Performance

After graph convolutional feature aggregation, we get the final full graph embedding representation

{\hat{H}}^{(L_{s})}

, which incorporates its own features, interactions, and other potential information. Let its column

i

be the embedding representation of learner node

c_{i}

as

{\hat{h}}_{i}^{(L_{s})}

. Since this literature prediction is essentially a semi-supervised classification task, the embedding representation is inputted into the MLP classifier to perform classification to obtain

χ_{i} = s o f t m a x (ω_{s} {\hat{h}}_{i}^{(L_{s})})

(14)

where

χ_{i} = {χ_{i 1}, χ_{i 2}, χ_{i 3}, χ_{i 4}}

represents the prediction probability of node

c_{i}

in four categories and

ω_{s}

is the trainable parameter matrix. Meanwhile, the training process is optimized using cross-entropy loss, considering the complexity of the model, and avoiding overfitting of the model in the training process;

L 2

regular terms are introduced to prevent overfitting; and the final loss function of the model is

L_{A-MTGCN} = - \sum_{i \in D_{t r a i n}} \sum_{f = 1}^{4} y_{i}^{f} \log (χ_{i f}) + τ \sum_{l = 1}^{L_{s}} {‖Θ^{(l)}‖}_{2}^{2}

(15)

where

D_{t r a i n}

is the set of training set node indices;

y_{i}^{f}

is a binary element with value 1 when the node

c_{i}

belongs to category f and 0 otherwise;

τ

is the regularization factor; and

{‖\cdot‖}_{2}

is the 2-parameter.

4. Experimental Results and Analysis

4.1. Dataset Construction

This study analyzes the learning data of 4400 students from over 300 universities nationwide who participated in massive open online courses (MOOCs) during the 2023–2024 academic year. The data covers courses such as Digital Signal Processing, Digital Image Processing, Circuits, Analog Electronics, Microcontroller Principles, and Interface Technology, and it integrates learning behavior records spanning multiple semesters. A partner university in Shandong Province uniformly collected, anonymized, and standardized all data; the sample is not limited to students from that institution. The data includes five primary categories: learner demographics, learning engagement, learning interactions, daily performance, and academic performance. These categories encompass 14 specific attributes, including gender, student ID, study duration, number of study sessions, number of assignments submitted, and video rumination ratio. See Table 1 for specific details. This dataset originates from a partner university’s teaching platform and is classified as a non-public educational research dataset. Throughout the research process, strict adherence to privacy protection guidelines was maintained, and sensitive information, such as student ID numbers and gender, was anonymized and encrypted. Therefore, the raw data will not be made publicly available at this time.

Among them, the basic information includes two attributes, gender and student ID, which are characteristic categories that describe the basic information of learners. Learning engagement includes four attributes: learning duration, learning frequency, the number of submitted tasks, and video rumination ratio. It is a characteristic category that measures learners’ level of engagement in online learning. Learning interaction includes three attributes, likes, comments, and replies to comments, which are characteristic categories for measuring online learners’ online learning interaction. Daily performance includes four attributes, audio score, chapter test score, academic score, and homework score, which are characteristic categories for evaluating learners’ daily learning situation. The overall grade is the final grade, which is weighted by the four grades of daily performance.

This paper performs preprocessing on the raw data, including cleaning, transformation, and reduction. The main goal is to address issues such as outliers and missing values, making the data more suitable for modeling. This improves the model’s accuracy and reliability. Given the potential presence of noise and redundant information in the raw features during data processing, this paper employs data cleaning and standardization to mitigate the impact of outliers on the model. From the perspective of high-dimensional data modeling, future research could incorporate deep feature learning methods, such as Stacked Autoencoder (SAE), to perform nonlinear dimensionality reduction and representation optimization on learner features. This would further enhance the model’s adaptability to complex, high-dimensional data. After the online learning data is preprocessed, a standardized dataset is obtained for subsequent modeling and prediction. Additionally, to protect learners’ privacy, the data was anonymized. The study strictly adheres to the principle of data minimization by retaining only the features necessary for the learning situation prediction task. Future work could further integrate federated learning with homomorphic encryption mechanisms to enable cross-platform, collaborative data modeling while preventing the leakage of sensitive information. In this study, the learning situation status is the target variable for the classification task. Based on a comprehensive assessment of course grades and learning behaviors, learners are categorized into one of four groups: Excellent, Good, Average, and Passive. These labels are derived from grade ranges and reasonable adjustments using learning behavior features to ensure the validity and discriminative power of the labels.

4.2. Experimental Environment and Evaluation Indicators

The A-MTGCN model established in this paper is implemented based on the PyTorch deep learning framework. PyG is an algorithm library dedicated to processing graph-structured data, which can efficiently and flexibly implement various graph neural network models. The specific hardware and software information for the experiments in this section is shown in Table 2.

In order to validate the performance of the proposed A-MTGCN model, it is analyzed in this paper in comparison with support vector machine (SVM), Random Forest (RF), MLP, GCN, Relational Graph Convolutional Network (RGCN), and MGAT models, covering two traditional machine learning algorithms and four deep learning algorithms. The parameters of all machine learning baseline models are optimized using cross-validation, while parameter settings of the deep learning models are kept consistent with the A-MTGCN model. There are several adjustable parameters in the A-MTGCN algorithm, among which the number of network layers is selected for tuning in the range of

{1, 2, 3, 4}

; the random dropout ratio is set to 0.5, the similarity threshold

ε

for edge construction is selected in the range of

{0.5, 0.6, 0.7, 0.8, 0.9}

; the embedding dimensions are selected in the range of

{32, 64, 128, 256}

; and the Adam optimizer is adopted in the experiments, with a learning rate set to 0.001, the

L 2

-regularization factor set to 5 × 10⁻³, and the ReLU function used as the activation function.

Learner learning situation prediction is defined as a node classification problem in this paper, and the model performance is evaluated using four metrics: accuracy, precision, recall, and F1-score. Accuracy reflects the degree of agreement between the model’s predictions and the true labels and indicates the proportion of correctly classified samples among all samples, which is a measure of the overall performance of the model. Precision measures the proportion of samples predicted by the model as belonging to a particular category that actually belong to that category, reflecting the accuracy of the prediction. Reca, on the other hand, focuses on the model’s ability to identify all samples that actually belong to a certain category and indicates the proportion of correctly classified positive samples among all positive samples. F1-score is the harmonic mean of precision and recall, which combines the performance of both. By using these evaluation metrics in combination, the classification ability of the algorithm can be comprehensively measured from different dimensions, thus more accurately reflecting its overall performance in practical application scenarios.

4.3. Experiments and Analysis of Results

4.3.1. Comparative Experiments

To validate the performance of the A-MTGCN model, we conducted comparative experiments using the aforementioned in-house dataset, comparing it with baseline models, such as SVM, RF, MLP, GCN, GAT, R-GCN, Multiview, and MGAT. To ensure the scientific rigor and validity of the prediction results, as well as to minimize the impact of random errors on the experimental outcomes, this paper conducted an evaluation using 5-fold cross-validation, in addition to the original five replicates. The average of these results was taken as the final outcome. To verify the statistical significance of the performance improvement, we used a paired t-test to analyze the experimental results. The results of the experiments are shown in Table 3, where the optimal results are in bold.

As shown in Table 3, the A-MTGCN model yielded the best results in all metrics, achieving 92.53%, 89.15%, 92.27%, and 87.83% across the four evaluation metrics, respectively. Compared to the second-best model (MGAT), the A-MTGCN model improved by 1.12%, 0.51%, 1.81%, and 0.41%, respectively. This indicates that the A-MTGCN model has a consistent advantage across multiple dimensions. To further validate the effectiveness of the proposed method, a paired Student’s t-test was conducted based on the results obtained from the 5-fold cross-validation. The statistical analysis shows that A-MTGCN achieves highly significant improvements (p < 0.01) over GAT in both accuracy and F1-score, with t-values of 4.86 and 5.42, respectively. In comparison with MGAT, A-MTGCN also demonstrates statistically significant improvements (p < 0.05), with t-values of 3.12 for accuracy and 2.87 for F1-score. Similarly, when compared with the Multiview model, the proposed method achieves significant improvements, with t-values of 3.68 (accuracy) and 3.21 (F1-score), corresponding to p-values below 0.05. These results confirm that the performance improvements of A-MTGCN are statistically significant and not due to random variations.

Overall, SVM and RF performed the worst. These two traditional machine learning methods rely solely on manual feature engineering, which makes it difficult to effectively capture the latent relationships among online learners. This limits their ability to represent high-dimensional data, thereby affecting classification performance. Among the remaining deep learning algorithms, MLP outperformed SVM and RF but still lagged behind GCN, RGCN, and MGAT. This is because MLP is a pointwise learning model that cannot leverage adjacency relationships among learners to propagate information. This limits the model’s ability to predict learner performance. GCN and GAT use a neighbor feature aggregation mechanism that enables node embeddings to incorporate information about surrounding learners. This results in significantly higher prediction accuracy than MLP. Additionally, RGCN and MGAT optimize the information propagation mechanism of GCNs further. The RGCN enhances the ability to learn heterogeneous relationships through relationship modeling. The MGAT adaptively adjusts the weights of different neighboring nodes via an attention mechanism to improve classification performance. Nevertheless, these methods primarily aggregate information based on a single or limited topological structure. The Multiview model outperforms the graph convolutional neural network (GCN) by integrating view information under multiple similarity metrics. However, it remains slightly inferior to the multi-graph attention transformer (MGAT), which incorporates an attention mechanism. These results suggest that fusing multi-view information positively impacts node classification tasks, though improvements to the fusion method are still needed. By contrast, A-MTGCN produced the best results across all evaluation metrics. This suggests that integrating multi-topological modeling and attention mechanisms effectively combines structural information from various similarity metrics. This significantly enhances the model’s expressive power.

4.3.2. Parameter Sensitivity Experiment

This paper investigates the sensitivity of the A-MTGCN model to its parameters by conducting a systematic series of experiments and analyses. This study focuses on three key parameters: the number of graph convolutional layers, the node embedding dimension, and the similarity threshold in the edge construction process. These parameters are examined for their impact on model performance. Pillai et al.’s research [22] indicates that hyperparameter optimization is a key method for mitigating overfitting in deep learning models and enhancing their generalization ability. This research also provides the core theoretical basis for the parameter sensitivity analysis in this paper.

(1): Number of layers of the graph convolutional network

In the A-MTGCN model, the number of graph convolution layers directly affects the model’s ability to extract information about the graph structure, and an increase in the number of graph convolution layers implies that the order of the information about the neighboring nodes that can be aggregated also increased. The experimental results when the number of GCN layers is set from 1 to 4 are shown in Figure 9. From the results, it can be seen that the number of GCN layers has a significant effect on the model performance. When the number of layers is 2, the model performs best, indicating that at this time, the model is able to achieve a better balance between the depth of information aggregation and the ability of feature differentiation, which is able to effectively capture the higher-order relationships in the graph structure, but also avoids the loss of information caused by over-smoothing of features, and at this time, it is able to capture the information of the graph structure better and achieve the optimal classification effect. However, when the number of layers continues to increase, the performance of the model decreases significantly in various indicators, which is mainly due to the phenomenon of over-smoothing caused by network deepening after stacking too many convolutional layers, and its representation gradually tends to be similar, resulting in the weakening of the differences between the node features, which in turn weakens the classification ability of the model.

(2): Node Embedding Dimension

The embedding dimension of graph convolutional neural networks determines the complexity of the mapping relationships they can fit. Lower embedding dimensions may not fully express the feature information of nodes, resulting in limited model performance. However, higher embedding dimensions may introduce redundant information, leading to increased computational costs and overfitting of the model. This article experimented with four embedding dimensions, 32, 64, 128, and 256, and the results are shown in Figure 10. It can be seen that embedding dimensions have a significant impact on model performance. As the dimension increases from 32 to 128, the performance of the model gradually improves. When the dimension is 128, the model reaches its optimal performance, indicating that higher dimensions can enhance feature expression ability and improve classification performance. However, as the dimensionality further increased to 256, various indicators showed a decline due to the introduction of redundant information in high dimensions, which increased the complexity of the model and led to overfitting.

(3): Similarity Threshold

In graph construction, the similarity threshold parameter determines the edge connections between nodes, and a reasonable threshold value is crucial for the connectivity of the graph and the final performance of the model. Too low a threshold may lead to too many meaningless edges, which increases the computational complexity; while too high a threshold may lead to too sparse graphs, which may weaken the information dissemination effect. In this paper, experiments are conducted for thresholds of 0.5, 0.6, 0.7, 0.8, and 0.9, and the results are shown in Figure 11. It can be seen that the model performs best when the threshold is 0.6, which indicates that the graph structure constructed by the model can better retain the effective correlation information between nodes under this threshold setting. As the threshold increases, accuracy, precision, recall, and F1-score all show a decreasing trend, and the model is most ineffective when it is 0.9. This is because too high a threshold makes the graph structure constructed between learner nodes highly sparse and the graph structure becomes very simple, indicating that it is missing a large amount of node information and edge information. The missing edge information and the interaction between nodes limit the depth and breadth of information propagation of the graph convolutional neural network, resulting in the model being unable to learn effective features. In real-world large-scale application scenarios, to enhance the efficiency and flexibility of model deployment, meta-heuristic optimization methods (such as Moth-Flame Optimization and Walrus Optimization) are used to automatically optimize the similarity threshold parameter A and other key model parameters. This approach enables the rapid identification of globally optimal parameter combinations, ensuring the stability and reliability of the model’s predictive performance. Additionally, by optimizing parameter configurations, it indirectly reduces redundant computational overhead, significantly enhancing the model’s deployment efficiency, adaptability, and practical value in large-scale scenarios.

4.3.3. Model Complexity Experiment

This paper also evaluates the computational cost of the models, measuring the complexity of the models in terms of time consumption and memory usage. Figure 12 gives the time and memory consumed for a single round of training on the online learning dataset for the A-MTGCN model in this paper, as well as for the MLP, GCN, RGCN, GAT, MGAT, and Multiview models.

Figure 12 shows that there are significant differences among the models in terms of training time and memory consumption. The MLP has the lowest values for both. This is because the MLP does not involve graph structures and performs only fully connected computations. This results in the shortest training time and lowest memory usage. In contrast, since the GCN requires the aggregation of adjacency information, its computational complexity increases, leading to longer training times and higher memory usage. Building on this foundation, GAT introduces a node-level attention mechanism that adaptively weights the importance of neighboring nodes. This endows the model with stronger expressive power. However, this introduces additional computational overhead, resulting in higher resource consumption than GCN. The Multiview model enhances representational capacity by integrating graph structural information from multiple perspectives, but this significantly increases computational complexity. Its time and memory consumption are notably higher than those of single-topology models. This indicates that, while multi-view modeling can improve performance, it also imposes an additional resource burden. MGAT increases computational overhead while improving the model’s expressive power because it requires dynamic weighting of the importance of a learner’s neighboring nodes. RGCN performed the worst because it adds different relationship types to a graph neural network. It learns independent weights for each relationship’s neighboring nodes and propagates information accordingly. In online learning datasets, interactions between learners may involve multiple relationship types. This requires RGCN to model multiple weight matrices, thereby increasing computational and memory requirements. The A-MTGCN model, which is presented in this paper, fuses cross-topological information through multi-topological structures and attention mechanisms. Although this approach requires more computing power than the simple graph structure of GCN, it uses significantly less time and memory than MGAT and RGCN. Furthermore, it outperforms multi-view models, such as Multiview, achieving optimal classification performance while maintaining reasonable memory consumption and training time, thus striking a balance between computational efficiency and performance.

In summary, the A-MTGCN method proposed in this paper has obvious advantages over MGAT and RGCN in terms of computational complexity, although it is slightly higher than traditional methods such as MLP and GCN. And, the time consumption of A-MTGCN is controlled within 1 s during the training round, and the memory occupation is lower than 1 GB, which makes it more practical.

4.3.4. Ablation Experiment

To analyze the effectiveness of each component module in the A-MTGCN model in depth, we used the control variable method and designed multiple model variants for comparative testing. We observed changes in model performance by gradually removing or replacing different components of the model to quantify the contribution of each module to predicting learner performance. Here, A-MTGCN denotes the original prediction model proposed in this paper. This model includes multi-topology construction, a cross-topology information attention fusion mechanism, and a data augmentation module. A-MTGCN_Euc, A-MTGCN_cos, and A-MTGCN_Man are models that adopt a single topological structure based on Euclidean distance, cosine similarity, and Manhattan distance, respectively. These models retain the attention fusion and data augmentation modules. A-MTGCN_Euc+cos, A-MTGCN_cos+Man, and A-MTGCN_Man+Euc are models that adopt a combination of two topological structures. These models retain the attention fusion mechanism and data augmentation modules. A-MTGCN_no-Att is a model that removes the data augmentation component based on the attention mechanism (Att) during the learner graph construction phase. This model dynamically adjusts the cross-topology information fusion using equal weights. A-MTGCN_no-fusion denotes a model that completely removes the cross-topology information fusion module.

As shown in Table 4, the various constituent modules influence the model’s performance to different degrees. This suggests that each key module of the A-MTGCN model is essential for improving prediction accuracy and model robustness. Under a single-topology condition, the cosine similarity-based model achieved the highest accuracy of 88.69%, indicating that semantic similarity offers a distinct advantage in modeling learning states. Euclidean distance performed relatively stably. However, Manhattan distance exhibited slightly lower performance due to its sensitivity to sparse features. A single topology struggles to capture the complex relationships among learners and only captures information about local similarities. All combinations of two topologies outperform single-topology models. The combination of Euclidean distance and cosine similarity yields the best results with an accuracy rate of 90.57%, demonstrating the strong complementarity between the two. Multi-topology structures capture latent association information from three distinct perspectives: Euclidean distance, cosine similarity, and Manhattan distance. This enables more comprehensive modeling of multi-level relationships among learners, improving the model’s overall performance. Comparing A-MTGCN and A-MTGCN_no-Att reveals that replacing the attention mechanism with equal weights in cross-topology information fusion significantly decreases model performance across all four evaluation metrics. This indicates that the attention mechanism dynamically adjusts the importance of different topologies to achieve superior information fusion. Eliminating the cross-topology information fusion module degraded the model’s performance further, confirming the effectiveness of multi-perspective information fusion. The multi-topology design effectively integrates multidimensional behavioral characteristics, such as assignment completion, classroom participation, and mastery of key concepts, thereby enabling more accurate predictions of academic performance. Different topologies capture behavioral features that complement one another in predictive tasks. Cosine similarity excels at capturing semantic consistency in learning behavior, such as consistent assignment submission patterns. Euclidean distance reflects overall behavioral trends, such as a steady learning progress rate. Manhattan distance is sensitive to local variations, such as fluctuations in test scores. Together, these three metrics provide a comprehensive portrayal of the complex relationships among learners.

In summary, the A-MTGCN model effectively captures multi-level correlation information among learners through the synergistic interaction of its multi-topological structure, attention fusion mechanism, and data augmentation module. This enables high-precision score prediction while demonstrating high transparency and practical value in terms of interpretability.

5. Predictive Applications for Online Learners

5.1. Predictions of Process Performance

Based on the previous experimental verification of the model’s effectiveness in static node classification, we apply A-MTGCN again to investigate dynamic learning processes. This allows us to verify its temporal predictive capabilities and practical value over time. Predicting academic performance early on is crucial for boosting learning efficiency and optimizing the allocation of educational resources. If educators can accurately predict learners’ future performance early in a course, they can promptly intervene to provide personalized support and guidance, thus enabling early detection and intervention [23]. This paper applies the proposed A-MTGCN model to the prediction of process-level-learning progress based on dynamically updated online learning data. We provide valuable insights for both instructors and learners by dividing the course into multiple phases and using data from each phase to predict learners’ progress.

In this paper, learners are categorized into four group types: Excellent, Good, Average, and Passive. The online data of learners from multiple weeks during the academic year 2022–2024 is divided into five stages according to the time dimension, namely Stages 1, 2, 3, 4, and 5. Cumulative learning characteristics up to the current stage are extracted as inputs to the model in order to predict the future profile categories of learners. The number of portraits as well as the percentage distribution of each learner group in the initial stage are shown in Table 5.

This paper uses two metrics, accuracy and F1-score, to evaluate the prediction results. The experimental results are shown in Table 6 and Figure 13, with the optimal results in the table highlighted in bold. As can be seen in Table 6 and Figure 13, the A-MTGCN model’s prediction accuracy and F1-score for different groups of learners show an upward trend as the weeks of the course progress, with a significant increase from Stage 3 to Stage 5. This indicates that, as the course progresses, the model is able to use richer learner feature data to improve prediction accuracy. At the same time, accumulating feature data helps build a learner graph structure that incorporates more relationship information, further enhancing the model’s ability to model complex interactions between learners and boosting overall prediction performance. The model shows the strongest prediction effect on ‘Excellent’ learners, with an accuracy of 92.53% and an F1-score of 87.83% at Stage 5, indicating better recognition accuracy. In contrast, the model exhibits relatively poorer prediction effect on ‘Passive’ learners, with an accuracy of 92.53% and an F1-score of 87.83%. Although the prediction effect improves as the number of weeks increases, it still lags behind that of other groups, indicating that the model is better at identifying highly engaged learners.

5.2. Dynamic Tracking of Individual Learner Profiles

Learners change dynamically during the course, and the model’s ability to make stage-by-stage predictions offers a path for learners to track the evolution of their profiles throughout the course. In this paper, we have taken four types of profile groups (‘Excellent’, ‘Good’, ‘Average’, and ‘Passive’), and one representative learner is randomly selected as an individual case, named S1, S2, S3, and S4, as the tracking object. In this paper, we use the A-MTGCN model to track and record the predicted portrait categories of four learners at five stages. We then construct a sequence of dynamic changes in their portraits to demonstrate the transformation of their learning status. The results are presented in Scheme 1, where different colors represent different group types.

As seen in Scheme 1, the four different initial types of learners show their own dynamic profile evolution paths in the five stages of the course. Specifically, learner S1 always maintains a high learning state, only fluctuating briefly to ‘Good’ in Stage 3 and then returning to ‘Excellent’, showing a high degree of stable learning engagement and cognitive control, with good self-regulation and learning resilience. They showed a high degree of stable learning engagement and cognitive control and possessed good self-regulation mechanisms and learning resilience. Learner S2, on the other hand, exhibits a gradual deterioration, especially in the latter part of the course, when they made two consecutive drops to a lower profile category and finally even shifted to the ‘Passive’ negative type. This may be influenced by factors such as learning fatigue, lack of external motivation, or the accumulation of learning difficulties, which make it easy to slip in the middle and late stages of the course. Therefore, it is urgent to strengthen process monitoring and dynamic intervention. The evolutionary path of learner S3’s profile shows a positive growth trend, with a relatively stable state in the early stage and a gradual leap to ‘Good’ in the later stage that remains stable, indicating that the learner has stimulated stronger learning motivation in the middle and late stages and is expected to further develop into an ‘Excellent’ learner if provided with appropriate pedagogical support and guidance. Although learner S4 jumps to ‘Average’ twice in Stages 2 and 4, demonstrating a certain degree of improvement, the learner eventually failed to maintain a positive status and fell back to ‘Passive’, indicating that the learner may face problems such as learning inertia, limited access to resources, or a lack of timely and effective feedback and that targeted support and motivation should be provided at an early stage of the program.

5.3. Cohort Learner Portrait Leapfrog Analysis

To analysis the dynamic changes in the learning status of learners from different groups in depth, this paper conducted a transition analysis of learner profile categories at different stages. A profile transition matrix was constructed to statistically analyze the transformation of each type of learner between stages and reveal their group evolution trend. The vertical axis represents the learner portrait group in the current cycle and the horizontal axis represents the learner portrait group in the next cycle. The values in each cell represent the proportion transitioning from one portrait type to another between stages. The darker the color, the higher the transition ratio. The results are shown in Figure 14.

As can be seen in Figure 14, different groups of learners show obvious dynamic transitions in the course process. On the whole, the ‘Excellent’ and ‘Passive’ groups of learners have a high degree of stability, with a retention rate of more than 70% in each of the four time phases. This indicates that ‘Excellent’ learners have a consistently high level of participation in the learning process, while ‘Passive’ learners show strong inertia, and their learning state is relatively low. In contrast, the ‘Good’ and ‘Average’ groups show more active profile transition, with both positive upward and negative downward trends. Some of the ‘Good’ and “Average” learners have successfully moved to ‘Excellent’ in the middle and late stages. For example, the jump from ‘Good’ to ‘Excellent’ in Stage 3 to Stage 4 is 11%, while the total transition rate from ‘Average’ to ‘Good’ or ‘Excellent’ is more than 23%. This indicates that these two groups have significant growth potential with the right teaching support. Similarly, the increase from ‘Good’ to ‘Excellent’ was 11%, while the increase from ‘Average’ to ‘Good’ or ‘Excellent’ was over 23%. This suggests that these two groups have significant potential for growth with the right teaching support. Conversely, a small proportion of ‘Good’ and ‘Average’ learners deteriorate to ‘Passive’, suggesting a potential decline in motivation or learning potential in the later stages of the program. This may indicate problems such as a decline in motivation or an increase in learning pressure towards the end of the course, resulting in a decline in learning status.

In summary, the changes in the learning portraits of different types of learners show significant differences. The leap between individual and group portraits reveals the evolutionary trend of their learning behaviors and cognitive characteristics. This provides a basis for educators to accurately identify learners and implement targeted interventions.

6. Summary

Traditional machine learning methods struggle to capture the deep-seated relationships within online learning data, and this paper addresses this issue. Leveraging the powerful representation learning capabilities of graph convolutional networks (GCNs), we propose A-MTGCN: a multi-topology, GCN-based learning progress prediction model that incorporates an attention mechanism. This method constructs diverse learner relationship topologies from multiple perspectives by integrating similarity metrics based on Euclidean distance, cosine similarity, and Manhattan distance. The model further introduces an attention mechanism that dynamically weights these different topological structures. This achieves efficient cross-graph information fusion and enhances the model’s ability to capture complex learning behavior patterns. This paper performs a systematic experimental evaluation to validate the effectiveness of A-MTGCN on online learning datasets. The evaluation covers comparative experiments based on various baseline models, as well as ablation studies and parameter sensitivity analyses. The experimental results demonstrate that the proposed algorithm exhibits favorable generalization ability and effectiveness. Furthermore, we used the improved model to conduct experiments on process-based prediction of online learning data. We made learning progress predictions and established individual profiles for different types of learner groups across multiple stages. We also constructed a multi-stage learner state transition matrix and revealed the transition patterns of different types of learners during the learning process. Although GCNs based on multi-topology and attention mechanisms demonstrate excellent performance in predicting student academic performance, improvements are still needed in terms of real-time performance and computational cost in large-scale educational settings. Future research could integrate a lightweight network design with swarm intelligence optimization algorithms, such as the Moth-Flame and Walrus methods, to reduce the computational cost of models. This approach would leverage time-series-aware frameworks to iteratively improve the learner’s state transition logic. When combined with comprehensive IoT data collection and federated, privacy-preserving encryption architectures, this approach would significantly enhance the model’s deployment flexibility, scalability, and data security in complex educational settings.

Author Contributions

All authors contributed to the study design, data analysis, and manuscript drafting. The study was conceptualized by D.F. The data management and analysis were carried out by Y.T., C.L. and F.Z. The software portion was carried out by L.F. The methodology was co-conducted by Y.T., L.F. and C.L. The first draft of the manuscript was written by D.F., Y.T. and L.F., and all authors commented on previous versions of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Shandong Province Graduate High Quality Education and Teaching Resources Project (number: SDYAL2024037, funder: Shandong Provincial Department of Education), Key Project of the China Association of Higher Education: Research and Application Exploration on the Construction of a Stereoscopic Atlas for Electronic Information Specialty Courses from the Discipline Perspective (number: 25KC0202, funder: China Society of Higher Education), Natural Science Foundation of China-Shandong Joint Fund (number: ZR2022LZH001, funder: Shandong Provincial Department of Science and Technology), and Natural Science Foundation of Shandong Provincial (number: ZR2023MF004, funder: National Natural Science Foundation of China).

Data Availability Statement

The dataset used in this study is from a third-party platform, and the relevant data are collected by the platform in compliance with the requirements during normal operation. The data and materials supporting this study are available from the corresponding author upon reasonable request.

Acknowledgments

The authors express sincere gratitude to the teaching and research team of the relevant courses for their support in the collection and sorting of online learning data. We also appreciate the technical support provided by the laboratory research platform during the model training and experimental verification process of this study. During the preparation of this manuscript, the authors used ChatGPT-5.2 for the purposes of polishing the language expression of the manuscript and optimizing the logical structure of the research content description. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest. All authors confirm that, within the past 3 years, there are no financial or non-financial interests directly or indirectly related to this submission, including no funding obtained from profit-related organizations, no relevant financial holdings or remunerations, and no professional or personal relationships that may introduce bias into the research. The funding sponsors had no role in the design of this study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

A-MTGCN	Attention-Based Multi-Topological Graph Convolution
LMS	Learning Management System
DTBW	Dynamic Time Warping with Black Widow Optimization
DTFO	Dynamic Time Warping with Fox Optimization
LSTM	Long Short-Term Memory
IoT	Internet of Things
GNNs	Graph Neural Networks
HGNN	Heterogeneous Graph Neural Network
GCN	Graph Convolutional Network
MGAT	Multimodal Graph Attention Network for Recommendation
DGEKT	Dual Graph Ensemble Learning Method for Knowledge Tracing
GKTPR	Graph-Based Knowledge Tracing for Performance Recommendation
MLP	Multi-Layer Perceptron
SVM	Support Vector Machine
RF	Random Forest
RGCN	Relational Graph Convolutional Network

References

Mirza, N.M.; Ali, A.; Musa, N.S.; Ishak, M.K. Enhancing Task Management in Apache Spark Through Energy-Efficient Data Segregation and Time-Based Scheduling. IEEE Access 2024, 12, 105080–105095. [Google Scholar] [CrossRef]
Wang, J.; Wang, P.G. Exploration of Precision Teaching Mode Driven by the Integration of Knowledge Graph and Learning Situation Data. High. Educ. J. 2025, 11, 107–111. [Google Scholar] [CrossRef]
Chen, J. Effects of learning analytics-based feedback on students’ self-regulated learning and academic achievement in a blended EFL course. System 2024, 124, 103388. [Google Scholar] [CrossRef]
Albreiki, B.; Zaki, N.; Alashwal, H. A Systematic Literature Review of Student’ Performance Prediction Using Machine Learning Techniques. Educ. Sci. 2021, 11, 552. [Google Scholar] [CrossRef]
Riestra-González, M.; del Puerto Paule-Ruíz, M.; Ortin, F. Massive LMS log data analysis for the early prediction of course-agnostic student performance. Comput. Educ. 2021, 163, 104108. [Google Scholar] [CrossRef]
Alshabandar, R.; Hussain, A.; Keight, R.; Khan, W. Students performance prediction in online courses using machine learning algorithms. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–7. [Google Scholar] [CrossRef]
Lu, X. Modern Education: Advanced Prediction Techniques for Student Achievement Data. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 1294–1308. [Google Scholar] [CrossRef]
Wang, K. Optimized ensemble deep learning for predictive analysis of student achievement. PLoS ONE 2024, 19, e0309141. [Google Scholar] [CrossRef] [PubMed]
Junejo, N.U.R.; Huang, Q.; Dong, X.; Wang, C.; Zeb, A.; Humayoo, M.; Zheng, G. SAPPNet: Students’ academic performance prediction during COVID-19 using neural network. Sci. Rep. 2024, 14, 24605. [Google Scholar] [CrossRef] [PubMed]
Li, M.Y.; Wang, X.D.; Ruan, S.L.; Zhang, K.; Liu, Q. Student Performance Prediction Model Based on Two-Way Attention Mechanism. J. Comput. Res. Dev. 2020, 57, 1729–1740. [Google Scholar] [CrossRef]
Neha, K.; Kumar, R.; Sankat, M. A comprehensive study on student academic performance predictions using graph neural network. In Concepts and Techniques of Graph Neural Networks; IGI Global: Hershey, PA, USA, 2023; pp. 167–185. [Google Scholar] [CrossRef]
Tao, Z.; Ouyang, C.; Liu, Y.; Chung, T.; Cao, Y. Multi-head attention graph convolutional network model: End-to-end entity and relation joint extraction based on multi-head attention graph convolutional network. CAAI Trans. Intell. Technol. 2023, 8, 468–477. [Google Scholar] [CrossRef]
Wang, S.; Zhang, X.; Zhang, Z.; Zhao, L. A multidimensional question representation method based on heterogeneous graph neural networks. In Eighth International Conference on Artificial Intelligence and Pattern Recognition (AIPR 2025); SPIE: Bellingham, WA, USA, 2025; Volume 13993, pp. 899–908. [Google Scholar]
Fang, X.; Dong, H. Intelligent Recommendation and Optimization System for Student Management Information Using Knowledge Graphs. In Proceedings of the 2025 International Conference on Artificial Intelligence, Virtual Reality and Interaction Design, Dongguan, China, 17–19 October 2025; pp. 424–428. [Google Scholar]
Fan, D.; Qian, Y.; Zhang, Y.; Zhang, T.; Yu, M.; Yu, G. An LLM Augmentation and Multitask Learning Based Recommendation Model for MOOCs. In 2025 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA); IEEE: New York, NY, USA, 2025; pp. 696–704. [Google Scholar]
Du, L.; Liu, H.; Li, W. Pedestrian flow prediction using a spatiotemporal multi–head attention graph convolutional network integrated with knowledge graph. Appl. Intell. 2025, 55, 896. [Google Scholar] [CrossRef]
Cui, C.; Yao, Y.; Zhang, C.; Ma, H.; Ma, Y.; Ren, Z.; Zhang, C.; Ko, J. DGEKT: A Dual Graph Ensemble Learning Method for Knowledge Tracing. ACM Trans. Inf. Syst. 2024, 42, 1–24. [Google Scholar] [CrossRef]
Zhang, J. Graph-based Knowledge Tracing for Personalized MOOC Path Recommendation. J. Adv. Comput. Syst. 2025, 5, 1–15. [Google Scholar] [CrossRef]
Achari, S.; Kumar, S.; Beniwal, R. GCN-SynDCL: Imbalanced Node Classification via GCN and Synthetic Node Generation using Dynamic Curriculum Learning. IEEE Trans. Netw. Sci. Eng. 2026, 13, 6243–6257. [Google Scholar] [CrossRef]
Xia, Z.; Dong, N.; Wu, J.; Ma, C. Multivariate Knowledge Tracking Based on Graph Neural Network in ASSIST-ments. IEEE Trans. Learn. Technol. 2023, 17, 32–43. [Google Scholar] [CrossRef]
Airlangga, G. Predicting Student Performance Using Deep Learning Models: A Comparative Study of MLP, CNN, BiLSTM, and LSTM with Attention. MALCOM Indones. J. Mach. Learn. Comput. Sci. 2024, 4, 1561–1567. [Google Scholar] [CrossRef]
Pillai, S.E.V.S.; Nadella, G.S.; Meduri, K.; Priyadharsini, N.A.; Bhuvanesh, A.; Kumar, D. A Walrus optimization-enhanced long short-term memory model for credit fraud detection in banking. Int. J. Inf. Technol. 2025, 2025, 1–17. [Google Scholar] [CrossRef]
Li, Y.F.; Zhang, H.F.; Liu, S.L.; Tang, B. Research Progress on Educational Data Mining. Comput. Eng. Appl. 2019, 55, 15–23. [Google Scholar] [CrossRef]

Figure 1. General framework of the A-MTGCN algorithm. On the left is the “learner graph construction” section, which demonstrates the process of assigning weights to features based on the attention mechanism. In the middle is the “graph convolutional feature fusion” section, where three topological structures are fused into a global graph representation through the attention mechanism, followed by graph convolution operations. On the right is the “learning situation prediction” section, where an MLP is used to classify and output the learning situation status (Excellent, Good, Average, and Passive) of learners.

Figure 2. Learner node relationship diagram.

Figure 3. Feature enhancement diagram.

Figure 4. Multi-topology construction diagram. The figure illustrates the construction of three different topological subgraphs based on similarity measures, starting from a single feature set. The white circle set in the middle represents the original learner nodes and feature set, which is the common starting point for all topologies. The orange subgraph is a topology constructed based on Euclidean distance; the green subgraph is a topology constructed based on cosine similarity; and the pink subgraph is a topology constructed based on Manhattan distance.

Figure 5. MLP mapping process.

Figure 6. Learner node feature aggregation diagram.

Figure 8. Residual connection schematic diagram.

Figure 9. The influence of the number of GCN layers on the model.

Figure 10. The effect of the number of embedded dimensions on the model.

Figure 11. Similarity threshold effect on the model.

Figure 12. Experimental results of model complexity.

Figure 13. Predicted results of learner group types at different time points.

Scheme 1. Dynamic tracking results of individual learner profiles.

Figure 14. Profile transition maps of different types of learner groups at various time points.

Table 1. Online learning data characteristics’ attribute information.

Feature Category	Attribute	Specific Description
Basic information	Gender	Gender of learners (m/f)
Basic information	Student ID	Unique identifier for learners
Learning engagement	Learning duration	Total learning time (minutes)
	Learning frequency	Number of visits to the course
	Task submission	The total number of completed and submitted tasks
	Video rumination ratio	The proportion of repeated watches for learning videos
Learning interaction	Number of likes	The number of likes in the discussion forum
	Make comments	Number of comments posted in the discussion forum
	Reply to comments	The number of replies to others’ comments
	Audio score	Video viewing score
Daily performance	Chapter test results	The average score of each chapter test
	Academic performance	Academic progress and grades
	Homework result	Average score of homework
Overall result	Final mark	Three types of daily performance scores are weighted together

Table 2. Experimental environment.

Name	Version
Operating system	Windows 10
Python	3.9.12
PyTorch	1.11.0
CPU	Intel Core i7-12700H
GPU	NVIDIA GeForce RTX 3060

Table 3. Comparison of node classification experimental results of different algorithms in self-built datasets.

Model	Accuracy	Precision	Recall	F1-Score
SVM	0.7625 ± 0.0136	0.7513 ± 0.0141	0.7548 ± 0.0148	0.7485 ± 0.0142
RF	0.7856 ± 0.0124	0.7520 ± 0.0127	0.7754 ± 0.0131	0.7552 ± 0.0126
MLP	0.8425 ± 0.0098	0.8284 ± 0.0104	0.8332 ± 0.0109	0.8194 ± 0.0101
GCN	0.8548 ± 0.0089	0.8363 ± 0.0093	0.8493 ± 0.0101	0.8298 ± 0.0090
GAT	0.8612 ± 0.0092	0.8421 ± 0.0096	0.8536 ± 0.0104	0.8365 ± 0.0095
RGCN	0.8851 ± 0.0071	0.8616 ± 0.0080	0.8726 ± 0.0085	0.8312 ± 0.0083
Multiview	0.9026 ± 0.0064	0.8784 ± 0.0069	0.8942 ± 0.0073	0.8617 ± 0.0067
MGAT	0.9141 ± 0.0062	0.8864 ± 0.0065	0.9046 ± 0.0070	0.8742 ± 0.0064
A-MTGCN	0.9253 ± 0.0050	0.8915 ± 0.0055	0.9227 ± 0.0062	0.8783 ± 0.0053

Table 4. Ablation experiment results.

Model	Data Enhancement	Cross- Topology Information Fusion	Multi- Topology Structure	Accuracy	Precision	Recall	F1-Score
A-MTGCN_Euc	☑	☐	☐	0.8798 ± 0.0086	0.8612 ± 0.0091	0.8715 ± 0.0098	0.8469 ± 0.0089
A-MTGCN_cos	☑	☐	☐	0.8869 ± 0.0081	0.8743 ± 0.0087	0.8838 ± 0.0093	0.8557 ± 0.0085
A-MTGCN_Man	☑	☐	☐	0.8665 ± 0.0089	0.8363 ± 0.0094	0.8587 ± 0.0101	0.8242 ± 0.0090
A-MTGCN_Euc+cos	☑	☑	☐	0.9057 ± 0.0067	0.8826 ± 0.0072	0.8984 ± 0.0079	0.8687 ± 0.0070
A-MTGCN_cos+Man	☑	☑	☐	0.9012 ± 0.0069	0.8769 ± 0.0074	0.8937 ± 0.0081	0.8654 ± 0.0071
A-MTGCN_Man+Euc	☑	☑	☐	0.8969 ± 0.0070	0.8714 ± 0.0075	0.8892 ± 0.0082	0.8601 ± 0.0072
A-MTGCN_no-Att	☐	☑	☑	0.9074 ± 0.0074	0.8712 ± 0.0080	0.9056 ± 0.0088	0.8620 ± 0.0076
A-MTGCN_no-Fusion	☑	☐	☑	0.8925 ± 0.0077	0.8571 ± 0.0083	0.8893 ± 0.0090	0.8494 ± 0.0079
A-MTGCN	☑	☑	☑	0.9253 ± 0.0050	0.8915 ± 0.0055	0.9227 ± 0.0062	0.8783 ± 0.0053

Table 5. The number of learners in various groups during the early stages of learning.

Portrait Group	Number	Proportion
Excellent	484	11%
Good	1408	32%
Average	1760	40%
Passive	748	17%

Table 6. Predicted results of learner group types at different time points.

Evaluation	Prediction Type	Stage 1	Stage 2	Stage 3	Stage 4	Stage 5
Accuracy (%)	Excellent	81.22	83.80	86.37	88.15	92.53
	Good	79.85	81.41	84.22	86.89	89.16
	Average	78.52	80.93	82.96	84.74	87.02
	Passive	73.21	74.97	76.60	78.08	83.02
F1-score (%)	Excellent	77.82	78.63	81.02	83.41	87.83
	Good	76.85	77.41	79.82	82.89	85.16
	Average	75.52	76.93	78.96	81.74	84.02
	Passive	71.17	72.30	73.60	76.08	79.12

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fan, D.; Tan, Y.; Fan, L.; Zhao, F.; Lv, C. Learning Prediction of Multi-Topological GCN Based on Attention Mechanism. Electronics 2026, 15, 1898. https://doi.org/10.3390/electronics15091898

AMA Style

Fan D, Tan Y, Fan L, Zhao F, Lv C. Learning Prediction of Multi-Topological GCN Based on Attention Mechanism. Electronics. 2026; 15(9):1898. https://doi.org/10.3390/electronics15091898

Chicago/Turabian Style

Fan, Di, Yifan Tan, Leihua Fan, Fuyan Zhao, and Changzhi Lv. 2026. "Learning Prediction of Multi-Topological GCN Based on Attention Mechanism" Electronics 15, no. 9: 1898. https://doi.org/10.3390/electronics15091898

APA Style

Fan, D., Tan, Y., Fan, L., Zhao, F., & Lv, C. (2026). Learning Prediction of Multi-Topological GCN Based on Attention Mechanism. Electronics, 15(9), 1898. https://doi.org/10.3390/electronics15091898

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Learning Prediction of Multi-Topological GCN Based on Attention Mechanism

Abstract

1. Introduction

2. A-MTGCN Model Learning Situation Prediction Framework and Learner Graph Construction

2.1. A-MTGCN Model Framework

2.2. Learner Graph Constructs

2.2.1. Feature Enhancement

2.2.2. Multi-Topology Diagram Structure Design

2.2.3. Attention Weighting Based on Multi-Scale Node Similarity

3. Graph Convolutional Feature Fusion and Learning Prediction

3.1. Graph Convolutional Feature Fusion

3.2. Prediction of Academic Performance

4. Experimental Results and Analysis

4.1. Dataset Construction

4.2. Experimental Environment and Evaluation Indicators

4.3. Experiments and Analysis of Results

4.3.1. Comparative Experiments

4.3.2. Parameter Sensitivity Experiment

4.3.3. Model Complexity Experiment

4.3.4. Ablation Experiment

5. Predictive Applications for Online Learners

5.1. Predictions of Process Performance

5.2. Dynamic Tracking of Individual Learner Profiles

5.3. Cohort Learner Portrait Leapfrog Analysis

6. Summary

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI