Federated Graph Representation Learning for Online Student Performance Analysis

Seyghaly, Rasool; Garcia, Jordi; Masip-Bruin, Xavi

doi:10.3390/electronics15071495

Open AccessArticle

Federated Graph Representation Learning for Online Student Performance Analysis

by

Rasool Seyghaly

^*

,

Jordi Garcia

and

Xavi Masip-Bruin

CRAAX Lab, UPC BarcelonaTECH, 08800 Vilanova i la Geltrú, Spain

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(7), 1495; https://doi.org/10.3390/electronics15071495

Submission received: 24 February 2026 / Revised: 23 March 2026 / Accepted: 1 April 2026 / Published: 2 April 2026

(This article belongs to the Special Issue Deep Learning and Data Analytics Applications in Social Networks)

Download

Browse Figures

Versions Notes

Abstract

The rapid growth of online learning platforms has intensified the need for privacy-aware methods that can analyze learner behavior without centralizing sensitive activity logs. This study presents a Federated Learning-Based Graph Representation Learning (FL-GRL) framework for online student performance analysis in distributed learning environments. Each learner is represented through a local Student Learning Knowledge Graph (SLKG) that captures typed interactions with courses, lessons, webinars, challenges, and forum activities. Graph Neural Networks (GNNs) are used to derive relation-aware embeddings from these local graphs, while federated learning supports collaborative model optimization without sharing raw data. A federated clustering stage is then used to identify soft learner groups with partially overlapping behavioral patterns that may support exploratory personalization and confidence-aware educational follow-up. The current experiments focus on the feasibility of privacy-aware graph-based analysis rather than on a complete supervised prediction benchmark. Results across the evaluated graph-based variants indicate that the proposed framework is operationally viable, preserves relational structure better than flat-feature formulations, and provides an interpretable basis for learner-group discovery in privacy-sensitive online education settings.

Keywords:

federated learning; graph representation learning; graph neural networks; student performance analysis; learner profiling; learning management systems; knowledge graphs

Graphical Abstract

1. Introduction

With the rise of the Internet, online learning platforms have gained popularity, making education accessible to millions of learners worldwide. These platforms embrace the concept of open learning, where education is not confined to traditional institutional methods, such as attending physical lectures. Instead, learning takes place online through recorded lectures and digital tasks. This shift has expanded access to education, making it more inclusive regardless of an individual’s background [1]. At the same time, the methodological landscape relevant to this problem spans several mature research areas, including federated learning [2,3], graph neural networks and graph representation learning [4,5,6,7], knowledge graphs [8], and educational data mining/learning analytics [9].

In recent years, there has been substantial growth in the online learning platforms market, with forecasts suggesting this upward trend will persist. A report by Zion Market Research [10] estimates that the global market was valued at around $35.20 billion in 2023 and is expected to reach approximately $130.79 billion by 2032, reflecting a compound annual growth rate (CAGR) of about 15.70% over the forecast period.

Similarly, an accessible report published by The Educator, citing Stocklytics.com data, indicates that the global online education market is expected to increase by 40% over the next four years, climbing to an estimated valuation of about $257 billion by 2028. The report also notes that the market’s value has nearly tripled since 2017, rising from $65 billion to more than $185 billion in 2023 [11].

These statistics underscore the robust growth trajectory of online learning platforms, driven by technological advancements, increased internet accessibility, and a growing acceptance of digital education solutions.

Online learning platforms, however, require assessment of student performance. Learning progress must be measured, comprehension must be ensured, and academic standards must be maintained through effective evaluation methods. To accurately gauge student learning, online platforms should use innovative assessment methods, such as AI-driven assessments, peer reviews, and automated quizzes, as an alternative to in-person exams.

An important concern in analyzing the performance data of online learners is the need to preserve their privacy. Learners may hesitate to participate in surveys or may be reluctant to have their learning activities recorded in a centralized dataset. Federated Learning (FL) offers a suitable solution by enabling decentralized model training without requiring direct raw-data sharing [2,3,12]. Additionally, online learning activity data is often multidimensional, relational, and sparse, making Graph Representation Learning (GRL) an effective approach. Educational interactions are also inherently non-linear: learners follow different paths through lessons, revisit concepts, and depend on prerequisite knowledge structures that are not well captured by flat tabular features. Knowledge-graph modeling and graph neural methods are therefore relevant because they preserve structured relations among learners, resources, and activities rather than flattening them into manually aggregated features [4,5,6,8].

In this paper, we present a federated graph representation learning framework for analyzing student performance in Learning Management Systems (LMSs) while preserving data privacy. Our contribution lies in the design of a decentralized architecture that enables institutions to collaboratively analyze relation-rich student activity patterns without exposing sensitive student data. By leveraging graph-based modeling, we capture relational patterns among students, learning materials, and interactions that are often overlooked by traditional flat-feature approaches. Rather than claiming a generic first combination of FL and GRL, we position the contribution of this study more narrowly: a learner-centered Student Learning Knowledge Graph (SLKG) formulation, federated training over local graph representations, and a federated clustering stage intended to support performance analysis and exploratory personalization in distributed LMS environments. The substantive value of this integration is not that FL and graph learning are individually new, but that the proposed pipeline preserves typed educational relations during decentralized training. A simpler alternative based only on aggregate statistical features and FL could still be useful, but it would compress away relation types such as enrollment, attendance, attempts, forum participation, and resource linkage. Our claim is therefore not that graph-based federated models are universally superior, nor that non-graph federated baselines would necessarily fail. Rather, the present evidence supports the feasibility of a structure-preserving alternative for relational and privacy-sensitive educational data. In that sense, the manuscript argues for application-specific usefulness rather than for an experimentally proven universal non-replaceability over simpler feature-engineered FL pipelines.

We introduce the concepts of FL and GRL as foundational elements of our methodology. FL facilitates collaborative model training across distributed nodes, ensuring that raw data never leaves local environments, thus addressing privacy concerns and compliance with data protection regulations. This setting is also compatible with institution-local or edge-capable deployments, where learner interactions are generated close to the data source and where transmitting only model information is operationally preferable to centralizing detailed activity logs. GRL allows for the transformation of graph-structured educational data into low-dimensional embeddings that retain semantic relationships, enabling more effective analysis and pattern discovery. Together, these paradigms form a robust pipeline for privacy-aware, structure-aware learning in LMS environments, where data is both sensitive and inherently relational.

The remainder of this paper is structured as follows. Section 2 provides an overview of related work in educational performance analysis, FL, and GRL. Section 3 outlines the proposed method. Section 4 presents the experimental setup and results. Finally, Section 6 concludes the paper with future directions.

2. Related Work

Educational performance analysis sits at the intersection of learning analytics, educational data mining, graph-based representation learning, and privacy-preserving distributed optimization. Recent survey literature highlights the rapid expansion of educational data mining and learning analytics [9], the maturity of graph neural architectures and their variants [7], the broader conceptual foundations of knowledge graphs [8], the emergence of federated learning as a privacy-aware collaborative paradigm [2,3], and the rise of federated graph machine learning as a dedicated subfield [13]. We therefore position our work within this cross-disciplinary context rather than within a single-method literature alone.

In recent years, integrating FL and GRL has emerged as a promising approach to enhance educational outcomes while preserving data privacy. FL addresses privacy concerns in educational environments by allowing cooperative model training across distributed data sources without raw data exchange. Conversely, GRL records intricate interactions in educational data, enabling tailored learning environments.

FL has seen growing application in education, particularly for performance-analysis and learning-outcome modeling tasks while ensuring data confidentiality. For example, a novel FL framework was introduced to predict student grades while maintaining dataset confidentiality, showcasing the feasibility of FL in sensitive educational contexts [14]. The FecMap model further advanced this concept by incorporating local subspace learning and multi-layer privacy protection to train client-specific classifiers for learning-outcome prediction [15]. Additionally, FL has proven effective in improving dropout prediction across distributed datasets, contributing to more privacy-aware predictive models in online learning environments [16]. Beyond educational applications, foundational FL work and later surveys have also clarified the communication, heterogeneity, and privacy challenges that motivate decentralized deployment assumptions in settings with institution-local or device-local data [2,3].

On the other hand, GRL has demonstrated significant utility in capturing the structural complexity of educational data for personalized learning systems. One approach applied attention-based mechanisms to predict student performance using data from online learning activities, thereby improving inference accuracy [17]. Using text mining, another study utilized hierarchical educational data to create knowledge graph representations for personalized learning object retrieval [18]. The FOKE framework exemplifies a more comprehensive integration, combining knowledge graphs, foundation models, and prompt engineering to deliver explainable, individualized learning solutions [19].

Recent studies have started investigating the synergistic benefits of FL and GRL in promoting privacy-preserving and tailored educational systems. For example, using client-wise relationships encoded via graph-based structures, a well-organized FL framework was suggested to teach global and individualized models concurrently [20]. Similarly, GRL has been used in personalized exercise recommendation systems to model learner knowledge structures and recommend exercises aligned with individual learning needs [21]. These innovations illustrate the promise of integrating FL and GRL for effective, privacy-conscious modeling of student performance.

While these efforts mark an important step forward, they often focus on either optimizing federated models or enhancing graph-based personalization in isolation. Few studies have proposed an integrated framework that simultaneously utilizes GRL’s structural modeling and FL’s privacy guarantees for student performance analysis across distributed LMSs, while also using the learned graph space for downstream exploratory clustering of learner behavior.

These limitations highlight a critical research gap: the need for a unified, privacy-preserving framework that captures both the structural richness of educational interactions and the decentralized nature of data across institutions. In summary, the literature motivates a method that is simultaneously student-centered, graph-aware, and privacy-preserving, while remaining applicable to real LMS interaction data. The proposed method should therefore be read as an application-driven integration rather than as a claim that either federated learning or graph learning is new in isolation. Our work addresses this gap by proposing a federated GRL approach tailored for online student performance analysis in LMSs.

3. Proposed Method

In this study, we propose an FL-Based GRL framework to enhance student performance analysis while preserving learner privacy. Each learner maintains a local Student Learning Knowledge Graph (SLKG), capturing their interactions with various educational resources, including courses, lessons, videos, forum discussions, challenges, and webinars. To extract meaningful representations from these graphs, we apply Graph Neural Networks (GNNs), which generate embeddings that encode student engagement patterns. Instead of sharing raw data, learners participate in FL, where only graph embeddings are aggregated using an attention-based model to improve global learning representation. Furthermore, we introduce Federated Graph Clustering, which identifies clusters of students who exhibit similar learning behaviors and succeed in advanced courses. These clusters should be interpreted as exploratory learner groupings rather than as perfectly separated educational categories. Accordingly, they are intended to support hypothesis generation and soft personalization signals, such as confidence-aware content recommendations or difficulty adjustments, rather than rigid intervention rules. The overall proposed method is shown in Figure 1.

Figure 1 illustrates the overall architecture of our proposed Federated GRL framework for student performance analysis. The process is composed of several interconnected components designed to work collaboratively across multiple clients (e.g., educational institutions) while preserving data privacy. At a high level, each client constructs a local heterogeneous graph based on its own LMS data, capturing relationships among students, learning activities, and educational resources. These graphs are processed locally using a graph neural network (GNN) to generate node embeddings that reflect performance-related features. Instead of sharing raw data, clients transmit model updates to a central server, where a federated aggregation step combines the parameters to update a global model. The updated model is then redistributed back to the clients for the next training round. This iterative process continues until convergence. The figure encapsulates the data flow, graph modeling, local training, and federated aggregation, highlighting how personalized and global knowledge are jointly optimized in a privacy-preserving manner.

3.1. Local Student Learning Knowledge Graph (SLKG)

Each learner maintains a personalized Student Learning Knowledge Graph (SLKG), representing their interactions with educational content. The SLKG consists of:

Nodes: {Learner, Course, Lesson, Video, Forum Topic, Challenge, Webinar}.
Edges: {Watched, Attempted, Posted, Attended, Enrolled, Related}.

Each student’s SLKG evolves dynamically as they interact with different learning resources, ensuring a privacy-preserving, personalized representation of learning activities. In the current implementation, the node and edge schema are defined from LMS event logs and platform metadata. Direct interaction edges such as Watched, Attempted, Posted, Attended, and Enrolled are instantiated from recorded user actions, whereas the Related edge is generated deterministically from platform metadata and course-design associations available in the LMS configuration (for example, when a webinar is explicitly linked to a challenge, lesson, or course activity in the underlying platform catalog). Thus, Related is neither manually annotated per learner nor inferred by a separate relation-learning model in the current paper; it is a schema-level operational link inherited from the platform. This design is practical for deployment, but it may still reflect platform-specific curation choices and therefore introduce schema bias. For example, if a webinar is linked to a challenge in the LMS catalog, the resulting edge represents an operational course-design association rather than a causal claim about learning benefit. We therefore treat the graph schema as a domain-informed representation layer, not as ground-truth pedagogy, and we identify more formal empirical validation of edge semantics as an important direction for future work.

Figure 2 illustrates an example of a Local Student Learning Knowledge Graph (SLKG), where nodes represent entities such as the learner, courses, lessons, videos, forums, challenges, and webinars, while edges represent interactions like watched, attempted, posted, and attended. This graph captures the personalized learning activities of an individual student, enabling privacy-preserving analysis through localized graph representation.

3.2. Graph Representation Learning (GRL)

To extract meaningful features from SLKGs, we apply Graph Neural Networks (GNNs), such as GraphSAGE [22], Graph Attention Networks (GAT) [23], and Graph Autoencoders (GAEs) [24]. These methods generate graph embeddings, preserving the structure and semantic relationships within each SLKG.

3.3. FL-Based Graph Aggregation

To maintain privacy, learners do not share raw SLKG data. Instead, they participate in FL, where only model updates are transmitted. The aggregation follows:

Θ_{g}^{(t + 1)} = \sum_{i = 1}^{N} α_{i} Θ_{i}^{(t)},

(1)

where

α_{i}

is the attention weight assigned to learner i.

3.4. Federated Graph Clustering for Performance Analysis

In addition to analyzing performance-related representations, we identify groups of learners with similar learning behaviors by clustering the learned graph embeddings in a federated manner. No raw SLKG data are shared; instead, clients share either (i) learner embeddings or (ii) compact clustering summaries (e.g., centroids and counts). The discovered clusters can support interpretability and personalization, and can also be used as an auxiliary signal for downstream learner profiling or follow-up educational review. Although this sharing protocol is substantially more privacy-preserving than transmitting raw interaction logs, embeddings and summaries are not risk-free in principle: under a strong adversarial model, representation inversion, membership inference, or backtracking attacks may still be considered. In the present paper, the privacy discussion follows a limited threat model in which the coordinating server is assumed to be honest-but-curious (it follows the protocol but may inspect received updates), and clients are assumed to be semi-honest in the sense that they execute the training procedure correctly while still potentially observing shared outputs. Under this assumption, avoiding raw-data transmission reduces exposure substantially, but it does not by itself provide a formal privacy guarantee. We therefore view the current framework as privacy-reducing rather than absolutely leakage-proof, and we explicitly note that differential privacy, secure aggregation, or representation sanitization were not incorporated in the present implementation and remain important extensions.

3.4.1. KMeans Clustering for Local Pattern Discovery

Each client applies KMeans to its local learner embeddings to discover dominant behavior patterns and to initialize cluster prototypes. Given embeddings

{z_{m}}_{m = 1}^{M}

, KMeans finds K centroids

{μ_{k}}_{k = 1}^{K}

by minimizing:

min_{{μ_{k}}_{k = 1}^{K}} \sum_{m = 1}^{M} min_{k \in {1, \dots, K}} {∥z_{m} - μ_{k}∥}_{2}^{2} .

(2)

After convergence, the client transmits either the centroids

{μ_{k}}

or sufficient statistics (cluster sizes and sums) to the server, reducing communication overhead and limiting privacy exposure.

3.4.2. Agglomerative Hierarchical Clustering for Federated Consensus

To reach a global consensus, the server aggregates client-provided prototypes and performs Agglomerative Hierarchical Clustering [25]. Starting from individual prototypes, the algorithm iteratively merges the two closest clusters according to a linkage criterion; in our experiments we use Ward linkage, which minimizes the increase in within-cluster variance after each merge. The resulting dendrogram enables analysis at multiple granularities; the server selects the final number of clusters

K^{⋆}

using a validation criterion (e.g., Calinski–Harabasz) and broadcasts the resulting prototypes/assignment rules back to the clients. Clients then assign learners to the federated clusters and may use cluster membership (hard or soft) as an auxiliary analytic signal for exploratory personalization or follow-up review.

4. Results

To validate the effectiveness of the proposed method, we conducted experiments to evaluate its graph-based analytical behavior, scalability, and privacy-preserving characteristics. To ensure relevance in a real-world educational setting, we utilized data from Techwich [26], a technology-oriented online learning platform designed to support interactive and personalized education. Techwich hosts 4217 registered learners and offers 31 diverse courses, in addition to webinars, coding challenges, and vibrant community forums. This rich and dynamic environment enabled the construction of comprehensive Student Learning Knowledge Graphs (SLKGs) for each learner, effectively capturing diverse interaction patterns. These SLKGs were then processed using Graph Neural Networks within a FL setup, allowing us to assess our framework’s practical applicability under operational conditions. In this version, we report the platform scale and the graph construction pipeline, but we do not claim a complete statistical characterization of the interaction graph (e.g., average interactions per learner, graph sparsity, or degree distributions). We agree that such descriptors directly influence both GNN expressiveness and FL communication behavior, and they should be included in a broader benchmark-oriented follow-up study. In sparse educational graphs, practical mitigation strategies may include graph regularization, metadata-enriched node features, self-supervised pretraining, or knowledge-graph-guided augmentation; evaluating these options systematically remains part of future work rather than a claim of the present experiments. Table 1 presents the optimal hyperparameters for clustering methods (KMeans and Agglomerative) applied to GraphSAGE, GAT, and GAE embeddings, determined through tuning to enhance learner group identification.

Nevertheless, the clustering outcomes in Table 2 also reveal areas warranting further scrutiny, as evidenced by the moderate Silhouette Score of 0.5066 and the elevated Davies–Bouldin index of 8.9247. The Silhouette Score, which assesses how closely data points align with their assigned clusters relative to others, suggests a reasonable but not exceptional level of cluster cohesion and separation. This indicates that while many students are appropriately grouped, some may lie near cluster boundaries or exhibit behaviors that blur distinctions between groups. The Davies–Bouldin index, measuring average similarity between each cluster and its closest counterpart, reinforces this concern, with its higher value pointing to a degree of overlap among clusters. A plausible interpretation is that the learned embedding space captures broad learner groups well enough to yield strong global separation according to the Calinski–Harabasz criterion, while still containing locally overlapping subregions or clusters with unequal density and size. In practice, this may arise from heterogeneous engagement trajectories, sparsity in some learners’ activity graphs, or sensitivity to the selected linkage and number of clusters. Within the study’s context, these limitations highlight opportunities to refine the methodology through alternative cluster-number selection strategies, density-aware clustering, regularized embeddings, or richer graph features that better separate borderline learner profiles.

Based on the clustering performance metrics presented in Table 2, the Graph Attention Network (GAT) embeddings, when combined with the Agglomerative clustering method, exhibit the most favorable overall trade-off among the tested graph encoders in our experimental setting. The Calinski–Harabasz score of 20,338.42 stands out as particularly high, reflecting a favorable ratio of between-cluster dispersion to within-cluster dispersion. This suggests that the clusters are broadly separated, a property that is consistent with the use of Ward linkage in the Agglomerative method. At the same time, the Davies–Bouldin result advises caution: the embedding space still appears to contain nearby or partially overlapping groups. Accordingly, we interpret the GAT-based result as the strongest among the evaluated graph-based alternatives rather than as evidence of perfectly resolved learner categories or of strong standalone intervention partitions. One likely reason for the relative advantage of GAT is that attention weights can emphasize more informative neighbors and relations in heterogeneous educational graphs, which is beneficial when some interactions are more predictive of performance than others. At the same time, the high Davies–Bouldin value suggests that the resulting clusters should be treated as soft, partially overlapping behavioral regions. This pattern may depend not only on the data itself but also on modelling choices such as the selected number of clusters, the embedding dimension, and the sensitivity of Ward linkage to noisy or unevenly distributed prototypes.

Figure 3 presents a visualization of clusters derived from the best-performing model within a simulated environment using the FL-Based GRL (FL-GRL) framework. This figure specifically illustrates the outcome of applying Agglomerative clustering to Graph Attention Network (GAT) embeddings, identified as the most effective combination among the tested graph encoders based on its overall clustering performance (Table 2). The visualization represents individual learners as points grouped into clusters based on their learning behaviors and performance patterns extracted from Student Learning Knowledge Graphs (SLKGs). The axes correspond to a dimensionality-reduction view of the embedding space and are included to visually inspect relative grouping patterns rather than to convey standalone semantic variables. As with other low-dimensional projections, the apparent separation is method-dependent and should not be interpreted as definitive evidence of strongly isolated clusters on its own. In particular, methods such as t-SNE or UMAP can accentuate local neighborhoods differently, so the operational meaning of the clustering should be judged primarily from the quantitative indices and the downstream use case, not from the visual gaps alone.

These findings illuminate the practical implications and potential of integrating FL with GRL to advance educational outcomes in a privacy-conscious manner. For scope clarity, the present experimental section compares multiple graph encoders under the same federated graph-learning setting; it does not yet include traditional non-graph baselines such as logistic regression, SVM, or MLP, nor a centralized non-federated GNN counterpart. A fair inclusion of those baselines would require aligned feature engineering for tabular activity summaries and a separate centralized evaluation protocol with different privacy assumptions. Qualitatively, simpler FL baselines built on aggregate statistical features would likely capture activity volume, recency, and frequency well, and they may prove competitive for coarse performance-analysis tasks. Their main limitation is representational rather than procedural: they do not explicitly preserve typed relations among learners, resources, and activities unless those relations are manually compressed into engineered features. By contrast, a centralized GNN could offer a useful upper-bound under weaker privacy constraints because it would optimize over pooled relational data directly. We therefore do not interpret the present results as proof that the proposed pipeline dominates those alternatives; rather, we view it as evidence that a relation-preserving federated design is viable and promising. Such comparisons remain important for a fully exhaustive benchmark and are part of our planned follow-up evaluation.

The experimental results support the feasibility of the Federated Graph Representation Learning (FL-GRL) framework for student performance analysis in Learning Management Systems while reducing data exposure. Using real-world data from the Techwich platform, the framework showed the strongest performance among the evaluated graph-based variants when using Graph Attention Network embeddings and Agglomerative clustering. However, given the overlap indicated by the clustering metrics, the resulting learner groups should be used as exploratory signals for personalization rather than as rigid intervention classes. From a practical standpoint, their value lies more in prioritizing support intensity, sequencing follow-up analysis, or triggering human review than in enforcing mutually exclusive pedagogical tracks.

Some learner behaviors overlapped across clusters, indicating potential for refining cluster separation and for using soft cluster assignments or confidence-aware downstream decisions. Overall, FL-GRL offers a scalable privacy-reducing workflow for educational platforms by keeping student data decentralized during collaborative training, but its current clustering stage should be interpreted cautiously. Future work will aim to improve clustering precision, study dynamic graph neural networks, assess stronger privacy mechanisms such as differential privacy and secure aggregation, evaluate simpler non-graph and centralized baselines, and investigate how educational-psychology-informed features or multimodal signals (e.g., forum text, video engagement, and assessment traces) can enrich learner modeling.

5. Discussion

The proposed FL-GRL pipeline should be interpreted as a feasibility-oriented framework for privacy-aware performance analysis rather than as a fully benchmarked predictive system. In practice, the resulting learner groups can support exploratory personalization, targeted follow-up, and confidence-aware review, but they should not be treated as rigid instructional classes. Because the observed cluster boundaries are not perfectly sharp, any downstream use should remain soft and accompanied by additional signals such as current assessment history, recent activity, or instructor judgment.

Illustrative application pathways include recommending courses based on successful learner trajectories, adapting challenge difficulty to match engagement levels, and encouraging forum discussions among learners with similar patterns. These uses are presented as plausible analysis-driven intervention pathways rather than as independently validated pedagogical outcomes in the current study. When two clusters overlap substantially, the safer operational choice is to share support strategies across them or use cluster membership only as one signal among several, rather than assign entirely different teaching policies. Such intervention mappings are promising because the graph representation preserves sequential and relational context, but their educational efficacy should be validated in future deployment studies.

From a methodological standpoint, the current results compare graph-based variants within a common federated setting. They do not yet establish superiority over simpler feature-based FL baselines or centralized non-federated GNNs. Accordingly, the present evidence should be read as supporting the operational viability of a relation-preserving decentralized analysis framework, while broader baseline benchmarking, fuller graph statistics, and supervised prediction experiments remain necessary extensions.

6. Conclusions

This paper presents a privacy-aware graph-based framework for student performance analysis in online education platforms. Using Student Learning Knowledge Graphs and Graph Neural Networks, the framework captures student interactions and engagement patterns while federated training avoids raw-data centralization. Federated Graph Clustering further supports adaptive learning strategies by identifying groups of students with similar learning behaviors, although in the present results these groups are better understood as soft exploratory regions than as sharply separated categories. The experimental results indicate that this formulation is feasible and promising for distributed LMS settings, while also making clear that several practical issues remain open, including data sparsity, heterogeneous client behavior, incomplete baseline coverage, limited graph descriptive statistics, privacy assumptions weaker than formal differential privacy, and instability in cluster boundaries for some learners. For that reason, the present manuscript should be read as a feasibility-oriented study with internal comparison among graph-based variants, not as a definitive proof that the proposed approach is superior to all simpler non-graph or centralized alternatives. Although the present validation is based on the Techwich platform, the overall formulation is compatible with other LMS deployments and potentially with K-12 settings, provided that local graph schemas are adapted to the platform-specific learning resources and interaction logs. Future research should therefore combine stronger privacy mechanisms such as differential privacy or secure aggregation with richer dynamic graph models, broader baseline comparisons, cross-platform federated transfer learning, and educational-psychology-informed intervention modeling to evaluate generalizability across educational contexts.

Author Contributions

R.S.: Methodology, software, formal analysis, investigation, data curation, visualization, and writing—original draft preparation. J.G.: Conceptualization, supervision, validation, and writing—review and editing. X.M.-B.: Conceptualization, supervision, validation, and writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the HE MANOLO project, funded by the European Commission under Grant Number 101135782, and, for UPC authors, by the Spanish Ministry of Science and Innovation under Grant PID2024-156150OB-I00 funded by MICIU/AEI/10.13039/501100011033 and, as appropriate, by ERDF/EU.

Data Availability Statement

The data used in this study are not publicly available because they contain platform-specific and learner-related information from the Techwich environment. Data may be available from the corresponding author on reasonable request, subject to privacy, ethical, and institutional restrictions.

Acknowledgments

Authors acknowledge that this work has been partially supported by the HE MANOLO project, funded by European Commission with Grant Number 101135782, and for UPC authors by the Spanish Ministry of Science and Innovation under Grant PID2024-156150OB-I00 funded by MICIU/AEI/10.13039/501100011033 and, as appropriate, by ERDF/EU.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hughes, G.; Dobbins, C. The utilization of data analysis techniques in predicting student performance in massive open online courses (MOOCs). Res. Pract. Technol. Enhanc. Learn. 2015, 10, 10. [Google Scholar] [CrossRef] [PubMed]
McMahan, H.B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and Open Problems in Federated Learning. Found. Trends Mach. Learn. 2021, 14, 1–210. [Google Scholar] [CrossRef]
Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2017, arXiv:1609.02907. [Google Scholar] [CrossRef]
Hamilton, W.L.; Ying, R.; Leskovec, J. Inductive Representation Learning on Large Graphs. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. arXiv 2018, arXiv:1710.10903. [Google Scholar] [PubMed]
Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4–24. [Google Scholar] [CrossRef] [PubMed]
Hogan, A.; Blomqvist, E.; Cochez, M.; d’Amato, C.; de Melo, G.; Gutierrez, C.; Gayo, J.E.L.; Kirrane, S.; Neumaier, S.; Polleres, A.; et al. Knowledge Graphs. ACM Comput. Surv. 2021, 54, 1–37. [Google Scholar] [CrossRef]
Romero, C.; Ventura, S. Educational Data Mining and Learning Analytics: An Updated Survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2020, 10, e1355. [Google Scholar] [CrossRef]
Zion Market Research. Online Learning Platforms Market by End-User Facilities, by Age Group, and by Region—Global and Regional Industry Overview, Market Intelligence, Comprehensive Analysis, Historical Data, and Forecasts 2024–2032. Market Research Report. 2024. Available online: https://www.zionmarketresearch.com/report/online-learning-platforms-market (accessed on 31 March 2026).
The Educator. Online Education Revenue Growing $20bn Per Year—Research. Article Reporting Stocklytics.com Market Data. 2024. Available online: https://www.theeducatoronline.com/k12/news/online-education-revenue-growing-20bn-per-year–research/285823 (accessed on 15 March 2026).
Bonawitz, K.; Ivanov, V.; Kreuter, B.; Marcedone, A.; McMahan, H.B.; Patel, S.; Ramage, D.; Segal, A.; Seth, K. Practical Secure Aggregation for Privacy-Preserving Machine Learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 1175–1191. [Google Scholar]
Fu, X.; Zhang, B.; Dong, Y.; Chen, C.; Li, J. Federated Graph Machine Learning: A Survey of Concepts, Techniques, and Applications. arXiv 2022, arXiv:2207.11812. [Google Scholar] [CrossRef]
Farooq, U.; Naseem, S.; Mahmood, T.; Li, J.; Rehman, A.; Saba, T.; Mustafa, L. Transforming educational insights: Strategic integration of federated learning for enhanced prediction of student learning outcomes. J. Supercomput. 2024, 80, 16334–16367. [Google Scholar] [CrossRef]
Zhang, Y.; Li, Y.; Wang, Y.; Wei, S.; Xu, Y.; Shang, X. Federated learning-outcome prediction with multi-layer privacy protection. Front. Comput. Sci. 2024, 18, 186604. [Google Scholar] [CrossRef]
Zhang, T.; Liu, H.; Tao, J.; Wang, Y.; Yu, M.; Chen, H.; Yu, G. Enhancing dropout prediction in distributed educational data using learning pattern awareness: A federated learning approach. Mathematics 2023, 11, 4977. [Google Scholar] [CrossRef]
Chu, Y.W.; Hosseinalipour, S.; Tenorio, E.; Cruz, L.; Douglas, K.; Lan, A.; Brinton, C. Mitigating biases in student performance prediction via attention-based personalized federated learning. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; pp. 3033–3042. [Google Scholar]
Abu-Rasheed, H.; Dornhöfer, M.; Weber, C.; Kismihók, G.; Buchmann, U.; Fathi, M. Building contextual knowledge graphs for personalized learning recommendations using text mining and semantic graph completion. In Proceedings of the 2023 IEEE International Conference on Advanced Learning Technologies (ICALT), Orem, UT, USA, 10–13 July 2023; pp. 36–40. [Google Scholar]
Hu, S.; Wang, X. Foke: A personalized and explainable education framework integrating foundation models, knowledge graphs, and prompt engineering. In Proceedings of the China National Conference on Big Data and Social Computing; Springer: Berlin/Heidelberg, Germany, 2024; pp. 399–411. [Google Scholar]
Chen, F.; Long, G.; Wu, Z.; Zhou, T.; Jiang, J. Personalized federated learning with graph. arXiv 2022, arXiv:2203.00829. [Google Scholar] [CrossRef]
Yan, Z.; Hongle, D.; Lin, Z.; Jianhua, Z. Personalization exercise recommendation framework based on knowledge concept graph. Comput. Sci. Inf. Syst. 2023, 20, 857–878. [Google Scholar] [CrossRef]
Lee, S.; Moon, Y.S.; Jang, H.J. Domain Transformation to Graphs and GraphSAGE-based Embedding for Performance Enhancement in Time-Series Classification. IEEE Access 2024, 12, 197121–197136. [Google Scholar] [CrossRef]
Vrahatis, A.G.; Lazaros, K.; Kotsiantis, S. Graph attention networks: A comprehensive review of methods and applications. Future Internet 2024, 16, 318. [Google Scholar] [CrossRef]
Pan, S.; Hu, R.; Long, G.; Jiang, J.; Yao, L.; Zhang, C. Adversarially regularized graph autoencoder for graph embedding. arXiv 2018, arXiv:1802.04407. [Google Scholar]
Murtagh, F.; Contreras, P. Algorithms for hierarchical clustering: An overview. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2012, 2, 86–97. [Google Scholar] [CrossRef]
Techwich Learning Platform. Available online: https://techwich.net/ (accessed on 17 March 2025).

Figure 1. Overall proposed FL-GRL framework.

Figure 2. An example of a local Student Learning Knowledge Graph (SLKG).

Figure 3. Visualization of clusters (FL-aggregated Agglomerative on GAT embeddings). The horizontal and vertical axes correspond to coordinates in a two-dimensional projection space used only for visualization; they do not have a direct educational interpretation on their own.

Table 1. Best hyperparameters for clustering methods applied to GraphSAGE, GAT, and GAE embeddings.

Embedding	Method	Number of Clusters	Init	max_iter	n_init	Linkage
GraphSAGE	KMeans	7	k-means++	300	20	–
GraphSAGE	Agglomerative	3	–	–	–	ward
GAT	KMeans	12	k-means++	300	10	–
GAT	Agglomerative	12	–	–	–	ward
GAE	KMeans	3	k-means++	300	10	–
GAE	Agglomerative	3	–	–	–	ward

Table 2. Clustering performance of GraphSAGE, GAT, and GAE embeddings using KMeans and Agglomerative Clustering.

Embedding	Silhouette Score	Calinski–Harabasz Score	Davies–Bouldin Index
GraphSAGE (KMeans)	0.3746	2303.03	0.811
GraphSAGE (Aggl.)	0.3703	1849.30	0.9409
GAT (KMeans)	0.5023	24,992.23	9.9899
GAT (Aggl.)	0.5066	20,338.42	8.9247
GAE (KMeans)	0.4068	3498.95	0.8486
GAE (Aggl.)	0.4149	2884.17	0.9154

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Seyghaly, R.; Garcia, J.; Masip-Bruin, X. Federated Graph Representation Learning for Online Student Performance Analysis. Electronics 2026, 15, 1495. https://doi.org/10.3390/electronics15071495

AMA Style

Seyghaly R, Garcia J, Masip-Bruin X. Federated Graph Representation Learning for Online Student Performance Analysis. Electronics. 2026; 15(7):1495. https://doi.org/10.3390/electronics15071495

Chicago/Turabian Style

Seyghaly, Rasool, Jordi Garcia, and Xavi Masip-Bruin. 2026. "Federated Graph Representation Learning for Online Student Performance Analysis" Electronics 15, no. 7: 1495. https://doi.org/10.3390/electronics15071495

APA Style

Seyghaly, R., Garcia, J., & Masip-Bruin, X. (2026). Federated Graph Representation Learning for Online Student Performance Analysis. Electronics, 15(7), 1495. https://doi.org/10.3390/electronics15071495

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Federated Graph Representation Learning for Online Student Performance Analysis

Abstract

1. Introduction

2. Related Work

3. Proposed Method

3.1. Local Student Learning Knowledge Graph (SLKG)

3.2. Graph Representation Learning (GRL)

3.3. FL-Based Graph Aggregation

3.4. Federated Graph Clustering for Performance Analysis

3.4.1. KMeans Clustering for Local Pattern Discovery

3.4.2. Agglomerative Hierarchical Clustering for Federated Consensus

4. Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI