Privacy-Preserving Federated Unlearning with Ontology-Guided Relevance Modeling for Secure Distributed Systems

Naglaa E. Ghannam; Esraa A. Mahareek

doi:10.3390/fi17080335

and

¹

Department of Computer Engineering and Information, College of Engineering in Wadi Alddawasir, Prince Sattam Bin Abdulaziz University, Wadi Alddawasir 11991, Saudi Arabia

²

Department of Mathematics, Faculty of Science, Al-Azhar University (Girls’ Branch), Cairo 11754, Egypt

^*

Author to whom correspondence should be addressed.

Future Internet2025, 17(8), 335;https://doi.org/10.3390/fi17080335

This article belongs to the Special Issue Privacy and Security Issues in IoT Systems

Version Notes

Order Reprints

Abstract

Federated Learning (FL) is a privacy-focused technique for training models; however, most existing unlearning techniques in FL fall significantly short of the efficiency and situational awareness required by the GDPR. The paper introduces two new unlearning methods: EG-FedUnlearn, a gradient-based technique that eliminates the effect of specific target clients without retraining, and OFU-Ontology, an ontology-based approach that ranks data importance to facilitate forgetting contextually. EG-FedUnlearn directly eliminates the contributions of specific target data by reversing the gradient, whereas OFU-Ontology utilizes semantic relevance to prioritize forgetting data of the least importance, thereby minimizing the unlearning-induced degradation of models. The results of experiments on seven benchmark datasets demonstrate the good performance of both algorithms. OFU-Ontology yields 98% accuracy of unlearning while maintaining high model utility with very limited accuracy loss under class-based deletion on MNIST (e.g., 95%), surpassing FedEraser and VeriFi on the multiple metrics of residual influence, communication overhead, and computational cost. These results indicate that the cooperation of efficient unlearning algorithms with semantic reasoning, minimized unlearning costs, and operational performance in a distributed environment. This paper becomes the first to incorporate ontological knowledge into federated unlearning, thereby opening new avenues for scalable and intelligent private machine learning systems.

Keywords:

privacy-preserving; security-aware; ontology; intelligent systems; federated learning; unlearning; semantic relevance

1. Introduction

The pervasive integration of artificial intelligence (AI) and machine learning (ML) across diverse sectors has precipitated the unprecedented collection and analysis of personal data, intensifying concerns about privacy and regulatory compliance. Landmark regulations like the European Union’s GDPR [] and California’s CCPA [] enshrine the “right to be forgotten”, compelling organizations to erase individual data upon request and eliminate its influence from trained models. Traditional machine unlearning in centralized systems addresses this through model retraining after data deletion—a process that becomes prohibitively expensive and impractical in federated learning (FL) environments [,], where data is inherently distributed across edge devices, and central aggregation violates privacy principles. As illustrated in Figure 1, the primary objective of unlearning is to remove specific data points’ effects from a trained model while preserving the model’s overall performance.

Figure 1. Overview of machine unlearning.

Federated learning [] emerged as a privacy-preserving alternative, enabling collaborative model training by exchanging parameter updates instead of raw data. However, removing a client’s data contribution post-training—known as federated unlearning—introduces unique challenges. Decentralized architecture and iterative aggregation of updates make it difficult to isolate and eliminate specific data influences once integrated into the global model []. Naive solutions (e.g., retraining from scratch without the target client’s data) incur excessive communication and computational overhead [], while existing unlearning methods (e.g., FedEraser []) suffer critical limitations as follows:

Efficiency—storing historical gradients or performing calibration rounds imposes high storage/server costs [].
Contextual blindness—uniform treatment of data points during unlearning risks unnecessary accuracy loss by disregarding semantic relevance [].
Verifiability—clients cannot independently confirm complete data erasure [].

Figure 2 illustrates the working of the typical FL setup. Several clients individually train their ML models, which are then combined into a global model. After this enhancement, the global model is shared with all clients by the server for the training phase in the next FL round. In this way, the international model will continue being processed until it converges. Hence, we want FU to enable the FL model to forget any knowledge about an FL client or any identifiable value linked to a client’s partial data, with strict guarantees on the underlying privacy promises of the decentralized learning.

Figure 2. Overview of federated unlearning.

To overcome these gaps, we propose two novel federated unlearning frameworks:

EG-FedUnlearn—A gradient-based approach that analytically reverses target clients’ contributions from aggregated model parameters. By applying negative gradient updates derived from local data samples, it achieves “exact unlearning” (equivalent to retraining) under standard FL assumptions, eliminating retraining costs and minimizing communication.

Ontologies are essentially knowledge structures that help in the categorization of knowledge in terms of its entities, attributes, and relations between concepts []. With the inclusion of ontology, relevance-based data prioritization can be included in federated unlearning by ensuring the unnecessary data points are forgotten, but those with a larger contextual relevance are kept. Relevance-based ontology provides a way of intelligent unlearning, which is adaptive to context, with the categorization of data and the assignment of relevance scores to the context for each data point to be unlearned. Through this structured relevance-based approach, the model can focus on the high-value information that needs to be retained, thus increasing the efficiency and accuracy of federated unlearning [].

OFU-Ontology—The first method to integrate domain ontologies into federated unlearning. It assigns semantic relevance scores $R (d_{i, j})$ to data points using structured knowledge representations (e.g., symptom–diagnosis relationships in healthcare). Low-relevance data is prioritized for unlearning via weighted gradient negation, preserving high-impact knowledge and reducing accuracy degradation.

Figure 3 illustrates this motivation, comparing classic federated unlearning with our ontology-based approach that uses semantic relevance scoring to reduce performance degradation. The figure compares classical federated unlearning with uniform data treatment to ontology-based unlearning, where semantic relevance scoring is applied for unlearning of data. This ontology module allows selective unlearning with a label on the least-relevant data points so as to decrease performance degradation in support of intelligent decision-making in privacy-sensitive federated systems.

Figure 3. Ontology-enhanced federated unlearning motivation.

The proposed work delivers three key contributions:

Proposing EG-FedUnlearn, a fast and storage-efficient unlearning method that removes a target client’s influence from the federated model without requiring full model retraining. We also propose OFU-Ontology, the first ontology-enhanced federated unlearning approach, which utilizes domain ontologies to guide the unlearning process and protect the remaining useful model knowledge.
Providing an analysis of the proposed methods to demonstrate that the target client’s data contributions are eliminated from the model. We prove that EG-FedUnlearn achieves the same effect as excluding the client’s data from training (equivalent to retraining) under certain assumptions, thus guaranteeing exact unlearning. We also discuss how the ontology in OFU-Ontology helps avoid unintended forgetting of unrelated knowledge. We also analyze security and privacy implications to show that our approaches do not introduce any new privacy leaks and that these techniques can be used in conjunction with verifiability techniques to assure clients of successful unlearning.
Performing experiments on benchmark FL datasets to assess methods such as EG-FedUnlearn and OFU-Ontology. The results indicate that our techniques substantially enhance the performance of unlearning, standing far apart from FedEraser and VeriFi. In terms of model unlearning, our approach takes less time (approximately one-tenth the computation and communication overhead of the baselines) and better retains the model (higher remaining accuracy on test data after unlearning). We demonstrate that OFU-Ontology maintains model utility especially well—the global model suffers a much smaller accuracy drop on non-forgotten data, thanks to ontology-guided knowledge preservation. These improvements illustrate the practical benefit of our contributions to enable efficient, reliable, and accurate federated unlearning in real-world scenarios.

By addressing efficiency, completeness, and knowledge-retention challenges with our two approaches, this work paves the way for federated learning systems that can forget users on demand with minimal performance penalty. To the best of our knowledge, this is the first work to integrate ontological domain knowledge into the federated unlearning process, opening a new direction for research in safe and intelligent model unlearning. We also explain how the ontology is constructed and used in our approach. For our experiments, domain ontologies are designed manually based on expert knowledge and public schema definitions, making them interpretable and adaptable across datasets. While this work uses static ontologies for clarity and evaluation, the framework supports future extension to dynamic or automatically generated ontologies that evolve with new data or client feedback. We also include ablation experiments comparing OFU-Ontology with and without the ontology-based relevance scoring, showing that the ontology component significantly improves model utility after unlearning.

Finally, to harmonize regulatory compliance with model performance, this paper is organized as follows: Section 2 presents an exhaustive literature review on federated unlearning, unlearning approaches, and the role of ontologies in data relevance consideration. Section 3 presents the methodology of the proposed work. Section 4 brings forth the results with discussion and comparison with existing methods of federated unlearning. Finally, Section 5 concludes the study and suggests directions for future research.

2. Literature Review

Federated learning is truly a decentralized setting of model teaching with data remaining on local machines and the model updates only being shared with the central server. This, among others, was considered by McMahan et al. [], and the term “federated learning” was coined to denote collaborative training without centralizing the sensitive data, hence preserving the privacy of the users. FL stands as a big umbrella that aims to balance the trade-off between privacy risks and large-scale data collaboration seen in centralized ML paradigms where all data sit aggregated in a common location. Some challenges unique to FL abound, including efficient communication, heterogeneity of devices, and ensuring convergence over non-IID data residing on decentralized nodes. Federated Averaging (FedAvg) [], which averages model updates instead of providing raw data, is one of the optimal communication protocols that researchers have developed to overcome these problems. It is commonly used because it is easy to use, efficient, and suitable for a wide range of clients.

A federated unlearning technique called FedEraser is intended to eliminate the impact of data points from a trained model in a targeted manner. To ensure that the surviving model no longer remembers details about the deleted instances, it works by rebuilding the training procedure without the target data. This unlearning technique comes to use specifically when data from machine learning models must be erased so that data laws like the GDPR may be complied. One of the major weaknesses of FedEraser is its high computational cost for each unlearning request, including further rounds of gradient computations and updates; however, it shows great promise in privacy preservation against due obstacles [].

Unlearning teaches a machine to forget data points, with the consequence that a given training data point may no longer be considered for a decision to train a model. Unlearning in centralized systems can be attained by simply retraining the model with the remaining data after removing the repugnant data points. However, such an approach becomes inefficient and demanding in terms of computation in the setting of federated systems where data is distributed across devices. Federated unlearning came into the picture to counter the problem of unlearning in a decentralized environment. Thus, removing the effect of data without requiring full access to all the data nodes is the primary goal of federated unlearning, which sits well with regulations such as GDPR and CCPA.

A unified federated unlearning workflow is presented in Figure 4, serving as the basis for a novel taxonomy of existing FU techniques. It defines the timeline for learning, unlearning, and verification. When the FU system receives an unlearning request, it can either allow the target client to exit the system immediately, referred to as “Passive unlearning”, or the target client can choose to stay and participate in the unlearning process, referred to as “Active unlearning”. Note that some unlearned clients may simultaneously initiate the unlearning request and transmit information to the server, while others may not engage in the unlearning process but remain solely for verification.

Figure 4. Federated unlearning workflow.

The current methods tend to lose the contextual significance of data, which might result in the loss of valuable information, and are unable to ensure the least amount of overall unlearned data impact while maintaining the integrity of a model. Old techniques for federated unlearning included a certain form of local updates based on gradients that were adjusted to lessen the particular data residues to cloud a particular model’s parameters. For example, Ginart et al. [] proposed an approximate unlearning method, where deformation of the data point was borrowed to simulate “forgetting” some of the contributions. In principle, these works are effective when the storage of data and computations is centralized, but they have to be adapted to perform on federated systems where nodes with numerous gradients have to be shared. More recent procedures, like Thudi et al. [], performed remote and controlled experiments with model updates on each client’s devices and then applied selective erasing of their data from other clients, which rendered the need for complete retraining unnecessary. Notably, these methods encounter difficulties with relevance triage as they do not target specific data points that are important regarding the model’s performance or accuracy.

Moreover, federated unlearning needs to strike a trade-off among accuracy retention, computational efficiency, and privacy preservation. Cao et al. optimize for low memory and low computational overhead, and their approach is therefore ideal for client-side model adaptation on resource constrained devices often employed in FL []. Nonetheless, many of these approaches introduce an accuracy or privacy leakage minimization trade-off for efficiency. As federated learning models get more sophisticated and distributed, it is critical to have data-aware unlearning, especially when we need to guarantee that the removed data are relevant for the model behavior.

Ontologies create a standard model for specifying entities and relationships, as well as attributes in context to a specific domain. Using ontologies, data points could be organized according to conceptual relationships rather than statistical features, which would enable model interpretability and the prioritization of more relevant/proper data in the training phase. The ontologies first defined by Gruber [] as “an explicit specification of a conceptualization” provide the foundation for your ontology-based systems. Such systems permit the hierarchical structuring of knowledge and linking concepts so that one can manage intelligent data.

In ontology-driven relevance in unlearning contexts, we exploit this structured approach to distinguish data points by their semantic importance []. In healthcare, symptoms might be associated with diagnoses within an ontology, allowing the system to prioritize key symptoms relating to a given diagnosis while de-prioritizing non-relevant information for unlearning. Relevance-based scoring is essential for domains with complex and interrelated data, like in medical diagnostics, where some features can be more relevant to the expected results of the model. Ontologies have been applied to ML models in unique domains like personalized recommendation systems and natural language processing, where the relationships between multiple entities can help increase relevance scoring of data to create stronger models.

Ontology-driven relevance can serve as a guide in federated unlearning that enables selective data forgetting concerning the semantic hierarchy and importance of data points []. Focusing on the relevant data points that maximize accuracy helps to reduce excessive information loss and improve the unlearning process. An untapped direction away from the typical federated learning context is toward ontologies, which present a considerable opportunity for finding new areas of model improvement through unlearning that takes context into account.

While federated unlearning allows for the removal of particular data points from a model to protect privacy, existing methods usually treat all data as being equally important with no regard whatsoever for contextual importance. With ontology-based relevance scoring, federated unlearning focuses on keeping highly relevant data and forgetting less relevant information. Therefore, the model retains accuracy by retaining crucial knowledge after unlearning. Also, it may reduce computational and memory overhead on client devices by focusing unlearning on less valuable or more outdated data—a big plus in resource-constrained federated learning scenarios.

This paper proposes a novel ontology-enhanced federated unlearning framework that incorporates relevance-based data prioritization, thereby advancing federated learning systems’ ability to adapt to privacy requirements. The proposed framework offers significant contributions to the fields of machine unlearning and federated learning, presenting an adaptive, intelligent approach to decentralized data management.

3. Methodology

This section describes the proposed federated unlearning framework, which comprises two complementary methods: Efficient Gradient-Based Federated Unlearning (EG-FedUnlearn) and Ontology-Facilitated Unlearning (OFU-Ontology). Both methods aim to remove a specific client’s data influence from the global model while preserving overall model utility. EG-FedUnlearn provides an efficient, retraining-free mechanism based on gradient elimination, while OFU-Ontology introduces domain knowledge to guide relevance-based selective unlearning.

We consider a standard federated learning (FL) setup consisting of a central server and

N

participating clients. Each client

i

holds local data

D_{i}

and collaboratively trains a global model by exchanging parameter updates with the server. The server aggregates these updates, e.g.,

θ^{(t + 1)} = θ^{(t)} - η \sum_{i = 1}^{N} \frac{n_{i}}{n} \nabla L_{i} (θ^{(t)})

(1)

where

θ (t)

are model parameters at round

t

, η is the learning rate,

n_{i}

is client i’s data size, and

n = \sum n_{i}

. Federated unlearning (FU) requires that after training, the server can remove a target client’s influence. The challenge is to achieve privacy compliance (GDPR “right to be forgotten”), minimal accuracy loss, and low communication and computation costs.

3.1. Efficient Gradient-Based Federated Unlearning (EG-FedUnlearn)

The Efficient Gradient-Based Federated Unlearning (EG-FedUnlearn) method aims to promptly delete certain points of data from a given federated model, and for this reason, the method exploits gradient information to undo the influence of such data selectively without ever retraining the model. The objectives of this algorithm are to remove the target data influence efficiently without resorting to costly retraining, minimize model degradation when unlearning specific data, and minimize the number of rounds and size of data communicated between federated nodes and the central server.

Each federated node would maintain local datasets and train the shared global model. These local updates of the model are computed from the gradients obtained in training on local data. During every training round, each node will store the gradient updates for every data sample or mini-batch of samples in local storage. This stored gradient update is used later to eliminate the effect of a particular data sample when an unlearn request is made. For unlearning given data other than

d_{i}

, the node calculates the gradient concerning that data and applies this negative update to the local model:

θ_{u p d a t e d}^{n} = θ_{n} - η \cdot Δ θ_{d_{i}}^{n}

(2)

where

Δ θ_{d_{i}}^{n}

represents the gradient update associated with data point

d_{i}

. The negative update essentially reverses the influence of

d_{i}

on the model. The local model is updated with this gradient-based unlearning approach. Nodes then communicate the adjusted model parameters to the central server.

The central server aggregates the adjusted model updates from all nodes to create a global unlearned model. The aggregation formula is like the standard federated learning process:

θ_{G} = \sum_{n - 1}^{N} \frac{| D^{n} |}{| D |} \cdot θ_{u p d a t e d}^{n}

(3)

where

θ_{u p d a t e d}^{n}

are the adjusted model weights received from each node.

| D^{n} |

represents the number of data points in the dataset at node

n

.

The server keeps track of summarized gradient information from previous rounds to further improve efficiency. The server then approximates the effects of the target data over the entire training set to minimize the need for repeated recalculations. This summarization acts like a memory of the impact mammoth from big data in the past. The central server applies varying rates of the adaptive learning height to adjust how unlearning affects the gradient. The adaptive learning rate is then made small if the unlearning incurs big losses with respect to accuracy, thus preserving the quality of the model as follows:

θ_{G} = θ_{G} + α \cdot Δ θ_{G}

(4)

where

α

is an adaptive scaling factor that adjusts the degree of unlearning.

Δ θ_{G}

represents the aggregate influence of the data point that is being unlearned across all nodes.

As shown in the algorithm methodology in Figure 5, this algorithm uses gradient information to directly reverse the impacts of specific data points on the local models. By applying a negative gradient update, the influence of a data point can be effectively removed. The central server summarizes gradients across multiple rounds, which allows it to understand the impacts of data points throughout the training process. This reduces the need for full retraining or calibration, making the unlearning process more efficient. The algorithm employs an adaptive learning rate during the aggregation process to ensure that unlearning operations do not drastically degrade model quality. If the negative gradient update has a strong negative impact, the adaptive learning rate mitigates this effect to maintain the performance of the model.

Figure 5. Flowchart of the EG-FedUnlearn methodology.

If we compare the proposed algorithm with FedEraser, we will find that FedEraser relies on storing historical parameter updates and applying calibration training to reconstruct the unlearned model. This storage requirement can be resource-intensive for large-scale systems. Meanwhile, EG-FedUnlearn addresses this using gradient tracking at each node, eliminating the need to store extensive historical updates centrally. It also uses an adaptive learning rate to ensure that model quality is preserved. By comparing it with retraining from scratch we will find that retraining is the most straightforward but computationally expensive approach to unlearning. While EG-FedUnlearn offers significant speed-up by directly reversing the influence of the target data instead of retraining the entire model from scratch, EG-FedUnlearn, in contrast, uses gradient-based negative updates that go directly after data influence, making unlearning more precise and effective.

Here, the pseudocode of EG-FedUnlearn represents where each node has a local model and dataset, and stores the gradient updates for each data point during training, the “train_local_model” function trains the local model and stores gradients; the ‘unlearn_data_point’ function reverses the influence of a specific data point by applying the negative of the stored gradient; and the “send_model_update” function sends the current model to the central server. The CentralServer Class aggregates model updates from the nodes, and the method “aggregate_model_updates” averages the models received from the nodes with respect to an adaptive learning rate as a mitigation against unlearning on the quality of the resultant model, while the “distribute_global_model” function distributes the aggregated model back to the nodes.

Thus, the Algorithm 1 thus aims to reduce extremely costly inverse computations by avoiding full model retraining. Influence reversal is selectively performed through the gradient tracking procedure. Summarized gradient tracking at a central server makes EG-FedUnlearn suitable for large-scale federated networks. The summarization process allows the server to efficiently manage the influence of multiple data points over time. Communication between nodes and the server is minimized by focusing only on the relevant gradient updates during unlearning, rather than retraining entire models. This reduces the size of transmitted data. The adaptive update strategy helps ensure that the model’s performance is not heavily impacted during the unlearning process, which is a common concern in federated learning environments.

Algorithm 1: Pseudocode for EG-FedUnlearn

Input: Local datasets {

D_{i}

} for each client

i

, Target data point

T

to be unlearned, Number of communication rounds

R

Output: Updated global model without influence from T
1. Initialize global model

M_{0}

2. For each communication round

r = 1, 2, . . ., R

do:
3. Each client

i

receives the global model

M_{r}

from the server
4. Each client checks if

T \in D_{i}

a. If

T

is in

D_{i}

:
i. Recalculate local updates by removing

T

from

D_{i}

ii. Train the local model

M_{i}

using

D_{i}

\

T

b. Else:
i. Train the local model

M_{i}

using

D_{i}

5. Send local model updates

Δ M_{i}

to the server
6. Server aggregates local updates to compute the new global model:

M_(r + 1)

= Aggregate (

Δ M_{i}

for all

i

)
7. Return the updated global model

M_{R}

3.2. Ontology-Integrated Federated Unlearning (OFU-Ontology) Algorithm

Ontology-Integration Federated Unlearning (OFU-Ontology) can perform unlearning requests efficiently under FL scenarios by adding an ontology module. Such an ontology can decide which data points are more important, depending on their relevance to an unlearning query, and it could only unlearn those data points to reduce the negative impacts of unlearning on model performance.

3.2.1. System Architecture

The overall architecture of the proposed federated unlearning system integrates an ontology module alongside traditional federated nodes and a central coordinating server. Each client locally holds a subset of data and participates in collaborative training. During the unlearning phase, clients may request the removal of specific data or features, and the central server orchestrates this removal using relevance information extracted from an ontology.

Unifying data semantics in the UDL module requires establishing an ontology to formally define the concepts, relationships, and class hierarchies of the domain (e.g., similarities between digits via their shapes in MNIST, or topic relationships among Reddit comments). Being offline, an expert can periodically update the ontology with annotations, while online integration allows for learning via client feedback logs. Hence, the model can evolve its definition of relevance with time. With such a mechanism developed, unlearning may remain consistent with current data semantics or regulatory interpretations.

The system architecture of OFU-Ontology, as shown in Figure 6, consists of three major components: federated nodes, central coordinating server, and the ontology module.

Figure 6. Ontology visualizations for evaluation datasets. (a) MIMIC-III dataset ontology with medical entities and relationships; (b) Purchase dataset ontology with object–category hierarchies; (c) UCI Adult dataset ontology capturing demographic and socioeconomic attributes; (d) Reddit Comment dataset ontology representing topics and sentiment; (e) CIFAR-10 dataset ontology with object–category hierarchies; and (f) KDD Cup 1999 dataset ontology modeling network events and attack types.

Federated nodes: Each federated node locally stores its dataset and maintains a copy of the model. Federated nodes train the model locally on data unique to them and communicate updates of their models to the central server. At each node, there is also storage of gradient information for each data point during training, later to be used for unlearning.
Ontology module: The ontology module describes inter-entity relationships and relevance hierarchies among different types of data entities. This module assigns relevance scores to data points on the basis of domain knowledge, with higher scores implying higher importance and lower scores suggesting that a data point can be prioritized for unlearning with the least possible effect on the model quality. The ontology module can either be embedded locally at each node, or it can be housed centrally and accessed on unlearning requests. The ontology model is used for representing relationships and hierarchies within the data and hence forms a basis for assigning relevance scores to data points. The ontology is built using the following:
- Entities represent key features or data elements in the domain. For instance, in healthcare, entities might include “symptom”, “diagnosis”, “treatment”, etc.
- Attributes are properties of entities that provide additional detail. For example, the “symptom” entity may have attributes such as severity, duration, and frequency.
- Relationships define the connections between entities. For instance, “symptom” might be related to “diagnosis” through a “leads to” relationship.
- Importance hierarchy indicates that the ontology includes a hierarchical structure that defines the importance of different entities and relationships.

The ontology is used to calculate a relevance score

R (d_{i})

for each data point

d_{i}

. This score helps determine which data points have a higher impact on model quality and should therefore be prioritized for retention or unlearning.

3.: Central coordinating server: The central coordinating server aggregates the updates received from the nodes and synchronizes with the global model. It also facilitates the federated unlearning by managing relevance-weighted updates and calibration training to minimize knowledge permeation from unlearned data.

Figure 6 shows the ontology visualizations designed for each evaluation dataset to illustrate how domain knowledge is structured to facilitate relevance-based unlearning. Figure 6a shows the MIMIC-III dataset ontology with medical entities such as diagnoses, symptoms, and treatment, and their relationships in support of semantic relevance scoring in healthcare data. Figure 6b depicts the Purchase dataset ontology, wherein products are placed into category taxonomies to prioritize transactional records for unlearning. Figure 6c depicts the Purchase dataset ontology, wherein products are placed into category taxonomies to prioritize transactional records for unlearning. Figure 6c represents the UCI Adult dataset ontology with demographic and socioeconomic attributes modeled like age, education, occupation, and income brackets. Figure 6d shows the Reddit Comment dataset ontology, mapping through topics and sentiment categories for context-aware unlearning of text data. Figure 6e gives the CIFAR-10 dataset ontology to define visual object categories and inter-class similarities to conserve vital features for image classification. Lastly, Figure 6f illustrates the KDD Cup 1999 dataset ontology, modeling network events, protocol types, and attack categories for precise unlearning in cybersecurity scenarios. These structured ontologies allow the OFU-Ontology framework to assign semantic relevance scores to data points to enable selective unlearning that balances the retention of important knowledge in the model with respect to privacy concerns.

3.2.2. Ontology Construction and Integration

Ontology construction and integration are critical components of the OFU-Ontology framework, enabling relevance-aware federated unlearning through structured domain knowledge. The ontology module serves as a domain-specific knowledge graph that represents entities, attributes, and semantic relationships. During unlearning, it assigns relevance scores to data samples, guiding the selective forgetting of low-impact information while preserving high-value knowledge that supports model utility. In this work, ontologies were manually designed based on expert knowledge, publicly available schema definitions, and dataset-specific semantics to ensure interpretability and alignment with real-world logic. Each dataset domain leverages a customized ontology structure reflecting its inherent relationships. In this study, all ontologies were static (fixed before training), but the design explicitly supports future dynamic updates to incorporate new data, semantic drift, or client feedback. Each dataset-specific ontology was designed modularly, enabling easy replacement or customization for other domains without altering the core unlearning algorithm.

For example, in healthcare-style data (e.g., UCI Adult dataset), the ontology models relationships between demographic attributes and health risk factors. A typical SPARQL query to retrieve high-risk attribute combinations might be

SELECT ?ageGroup ?occupation ?riskLevel

WHERE {

?person ex:hasAgeGroup ?ageGroup.

?person ex:hasOccupation ?occupation.

?person ex:hasRiskLevel ?riskLevel.

FILTER(?riskLevel = “High”)

}

This query helps identify and deprioritize neutral, less informative comments for unlearning. For network intrusion detection (e.g., KDDCup99 dataset), the ontology models relationships between attack types, protocols, and network services. A relevant SPARQL query might be

SELECT ?attackType ?service ?protocol

WHERE {

?event ex:hasAttackType ?attackType.

?event ex:usesService ?service.

?event usesProtocol ?protocol.

FILTER(?attackType != “DoS”)

}

These SPARQL queries are illustrative of how domain knowledge encoded in ontologies enables computing relevance scores

R (d_{i, j})

for each data sample:

R (d_{i, j})) = f_{o n t o l o g y} (d_{i, j}))

(5)

where

f_{o n t o l o g y}

is derived from ontology-based semantic rules. Low-relevance data points (low

R

) are targeted first during weighted gradient negation:

θ_{u n l e a r n e d}^{(t + 1)} = θ^{(t + 1)} + η \frac{n_{u}}{n} \sum_{j} R (d_{u, j}) \nabla L_{u, j} (θ^{(t)})

(6)

While our experiments use static ontologies—knowledge graphs defined before training and unlearning—the framework supports future dynamic or evolving ontologies. These could be adapted based on new data, client feedback, or automated schema induction, ensuring long-term relevance and robustness against semantic drift. Scalability is achieved through a modular design, allowing each application domain or dataset to define its own ontology independently of the core unlearning algorithm. This modularity enables broad applicability across fields such as healthcare, text analysis, network security, image recognition, and e-commerce, ensuring that the OFU-Ontology framework generalizes effectively to diverse federated learning scenarios.

3.2.3. Workflow of OFU-Ontology

1. Training Phase: During the training phase, an individual federated node n carries out local model training using gradient descent for every single data point

d_{i}

from its local dataset

D^{n}

. For its local data point observations, the node computes the gradient of the loss function

L

with respect to the model parameters

θ^{n}

:

Δ θ_{d_{i}}^{n} = \nabla_{θ^{n}} L (d_{i}, θ^{n})

(7)

Here,

Δ θ_{d_{i}}^{n}

is stored for potential use in the unlearning phase. The local model parameters are updated using gradient descent:

θ^{n} = θ^{n} - η \cdot Δ θ_{d_{i}}^{n}

(8)

where

θ^{n}

local model parameters at node

n

and

η

is the learning rate.

2. Ontology and Relevance Score Calculation: The ontology is first utilized during the relevance score calculation. Each node has access to the ontology module, which defines relationships and importance levels for different data entities. This phase is essential for determining how important each data point is in relation to the entire dataset.

The ontology function

f_{o n t o l o g y}

calculates the relevance score

R (d_{i})

for each data point

d_{i}

:

R (d_{i}) = f_{o n t o l o g y} (d_{i})

(9)

Here,

f_{o n t o l o g y}

uses the domain-specific ontology to assess relationships, attributes, and the importance hierarchy of the data point. The code editor uses a relevance score ranging from 0 to 1 to comprehend which points play a very vital role in the model’s performance and can be unlearned with little to no impact.

3. Selective Unlearning Phase: This selective unlearning phase begins by creating an unlearning request. Here, ontology plays an important role in determining how large an unlearning should be made to each respective data point’s influence. The node now checks the relevance score of the data point requested for unlearning: if

R (d_{i}) < R_{t h r e s h o l d}

, then data point d_i gets the priority. A lower relevance score implies that the data point has less impact on the model and therefore can be removed more easily.

To remove the influence of the data point, the node applies the negative gradient, weighted by its relevance score:

θ_{u p d a t e d}^{n} = θ^{n} - η \cdot (1 - R (d_{i})) \cdot Δ θ_{d_{i}}^{n}

(10)

where

θ^{n}

is the model parameters at node

n

,

η

is the learning rate,

Δ θ_{d_{i}}^{n}

is the gradient update for data point

d_{i}

, and

(1 - R (d_{i}))

ensures that the impact of unlearning is proportional to the relevance score. Data points with lower relevance scores have a larger adjustment, effectively removing their influence more aggressively.

4. Decoupling Knowledge Permeation: In another indirect way, the ontology does help mitigate knowledge permeation (i.e., the spread of influence from unlearnt data to other nodes via the aggregated global model). After unlearning, the node applies a decoupling update to further minimize residual effects

Δ θ_{d e c o u p l e d}^{n} = Δ θ_{d_{i}}^{n} \cdot (1 - R (d_{i}))

This ensures that the influence of unlearned data is not only minimized locally but also reduced when the global model is aggregated.

5. Global Aggregation Phase: Each node has finished its selective unlearning; the resulting models are sent to the central coordinating server. The central server then aggregates these model updates into a global model. The central server aggregates models updated at nodes as follows:

θ_{G} = \sum_{n - 1}^{N} \frac{∣ D^{n} ∣}{∣ D ∣} \cdot θ_{u p d a t e}^{n}

(11)

where

∣ D^{n} ∣

is the number of data points at node nnn and

∣ D ∣

is the total number of data points across all nodes. The ontology indirectly influences this phase by ensuring that only less relevant data points are aggressively unlearned, thereby reducing the negative impact on the aggregated model.

6. Calibration Training Phase: The central server intervenes if any unlearned data maintains any kind of impact on the model by performing a calibration. The calibration uses past versions of a model to finalize the final adjustments to the model.

M_{c a l i b r a t e d} = M + α \cdot (M_{h i s t o r i c} - M_{u n l e a r n e d})

(12)

where

M_{h i s t o r i c}

is the model before unlearning,

M_{u n l e a r n e d}

is the model immediately after unlearning, and

α

is calibration factor used to adjust the global model. During calibration, the full decoupling of the impact of unlearned data is effectuated to enhance robustness of the model against any unintended consequences of unlearning.

The ontology module within the OFU-Ontology Algorithm 2 constitutes the kernel of the relevance score calculation and the selective unlearning phase. It guarantees some intelligent prioritizing of data points to unlearn to enable efficient data removal while still maintaining model quality. Ontology-derived relevance scores help to inform decisions across the workflow, thus achieving a trade-off between data-privacy requests and model performance.

In brief, the gains of OFU-Ontology could be listed as follows:

By considering low-relevance data points as applying for unlearning, the computational overhead and communication overhead are decreased.
The ontology-based approach does not allow important data to be needlessly exposed or removed, thereby retaining federated learning’s privacy-preserving properties.
Calibration using historical updates mitigates knowledge permeation by ensuring that the unlearned data’s influence is correctly removed from all clients.
By allowing updates of the ontology module with changes in data relevance, the system could be made to adapt to changing privacy and regulatory requirements.

Algorithm 2: Pseudocode for OFU-Ontology

Input: Local datasets {

D_{i}

} for each client

i

, Target data point

T

to be unlearned, Number of communication rounds

R

Output: Updated global model without influence from T
1. Initialize global model

M_{0}

2. For each communication round

r = 1, 2, . . ., R

do:
3. Each client

i

receives the global model

M_{r}

from the server
4. Each client checks if

T \in D_{i}

a. If

T

is in

D_{i}

:
i. Recalculate local updates by removing

T

from

D_{i}

ii. Train the local model

M_{i}

using

D_{i}

\

T

b. Else:
i. Train the local model

M_{i}

using

D_{i}

5. Send local model updates

Δ M_{i}

to the server
6. Server aggregates local updates to compute the new global model:

M_(r + 1)

= Aggregate (

Δ M_{i}

for all

i

)
7. Return the updated global model

M_{R}

The OFU-Ontology pseudocode algorithm 2 describes the unlearning process in the context of federated learning using ontology-based relevance. First is local training at each federated node where models are trained separately on local data. After that, ontology integration is executed for data relevance evaluation while assigning a relevance score to each point of data depending on its contextual importance. Afterward, a gradient is computed, and the relevance scores are used to weight the gradients to highlight the importance of high-value data. Then it performs ontology-based relevance filtering to determine if unlearning needs to happen on lower importance data. If it is to be unlearned, by using selective gradient negation, the impact of this data on the model will be suppressed []. Then, gradient aggregation is performed at a central server and a global model update is generated. The updated global model is then sent back to the federated nodes for more training. This structured way guarantees that the system will have low-value data to unlearn while most importantly keeping valuable information intact, complying with both privacy regulations and additional company policies.

The OFU framework for combined federated learning and ontology-based unlearning is the end-to-end workflow depicted in Figure 7. Diverse input datasets (e.g., MNIST, CIFAR-10, Reddit, and Water Meters) are preprocessed according to their own data types. The ontology module models semantic relevancy between data domains to assist with relevance scoring of data points at each federated node. Low-relevance data get selectively unlearned using techniques of negation of gradients, while the central server aggregates the remaining gradients to update the global model. Unlike FedEraser, which requires storage of all gradients and calibration rounds, and VeriFi, which focuses solely on cryptographic proofs of deletion, our approach directly removes client influence while preserving model utility through semantic relevance scoring.

Figure 7. Detailed workflow of the ontology-enhanced federated unlearning (OFU) process, showing relevance scoring and weighted gradient negation.

4. Experimental Results

4.1. Datasets

The results of the experiments in this section sought to assess the proposed OFU-Ontology algorithm’s efficiency and effectiveness, compared with the baseline EG-FedUnlearn algorithm. The experiments were carried out on very diversified datasets representative of different data types, including image classification, tabular data, transactional data, health records, network intrusions, and social media comments. In this way, the datasets act as a fuller evaluation framework for the unlearning algorithms across several domains.

The detailed descriptions above in Table 1 highlight how each dataset contributes to evaluating the efficiency and effectiveness of EG-FedUnlearn and OFU-Ontology. For instance, the MNIST dataset contains simple image data for the baseline evaluation of unlearning efficiency, whereas as shown in Figure 8, the CIFAR-10 dataset is a more complex image classification to assess how ontology-driven relevance impacts unlearning and the UCI Adult census dataset is tabular data with mixed feature types to evaluate the effectiveness of ontology in preserving key features during unlearning. In contrast, the Purchase dataset contains transactional data used to test how well less impactful transactions can be unlearned while retaining critical information. MIMIC-III is complex healthcare data designed to assess the prioritization of sensitive clinical information during unlearning. The KDD Cup 1999 dataset involves network security data and is used to evaluate unlearning efficiency in real-time intrusion detection. The Reddit Comment dataset consists of text data for evaluating unlearning in content moderation and privacy-driven removal requests.

Table 1. Overview of the datasets used for evaluating EG-FedUnlearn and OFU-Ontology.

Figure 8. Samples of the CIFAR-10 dataset.

4.2. Evaluation Metrics

To thoroughly evaluate the performance of the proposed OFU-Ontology algorithm and the baseline EG-FedUnlearn algorithm, several evaluation metrics are used. The metrics include different aspects of unlearning efficiency and the tradeoff between resources and privacy. The metrics are elaborated on below:

Model accuracy (A) is evaluated before and after the unlearning process to assess how well the model retains its predictive power after removing specific data points [].
Unlearning efficiency ( $U_{e}$ ) is measured as the time required to remove the influence of a data point from the model [].

U_{e} = t_{e n d} - t_{s t a r t}

where

t_{s t a r t}

and

t_{e n d}

represent the start and end times of the unlearning process.

Residual influence ( $R_{i n f}$ ), in terms of the remaining influence of the unlearned data on the model, this is determined by comparing the model parameters before and after unlearning [,].

R_{i n f} = \frac{∣ ∣ θ_{b e f o r e} - θ_{a f t e r} ∣ ∣}{∣ ∣ θ_{b e f o r e} ∣ ∣}

where

θ_{b e f o r e}

and

θ_{a f t e r}

are the model parameters before and after unlearning, respectively.

Communication overhead ( $C_{o v e r h e a d}$ ) represents how much data is exchanged between the nodes and the server of the central during the process of unlearning [].

C_{o v e r h e a d} = \sum_{n = 1}^{N} S_{n}

where

S_{n}

represents the size of the data sent by node

n

, and

N

is the total number of nodes.

Convergence time ( $T_{c o n v e r g e n c e}$ ) is measured as the number of rounds required by the global model to restore its stability following unlearning [].

T_{c o n v e r g e n c e} = m i n {t : ∣ ∣ θ_{G}^{(t)} - θ_{G}^{(t - 1)} ∣ ∣ < ϵ}

where

θ_{G}^{(t)}

is the global model at round

t

, and

ϵ

is a small threshold indicating convergence.

Unlearning accuracy (UA) measures the precision with which the unlearning procedure has erased the influence of a particular datum. It is measured by assessing how the model output differs with and without the particular datum. Unlearning accuracy determines whether the unlearning mechanism has indeed reversed the model influence of the requested data points or otherwise negatively affected the remaining model [].
Computational cost (CC) concerns the computational requirement to unlearn; it comprises computation time and memory utilization [].

C C = t_{c p u} + m_{u s a g e}

where

t_{c p u}

represents the CPU time consumed, and

m_{u s a g e}

represents memory usage during the unlearning process. A lower computational cost indicates that the algorithm is efficient in terms of resource consumption.

4.3. Deletion Strategies

In machine unlearning, the deletion techniques specify which data points are chosen to be eliminated throughout the unlearning procedure. These deletions strategies are

(a): Random deletion strategy—A random portion of the training data is chosen using this method to be eliminated. The idea is to simulate a situation where a random user requests that data be deleted. This simple method evaluates the model’s ability to extract random patterns from the dataset. For example, randomly deleting images from various classes or arbitrary removal of a subset of numbers (e.g., 100 random photos) [].
(b): Class-based deletion strategy—This method aims to remove all data points that belong to a specific class (or category). This facilitates the evaluation of the model’s ability to retain knowledge of other classes despite forgetting entire datasets. It is useful for removing entire categories from the dataset, such all images of a particular item or all evaluations that express a particular emotion. Examples are removing all images of cars, all images of the number seven, and all reviews that are labelled as “positive” [].
(c): Feature-based deletion strategy—With such an approach, data points are identified instead of labels or classes. For instance, the data in the UCI Adult dataset may not contain any data points with certain demographic features (such as gender or level of education). This type of method is truly useful with tabular datasets (e.g., deleting data belonging to a certain demographic class) and can be used as a guard to comply with data privacy laws. Examples are to wipe away all those data items with the attribute “education” set to “Bachelors” or a value of “White” for the attribute “race” [].

4.4. Results

This section discusses the results obtained from comparing the two proposed methods, VeriFi, and FedEraser using seven datasets under three deletion strategies to test the efficiency of OFU-Ontology and EG-FedUnlearn; bold values indicate the best result among compared methods. SPARQL B1 in Appendix B shows the SPARQL query used to extract the relevance information from the ontology used for MNIST that represents handwritten digits, with concepts such as “Digit”, “Stroke Pattern”, and “Writing Style”. Relationships like “is similar to” (e.g., one is similar to seven) are used to capture common confusions, which is useful for relevance scoring during selective unlearning. This query selects digits that are complex and have similarities to other digits (e.g., “one” and “seven”), which helps in targeted unlearning without adversely affecting accuracy. Leveraging the relationships between similar digits, unlearning was conducted in a way that maintained high model accuracy.

Table A1 and Figure 9 show the performance for the MNIST dataset where OFU-Ontology achieved a maximum overall accuracy of 95% and efficiency toward unlearning of 97% for class-based deletion and therefore was much better than the other methods with residual influence values reaching 0.01 and convergence times of 0.78. EG-FedUnlearn was also competitive (with an accuracy of around 92%), but OFU-Ontology still outperformed it in removing influences with 98% unlearning accuracy given class-based deletion. Slightly more computational cost was incurred using OFU-Ontology compared to VeriFi; however, that cost was justified by performance relevant to the application and communication overhead.

Figure 9. Performance comparison of federated unlearning algorithms for the MNIST dataset.

The ontology used for CIFAR-10 is an image-based ontology that includes concepts such as “Object”, “Category”, “Color”, “Texture”, and “Context”. Relationships such as “is part of” (e.g., a wheel is part of a car) and “has color” (e.g., sky has the color blue) are used to enhance the relevance scoring during unlearning. SPARQL B2 shows the SPARQL query used to extract the relevance of specific images based on object type and color. This query identifies images that belong to important categories like “Vehicle” or “Animal” and assigns relevance scores accordingly to decide their unlearning priority. The use of relationships like “has color” and “is part of” ensured comprehensive relevance scoring, allowing for targeted unlearning with minimized residual influence.

Table A2 and Figure 10 show the performance for CIFAR-10 dataset across deletion strategies, where a maximum value of accuracy 0.93 was recorded, with unlearning efficiency recorded to be its highest of 0.95 for OFU-Ontology. The residual influence was brought down to 0.02, better than FedEraser and VeriFi’s 0.05–0.07. A slight decrease in communication overhead was observed to occur, in the range of 0.55–0.6, whereas the convergence time was also faster at 0.75 compared to 0.81 for FedEraser. The accuracy of unlearning stood at 0.94, guaranteeing efficient removal of data, and the computational cost was recorded at 0.67, balanced by its global good. EG-FedUnlearn was rather competitive, with accuracy fluctuating between 0.89 and 0.91 and a 0.92 unlearning efficiency.

Figure 10. Performance comparison of federated unlearning algorithms for the CIFAR-10 dataset.

The ontology for the UCI Adult dataset represents demographic information such as “Age”, “Occupation”, “Education”, “Income”, and “Marital Status”. Relationships like “is related to Income” and “has Education Level” help assign relevance scores for selective unlearning. SPARQL B3 shows the SPARQL query that is used to select individuals with specific income and education levels to focus the unlearning process on records that may be deemed sensitive to prioritize the unlearning of individuals based on sensitive attributes such as income level and marital status.

The ontology for the Purchase dataset includes concepts such as “Customer”, “Product Category”, “Purchase Frequency”, and “Spending Pattern”. Relationships like “has purchased” and “is a frequent buyer of” are used to assign relevance scores for products and customers. SPARQL B4 shows the SPARQL query that used to focus on customers who are frequent buyers of specific product categories, allowing targeted unlearning to identify high-relevance customers based on their spending patterns and product categories.

Across the UCI Adult and Purchase datasets (Table A3 and Table A4, Figure 11 and Figure 12), OFU-Ontology consistently achieved better end results with the highest model accuracy (0.93 for UCI Adult and 0.94 for Purchase) and unlearning efficiency (0.95 for both datasets). EG-FedUnlearn came in second, with accuracies of 0.90 and 0.92, respectively, whereas VeriFi scored 0.88 on UCI Adult. OFU-Ontology went on to reduce the residual influence to 0.02 against VeriFi’s 0.06 and enjoyed lower communication overheads (0.54 for UCI Adult and 0.56 for Purchase) than FedEraser (0.62–0.63). Moreover, OFU-Ontology converged much faster (0.73–0.75), with an unlearning accuracy topping out at 0.97, demonstrating that it successfully eliminated the influence of the data at hand. Although OFU-Ontology is somewhat computationally expensive (0.60–0.64), such a computationally expensive protocol is well warranted, owing to the fact that it outperforms every other protocol on almost every metric.

Figure 11. Performance comparison of federated unlearning algorithms for the UCI Adult dataset.

Figure 12. Performance comparison of federated unlearning algorithms for the Purchase dataset.

The ontology for MIMIC-III represents medical data, including “Patient”, “Diagnosis”, “Treatment”, and “Medical History”. Relationships such as “has Diagnosis” and “is associated with” help prioritize unlearning based on patient conditions. SPARQL B5 shows the SPARQL query that is used to select patients with specific diagnoses (e.g., cardiac or respiratory) for unlearning to prioritize patients for unlearning based on medical conditions and treatments. Ontology-based relevance ensured that patients with sensitive medical conditions were prioritized for unlearning, protecting privacy.

For the MIMIC-III, KDD Cup 1999, and Reddit Comments datasets (Table A5, Table A6 and Table A7, Figure 13, Figure 14 and Figure 15), OFU-Ontology was found to be better than any other state-of-the-art method across the various evaluation metrics. Model accuracies of 0.89, 0.92, and 0.94 were attained by OFU-Ontology for MIMIC-III, KDD Cup 1999, and Reddit Comments, respectively, while unlearning efficiency scores were 0.93, 0.95, and 0.96, respectively. Residual influences were reduced to 0.01–0.04, far below those recorded by FedEraser and VeriFi, where the residual influence lay between 0.04 and 0.05. Communication overhead with OFU-Ontology was always low (0.50–0.64), whereas in the case of FedEraser, it hovered around a much higher 0.6–0.75. Concerning convergence time, however, OFU-Ontology was the fastest with 0.72–0.74, and unlearning accuracy was the highest, fluctuating between 0.95 and 0.98, thereby accomplishing extremely precise data removal. The computational cost of OFU-Ontology, given as 0.61–0.68, is something that can be paid in light of its performance in the accuracy, efficiency, and robustness metrics. Similarly performing EG-FedUnlearn slightly lagged behind in accuracy and residual influence.

Figure 13. Performance comparison of federated unlearning algorithms for the MIMIC-III dataset.

Figure 14. Performance comparison of federated unlearning algorithms for the KDD Cup 1999 dataset.

Figure 15. Performance comparison of federated unlearning algorithms for the Reddit Comments dataset.

The ontology for the KDD Cup 1999 dataset includes network activities such as “User”, “NetworkEvent”, “Attack Type”, and “Severity”. Relationships like “is anomalous” and “is part of the network” help identify and score the relevance for security-related unlearning. SPARQL B6 shows the SPARQL query that is used to focus on network events with high severity and anomalous characteristics for targeted unlearning to identify and prioritize unlearning of network events that are categorized as anomalies. The ontology helps effectively target anomalous events for unlearning while preserving legitimate network activities and ensures that all traces of targeted network anomalies are removed, significantly reducing residual influence.

The ontology for Reddit Comments includes concepts like “User”, “Comment”, “Sentiment”, and “Topic”. Relationships like “has Sentiment” and “belongs to Topic” are used to determine the relevance of specific comments for unlearning. SPARQL B7 shows the SPARQL query that is used to select comments with negative sentiment in sensitive topics for unlearning to prioritize comments for unlearning based on sentiment and topics deemed sensitive. Ontology-driven relevance allowed the effective unlearning of comments related to sensitive topics, which helped mitigate model bias.

Figure 16 presents a radar chart that analyzes the results for the two proposed methods, while Figure 17 offers the analysis of the four federated unlearning algorithms EG-FedUnlearn, OFU-Ontology, FedEraser, and VeriFi-across seven criteria spread over various metrics: model accuracy, unlearning efficiency, residual influence, communication overhead, convergence time, unlearning accuracy, and computational cost. In general, OFU-Ontology is the most competent algorithm since, in terms of the metrics, it has better model accuracy, unlearning efficiency, residual influence, and unlearning accuracy compared to the other competitors. Efficiency is calculated by evaluating the reduction in unlearning time and memory usage relative to accuracy loss after deletion. Through the leverage of ontology-based relevance scoring, OFU-Ontology selectively unlearns data most effectively without compromising the overall quality of the model. EG-FedUnlearn has a fair share of the attention, especially in terms of convergence time and unlearning efficiency. It maintains a healthy balance between performance and computational cost, as its way of recalculating local updates after the removal of targeted data makes the unlearning highly efficient. FedEraser is mediocre in its performance; however, the high cost of communication and computation makes it less viable for scalable implementations, especially when frequent unlearning is required. Being the simplest of the four, VeriFi has the poorest results in almost all metrics related to unlearning, as it does not actually feature specialized mechanisms for effective data removal. Its key strengths lie in a lower computational cost and reduced communication overhead, making it suitable for federated learning applications where unlearning is not a primary concern.

Figure 16. Efficiency comparison between EG-FedUnlearn and OFU-Ontology.

Figure 17. Efficiency comparison between the four federated unlearning algorithms.

Figure 18 shows how different text features such as the sentiment score, keyword frequency, or topic coherence affect the accuracy of the model with respect to several federated unlearning approaches on Reddit Comments dataset. It indicates how different facets of text data are influencing the model over the methodological change of unlearning. Figure 19 presents the changes in the model’s accuracy during federated training and unlearning. Both EG-FedUnlearn and OFU-Ontology have sharp improvements in accuracy with communication rounds, with OFU-Ontology having the highest final accuracy. This depicts the advantages of selective forgetting relevance for maintaining useful knowledge. However, FedEraser and VeriFi, while adhering to unlearning, are slower or a bit stagnant to improve accuracy due to more aggressive methods or they simply ignored the semantic context. This, therefore, highlights the proposed approaches that maintain utility of the model while complying with unlearning.

Figure 18. Feature contribution heatmap for the Reddit Comments dataset.

Figure 19. Model accuracy over communication rounds.

In conclusion, OFU-Ontology is the most promising approach for federated unlearning due to its ability to maintain high model quality while effectively handling unlearning. EG-FedUnlearn is also a strong contender, especially for scenarios where a balance between performance and resource consumption is desired. Meanwhile, FedEraser and VeriFi are less efficient in the context of targeted unlearning, with VeriFi being more appropriate for cases where simplicity and lower costs are prioritized over sophisticated unlearning capabilities.

5. Conclusions

OFU-Ontology is a privacy-preserving federated unlearning framework that espouses semantic knowledge via ontologies for unlearning operations in greater exactitude and efficacy. Via ontology-based relevance scoring, the proposed framework selects data deletion with precision to maximize the impact on more critical data and with minimal impact on the model performance from all other less relevant data. Extensive experiments on seven datasets of diverse natures—CIFAR-10, MNIST, MIMIC-III, UCI Adult, Reddit Comments, Purchase and KDD Cup 1999—with three deletion strategies, random, class-specific, and adversarial, exhibited OFU-Ontology’s improvements over base methods. For instance, on CIFAR-10, it maintained 92.4% accuracy post-unlearning against 89.1% for EG-FedUnlearn, with a much lower memory overhead of 18%. On the UCI Adult dataset, a reduction in unlearning time by 23%, while down only 0.9% accuracy, demonstrated that the method could strike a promising balance between performance and efficiency in practice. Across datasets, the framework would reinforce better suppression of residual influences as a sign of its resistance to knowledge permeation. Additionally, through ontology integration, the system could smartly distinguish the importance of data points based on context as ingress for relevance weighting of gradient negation. This semantic-based approach to unlearning then realized much better efficiency scores, that is, performance preserved over the cost of unlearning (in time and memory). Another outline for explainability came along via synergistic visualization with SHAP to witness how unlearning decisions correspond to feature importance hierarchies drawn from domain ontologies.

In summary, OFU-Ontology goes beyond facilitating federated unlearning via semantic prioritization to provide a scalable, interpretable, and GDPR-friendly solution as well. Other interesting future work may draw from using dynamic, evolving ontologies; coupling with edge knowledge distillation; and real-time unlearning in streaming environments.

Author Contributions

Conceptualization, N.E.G., and E.A.M.; methodology, N.E.G.; software, E.A.M.; validation, N.E.G.; formal analysis, E.A.M.; investigation, N.E.G.; resources, E.A.M.; data curation, E.A.M.; writing—original draft preparation, N.E.G.; writing—review and editing, N.E.G. and E.A.M.; visualization, E.A.M.; supervision, N.E.G.; project administration, N.E.G.; funding acquisition, N.E.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by funding from Prince Sattam bin Abdulaziz University. project number PSAU/2024/01/29889.

Data Availability Statement

The data presented in this research is available on request from the corresponding author. The data is publicly available.

Acknowledgments

The authors extend their appreciation to Prince Sattam bin Abdulaziz University for funding this research work through project number PSAU/2024/01/29889.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Performance comparison of federated unlearning algorithms using three deletion strategies for the MNIST dataset.

Deletion Strategy	Algorithm	Model Accuracy	Unlearning Efficiency	Residual Influence	Communication Overhead	Convergence Time	Unlearning Accuracy	Computational Cost
Random Deletion	FedEraser	0.88	0.92	0.03	0.5	0.82	0.9	0.65
	EG-FedUnlearn	0.91	0.93	0.02	0.45	0.80	0.93	0.6
	OFU-Ontology	0.94	0.95	0.01	0.4	0.78	0.95	0.58
	VeriFi	0.85	0.87	0.04	0.52	0.83	0.89	0.66
Class-based Deletion	FedEraser	0.89	0.93	0.03	0.48	0.81	0.94	0.63
	EG-FedUnlearn	0.92	0.95	0.02	0.42	0.78	0.96	0.6
	OFU-Ontology	0.95	0.97	0.01	0.38	0.75	0.98	0.55
	VeriFi	0.86	0.88	0.04	0.50	0.82	0.91	0.68
Feature-based Deletion	FedEraser	0.87	0.91	0.04	0.47	0.80	0.92	0.64
	EG-FedUnlearn	0.91	0.94	0.03	0.43	0.77	0.95	0.61
	OFU-Ontology	0.94	0.96	0.02	0.39	0.74	0.97	0.57
	VeriFi	0.84	0.88	0.05	0.51	0.81	0.90	0.70

Table A2. Performance comparison of federated unlearning algorithms using three deletion strategies for the CIFAR-10 dataset.

Deletion Strategy	Algorithm	Model Accuracy	Unlearning Efficiency	Residual Influence	Communication Overhead	Convergence Time	Unlearning Accuracy	Computational Cost
Random Deletion	FedEraser	0.85	0.9	0.05	0.6	0.8	0.88	0.7
	EG-FedUnlearn	0.89	0.91	0.04	0.55	0.78	0.9	0.68
	OFU-Ontology	0.92	0.92	0.03	0.52	0.75	0.93	0.65
	VeriFi	0.81	0.85	0.06	0.65	0.82	0.87	0.72
Class-based Deletion	FedEraser	0.87	0.91	0.04	0.58	0.79	0.92	0.68
	EG-FedUnlearn	0.91	0.93	0.03	0.53	0.76	0.94	0.65
	OFU-Ontology	0.93	0.94	0.02	0.50	0.74	0.96	0.6
	VeriFi	0.83	0.87	0.05	0.62	0.80	0.9	0.7
Feature-based Deletion	FedEraser	0.86	0.90	0.05	0.57	0.77	0.90	0.70
	EG-FedUnlearn	0.90	0.92	0.04	0.54	0.75	0.92	0.66
	OFU-Ontology	0.93	0.93	0.03	0.49	0.72	0.95	0.62
	VeriFi	0.82	0.86	0.06	0.64	0.79	0.88	0.75

Table A3. Performance comparison of federated unlearning algorithms using three deletion strategies for the UCI Adult dataset.

Deletion Strategy	Algorithm	Model Accuracy	Unlearning Efficiency	Residual Influence	Communication Overhead	Convergence Time	Unlearning Accuracy	Computational Cost
Random Deletion	FedEraser	0.80	0.85	0.07	0.7	0.78	0.87	0.75
	EG-FedUnlearn	0.84	0.88	0.05	0.65	0.77	0.89	0.7
	OFU-Ontology	0.88	0.90	0.04	0.63	0.74	0.92	0.68
	VeriFi	0.75	0.82	0.08	0.73	0.79	0.84	0.76
Class-based Deletion	FedEraser	0.82	0.89	0.06	0.66	0.77	0.91	0.69
	EG-FedUnlearn	0.86	0.92	0.04	0.61	0.75	0.93	0.65
	OFU-Ontology	0.90	0.94	0.03	0.58	0.73	0.95	0.62
	VeriFi	0.78	0.86	0.07	0.68	0.80	0.88	0.71
Feature-based Deletion	FedEraser	0.81	0.88	0.06	0.67	0.76	0.90	0.72
	EG-FedUnlearn	0.85	0.91	0.04	0.63	0.74	0.93	0.68
	OFU-Ontology	0.89	0.93	0.03	0.61	0.71	0.96	0.65
	VeriFi	0.77	0.86	0.07	0.69	0.79	0.89	0.73

Table A4. Performance comparison of federated unlearning algorithms using three deletion strategies for the Purchase dataset.

Deletion Strategy	Algorithm	Model Accuracy	Unlearning Efficiency	Residual Influence	Communication Overhead	Convergence Time	Unlearning Accuracy	Computational Cost
Random Deletion	FedEraser	0.83	0.89	0.06	0.65	0.81	0.88	0.72
	EG-FedUnlearn	0.87	0.91	0.04	0.6	0.78	0.91	0.69
	OFU-Ontology	0.91	0.92	0.03	0.58	0.75	0.94	0.67
	VeriFi	0.79	0.85	0.07	0.68	0.80	0.86	0.73
Class-based Deletion	FedEraser	0.85	0.90	0.05	0.62	0.80	0.92	0.67
	EG-FedUnlearn	0.89	0.92	0.03	0.57	0.78	0.94	0.64
	OFU-Ontology	0.93	0.95	0.02	0.54	0.75	0.97	0.6
	VeriFi	0.81	0.87	0.06	0.64	0.82	0.89	0.69
Feature-based Deletion	FedEraser	0.83	0.89	0.05	0.63	0.78	0.91	0.69
	EG-FedUnlearn	0.87	0.92	0.03	0.59	0.75	0.94	0.67
	OFU-Ontology	0.91	0.94	0.02	0.56	0.73	0.97	0.64
	VeriFi	0.79	0.87	0.06	0.66	0.80	0.90	0.71

Table A5. Performance comparison of federated unlearning algorithms using three deletion strategies for the MIMIC-III dataset.

Deletion Strategy	Algorithm	Model Accuracy	Unlearning Efficiency	Residual Influence	Communication Overhead	Convergence Time	Unlearning Accuracy	Computational Cost
Random Deletion	FedEraser	0.78	0.83	0.08	0.75	0.79	0.84	0.80
	EG-FedUnlearn	0.82	0.86	0.06	0.72	0.76	0.87	0.78
	OFU-Ontology	0.87	0.88	0.05	0.70	0.73	0.90	0.76
	VeriFi	0.73	0.81	0.09	0.77	0.82	0.82	0.79
Class-based Deletion	FedEraser	0.80	0.88	0.07	0.70	0.78	0.90	0.74
	EG-FedUnlearn	0.84	0.91	0.05	0.67	0.76	0.92	0.71
	OFU-Ontology	0.89	0.93	0.04	0.63	0.73	0.95	0.68
	VeriFi	0.76	0.85	0.08	0.71	0.79	0.87	0.72
Feature-based Deletion	FedEraser	0.79	0.87	0.07	0.71	0.77	0.89	0.74
	EG-FedUnlearn	0.83	0.90	0.05	0.68	0.75	0.92	0.71
	OFU-Ontology	0.88	0.92	0.04	0.64	0.72	0.95	0.68
	VeriFi	0.74	0.84	0.08	0.73	0.79	0.87	0.72

Table A6. Performance comparison of federated unlearning algorithms using three deletion strategies for the KDD Cup 1999 dataset.

Deletion Strategy	Algorithm	Model Accuracy	Unlearning Efficiency	Residual Influence	Communication Overhead	Convergence Time	Unlearning Accuracy	Computational Cost
Random Deletion	FedEraser	0.82	0.88	0.05	0.6	0.8	0.89	0.73
	EG-FedUnlearn	0.86	0.90	0.04	0.58	0.77	0.91	0.71
	OFU-Ontology	0.90	0.91	0.03	0.55	0.75	0.93	0.7
	VeriFi	0.78	0.84	0.06	0.65	0.81	0.87	0.74
Class-based Deletion	FedEraser	0.84	0.91	0.04	0.61	0.82	0.91	0.69
	EG-FedUnlearn	0.88	0.93	0.03	0.58	0.79	0.94	0.66
	OFU-Ontology	0.92	0.95	0.02	0.56	0.76	0.97	0.64
	VeriFi	0.80	0.87	0.05	0.66	0.83	0.89	0.71
Feature-based Deletion	FedEraser	0.83	0.90	0.05	0.62	0.79	0.91	0.70
	EG-FedUnlearn	0.87	0.92	0.04	0.59	0.76	0.93	0.68
	OFU-Ontology	0.92	0.95	0.03	0.56	0.75	0.96	0.65
	VeriFi	0.80	0.87	0.06	0.66	0.81	0.89	0.73

Table A7. Performance comparison of federated unlearning algorithms using three deletion strategies for the Reddit Comments dataset.

Deletion Strategy	Algorithm	Model Accuracy	Unlearning Efficiency	Residual Influence	Communication Overhead	Convergence Time	Unlearning Accuracy	Computational Cost
Random Deletion	FedEraser	0.84	0.91	0.04	0.55	0.81	0.90	0.71
	EG-FedUnlearn	0.88	0.92	0.03	0.52	0.78	0.92	0.69
	OFU-Ontology	0.92	0.94	0.02	0.50	0.75	0.95	0.67
	VeriFi	0.81	0.87	0.05	0.60	0.83	0.88	0.72
Class-based Deletion	FedEraser	0.86	0.92	0.03	0.54	0.80	0.93	0.68
	EG-FedUnlearn	0.90	0.94	0.02	0.52	0.77	0.95	0.65
	OFU-Ontology	0.94	0.96	0.01	0.50	0.74	0.98	0.61
	VeriFi	0.83	0.89	0.04	0.59	0.81	0.91	0.70
Feature-based Deletion	FedEraser	0.85	0.91	0.04	0.55	0.78	0.92	0.69
	EG-FedUnlearn	0.89	0.93	0.03	0.52	0.75	0.95	0.66
	OFU-Ontology	0.93	0.95	0.02	0.50	0.72	0.97	0.63
	VeriFi	0.82	0.88	0.05	0.60	0.80	0.90	0.70

Appendix B

SPARQL B1: Query for MNIST Ontology

SELECT ?digit ?similarityScore
WHERE {
?digit rdf:type :Digit.
?digit :isSimilarTo ?similarDigit.
?digit :hasComplexityScore ?complexityScore.
FILTER(?complexityScore > 0.5)
}

SPARQL B2: Query for CIFAR-10 Ontology

SELECT ?image ?category ?relevanceScore
WHERE {
?image rdf:type :Image.
?image :hasCategory ?category.
?category :hasRelevanceScore ?relevanceScore.
FILTER(?category IN (:Vehicle, :Animal))
}

SPARQL B3: Query for UCI Adult Ontology

SELECT ?individual ?incomeLevel ?educationLevel
WHERE {
?individual rdf:type :Person.
?individual :hasIncomeLevel ?incomeLevel.
?individual :hasEducationLevel ?educationLevel.
FILTER(?incomeLevel = “High” && ?educationLevel = “Bachelors”)
}

SPARQL B4: Query for Purchase Ontology

SELECT ?customer ?productCategory ?purchaseFrequency
WHERE {
?customer rdf:type :Customer.
?customer :hasPurchased ?productCategory.
?customer :purchaseFrequency ?purchaseFrequency.
FILTER(?purchaseFrequency > 5)
}

SPARQL B5: Query for MIMIC-III Ontology

SELECT ?patient ?diagnosis ?treatment
WHERE {
?patient rdf:type :Patient.
?patient :hasDiagnosis ?diagnosis.
?patient :receivedTreatment ?treatment.
FILTER(?diagnosis IN (:Cardiac, :Respiratory))
}

SPARQL B6: Query for KDD Cup 1999 Ontology

SELECT ?event ?attackType ?severity
WHERE {
?event rdf:type :NetworkEvent.
?event :isAnomalous ?attackType.
?event :hasSeverity ?severity.
FILTER(?severity > 0.7)
}

SPARQL B7: Query for Reddit Comments Ontology

SELECT ?comment ?user ?sentiment
WHERE {
?comment rdf:type :Comment.
?comment :hasSentiment ?sentiment.
?comment :belongsToTopic ?topic.
FILTER(?sentiment = “Negative” && ?topic IN (:Politics, :Religion))
}

References

General Data Protection Regulation (GDPR), 2018.
California Consumer Privacy Act (CCPA), California Legislative Information. 2018. Available online: https://leginfo.legislature.ca.gov (accessed on 25 June 2025).
Song, M.; Wang, Z.; Zhang, Z.; Song, Y.; Wang, Q.; Ren, J.; Qi, H. Analyzing User-Level Privacy Attack Against Federated Learning. IEEE J. Sel. Areas Commun. 2020, 38, 2430–2444. [Google Scholar] [CrossRef]
McMahan, H.B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), Fort Lauderdale, FL, USA, 20–22 April 2017; Google, Inc.: Seattle, WA, USA, 2017. [Google Scholar]
Ginart, A.A.; Guan, M.Y.; Valiant, G.; Zou, J. Making AI Forget You: Data Deletion in Machine Learning. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
Liu, G.; Ma, X.; Yang, Y.; Wang, C.; Liu, J. FedEraser: Enabling Efficient Client-Level Data Removal from Federated Learning Models. In Proceedings of the 2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS), Tokyo, Japan, 25–28 June 2021; pp. 1–10. [Google Scholar] [CrossRef]
Wu, L.; Guo, S.; Wang, J.; Hong, Z.; Zhang, J.; Ding, Y. Federated Unlearning: Guarantee the Right of Clients to Forget. IEEE Netw. 2022, 36, 129–135. [Google Scholar] [CrossRef]
Xie, C.; Huang, K.; Chen, P.-Y.; Li, B. DBA: Distributed Backdoor Attacks Against Federated Learning. In Proceedings of the International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
Liu, G.; Ma, X.; Yang, Y.; Wang, C.; Liu, J. Federated Unlearning. arXiv 2021. [Google Scholar] [CrossRef]
Guarino, N.; Oberle, D.; Staab, S. What Is an Ontology? In Handbook on Ontologies, 2nd ed.; Staab, S., Studer, R., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; pp. 1–17. [Google Scholar] [CrossRef]
Zhu, N.; Chen, B.; Wang, S.; Teng, D.; He, J. Ontology-Based Approach for the Measurement of Privacy Disclosure. Inf. Syst. Front. 2022, 24, 1689–1707. [Google Scholar] [CrossRef]
Kairouz, P.; McMahan; Brendan, H.; Brendan, A. Advances and Open Problems in Federated Learning; Now Foundations and Trends: Norwell, MA, USA, 2021; Volume 14, pp. 1–210. [Google Scholar]
Truex, S.; Baracaldo, N.; Anwar, A.; Steinke, T.; Ludwig, H.; Zhang, R.; Zhou, Y. A Hybrid Approach to Privacy-Preserving Federated Learning. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, London, UK, 11–15 November 2019; Volume 42, pp. 356–357. [Google Scholar] [CrossRef]
Liu, Z.; Jiang, Y.; Shen, J.; Peng, M.; Lam, K.-Y.; Yuan, X.; Liu, X. A Survey on Federated Unlearning: Challenges, Methods, and Future Directions. ACM Comput. Surv. 2024, 57, 2. [Google Scholar] [CrossRef]
Gruber, T.R. A translation approach to portable ontology specifications. Knowl. Acquis. 1993, 5, 199–220. [Google Scholar] [CrossRef]
Tarun, A.K.; Chundawat, V.S.; Mandal, M.; Kankanhalli, M. Fast Yet Effective Machine Unlearning. IEEE Trans. Neural Netw. Learn. Syst. 2021, 35, 13046–13055. [Google Scholar] [CrossRef]
LeCun, Y.; Cortes, C.; Burges, C.J.C. MNIST Handwritten Digit Database. AT&T Labs. 2010. Available online: http://yann.lecun.com/exdb/mnist/ (accessed on 12 November 2024).
Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images. 2009. Available online: https://www.cs.toronto.edu/~kriz/cifar.html (accessed on 28 September 2024).
Dua, D.; Graff, C. UCI Machine Learning Repository: Adult Data Set; University of California, Irvine: Irvine, CA, USA, 2019. [Google Scholar]
Yeh, I.C.; Yang, K.J.; Ting, T.M. Knowledge discovery on RFM model using Bernoulli sequence. Expert. Syst. Appl. 2009, 36, 5866–5871. [Google Scholar] [CrossRef]
Johnson, A.E.W.; Pollard, T.J.; Shen, L.; Lehman, L.-W.H.; Feng, M.; Ghassemi, M.; Moody, B.; Szolovits, P.; Celi, L.A.; Mark, R.G. MIMIC-III, a freely accessible critical care database. Sci. Data 2016, 3, 160035. [Google Scholar] [CrossRef]
Stolfo, S.J.; Fan, W.; Lee, W.; Prodromidis, A.L.; Chan, P.K. Cost-based modeling for fraud and intrusion detection: Results from the JAM project. In Proceedings of the DARPA Information Survivability Conference and Exposition, DISCEX’00, Hilton Head, SC, USA, 25–27 January 2000; Volume 2, pp. 130–144. [Google Scholar] [CrossRef]
Baumgartner, J.; Zannettou, S.; Keegan, B.; Squire, M.; Blackburn, J. The Pushshift Reddit Dataset. In Proceedings of the Fourteenth International AAAI Conference on Web and Social Media, Atlanta, GA, USA, 8–11 June 2019; The Association for the Advancement of Artificial Intelligence: Washington, DC, USA, 2020; Volume 14, pp. 830–839. [Google Scholar]
Aldaghri, N.; Mahdavifar, H.; Beirami, A. Coded Machine Unlearning. IEEE Access 2021, 9, 88137–88150. [Google Scholar] [CrossRef]
Sekhari, A.; Acharya, J.; Kamath, G.; Suresh, A.T. Remember what you want to forget: Algorithms for machine unlearning. Adv. Neural Inf. Process. Syst. 2021, 34, 18075–18086. [Google Scholar]
Chundawat, V.S.; Tarun, A.K.; Mandal, M.; Kankanhalli, M. Zero-Shot Machine Unlearning. IEEE Trans. Inf. Forensics Secur. 2022, 18, 2345–2354. [Google Scholar] [CrossRef]
Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. On the Convergence of Federated Optimization in Heterogeneous Networks. arXiv 2018. [Google Scholar] [CrossRef]
Gu, H.; Zhu, G.; Zhang, J.; Zhao, X.; Han, Y.; Fan, L.; Yang, Q. Unlearning during Learning: An Efficient Federated Machine Unlearning Method. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI-24), Jeju, Republic of Korea, 3–9 August 2024; pp. 4035–4043. Available online: https://gdpr-info.eu/art-17-gdpr/ (accessed on 25 June 2025).
Bourtoule, L.; Chandrasekaran, V.; Choquette-Choo, C.A.; Jia, H.; Travers, A.; Zhang, B.; Lie, D.; Papernot, N. Machine unlearning. In Proceedings of the IEEE Symposium on Security and Privacy, San Francisco, CA, USA, 24–27 May 2021; pp. 141–159. [Google Scholar] [CrossRef]

Figure 1. Overview of machine unlearning.

Figure 2. Overview of federated unlearning.

Figure 3. Ontology-enhanced federated unlearning motivation.

Figure 4. Federated unlearning workflow.

Figure 5. Flowchart of the EG-FedUnlearn methodology.

Figure 6. Ontology visualizations for evaluation datasets. (a) MIMIC-III dataset ontology with medical entities and relationships; (b) Purchase dataset ontology with object–category hierarchies; (c) UCI Adult dataset ontology capturing demographic and socioeconomic attributes; (d) Reddit Comment dataset ontology representing topics and sentiment; (e) CIFAR-10 dataset ontology with object–category hierarchies; and (f) KDD Cup 1999 dataset ontology modeling network events and attack types.

Figure 7. Detailed workflow of the ontology-enhanced federated unlearning (OFU) process, showing relevance scoring and weighted gradient negation.

Figure 8. Samples of the CIFAR-10 dataset.

Figure 9. Performance comparison of federated unlearning algorithms for the MNIST dataset.

Figure 10. Performance comparison of federated unlearning algorithms for the CIFAR-10 dataset.

Figure 11. Performance comparison of federated unlearning algorithms for the UCI Adult dataset.

Figure 12. Performance comparison of federated unlearning algorithms for the Purchase dataset.

Figure 13. Performance comparison of federated unlearning algorithms for the MIMIC-III dataset.

Figure 14. Performance comparison of federated unlearning algorithms for the KDD Cup 1999 dataset.

Figure 15. Performance comparison of federated unlearning algorithms for the Reddit Comments dataset.

Figure 16. Efficiency comparison between EG-FedUnlearn and OFU-Ontology.

Figure 17. Efficiency comparison between the four federated unlearning algorithms.

Figure 18. Feature contribution heatmap for the Reddit Comments dataset.

Figure 19. Model accuracy over communication rounds.

Table 1. Overview of the datasets used for evaluating EG-FedUnlearn and OFU-Ontology.

Dataset	Type	Number of Samples	Features	Dataset Size	Use Case	Relevance for Unlearning
MNIST []	Image Classification	70,000	28 × 28 grayscale images (784 features)	~50 MB	Used to test image classification algorithms. It is simple, well-balanced, and often used as a benchmark for machine learning and federated learning techniques.	The simplicity of the dataset makes it suitable for testing the efficiency of both EG-FedUnlearn and OFU-Ontology. It helps illustrate the basic performance and computational overhead of using ontology integration in a low-complexity setting.
CIFAR-10 []	Image Classification	60,000	32 × 32 color images (3072 features)	~163 MB	CIFAR-10 is more complex compared to MNIST, with color images and multiple categories.	CIFAR-10 is ideal for testing the efficiency of ontology-based unlearning. Ontology can help prioritize images based on their impact on classification accuracy.
UCI Adult Census []	Tabular Data	48,842	14 features (age, education, etc.)	~4 MB	The dataset is typically used for income classification tasks and serves as a benchmark for machine learning algorithms dealing with tabular data.	This dataset allows us to evaluate how both algorithms handle unlearning across different data types, including numerical and categorical features. Ontology can prioritize the unlearning of sensitive features (e.g., gender and race).
Purchase []	Transactional Data	197,324	Product ID, category, store, transaction	~20 MB	Purchase data is often used in recommendation systems and market basket analysis to derive customer buying patterns.	The dataset is highly suitable for evaluating the ability of unlearning algorithms to handle transactional data. OFU-Ontology can be used to prioritize unlearning less significant purchases, while EG-FedUnlearn can be used for a more generalized unlearning approach.
MIMIC-III []	Medical Diagnosis	~58,000 hospital admissions	Patient demographics, vitals, medications	~60 GB	MIMIC-III is widely used for predictive modeling in healthcare, such as predicting mortality, length of stay, and treatment outcomes.	Evaluating unlearning in this context helps to assess how well OFU-Ontology can prioritize unlearning of less critical clinical information while retaining essential features that significantly affect the model’s predictions.
KDD Cup 1999 []	Network Security Data	4,898,431	41 features (protocol type, service, etc.)	~743 MB	The dataset is used for network intrusion detection tasks to classify network traffic as normal or malicious. It has a combination of continuous and categorical features.	This dataset allows us to assess how efficiently the algorithms can handle unlearning while ensuring that the model retains the ability to accurately detect intrusions. OFU-Ontology’s relevance scoring can help identify and unlearn low-impact traffic data.
Reddit Comment []	Text Data	~100 million comments	Comment text, metadata (author, subreddit)	~50 GB	This dataset is often used for sentiment analysis, topic modeling, or text classification tasks.	The Reddit Comment dataset is suitable for evaluating the algorithms in terms of content moderation and privacy-related requests (e.g., users requesting removal of comments).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Privacy-Preserving Federated Unlearning with Ontology-Guided Relevance Modeling for Secure Distributed Systems

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. Efficient Gradient-Based Federated Unlearning (EG-FedUnlearn)

3.2. Ontology-Integrated Federated Unlearning (OFU-Ontology) Algorithm

3.2.1. System Architecture

3.2.2. Ontology Construction and Integration

3.2.3. Workflow of OFU-Ontology

4. Experimental Results

4.1. Datasets

4.2. Evaluation Metrics

4.3. Deletion Strategies

4.4. Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix B

References

Article Metrics

Citations

Article Access Statistics