A Unified Knowledge Management Framework for Continual Learning and Machine Unlearning in Large Language Models
Abstract
1. Introduction
- We develop a parameter-efficient CLU method that combines Low-Rank Adaptation (LoRA) [9] freezing, magnitude-based sparse masking, and orthogonal gradient projection into a unified structural constraint system, achieving state-of-the-art stability–plasticity balance across interleaved learning-unlearning sequences on 4B- and 8B-scale LLMs.
- We ground these structural choices in a drift-aware design principle based on KL divergence, establishing a formal upper bound (Theorem A1) that decomposes distributional drift into update magnitude and direction terms. This provides a principled explanation for why magnitude-controlling constraints (freezing, sparsity) yield the largest individual gains, while direction control (orthogonal projection) provides crucial cumulative-drift mitigation in longer sequences.
- We provide systematic experimental evidence including behavioral metrics, token-level distributional drift analysis, and ablation studies that jointly validate the method’s effectiveness and the design principle’s explanatory power on controlled interleaved CLU benchmarks.
2. Related Work
2.1. Continual Learning
2.1.1. Regularization-Based Methods
2.1.2. Replay-Based Methods
2.1.3. Structure-Based Methods
2.2. Machine Unlearning
2.2.1. Removal-Intended Methods
2.2.2. Suppression-Intended Methods
2.3. Continual Learning and Machine Unlearning
3. Materials and Methods
3.1. Problem Definition
3.2. A Drift-Aware Framework for Retention-Controlled CLU
3.3. Method
3.3.1. Freezing the LoRA Matrix A
3.3.2. Sparse Masking for the Weight Matrix B
3.3.3. Orthogonal Gradient Projection
3.3.4. Overall Algorithm
| Algorithm 1 Unified CLU Framework with Parameter-Efficient Adaptation | |
| Require: Base model , task sequence where and | |
| Require: Hyperparameters: LoRA rank r, sparsity ratio s, learning rate | |
| Ensure: Updated model with LoRA adapters | |
| |
| ▹ Compute global threshold |
| ▹ Mask: 1 for small params, 0 for large |
| |
| ▹ No masking (all-ones matrix) for first task |
| |
| ▹ Orthogonal projection |
| |
| ▹ Masked parameter update (⊙: element-wise product) |
| |
4. Experiments
4.1. Dataset and Experimental Setup
4.2. Baselines
- Gradient Ascent (GA) [20]. This method performs unlearning by maximizing the negative log-likelihood on the forget set, thereby progressively reducing the model’s confidence in generating answers related to the data that should be forgotten.
- Gradient Ascent + Gradient Descent (GA + GD) [34]. This approach combines gradient ascent on the forget set with gradient descent on the retain set. It enables the model to erase undesired knowledge while simultaneously maintaining performance on data that should be retained.
- KL-Regularized Gradient Ascent (GA + KL) [34]. This method applies gradient-ascent unlearning on the forget set while constraining the model’s distributional drift via a KL divergence regularizer with respect to a reference model. This prevents excessive deviation from the original model behavior during the unlearning process.
- Negative Preference Optimization (NPO) [22]. This technique explicitly downweights forget-set targets by penalizing the likelihood ratio under a negative-preference objective, thereby directly diminishing the model’s confidence in producing answers from the forget set.
- Direct Preference Optimization (DPO) [20]. We adapt DPO to the unlearning scenario by constructing preference pairs where a neutral or alternative response is preferred over the forget-set target. The pairwise objective increases the probability of neutral responses while decreasing the probability of undesired responses relative to a reference policy.
- Low-Rank Adaptation (LoRA) [9]. Instead of updating all model parameters, LoRA injects trainable low-rank decomposition matrices into the model’s attention layers. A single shared LoRA adapter is trained across all sequential tasks (both unlearning and continual learning), modifying the model’s behavior through parameter-efficient updates. This enables efficient adaptation across the entire task sequence while maintaining the base model’s weights frozen.
4.3. Evaluation Metrics
- Model Utility (MU) serves as a task-level response quality proxy, quantifying the model’s ability to retain useful knowledge on the retain set and newly learned tasks. It is computed as the arithmetic mean of ROUGE-L, CS, and ES:A higher MU indicates that the model maintains strong performance on data that should be preserved.
- Forgetting Proxy (FP) measures the degree to which model outputs deviate from original responses on the forget set under specified prompt templates. Rather than certifying irrecoverability in a privacy sense, FP quantifies behavioral redirection—the extent to which outputs shift away from target responses. It is defined as:A higher FP indicates that the model produces outputs that diverge substantially from the original responses on forget-set samples, reflecting observable behavioral change rather than provable knowledge elimination.
5. Results
5.1. Main Results
5.2. Sensitivity Analysis
5.3. Distribution Drift Analysis
5.4. Ablation Study
5.5. Model Size and Computational Efficiency
5.6. More Results on Real-World Datasets
6. Discussion
6.1. Interpretation of Key Findings
6.2. Limitations and Future Directions
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. Parameter-Space Drift Control as a KL Approximation
- 1.
- Localization (Freezing A): Restricting updates to with frozen and reduces the effective parameter space dimension from d to , yielding , thereby bounding update magnitude via the fixed subspace defined by A.
- 2.
- Selective Protection (Sparse Masking on B): Applying element-wise mask with sparsity s enforces , where denotes the number of non-zero elements. By protecting top-s percentile parameters (largest magnitude entries critical to retained capabilities), we further constrain , reducing perturbation magnitude.
- 3.
- Direction Control (Orthogonal Projection): Projecting gradients to be orthogonal to previous task directions ensures for all , minimizing alignment with critical directions for retained knowledge and thereby reducing the effective impact on in directions where is large.
Appendix B. Baseline Method Details
- Gradient Ascent (GA).
- GA + GD (Gradient Difference).
- GA + KL (KL-regularized GA).
- Negative Preference Optimization (NPO).
- Direct Preference Optimization (DPO).
References
- Shi, H.; Xu, Z.; Wang, H.; Qin, W.; Wang, W.; Wang, Y.; Wang, Z.; Ebrahimi, S.; Wang, H. Continual learning of large language models: A comprehensive survey. ACM Comput. Surv. 2025, 58, 1–42. [Google Scholar] [CrossRef]
- Liu, S.; Yao, Y.; Jia, J.; Casper, S.; Baracaldo, N.; Hase, P.; Yao, Y.; Liu, C.Y.; Xu, X.; Li, H.; et al. Rethinking machine unlearning for large language models. Nat. Mach. Intell. 2025, 7, 181–194. [Google Scholar] [CrossRef]
- Wang, X.; Chen, T.; Ge, Q.; Xia, H.; Bao, R.; Zheng, R.; Zhang, Q.; Gui, T.; Huang, X.J. Orthogonal subspace learning for language model continual learning. In Findings of the Association for Computational Linguistics: EMNLP 2023; Association for Computational Linguistics: Kerrville, TX, USA, 2023; pp. 10658–10671. [Google Scholar]
- He, J.; Guo, H.; Zhu, K.; Zhao, Z.; Tang, M.; Wang, J. Seekr: Selective attention-guided knowledge retention for continual learning of large language models. arXiv 2024, arXiv:2411.06171. [Google Scholar] [CrossRef]
- Gao, C.; Wang, L.; Ding, K.; Weng, C.; Wang, X.; Zhu, Q. On large language model continual unlearning. arXiv 2024, arXiv:2407.10223. [Google Scholar]
- Liu, B.; Liu, Q.; Stone, P. Continual learning and private unlearning. In Proceedings of the Conference on Lifelong Learning Agents; PMLR: Cambridge, MA, USA, 2022; pp. 243–254. [Google Scholar]
- Chatterjee, R.; Chundawat, V.; Tarun, A.; Mali, A.; Mandal, M. A unified framework for continual learning and unlearning. arXiv 2024, arXiv:2408.11374. [Google Scholar] [CrossRef]
- Huang, Z.; Cheng, X.; Zhang, J.; Zheng, J.; Wang, H.; He, Z.; Li, T.; Huang, X. A unified gradient-based framework for task-agnostic continual learning-unlearning. arXiv 2025, arXiv:2505.15178. [Google Scholar]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-rank adaptation of large language models. Int. Conf. Learn. Represent. 2022, 1, 3. [Google Scholar]
- Kirkpatrick, J.; Pascanu, R.; Rabinowitz, N.; Veness, J.; Desjardins, G.; Rusu, A.A.; Milan, K.; Quan, J.; Ramalho, T.; Grabska-Barwinska, A.; et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. USA 2017, 114, 3521–3526. [Google Scholar] [CrossRef] [PubMed]
- Aljundi, R.; Babiloni, F.; Elhoseiny, M.; Rohrbach, M.; Tuytelaars, T. Memory aware synapses: Learning what (not) to forget. In European Conference on Computer Vision (ECCV); Springer: Cham, Switzerland, 2018; pp. 139–154. [Google Scholar]
- Zenke, F.; Poole, B.; Ganguli, S. Continual learning through synaptic intelligence. In International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2017; pp. 3987–3995. [Google Scholar]
- Chaudhry, A.; Dokania, P.K.; Ajanthan, T.; Torr, P.H. Riemannian walk for incremental learning: Understanding forgetting and intransigence. In European Conference on Computer Vision (ECCV); Springer: Cham, Switzerland, 2018; pp. 532–547. [Google Scholar]
- Guo, C.; Zhao, B.; Bai, Y. Deepcore: A comprehensive library for coreset selection in deep learning. In International Conference on Database and Expert Systems Applications; Springer: Cham, Switzerland, 2022; pp. 181–195. [Google Scholar]
- Feldman, D. Core-sets: Updated survey. In Sampling Techniques for Supervised or Unsupervised Tasks; Springer: Cham, Switzerland, 2019; pp. 23–44. [Google Scholar]
- Wang, T.; Zhu, J.Y.; Torralba, A.; Efros, A.A. Dataset distillation. arXiv 2018, arXiv:1811.10959. [Google Scholar]
- Yu, R.; Liu, S.; Wang, X. Dataset distillation: A comprehensive review. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 46, 150–170. [Google Scholar] [CrossRef]
- Ahn, H.; Cha, S.; Lee, D.; Moon, T. Uncertainty-based continual learning with adaptive regularization. arXiv 2019, arXiv:1905.11614. [Google Scholar] [CrossRef]
- Jin, H.; Kim, E. Helpful or harmful: Inter-task association in continual learning. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2022; pp. 519–535. [Google Scholar]
- Maini, P.; Feng, Z.; Schwarzschild, A.; Lipton, Z.C.; Kolter, J.Z. Tofu: A task of fictitious unlearning for LLMs. arXiv 2024, arXiv:2401.06121. [Google Scholar] [CrossRef]
- Jang, J.; Yoon, D.; Yang, S.; Cha, S.; Lee, M.; Logeswaran, L.; Seo, M. Knowledge unlearning for mitigating privacy risks in language models. In 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Association for Computational Linguistics: Kerrville, TX, USA, 2023; pp. 14389–14408. [Google Scholar]
- Zhang, R.; Lin, L.; Bai, Y.; Mei, S. Negative preference optimization: From catastrophic collapse to effective unlearning. arXiv 2024, arXiv:2404.05868. [Google Scholar] [CrossRef]
- Fan, C.; Liu, J.; Lin, L.; Jia, J.; Zhang, R.; Mei, S.; Liu, S. Simplicity prevails: Rethinking negative preference optimization for LLM unlearning. arXiv 2024, arXiv:2410.07163. [Google Scholar] [CrossRef]
- Cha, S.; Cho, S.; Hwang, D.; Lee, M. Towards robust and parameter-efficient knowledge unlearning for LLMs. arXiv 2024, arXiv:2408.06621. [Google Scholar]
- Russinovich, M.; Salem, A. Obliviate: Efficient unmemorization for protecting intellectual property in large language models. arXiv 2025, arXiv:2502.15010. [Google Scholar] [CrossRef]
- Liu, Z.; Dou, G.; Tan, Z.; Tian, Y.; Jiang, M. Towards safer large language models through machine unlearning. arXiv 2024, arXiv:2402.10058. [Google Scholar] [CrossRef]
- Ishibashi, Y.; Shimodaira, H. Knowledge sanitization of large language models. arXiv 2023, arXiv:2309.11852. [Google Scholar]
- Liu, Y.; Zhang, Y.; Jaakkola, T.; Chang, S. Revisiting Who’s Harry Potter: Towards Targeted Unlearning from a Causal Intervention Perspective. arXiv 2024, arXiv:2407.16997. [Google Scholar]
- Xu, H.; Zhao, N.; Yang, L.; Zhao, S.; Deng, S.; Wang, M.; Hooi, B.; Oo, N.; Chen, H.; Zhang, N. Relearn: Unlearning via learning for large language models. arXiv 2025, arXiv:2502.11190. [Google Scholar] [CrossRef]
- Shibata, T.; Irie, G.; Ikami, D.; Mitsuzumi, Y. Learning with Selective Forgetting. Int. Jt. Conf. Artif. Intell. 2021, 3, 4. [Google Scholar]
- Wang, Z.; Bi, B.; Pentyala, S.K.; Ramnath, K.; Chaudhuri, S.; Mehrotra, S.; Mao, X.B.; Asur, S.; Cheng, N. A comprehensive survey of LLM alignment techniques: RLHF, RLAIF, PPO, DPO and more. arXiv 2024, arXiv:2407.16216. [Google Scholar] [CrossRef]
- Izzo, Z.; Smart, M.A.; Chaudhuri, K.; Zou, J. Approximate data deletion from machine learning models. In International Conference on Artificial Intelligence and Statistics; PMLR: Cambridge, MA, USA, 2021; pp. 2008–2016. [Google Scholar]
- Qiao, J.; Zhang, Z.; Tan, X.; Qu, Y.; Zhang, W.; Han, Z.; Xie, Y. Gradient projection for continual parameter-efficient tuning. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 9316–9329. [Google Scholar] [CrossRef]
- Yao, J.; Chien, E.; Du, M.; Niu, X.; Wang, T.; Cheng, Z.; Yue, X. Machine unlearning of pre-trained large language models. arXiv 2024, arXiv:2402.15159. [Google Scholar] [CrossRef]
- Lin, C.Y. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out; Association for Computational Linguistics: Kerrville, TX, USA, 2004; pp. 74–81. [Google Scholar]
- Yuan, X.; Pang, T.; Du, C.; Chen, K.; Zhang, W.; Lin, M. A closer look at machine unlearning for large language models. arXiv 2024, arXiv:2410.08109. [Google Scholar] [CrossRef]
- Reimers, N.; Gurevych, I. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv 2019, arXiv:1908.10084. [Google Scholar] [CrossRef]
- Sileo, D. tasksource: A Dataset Harmonization Framework for Streamlined NLP Multi-Task Learning and Evaluation. arXiv 2023, arXiv:2301.05948. [Google Scholar] [CrossRef]
- Liu, Z.; Zhu, T.; Tan, C.; Chen, W. Learning to refuse: Towards mitigating privacy risks in LLMs. In 31st International Conference on Computational Linguistics; Association for Computational Linguistics: Kerrville, TX, USA, 2025; pp. 1683–1698. [Google Scholar]
- Pinsker, M.S. Some mathematical questions of theory of information transmission. Probl. Inf. Transm. 2007, 43, 380–392. [Google Scholar] [CrossRef]





| Design Principle (Conceptual) | Implementation Component (Algorithmic) | Role/Intuition |
|---|---|---|
| Retention control on (drift-aware stability) | Frozen LoRA projection matrix A (Section 3.3.1) | Constrains updates to a shared low-dimensional subspace, promoting stable behavior on retained knowledge. |
| Localization (reduce interference) | Sparse masking on B (Section 3.3.2) | Restricts parameter changes to a small subset, limiting collateral forgetting and isolating task-specific edits. |
| Direction control (protect past directions) | Orthogonal gradient projection (Section 3.3.3) | Removes update components aligned with previously learned directions, reducing destructive interference across tasks. |
| Model | Method | UL1 | CL1 | UL2 | CL2 | UL3 | CL3 | Average | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MU | FP | MU | FP | MU | FP | MU | FP | MU | FP | MU | FP | |||
| Qwen3-4B-Instruct | GA | 0.39 | 0.60 | 0.50 | 0.59 | 0.31 | 0.76 | 0.43 | 0.71 | 0.27 | 0.80 | 0.37 | 0.70 | 0.536 |
| GA + GD | 0.47 | 0.55 | 0.52 | 0.56 | 0.43 | 0.72 | 0.49 | 0.63 | 0.41 | 0.71 | 0.47 | 0.62 | 0.548 | |
| GA + KL | 0.57 | 0.48 | 0.52 | 0.47 | 0.43 | 0.67 | 0.48 | 0.63 | 0.46 | 0.66 | 0.47 | 0.61 | 0.538 | |
| NPO | 0.40 | 0.50 | 0.42 | 0.58 | 0.44 | 0.66 | 0.44 | 0.70 | 0.29 | 0.79 | 0.43 | 0.68 | 0.528 | |
| DPO | 0.54 | 0.49 | 0.51 | 0.51 | 0.45 | 0.64 | 0.49 | 0.63 | 0.47 | 0.67 | 0.48 | 0.71 | 0.549 | |
| LoRA | 0.52 | 0.54 | 0.54 | 0.49 | 0.03 | 0.96 | 0.28 | 0.82 | 0.03 | 0.99 | 0.23 | 0.86 | 0.524 | |
| Our Method | 0.59 | 0.50 | 0.61 | 0.52 | 0.58 | 0.51 | 0.58 | 0.52 | 0.55 | 0.54 | 0.56 | 0.66 | 0.560 | |
| Llama3-8B-Instruct | GA | 0.52 | 0.52 | 0.59 | 0.51 | 0.41 | 0.77 | 0.44 | 0.64 | 0.38 | 0.69 | 0.38 | 0.70 | 0.546 |
| GA + GD | 0.67 | 0.41 | 0.68 | 0.41 | 0.41 | 0.71 | 0.51 | 0.70 | 0.38 | 0.69 | 0.43 | 0.67 | 0.556 | |
| GA + KL | 0.62 | 0.50 | 0.68 | 0.49 | 0.41 | 0.65 | 0.47 | 0.61 | 0.36 | 0.72 | 0.39 | 0.66 | 0.547 | |
| NPO | 0.59 | 0.54 | 0.67 | 0.52 | 0.41 | 0.69 | 0.47 | 0.62 | 0.33 | 0.71 | 0.38 | 0.68 | 0.551 | |
| DPO | 0.59 | 0.40 | 0.59 | 0.36 | 0.52 | 0.61 | 0.52 | 0.57 | 0.48 | 0.63 | 0.53 | 0.59 | 0.533 | |
| LoRA | 0.59 | 0.50 | 0.69 | 0.37 | 0.03 | 0.97 | 0.29 | 0.84 | 0.03 | 0.99 | 0.25 | 0.89 | 0.537 | |
| Our Method | 0.81 | 0.39 | 0.80 | 0.38 | 0.74 | 0.38 | 0.73 | 0.38 | 0.68 | 0.45 | 0.67 | 0.46 | 0.573 | |
| Sparsity | UL1 | CL1 | UL2 | CL2 | UL3 | CL3 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MU | FP | MU | FP | MU | FP | MU | FP | MU | FP | MU | FP | |
| 0.0 | 0.52 | 0.54 | 0.54 | 0.49 | 0.03 | 0.96 | 0.28 | 0.82 | 0.03 | 0.99 | 0.23 | 0.86 |
| 0.3 | 0.52 | 0.54 | 0.55 | 0.49 | 0.10 | 0.94 | 0.24 | 0.84 | 0.09 | 0.94 | 0.17 | 0.83 |
| 0.5 | 0.52 | 0.54 | 0.55 | 0.49 | 0.20 | 0.79 | 0.27 | 0.81 | 0.11 | 0.90 | 0.21 | 0.80 |
| 0.7 | 0.52 | 0.54 | 0.58 | 0.52 | 0.30 | 0.73 | 0.33 | 0.72 | 0.19 | 0.82 | 0.25 | 0.77 |
| 0.9 | 0.59 | 0.50 | 0.61 | 0.52 | 0.58 | 0.51 | 0.58 | 0.52 | 0.55 | 0.54 | 0.56 | 0.56 |
| Method | UL1 | CL1 | UL2 | CL2 | UL3 | CL3 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MU | FP | MU | FP | MU | FP | MU | FP | MU | FP | MU | FP | |||
| × | × | × | 0.52 | 0.54 | 0.54 | 0.49 | 0.03 | 0.96 | 0.28 | 0.82 | 0.03 | 0.99 | 0.23 | 0.86 |
| ✓ | × | × | 0.59 | 0.50 | 0.60 | 0.50 | 0.38 | 0.76 | 0.47 | 0.61 | 0.31 | 0.72 | 0.40 | 0.62 |
| × | ✓ | × | 0.52 | 0.54 | 0.63 | 0.47 | 0.54 | 0.57 | 0.54 | 0.57 | 0.34 | 0.71 | 0.46 | 0.60 |
| × | × | ✓ | 0.52 | 0.54 | 0.54 | 0.49 | 0.03 | 0.97 | 0.18 | 0.94 | 0.01 | 0.98 | 0.15 | 0.97 |
| × | ✓ | ✓ | 0.52 | 0.54 | 0.63 | 0.46 | 0.56 | 0.53 | 0.53 | 0.55 | 0.35 | 0.72 | 0.44 | 0.64 |
| ✓ | × | ✓ | 0.59 | 0.50 | 0.59 | 0.48 | 0.38 | 0.68 | 0.45 | 0.64 | 0.31 | 0.71 | 0.36 | 0.66 |
| ✓ | ✓ | × | 0.59 | 0.50 | 0.61 | 0.51 | 0.58 | 0.50 | 0.59 | 0.52 | 0.44 | 0.52 | 0.44 | 0.64 |
| ✓ | ✓ | ✓ | 0.59 | 0.50 | 0.61 | 0.52 | 0.58 | 0.51 | 0.59 | 0.52 | 0.55 | 0.54 | 0.56 | 0.56 |
| Method | Trainable Params | Ratio | FLOPs/Step |
|---|---|---|---|
| Base Model | 3.74B | - | - |
| Full Fine-tuning | 3.74B | 100.0% | 183.8 TFLOPs (100.0%) |
| LoRA () | 15.63M | 0.42% | 62.3 TFLOPs (33.9%) |
| Ours () | 8.40M | 0.22% | 62.0 TFLOPs (33.7%) |
| Method | UL1 | CL1 | UL2 | CL2 | UL3 | CL3 | Average | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MU | FP | MU | FP | MU | FP | MU | FP | MU | FP | MU | FP | ||
| GA | 0.69 | 0.57 | 0.75 | 0.45 | 0.51 | 0.6 | 0.69 | 0.56 | 0.56 | 0.66 | 0.64 | 0.59 | 0.606 |
| GD | 0.69 | 0.47 | 0.74 | 0.48 | 0.73 | 0.51 | 0.71 | 0.56 | 0.63 | 0.59 | 0.61 | 0.62 | 0.612 |
| GA + KL | 0.76 | 0.4 | 0.78 | 0.55 | 0.68 | 0.54 | 0.69 | 0.54 | 0.57 | 0.65 | 0.59 | 0.62 | 0.614 |
| NPO | 0.69 | 0.52 | 0.77 | 0.51 | 0.63 | 0.54 | 0.7 | 0.53 | 0.59 | 0.6 | 0.62 | 0.59 | 0.601 |
| DPO | 0.69 | 0.51 | 0.77 | 0.51 | 0.72 | 0.54 | 0.71 | 0.51 | 0.6 | 0.6 | 0.6 | 0.58 | 0.617 |
| LoRA | 0.8 | 0.39 | 0.71 | 0.42 | 0.17 | 0.87 | 0.29 | 0.83 | 0.07 | 0.95 | 0.33 | 0.72 | 0.546 |
| Ours | 0.81 | 0.42 | 0.79 | 0.45 | 0.76 | 0.48 | 0.74 | 0.51 | 0.71 | 0.54 | 0.68 | 0.55 | 0.620 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Lang, J.; Li, L.; Zeng, D. A Unified Knowledge Management Framework for Continual Learning and Machine Unlearning in Large Language Models. Information 2026, 17, 238. https://doi.org/10.3390/info17030238
Lang J, Li L, Zeng D. A Unified Knowledge Management Framework for Continual Learning and Machine Unlearning in Large Language Models. Information. 2026; 17(3):238. https://doi.org/10.3390/info17030238
Chicago/Turabian StyleLang, Jiaqi, Linjing Li, and Dajun Zeng. 2026. "A Unified Knowledge Management Framework for Continual Learning and Machine Unlearning in Large Language Models" Information 17, no. 3: 238. https://doi.org/10.3390/info17030238
APA StyleLang, J., Li, L., & Zeng, D. (2026). A Unified Knowledge Management Framework for Continual Learning and Machine Unlearning in Large Language Models. Information, 17(3), 238. https://doi.org/10.3390/info17030238

