An Empirical Study on Enhancing Large Language Models for Long-Term Conversations in Korean
Abstract
1. Introduction
- Session Summarization: To generate appropriate responses, models must be able to recall information from previous sessions or user-provided inputs. While storing entire conversations is possible, summarization has been shown to be a more effective and efficient approach [2].
- Memory Update: Models must be capable of refining and updating user-related information, such as emotional state, location, or recent events, which can dynamically change over time in MSC scenarios [3].
- Response Generation: When responding to user utterances, models should reason about the relevance of stored memory in relation to the current input and determine whether past information should be utilized for response generation [4].
- 1.
- To address the capability gap between English and Korean, we conduct comprehensive experiments to enhance the long-term conversational abilities of LLMs in Korean.
- 2.
- We construct Korean session summarization and memory update datasets. Unlike existing MSC datasets, our dataset differentiates between persona and episode memory to reflect long-term versus short-term information.
- 3.
- Experimental results demonstrate the effectiveness of our methods in long-term conversation settings. Additionally, our MSC dataset—annotated with distinct memory types—supports the generation of more engaging and contextually appropriate responses.
- 4.
- We demonstrate that selectively tuning Korean-specific neurons outperforms existing fine-tuning approaches and exhibits robust performance in continual learning settings where other methods suffer from catastrophic forgetting.
2. Related Work
2.1. MSC Dataset
2.2. Memory-Augmented Dialogue Systems
2.3. Language-Specific Neuron
3. KEEM Dataset
Data Construction
4. Method
4.1. LoRA
4.2. DPO
4.3. MoE
- denotes the set of top-k expert indices selected by the gating network based on the input h,
- is the output of the i-th expert network,
- is the normalized gating weight for expert i,
- is the gating score for expert i, computed by the gating network.
4.4. Neuron Tuning
Activation Scores of Attention Module
Activation Scores of the FFN
Identification of Language-Specific Neurons
4.5. Continual Pre-Training
4.6. Specific Layer Tuning
4.7. Task
4.7.1. Session Summarization
4.7.2. Memory Update
4.7.3. Response Generation
5. Experiments
5.1. Experimental Setup
5.2. Implementation Details
5.3. Human Evaluator Details
5.4. Evaluation of Difficulty in Processing MSC
5.5. Results of Session Summarization
5.6. Results of Memory Update
5.7. Results of Response Generation
5.8. Effectiveness of Dividing Session Summarization Type
6. Analysis
6.1. Effectiveness of Multi-Task Learning
6.2. Comparative Evaluation of Neuron Identifying Methodologies
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| LLM | Large Language Model |
| MSC | Multi-Session Conversation |
| PLM | Pre-trained Language Model |
| LoRA | Low-Rank Adaptation |
| DPO | Direct Preference Optimization |
| MoE | Mixture-of-Experts |
| RAG | Retrieval-Augmented Generation |
| KEEM | Keep Emotion and Essential Memory |
| LAPE | Language Activation Probability Entropy |
| PLND | Parallel Language-specific Neuron Detection |
| CPT | Continual Pre-Training |
| RL | Reinforcement Learning |
| RLHF | Reinforcement Learning with Human Feedback |
| AMU | Appropriateness of Memory Usage |
Appendix A
Details of Instructions
| Instruction (Translated into English) |
|---|
| Given a dialogue between a user and a chatbot, along with its summarized memory, separate the memory into persona memory and episode memory. |
| Persona memory should include long-term information about the user—details that are unlikely to change frequently. Examples include the user’s name, place of residence, occupation, preferences, personality traits, and other stable personal attributes. |
| Episode memory should include short-term information that often changes based on the user’s recent experiences or events. Examples include recent important events in the user’s life, current emotions and their causes, and other context-specific or time-sensitive details. |
| In cases where the information in [speaker’s summary] requires modifying or deleting a sentence in [Persona Memory] or [Episode Memory], you must update the corresponding sentence to reflect the most recent information while maintaining overall consistency. |
| Persona Memory contains long-term user information (e.g., name, residence, occupation, stable preferences) accumulated across past conversations. Episode Memory contains short-term or changeable information (e.g., recent events, current emotions and their causes). |
| Because [speaker’s summary] reflects the current conversation, it always contains the most recent information. Therefore, when discrepancies arise, you must update [Persona Memory] or [Episode Memory] using the content of [speaker’s summary]. |
| You must not merge, modify, or delete sentences in Persona Memory or Episode Memory solely because they address the same topic. You should update a sentence only when it must be changed based on the new information in [speaker’s summary], or when the information naturally continues. |
| When updating, ensure that no new content is invented and that no existing content is lost. |
References
- Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 7871–7880. [Google Scholar]
- Xu, J.; Szlam, A.; Weston, J. Beyond Goldfish Memory: Long-Term Open-Domain Conversation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; pp. 5180–5197. [Google Scholar]
- Kang, J.; Kim, H.; Kim, H. Generation-Based and Emotion-Reflected Memory Update: Creating the KEEM Dataset for Better Long-Term Conversation. In Proceedings of the 31st International Conference on Computational Linguistics, Abu Dhabi, United Arab Emirates, 19–24 January 2025; pp. 9260–9277. [Google Scholar]
- Kim, H.; Keum, B.; Huang, J.; Kwon, O.; Kim, H. Multi-Level Attention-Based Generation Model for Long-Term Conversation. J. KIISE 2025, 52, 117–124. [Google Scholar] [CrossRef]
- Ko, H.; Son, G.; Choi, D. Understand, Solve and Translate: Bridging the Multilingual Mathematical Reasoning Gap. arXiv 2025, arXiv:2501.02448. [Google Scholar] [CrossRef]
- Han, S.; Suk, J.; An, S.; Kim, H.; Kim, K.; Yang, W.; Choi, S.; Shin, J. Trillion 7B Technical Report. arXiv 2025, arXiv:2504.15431. [Google Scholar] [CrossRef]
- Yoo, H.; Park, C.; Yun, S.; Oh, A.; Lee, H. Code-Switching Curriculum Learning for Multilingual Transfer in LLMs. arXiv 2024, arXiv:2411.02460. [Google Scholar] [CrossRef]
- Laban, P.; Hayashi, H.; Zhou, Y.; Neville, J. LLMs Get Lost in Multi-Turn Conversation. arXiv 2025, arXiv:2505.06120. [Google Scholar] [CrossRef]
- Zhao, Y.; Zhang, W.; Chen, G.; Kawaguchi, K.; Bing, L. How Do Large Language Models Handle Multilingualism? In Proceedings of the Thirty-Eighth Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 9–15 December 2024. [Google Scholar]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. arXiv 2022, arXiv:2106.09685. [Google Scholar]
- Rafailov, R.; Sharma, A.; Mitchell, E.; Manning, C.D.; Ermon, S.; Finn, C. Direct Preference Optimization: Your Language Model Is Secretly a Reward Model. Adv. Neural Inf. Process. Syst. 2023, 36, 53728–53741. [Google Scholar]
- Ma, J.; Zhao, Z.; Yi, X.; Chen, J.; Hong, L.; Chi, E.H. Modeling Task Relationships in Multi-Task Learning with Multi-Gate Mixture-of-Experts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 1930–1939. [Google Scholar]
- Kim, H.; Lee, J.; Lee, K.; Shin, J.; Lim, S.; Kwon, O. Do LLMs Need Inherent Reasoning Before Reinforcement Learning? A Study in Korean Self-Correction. In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, Mumbai, India, 20–24 December 2025; pp. 527–542. [Google Scholar]
- Wang, M.; Zhang, N.; Xu, Z.; Xi, Z.; Deng, S.; Yao, Y.; Zhang, Q.; Yang, L.; Wang, J.; Chen, H. Detoxifying large language models via knowledge editing. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand, 11–16 August 2024; pp. 3093–3118. [Google Scholar]
- Tang, T.; Luo, W.; Huang, H.; Zhang, D.; Wang, X.; Zhao, W.X.; Wei, F.; Wen, J.-R. Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand, 11–16 August 2024; pp. 5701–5715. [Google Scholar]
- Xu, H.; Zhan, R.; Ma, Y.; Wong, D.F.; Chao, L.S. Let’s Focus on Neuron: Neuron-Level Supervised Fine-tuning for Large Language Model. In Proceedings of the 31st International Conference on Computational Linguistics, Abu Dhabi, United Arab Emirates, 19–24 January 2025; pp. 9393–9406. [Google Scholar]
- Bae, S.; Kwak, D.; Kang, S.; Lee, M.Y.; Kim, S.; Jeong, Y.; Kim, H.; Kim, H.; Lee, S.-W.; Park, W.; et al. Keep Me Updated! Memory Management in Long-term Conversations. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 3769–3787. [Google Scholar]
- Jia, Z.; Liu, Q.; Li, H.; Chen, Y.; Liu, J. Evaluating the long-term memory of large language models. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2025, Vienna, Austria, 27 July–1 August 2025; pp. 19759–19777. [Google Scholar]
- Tan, Z.; Yan, J.; Hsu, I.H.; Han, R.; Wang, Z.; Le, L.; Song, Y.; Chen, Y.; Palangi, H.; Lee, G.; et al. In prospect and retrospect: Reflective memory management for long-term personalized dialogue agents. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vienna, Austria, 27 July–1 August 2025; pp. 8416–8439. [Google Scholar]
- Kim, E.; Park, C.; Chang, B. SHARE: Shared Memory-Aware Open-Domain Long-Term Dialogue Dataset Constructed from Movie Script. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vienna, Austria, 27 July–1 August 2025; pp. 14474–14498. [Google Scholar]
- OpenAI. GPT-4 Technical Report. arXiv 2023, arXiv:2303.08774. [Google Scholar] [CrossRef]
- Wang, Y.; Kordi, Y.; Mishra, S.; Liu, A.; Smith, N.A.; Khashabi, D.; Hajishirzi, H. Self-Instruct: Aligning Language Models with Self-Generated Instructions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; pp. 13484–13508. [Google Scholar]
- Cegin, J.; Simko, J.; Brusilovsky, P. ChatGPT to Replace Crowdsourcing of Paraphrases for Intent Classification: Higher Diversity and Comparable Model Robustness. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023; pp. 1889–1905. [Google Scholar]
- Chen, Z.; Shen, Y.; Ding, M.; Chen, Z.; Zhao, H.; Learned-Miller, E.G.; Gan, C. Mod-Squad: Designing Mixtures of Experts as Modular Multi-Task Learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023, Vancouver, BC, Canada, 17–24 June 2023; pp. 11828–11837. [Google Scholar]
- Dai, D.; Deng, C.; Zhao, C.; Xu, R.; Gao, H.; Chen, D.; Li, J.; Zeng, W.; Yu, X.; Wu, Y.; et al. DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand, 11–16 August 2024. [Google Scholar]
- Liu, A.; Feng, B.; Xue, B.; Wang, B.; Wu, B.; Lu, C.; Zhao, C.; Deng, C.; Zhang, C.; Ruan, C.; et al. DeepSeek-V3 Technical Report. arXiv 2024, arXiv:2412.19437. [Google Scholar]
- Liu, W.; Xu, Y.; Xu, H.; Chen, J.; Hu, X.; Wu, J. Unraveling Babel: Exploring Multilingual Activation Patterns of LLMs and Their Applications. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, 12–16 November 2024; pp. 11855–11881. [Google Scholar]
- LG AI Research; Bae, K.; Choi, E.; Choi, K.; Choi, S.J.; Choi, Y.; Han, K.; Hong, S.; Hwang, J.; Hwang, T.; et al. EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes. arXiv 2025, arXiv:2507.11407. [Google Scholar] [CrossRef]
- Gemma Team; Kamath, A.; Ferret, J.; Pathak, S.; Vieillard, N.; Merhej, R.; Perrin, S.; Matejovicova, T.; Ramé, A.; Rivière, M.; et al. Gemma 3 Technical Report. arXiv 2025, arXiv:2503.19786. [Google Scholar] [CrossRef]
- AI@Meta. Llama 3 Model Card. 2024. Available online: https://github.com/meta-llama/llama3 (accessed on 1 January 2025).
- Bai, J.; Bai, S.; Chu, Y.; Cui, Z.; Dang, K.; Deng, X.; Fan, Y.; Ge, W.; Han, Y.; Huang, F.; et al. Qwen Technical Report. arXiv 2023, arXiv:2309.16609. [Google Scholar] [CrossRef]
- Yang, A.; Li, A.; Yang, B.; Zhang, B.; Hui, B.; Zheng, B.; Yu, B.; Gao, C.; Huang, C.; Lv, C.; et al. Qwen3 Technical Report. arXiv 2025, arXiv:2505.09388. [Google Scholar] [CrossRef]




| Topic | Count | Ratio |
|---|---|---|
| Individuals & Relationships | 12,783 | 15.98% |
| Entertainment | 11,063 | 13.83% |
| Beauty & Health | 9038 | 11.30% |
| Society | 7328 | 9.17% |
| Work & Job | 6997 | 8.75% |
| Arts & Culture | 6508 | 8.13% |
| Education | 6288 | 7.86% |
| Food | 4130 | 5.15% |
| Climate | 3975 | 4.97% |
| Traffic | 3795 | 4.75% |
| House | 3489 | 4.37% |
| Fashion | 235 | 0.29% |
| Attribute | KMSC | KEEM | ||||
|---|---|---|---|---|---|---|
| Session 1–2 | Session 1–3 | Session 1–4 | Session 1–2 | Session 1–3 | Session 1–4 | |
| Total Episodes | 40,000 | 20,000 | 20,000 | 2006 | 1560 | 1005 |
| Total Utterances | 980,919 | 731,705 | 953,405 | 61,354 | 70,974 | 59,847 |
| Total Memory sentences | 567,176 | 395,808 | 498,497 | 33,972 | 31,846 | 23,148 |
| Avg. Length of Utterances | 16.98 | 17.13 | 17.22 | 17.02 | 17.26 | 17.32 |
| Avg. Number of Utterances | 30.65 | 45.73 | 59.59 | 30.58 | 45.49 | 59.54 |
| Avg. Number of Memory Sentences | 17.72 | 24.73 | 31.15 | 16.93 | 20.41 | 23.03 |
| Persona Memory | Episode Memory |
|---|---|
| My name is Andrew I’m a university student I have a sister I’m a dog person … | I’m traveling in Europe I’ve been to Porto and Lisbon I’m currently in Barcelona I feel calm and peaceful these days and enjoy my vacation … |
| Dataset | Model | Perplexity ↓ (Lower Is Better) | |||
|---|---|---|---|---|---|
| Session 1 | Session 1–2 | Session 1–3 | Session 1–4 | ||
| KEEM (KOR MSC) | EXAONE 4 32B | 4.1 | 4.3 | 5.9 | 8.1 |
| Gemma 3 12B | 3.9 | 4.2 | 5.4 | 7.5 | |
| Llama 3.1 8B | 3.5 | 4.2 | 5.3 | 7.8 | |
| Qwen 2.5 7B | 3.2 | 3.7 | 5.1 | 7.3 | |
| Qwen 2.5 32B | 2.8 | 3.1 | 5.0 | 7.1 | |
| Qwen 3 8B | 2.9 | 3.0 | 4.6 | 7.1 | |
| Qwen 3 32B | 2.4 | 2.4 | 4.2 | 6.7 | |
| MSC (ENG) [2] | EXAONE 4 32B | 2.8 | 3.3 | 5.0 | 6.7 |
| Gemma 3 12B | 3.1 | 3.1 | 3.8 | 5.4 | |
| Llama 3.1 8B | 2.6 | 2.7 | 3.9 | 5.6 | |
| Qwen 2.5 7B | 2.4 | 2.5 | 3.7 | 6.1 | |
| Qwen 2.5 32B | 2.1 | 1.8 | 3.2 | 5.3 | |
| Qwen 3 8B | 2.3 | 2.1 | 3.7 | 5.4 | |
| Qwen 3 32B | 1.9 | 1.7 | 3.1 | 4.9 | |
| Task | Model | Perplexity |
|---|---|---|
| Summarization (KOR) | EXAONE 4 32B | 5.3 |
| Gemma 3 12B | 4.5 | |
| Llama 3.1 8B | 4.6 | |
| Qwen 2.5 7B | 4.4 | |
| Qwen 2.5 32B | 4.0 | |
| Qwen 3 8B | 4.4 | |
| Qwen 3 32B | 4.1 |
| Model | Method | Semantic Similarity | Informativeness |
|---|---|---|---|
| EXAONE 4 32B | LoRA | 0.87 | 4.48 |
| DPO | 0.92 | 4.77 | |
| MoE | 0.86 | 4.59 | |
| Layer Tuning | 0.88 | 4.75 | |
| Neuron Tuning | 0.93 | 4.80 | |
| Gemma 3 12B | LoRA | 0.88 | 4.51 |
| DPO | 0.92 | 4.67 | |
| MoE | 0.88 | 4.60 | |
| CPT | 0.89 | 4.69 | |
| Layer Tuning | 0.89 | 4.73 | |
| Neuron Tuning | 0.94 | 4.82 | |
| Llama 3.1 8B | LoRA | 0.85 | 4.35 |
| DPO | 0.88 | 4.42 | |
| MoE | 0.85 | 4.39 | |
| CPT | 0.87 | 4.42 | |
| Layer Tuning | 0.86 | 4.50 | |
| Neuron Tuning | 0.90 | 4.64 | |
| Qwen 2.5 7B | LoRA | 0.83 | 4.30 |
| DPO | 0.89 | 4.44 | |
| MoE | 0.81 | 4.33 | |
| CPT | 0.90 | 4.49 | |
| Layer Tuning | 0.89 | 4.51 | |
| Neuron Tuning | 0.90 | 4.61 | |
| Qwen 3 8B | LoRA | 0.89 | 4.72 |
| DPO | 0.94 | 4.85 | |
| MoE | 0.91 | 4.74 | |
| CPT | 0.88 | 4.69 | |
| Layer Tuning | 0.93 | 4.81 | |
| Neuron Tuning | 0.95 | 4.92 |
| Model | Method | Informativeness | Conflict ↓ |
|---|---|---|---|
| EXAONE 4 32B | LoRA | 4.22 | 15.9% |
| MoE | 4.28 | 14.6% | |
| Layer Tuning | 4.49 | 13.1% | |
| Neuron Tuning | 4.67 | 9.8% | |
| Gemma 3 12B | LoRA | 4.27 | 14.5% |
| MoE | 4.31 | 14.4% | |
| CPT | 4.40 | 13.3% | |
| Layer Tuning | 4.32 | 14.6% | |
| Neuron Tuning | 4.42 | 12.7% | |
| Llama 3.1 8B | LoRA | 4.14 | 17.3% |
| MoE | 4.12 | 17.0% | |
| CPT | 4.25 | 14.9% | |
| Layer Tuning | 4.25 | 14.5% | |
| Neuron Tuning | 4.39 | 14.2% | |
| Qwen 2.5 7B | LoRA | 4.11 | 16.8% |
| MoE | 4.14 | 16.4% | |
| CPT | 4.23 | 14.4% | |
| Layer Tuning | 4.27 | 14.4% | |
| Neuron Tuning | 4.40 | 14.0% | |
| Qwen 3 8B | LoRA | 4.54 | 13.6% |
| MoE | 4.52 | 13.7% | |
| CPT | 4.61 | 11.5% | |
| Layer Tuning | 4.60 | 11.1% | |
| Neuron Tuning | 4.65 | 10.4% |
| Model | Method | Informativeness | Conflict ↓ |
|---|---|---|---|
| EXAONE 4 32B | LoRA | 3.67 | 17.5% |
| MoE | 3.83 | 16.4% | |
| Layer Tuning | 3.88 | 14.7% | |
| Neuron Tuning | 4.06 | 11.9% | |
| Gemma 3 12B | LoRA | 3.42 | 15.4% |
| MoE | 3.79 | 14.9% | |
| CPT | 3.84 | 13.3% | |
| Layer Tuning | 3.83 | 14.2% | |
| Neuron Tuning | 3.94 | 12.1% | |
| Qwen 3 8B | LoRA | 3.35 | 15.6% |
| MoE | 3.65 | 15.7% | |
| CPT | 3.86 | 13.5% | |
| Layer Tuning | 3.82 | 14.3% | |
| Neuron Tuning | 3.89 | 12.5% |
| Model | Method | Engagement | AMU ↑ |
|---|---|---|---|
| EXAONE 4 32B | LoRA | 3.78 | 85.9% |
| MoE | 3.73 | 84.7% | |
| Layer Tuning | 3.99 | 84.6% | |
| Neuron Tuning | 4.06 | 86.9% | |
| Gemma 3 12B | LoRA | 3.64 | 82.3% |
| MoE | 3.63 | 82.2% | |
| CPT | 3.90 | 83.5% | |
| Layer Tuning | 3.91 | 83.4% | |
| Neuron Tuning | 3.98 | 83.9% | |
| Llama 3.1 8B | LoRA | 3.62 | 78.9% |
| MoE | 3.66 | 78.4% | |
| CPT | 3.84 | 81.9% | |
| Layer Tuning | 3.82 | 81.6% | |
| Neuron Tuning | 3.89 | 82.7% | |
| Qwen 2.5 7B | LoRA | 3.59 | 77.4% |
| MoE | 3.61 | 77.3% | |
| CPT | 3.79 | 81.6% | |
| Layer Tuning | 3.72 | 80.4% | |
| Neuron Tuning | 3.87 | 81.7% | |
| Qwen 3 8B | LoRA | 3.73 | 85.0% |
| MoE | 3.78 | 85.0% | |
| CPT | 3.88 | 85.8% | |
| Layer Tuning | 3.82 | 85.4% | |
| Neuron Tuning | 3.97 | 87.4% |
| Model | Method | Engagement | Naturalness | AMU ↑ |
|---|---|---|---|---|
| EXAONE 4 32B | LoRA | 3.57 | 4.48 | 83.1% |
| MoE | 3.57 | 4.40 | 82.4% | |
| Layer Tuning | 3.69 | 4.51 | 81.3% | |
| Neuron Tuning | 3.90 | 4.54 | 83.5% | |
| Gemma 3 12B | LoRA | 3.44 | 4.32 | 80.6% |
| MoE | 3.41 | 4.33 | 80.2% | |
| CPT | 3.77 | 4.46 | 81.8% | |
| Layer Tuning | 3.72 | 4.45 | 81.6% | |
| Neuron Tuning | 3.85 | 4.53 | 83.0% | |
| Qwen 3 8B | LoRA | 3.37 | 4.10 | 79.2% |
| MoE | 3.40 | 4.12 | 80.0% | |
| CPT | 3.69 | 4.18 | 80.4% | |
| Layer Tuning | 3.65 | 4.19 | 80.5% | |
| Neuron Tuning | 3.78 | 4.24 | 81.6% |
| Method | Session 2 | Session 3 | Session 4 | |||
|---|---|---|---|---|---|---|
| PPL ↓ | Eng. ↑ | PPL ↓ | Eng. ↑ | PPL ↓ | Eng. ↑ | |
| w/o divide | 3.9 | 3.94 | 5.4 | 3.69 | 6.7 | 3.53 |
| w/ divide | 3.1 | 4.32 | 4.0 | 4.11 | 5.1 | 3.92 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Kim, H.; Kang, J.; Jang, Y.; Sim, Y.; Kim, H. An Empirical Study on Enhancing Large Language Models for Long-Term Conversations in Korean. Appl. Sci. 2026, 16, 3175. https://doi.org/10.3390/app16073175
Kim H, Kang J, Jang Y, Sim Y, Kim H. An Empirical Study on Enhancing Large Language Models for Long-Term Conversations in Korean. Applied Sciences. 2026; 16(7):3175. https://doi.org/10.3390/app16073175
Chicago/Turabian StyleKim, Hongjin, Jeonghyun Kang, Yeajin Jang, Yujin Sim, and Harksoo Kim. 2026. "An Empirical Study on Enhancing Large Language Models for Long-Term Conversations in Korean" Applied Sciences 16, no. 7: 3175. https://doi.org/10.3390/app16073175
APA StyleKim, H., Kang, J., Jang, Y., Sim, Y., & Kim, H. (2026). An Empirical Study on Enhancing Large Language Models for Long-Term Conversations in Korean. Applied Sciences, 16(7), 3175. https://doi.org/10.3390/app16073175

