Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline

Search Results (1)

Search Parameters:
Keywords = CTQRS-guided reinforcement learning

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
26 pages, 702 KB  
Article
CTQRS-Based Reinforcement Learning Framework for Reliable Bug Report Generation Using Open-Source Large Language Models
by Geunseok Yang
Appl. Sci. 2025, 15(23), 12545; https://doi.org/10.3390/app152312545 - 26 Nov 2025
Viewed by 961
Abstract
The advancement of Large Language Models (LLMs) has opened new possibilities for automating bug report generation in software engineering. However, a fundamental limitation remains: the generated reports often fail to maintain both consistent structure and reliable semantic quality. To address this issue, this [...] Read more.
The advancement of Large Language Models (LLMs) has opened new possibilities for automating bug report generation in software engineering. However, a fundamental limitation remains: the generated reports often fail to maintain both consistent structure and reliable semantic quality. To address this issue, this study proposes a Reinforcement Learning (RL) framework that integrates the CTQRS (Completeness, Traceability, Quality, Reproducibility, Specificity) metric as a reward signal. The proposed method aims to enhance both the structural completeness and semantic coherence of generated reports, enabling the automatic creation of reliable bug reports based on open-source LLMs. The training process consists of three stages: Supervised Fine-Tuning (SFT), Reinforcement Learning (RL), and Refinement. In the SFT stage, the model learns the formal structure of bug reports, reducing the loss from 1.9 to 1.3 and achieving initial CTQRS and SBERT scores of 0.46 and 0.68, respectively. In the RL stage, a multi-reward function centered on CTQRS is combined with the Proximal Policy Optimization (PPO) algorithm, increasing the reward value from 0.42 to 0.63 with stable convergence confirmed through the Exponential Moving Average (EMA). During this process, the CTQRS and SBERT scores improved to 0.72 and 0.84, demonstrating that the model simultaneously enhanced structural completeness and semantic consistency. In the final Refinement stage, the outcomes of SFT and RL are integrated, and a critic-based fine-grained feedback adjustment strategy is applied to stabilize the final outputs. The refined reports maintained a reward value of approximately 0.65, achieving peak CTQRS and SBERT scores of 0.76 and 0.85, respectively. Throughout the entire training process, the stability of the reward gradients was preserved, and the adjustments to length rewards and repetition penalties effectively prevented excessive verbosity. Experimental results show that the proposed CTQRS-based reinforcement learning framework improves the structural completeness, contextual accuracy, and evaluation stability of bug reports, thereby quantitatively enhancing the reliability of LLM-based (v.Qwen2.5-7B-Instruct) software quality assurance documentation. Future work will focus on further improving formal precision and evaluation consistency by fine-tuning the number of critic iterations (critic_iters) and adjusting detailed reward weights. Full article
Show Figures

Figure 1

Back to TopTop