This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
Do Stop Words Matter in Bug Report Analysis? Empirical Findings Using Deep Learning Models Across Duplicate, Severity, and Priority Classification
by
Jinfeng Ji
Jinfeng Ji 1 and
Geunseok Yang
Geunseok Yang 2,*
1
Department of Computer Applied Mathematics, Hankyong National University, Anseong-si 17579, Republic of Korea
2
Department of Computer Applied Mathematics, Computer System Institute, Hankyong National University, Anseong-si 17579, Republic of Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(16), 9178; https://doi.org/10.3390/app15169178 (registering DOI)
Submission received: 29 July 2025
/
Revised: 18 August 2025
/
Accepted: 19 August 2025
/
Published: 20 August 2025
Abstract
As software systems continue to increase in complexity and scale, the number of reported bugs also grows. Bug reports are essential artifacts in software maintenance, supporting critical tasks such as detecting duplicate reports, predicting bug severity, and assigning priority levels. Although stop word removal is a common text preprocessing step in natural language processing, its effectiveness in deep learning-based bug report analysis has not been thoroughly evaluated. This study investigates the impact of stop word removal on three core bug report classification tasks. The analysis uses a dataset containing over 1.9 million bug reports from eight large-scale open-source projects, including Eclipse, FreeBSD, GCC, Gentoo, Kernel, RedHat, Sourceware, and WebKit. Five deep learning models are applied: convolutional neural networks, long short-term memory networks, gated recurrent units, Transformers, and BERT. Each model is evaluated on its performance with and without stop word removal during preprocessing. The results show that the F1 score difference was less than 0.01 in over 85% of comparisons, so stop word removal has little to no effect on predictive performance in eight open-source projects. Average F1-scores remain consistent across all tasks and models, with 0.36 for duplicate detection, 0.33 for severity prediction, and 0.33 for priority prediction. Statistical significance tests confirm that the observed differences are not meaningful across datasets or model types. The findings suggest that stop word removal is not necessary in deep learning-based bug report analysis. Removing this step may simplify preprocessing pipelines without reducing accuracy, particularly in large-scale and real-world software engineering applications.
Share and Cite
MDPI and ACS Style
Ji, J.; Yang, G.
Do Stop Words Matter in Bug Report Analysis? Empirical Findings Using Deep Learning Models Across Duplicate, Severity, and Priority Classification. Appl. Sci. 2025, 15, 9178.
https://doi.org/10.3390/app15169178
AMA Style
Ji J, Yang G.
Do Stop Words Matter in Bug Report Analysis? Empirical Findings Using Deep Learning Models Across Duplicate, Severity, and Priority Classification. Applied Sciences. 2025; 15(16):9178.
https://doi.org/10.3390/app15169178
Chicago/Turabian Style
Ji, Jinfeng, and Geunseok Yang.
2025. "Do Stop Words Matter in Bug Report Analysis? Empirical Findings Using Deep Learning Models Across Duplicate, Severity, and Priority Classification" Applied Sciences 15, no. 16: 9178.
https://doi.org/10.3390/app15169178
APA Style
Ji, J., & Yang, G.
(2025). Do Stop Words Matter in Bug Report Analysis? Empirical Findings Using Deep Learning Models Across Duplicate, Severity, and Priority Classification. Applied Sciences, 15(16), 9178.
https://doi.org/10.3390/app15169178
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.