Next Article in Journal
Uncertainty Analysis of Performance Parameters of a Hybrid Thermoelectric Generator Based on Sobol Sequence Sampling
Previous Article in Journal
Research Progress on Control Algorithms for Grain Combine Harvesters
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Do Stop Words Matter in Bug Report Analysis? Empirical Findings Using Deep Learning Models Across Duplicate, Severity, and Priority Classification

1
Department of Computer Applied Mathematics, Hankyong National University, Anseong-si 17579, Republic of Korea
2
Department of Computer Applied Mathematics, Computer System Institute, Hankyong National University, Anseong-si 17579, Republic of Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(16), 9178; https://doi.org/10.3390/app15169178 (registering DOI)
Submission received: 29 July 2025 / Revised: 18 August 2025 / Accepted: 19 August 2025 / Published: 20 August 2025

Abstract

As software systems continue to increase in complexity and scale, the number of reported bugs also grows. Bug reports are essential artifacts in software maintenance, supporting critical tasks such as detecting duplicate reports, predicting bug severity, and assigning priority levels. Although stop word removal is a common text preprocessing step in natural language processing, its effectiveness in deep learning-based bug report analysis has not been thoroughly evaluated. This study investigates the impact of stop word removal on three core bug report classification tasks. The analysis uses a dataset containing over 1.9 million bug reports from eight large-scale open-source projects, including Eclipse, FreeBSD, GCC, Gentoo, Kernel, RedHat, Sourceware, and WebKit. Five deep learning models are applied: convolutional neural networks, long short-term memory networks, gated recurrent units, Transformers, and BERT. Each model is evaluated on its performance with and without stop word removal during preprocessing. The results show that the F1 score difference was less than 0.01 in over 85% of comparisons, so stop word removal has little to no effect on predictive performance in eight open-source projects. Average F1-scores remain consistent across all tasks and models, with 0.36 for duplicate detection, 0.33 for severity prediction, and 0.33 for priority prediction. Statistical significance tests confirm that the observed differences are not meaningful across datasets or model types. The findings suggest that stop word removal is not necessary in deep learning-based bug report analysis. Removing this step may simplify preprocessing pipelines without reducing accuracy, particularly in large-scale and real-world software engineering applications.
Keywords: bug duplicate detection; bug severity prediction; bug priority prediction; stop words; software bug report analysis bug duplicate detection; bug severity prediction; bug priority prediction; stop words; software bug report analysis

Share and Cite

MDPI and ACS Style

Ji, J.; Yang, G. Do Stop Words Matter in Bug Report Analysis? Empirical Findings Using Deep Learning Models Across Duplicate, Severity, and Priority Classification. Appl. Sci. 2025, 15, 9178. https://doi.org/10.3390/app15169178

AMA Style

Ji J, Yang G. Do Stop Words Matter in Bug Report Analysis? Empirical Findings Using Deep Learning Models Across Duplicate, Severity, and Priority Classification. Applied Sciences. 2025; 15(16):9178. https://doi.org/10.3390/app15169178

Chicago/Turabian Style

Ji, Jinfeng, and Geunseok Yang. 2025. "Do Stop Words Matter in Bug Report Analysis? Empirical Findings Using Deep Learning Models Across Duplicate, Severity, and Priority Classification" Applied Sciences 15, no. 16: 9178. https://doi.org/10.3390/app15169178

APA Style

Ji, J., & Yang, G. (2025). Do Stop Words Matter in Bug Report Analysis? Empirical Findings Using Deep Learning Models Across Duplicate, Severity, and Priority Classification. Applied Sciences, 15(16), 9178. https://doi.org/10.3390/app15169178

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop