Integrating Lightweight Transformers for Cross-Project Bug Severity Classification: An Applied AI Approach in Software Engineering
Featured Application
Abstract
1. Introduction
2. Dataset Description
3. Cross-Project Experimental Design
4. Research Methodology
4.1. Text Pre-Processing
4.2. Transformer-Based Representation Learning
4.3. Severity Classification Model
4.4. Representation-Level Domain Adaptation
4.5. Joint Optimization Objective
4.6. Cross-Project Training Procedure
4.7. Inference on the Target Project
5. Experimental Setup and Evaluation
5.1. Experimental Setup
- (1)
- single-source training without domain adaptation,
- (2)
- multi-source training without domain adaptation, and
- (3)
- multi-source training with representation-level domain adaptation using Maximum Mean Discrepancy (MMD)
5.2. Evaluation Metrics
5.3. Experimental Results and Discussion
5.3.1. Baseline Performance Under Single-Source Transfer
5.3.2. Baseline Performance Under Multi-Source Transfer
5.3.3. Analysis of Single-Source Transfer Performance
5.3.4. Analysis of Multi-Source Transfer Performance
5.3.5. Ablation Study: Impact of Domain Alignment and Multi-Source Training
- (1)
- single-source transformer models,
- (2)
- multi-source transformers without domain alignment, and
- (3)
- multi-source transformers with MMD-based domain adaptation.
5.3.6. Deployment Efficiency Analysis
5.3.7. Limitations of the Study
5.4. Discussion
5.5. Threats to Validity
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Arokiam, J.; Bradbury, J.S. Automatically Predicting Bug Severity Early in the Development Process. In Proceedings of the 42nd International Conference on Software Engineering: New Ideas and Emerging Results, Seoul, Republic of Korea; IEEE: New York, NY, USA, 2020; pp. 17–20. [Google Scholar]
- Tao, Z.; Chen, J.; Yang, G.; Lee, B.; Luo, X. Towards More Accurate Severity Prediction and Fixer Recommendation of Software Bugs. J. Syst. Softw. 2016, 117, 166–184. [Google Scholar] [CrossRef]
- Hamdy, A.; Ezzat, G. Deep Mining of Open Source Software Bug Repositories. Int. J. Comput. Appl. 2022, 44, 614–622. [Google Scholar] [CrossRef]
- Hamdy, A.; El-Laithy, A. Semantic Categorization of Software Bug Repositories for Severity Assignment Automation. In Integrating Research and Practice in Software Engineering; Jarzabek, S., Poniszewska-Marańda, A., Madeyski, L., Eds.; Springer: Cham, Switzerland, 2020; pp. 15–30. [Google Scholar]
- Luaphol, B.; Polpinij, J.; Kaenampornpan, M. Text Mining Approaches for Dependent Bug Report Assembly and Severity Prediction. Int. Arab J. Inf. Technol. 2022, 19, 915–924. [Google Scholar] [CrossRef]
- Sarawan, K.; Polpinij, J.; Luaphol, B. Machine Learning-Based Methods for Identifying Bug Severity Level from Bug Reports. In Proceedings of the International Conference on Computing and Information Technology, Bangkok, Thailand; Springer: Berlin/Heidelberg, Germany, 2023; pp. 199–208. [Google Scholar]
- Bahaa, A.; Fathy, E.M.; Eldin, A.S.; Abd-Elmegid, L.A. A Systematic Literature Review of Software Defect Prediction using Deep Learning. J. Comput. Sci. 2021, 17, 490–510. [Google Scholar] [CrossRef]
- Long, G.; Gong, J.; Fang, H.; Chen, T. Learning Software Bug Reports: A Systematic Literature Review. ACM Trans. Softw. Eng. Methodol. 2025, 37, 111. [Google Scholar] [CrossRef]
- Arshad, A.A.; Riaz, A.; Fatima, R.; Yasin, A. SevPredict: Exploring the Potential of Large Language Models in Software Maintenance. Appl. Inform. 2024, 5, 2739–2760. [Google Scholar] [CrossRef]
- Rumman, M.; Roy, E.; Zaman, A.; Bradbury, J.S. A Contrastive Learning Approach to Bug Severity Classification with Large Language Model Embeddings. In Proceedings of the 49th Annual Computers, Software, and Applications Conference (COMPSAC), Toronto, ON, Canada; IEEE Computer Society: Los Alamitos, CA, USA, 2025; pp. 1376–1381. [Google Scholar]
- Zimmermann, T.; Nagappan, N.; Gall, H.; Giger, G.; Murphy, B. Cross-Project Defect Prediction: A Large-Scale Experiment on Data vs. Domain vs. Process. In Proceedings of the Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Singapore; Association for Computing Machinery: New York, NY, USA, 2009; pp. 91–100. [Google Scholar]
- Colavito, G.; Lanubile, F.; Novielli, N.; Arreza, C.; Shi, Y. Issue Classification with LLMs: An Empirical Study of the NASA flight Software Systems. J. Syst. Softw. 2026, 237, 112851. [Google Scholar] [CrossRef]
- Agrawal, R.; Goyal, R. Developing bug severity prediction models using word2vec. Int. J. Cogn. Comput. Eng. 2021, 2, 104–115. [Google Scholar] [CrossRef]
- Du, X.; Zhou, Z.; Yin, B.; Xiao, G. Cross-Project Bug Type Prediction Based on Transfer Learning. Softw. Qual. J. 2020, 28, 39–57. [Google Scholar] [CrossRef]
- Nam, J.; Pan, S.J.; Kim, S. Transfer Defect Learning. In Proceedings of the International Conference on Software Engineering, San Francisco, CA, USA; Association for Computing Machinery: New York, NY, USA, 2013; pp. 382–391. [Google Scholar]
- Sotto-Mayor, B.; Kalech, M. A Survey on Transfer Learning for Cross-Project Defect Prediction. IEEE Access 2024, 12, 93398–93425. [Google Scholar] [CrossRef]
- Zirak, A.; Hemmati, H. Improving Automated Program Repair with Domain Adaptation. ACM Trans. Softw. Eng. Methodol. 2022, 33, 43. [Google Scholar] [CrossRef]
- Li, Z.; Li, Y.; Li, T.; Du, M.; Wu, B.; Cao, Y.; Xie, X.; Li, Y.; Liu, Y. Unveiling Project-Specific Bias in Neural Code Models. In Proceedings of the Joint International Conference on Computational Linguistics, Language Resources and Evaluation, Torino, Italy, 20–25 May 2024; pp. 17205–17216. [Google Scholar]
- Mock, M.; Forrer, T.; Russo, B. Cross-Domain Evaluation of Transformer-Based Vulnerability Detection on Open and Industry Data. In Proceedings of the 26th International Conference on Product-Focused Software Process Improvement, Salerno, Italy, 1–3 December 2025; pp. 36–52. [Google Scholar]
- Wei, Y.; Zhang, C.; Ren, T. Improving Bug Severity Prediction with Domain-Specific Representation Learning. IEEE Access 2023, 11, 62829–62839. [Google Scholar] [CrossRef]
- Hu, B.; Wang, J. A Weighted Multi-Source Domain Adaptation Approach for Surface Defect Detection. IET Image Process. 2022, 16, 2210–2218. [Google Scholar] [CrossRef]
- Xiao, Y.; Zuo, X.; Lu, X.; Dong, J.S.; Cao, X.; Beschastnikh, I. Promises and Perils of Using Transformer-Based Models for SE Research. Neural Netw. 2025, 184, 107067. [Google Scholar] [CrossRef]
- von der Mosel, J.; Trautsch, A.; Herbold, S. On the Validity of Pre-Trained Transformers for Natural Language Processing in the Software Engineering Domain. IEEE Trans. Softw. Eng. 2023, 49, 1487–1507. [Google Scholar] [CrossRef]
- Wang, M.; Cai, B.; Zou, W.; Zhang, J. Keys4BR: Key Sentences-based Model Fine-Tuning for Better Semantic Representation of Bug Reports. Inf. Softw. Technol. 2026, 189, 107943. [Google Scholar] [CrossRef]
- Stankevicius, L.; Lukoševičius, M. Extracting Sentence Embeddings from Pretrained Transformer Models. Appl. Sci. 2024, 14, 8887. [Google Scholar] [CrossRef]
- Grishina, A.; Hort, M.; Moonen, L. The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder Models for More Efficient Code Classification. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, San Francisco, CA, USA; Association for Computing Machinery: New York, NY, USA, 2023; pp. 895–907. [Google Scholar]
- Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter. In Proceedings of the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing (EMC2-NIPS), Vancouver, BC, Canada, 13 December 2019; pp. 1–5. [Google Scholar]
- Jiao, X.; Yin, Y.; Shang, L.; Jiang, X.; Chen, X.; Li, L.; Wang, F.; Liu, Q. TinyBERT: Distilling BERT for Natural Language Understanding. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP, Online, 16–20 November 2020; pp. 4163–4174. [Google Scholar]
- Pan, S.J.; Tsang, I.; Kwok, J.; Yang, Q. Domain Adaptation via Transfer Component Analysis. IEEE Trans. Neural Netw. 2011, 22, 199–210. [Google Scholar] [CrossRef]
- Luo, Y.; Ren, J.; Peng, M.; Zhang, J.; Li, J. Unsupervised Domain Adaptation via Discriminative Manifold Propagation and Maximum Mean Discrepancy. Knowl. -Based Syst. 2021, 229, 107286. [Google Scholar]
- Chen, C.; Chen, Z.; Jiang, B.; Jin, X. Joint Domain Alignment and Discriminative Feature Learning for Unsupervised Deep Domain Adaptation. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, Honolulu, HI, USA; AAAI Press: Washington, DC, USA, 2019; pp. 3296–3303. [Google Scholar]
- Jiang, S.; Zhang, J.; Guo, F.; Teng, O.; Li, J. Balanced Adversarial Tight Matching for Cross-Project Defect Prediction. IET Softw. 2024, 2024, 1–19. [Google Scholar] [CrossRef]
- Zhao, H.; Zhang, S.; Wu, G.; Costeira, J.P.; Moura, J.M.F.; Gordon, G.J. Adversarial Multiple Source Domain Adaptation. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS), Montréal, QC, Canada; Curran Associates Inc.: New York, NY, USA, 2018; pp. 1–12. [Google Scholar]
- Li, J.; Xu, Z.; Yongkang, W.; Zhao, Q.; Kankanhalli, M. GradMix: Multi-Source Transfer across Domains and Tasks. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, Arizona, 6–10 March 2020; pp. 3019–3027. [Google Scholar]
- Lee, M.C.H.; Braet, J.; Springael, J. Performance Metrics for Multilabel Emotion Classification: Comparing Micro, Macro, and Weighted F1-Scores. Appl. Sci. 2024, 14, 9863. [Google Scholar] [CrossRef]
- Opitz, J. A Closer Look at Classification Evaluation Metrics and a Critical Reflection of Common Evaluation Practice. Trans. Assoc. Comput. Linguist. 2024, 12, 820–836. [Google Scholar] [CrossRef]

| Severity | Datasets | |||
|---|---|---|---|---|
| Firefox | Core | Thunderbird | Bugzilla | |
| Trivial | 1111 | 864 | 740 | 788 |
| Minor | 3332 | 2591 | 1555 | 920 |
| Major | 7550 | 6046 | 2813 | 808 |
| Critical | 9772 | 7255 | 2073 | 112 |
| Blocker | 444 | 518 | 222 | 40 |
| Total | 22,209 | 17,274 | 7403 | 2468 |
| Cross-Project Experimental Settings | Source Project (Train) | Target Project (Test) |
|---|---|---|
| Core | Firefox |
| Core | Thunderbird | |
| Core | Bugzilla | |
| Firefox | Core | |
| Firefox | Thunderbird | |
| Firefox | Bugzilla | |
| Thunderbird | Core | |
| Thunderbird | Firefox | |
| Thunderbird | Bugzilla | |
| Bugzilla | Core | |
| Bugzilla | Firefox | |
| Core + Thunderbird + Bugzilla | Firefox |
| Firefox + Thunderbird+ Bugzilla | Core | |
| Core + Firefox + Bugzilla | Thunderbird | |
| Core + Firefox + Thunderbird | Bugzilla |
| Parameter | Value |
|---|---|
| Backbone models | DistilBERT-base-uncased, TinyBERT-4L-312D |
| Optimizer | AdamW |
| Learning rate | 2 × 10−5 |
| Batch size | 32 |
| Epochs | 10 |
| Max sequence length | 128 |
| Loss function | Weighted Cross-Entropy |
| Domain alignment | MMD (Gaussian kernel) |
| Alignment coefficient (λ) | 0.5 |
| Random seeds | 42, 52, 62, 72, 82 |
| Source → Target | LR Macro-F1 | LR Weighted-F1 | SVM Macro-F1 | SVM Weighted-F1 | CNN Macro-F1 | CNN Weughted-F1 |
|---|---|---|---|---|---|---|
| Core → Firefox | 0.34 | 0.58 | 0.36 | 0.60 | 0.39 | 0.62 |
| Core → Thunderbird | 0.32 | 0.56 | 0.34 | 0.58 | 0.37 | 0.60 |
| Core → Bugzilla | 0.28 | 0.52 | 0.30 | 0.54 | 0.33 | 0.56 |
| Firefox → Core | 0.36 | 0.60 | 0.38 | 0.62 | 0.41 | 0.64 |
| Firefox → Thunderbird | 0.33 | 0.57 | 0.35 | 0.59 | 0.38 | 0.61 |
| Firefox → Bugzilla | 0.29 | 0.53 | 0.31 | 0.55 | 0.34 | 0.57 |
| Thunderbird → Core | 0.31 | 0.55 | 0.33 | 0.57 | 0.36 | 0.59 |
| Thunderbird → Firefox | 0.33 | 0.56 | 0.35 | 0.58 | 0.38 | 0.60 |
| Thunderbird → Bugzilla | 0.27 | 0.51 | 0.29 | 0.53 | 0.32 | 0.55 |
| Bugzilla → Core | 0.26 | 0.50 | 0.28 | 0.52 | 0.31 | 0.54 |
| Bugzilla → Firefox | 0.27 | 0.51 | 0.29 | 0.53 | 0.32 | 0.55 |
| Bugzilla → Thunderbird | 0.25 | 0.49 | 0.27 | 0.51 | 0.30 | 0.53 |
| Average | 0.30 | 0.54 | 0.32 | 0.56 | 0.35 | 0.57 |
| Source → Target | LR Macro-F1 | LR Weighted-F1 | SVM Macro-F1 | SVM Weighted-F1 | CNN Macro-F1 | CNN Weighted-F1 |
|---|---|---|---|---|---|---|
| {Firefox, Thunderbird, Bugzilla} → Core | 0.34 | 0.57 | 0.36 | 0.59 | 0.39 | 0.62 |
| {Core, Thunderbird, Bugzilla} → Firefox | 0.35 | 0.58 | 0.37 | 0.60 | 0.40 | 0.63 |
| {Core, Firefox, Bugzilla} → Thunderbird | 0.33 | 0.56 | 0.35 | 0.58 | 0.38 | 0.61 |
| {Core, Firefox, Thunderbird} → Bugzilla | 0.31 | 0.54 | 0.33 | 0.56 | 0.36 | 0.59 |
| Average | 0.33 | 0.56 | 0.35 | 0.58 | 0.38 | 0.61 |
| Model | Source (Train) | Target (Test) | Macro-F1 | Weighted-F1 |
|---|---|---|---|---|
| DistilBERT | Core | Firefox | 0.48 ± 0.02 * | 0.66 ± 0.01 * |
| Thunderbird | 0.45 ± 0.02 * | 0.63 ± 0.01 * | ||
| Bugzilla | 0.39 ± 0.03 * | 0.58 ± 0.02 * | ||
| Firefox | Core | 0.50 ± 0.02 * | 0.68 ± 0.01 * | |
| Thunderbird | 0.47 ± 0.02 * | 0.65 ± 0.01 * | ||
| Bugzilla | 0.41 ± 0.03 * | 0.60 ± 0.02 * | ||
| Thunderbird | Core | 0.44 ± 0.02 * | 0.62 ± 0.01 * | |
| Firefox | 0.46 ± 0.02 * | 0.64 ± 0.01 * | ||
| Bugzilla | 0.38 ± 0.03 * | 0.57 ± 0.02 * | ||
| Bugzilla | Core | 0.36 ± 0.03 * | 0.55 ± 0.02 * | |
| Firefox | 0.37 ± 0.03 * | 0.56 ± 0.02 * | ||
| Thunderbird | 0.35 ± 0.03 * | 0.54 ± 0.02 * | ||
| Average (DistilBERT) | 0.42 ± 0.03 * | 0.61 ± 0.02 * | ||
| TinyBERT | Core | Firefox | 0.45 ± 0.03 | 0.63 ± 0.02 |
| Thunderbird | 0.42 ± 0.03 | 0.60 ± 0.02 | ||
| Bugzilla | 0.36 ± 0.03 | 0.55 ± 0.02 | ||
| Firefox | Core | 0.47 ± 0.02 | 0.65 ± 0.01 | |
| Thunderbird | 0.44 ± 0.03 | 0.62 ± 0.02 | ||
| Bugzilla | 0.38 ± 0.03 | 0.57 ± 0.02 | ||
| Thunderbird | Core | 0.41 ± 0.02 | 0.59 ± 0.01 | |
| Firefox | 0.43 ± 0.02 | 0.61 ± 0.01 | ||
| Bugzilla | 0.35 ± 0.03 | 0.54 ± 0.02 | ||
| Bugzilla | Core | 0.33 ± 0.03 | 0.52 ± 0.02 | |
| Firefox | 0.34 ± 0.03 | 0.53 ± 0.02 | ||
| Thunderbird | 0.32 ± 0.03 | 0.51 ± 0.02 | ||
| Average (TinyBERT) | 0.39 ± 0.03 | 0.58 ± 0.02 | ||
| Bug Summary | True Severity | Predicted Severity | Observation |
|---|---|---|---|
| “Browser crashes when opening encrypted PDF” | Critical | Critical | Explicit failure cue |
| “UI alignment issue after plugin update” | Minor | Major | Ambiguous wording |
| Model | Training Projects | Target (Test) | Macro-F1 | Weighted-F1 |
|---|---|---|---|---|
| DistilBERT | Core + Thunderbird + Bugzilla | Bugzilla | 0.56 ± 0.02 * | 0.72 ± 0.01 * |
| Firefox + Thunderbird + Bugzilla | Firefox | 0.58 ± 0.02 * | 0.74 ± 0.01 * | |
| Core + Firefox + Bugzilla | Core | 0.54 ± 0.02 * | 0.70 ± 0.01 * | |
| Core + Firefox + Thunderbird | Thunderbird | 0.49 ± 0.03 * | 0.66 ± 0.02 * | |
| Average (DistilBERT) | 0.54 ± 0.02 * | 0.71 ± 0.01 * | ||
| TinyBERT | Core + Thunderbird + Bugzilla | Bugzilla | 0.53 ± 0.02 | 0.69 ± 0.01 |
| Firefox + Thunderbird + Bugzilla | Firefox | 0.55 ± 0.02 | 0.71 ± 0.01 | |
| Core + Firefox + Bugzilla | Core | 0.51 ± 0.02 | 0.67 ± 0.01 | |
| Core + Firefox + Thunderbird | Thunderbird | 0.46 ± 0.03 | 0.63 ± 0.02 | |
| Average (TinyBERT) | 0.51 ± 0.02 | 0.68 ± 0.01 | ||
| Model | Training Setting | Domain Adaptation | Macro-F1 | Weighted-F1 |
|---|---|---|---|---|
| DistilBERT | Single-source | No | 0.45 ± 0.02 | 0.63 ± 0.02 |
| Multi-source | No | 0.52 ± 0.02 | 0.69 ± 0.01 | |
| Multi-source | MMD | 0.56 ± 0.02 * | 0.72 ± 0.01 * | |
| Average (DistilBERT) | 0.51 ± 0.02 | 0.68 ± 0.01 | ||
| TinyBERT | Single-source | No | 0.42 ± 0.02 | 0.60 ± 0.02 |
| Multi-source | No | 0.49 ± 0.02 | 0.66 ± 0.01 | |
| Multi-source | MMD | 0.53 ± 0.02 * | 0.69 ± 0.01 * | |
| Average (TinyBERT) | 0.48 ± 0.02 | 0.65 ± 0.01 | ||
| Model | Parameters | Model Size | Avg Inference Time (ms/Sample) | Memory Usage |
|---|---|---|---|---|
| BERT-base | 110 M | ~420 MB | 22 ms | High |
| DistilBERT | 66 M | ~255 MB | 12 ms | Medium |
| TinyBERT | 14 M | ~55 MB | 6 ms | Low |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhu, L.; Wiangsamut, S.; Polpinij, J. Integrating Lightweight Transformers for Cross-Project Bug Severity Classification: An Applied AI Approach in Software Engineering. Appl. Sci. 2026, 16, 6026. https://doi.org/10.3390/app16126026
Zhu L, Wiangsamut S, Polpinij J. Integrating Lightweight Transformers for Cross-Project Bug Severity Classification: An Applied AI Approach in Software Engineering. Applied Sciences. 2026; 16(12):6026. https://doi.org/10.3390/app16126026
Chicago/Turabian StyleZhu, Liangliang, Samruan Wiangsamut, and Jantima Polpinij. 2026. "Integrating Lightweight Transformers for Cross-Project Bug Severity Classification: An Applied AI Approach in Software Engineering" Applied Sciences 16, no. 12: 6026. https://doi.org/10.3390/app16126026
APA StyleZhu, L., Wiangsamut, S., & Polpinij, J. (2026). Integrating Lightweight Transformers for Cross-Project Bug Severity Classification: An Applied AI Approach in Software Engineering. Applied Sciences, 16(12), 6026. https://doi.org/10.3390/app16126026

