Journal Description
Software
Software
is an international, peer-reviewed, open access journal on all aspects of software engineering published quarterly online by MDPI.
- Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 28.9 days after submission; acceptance to publication is undertaken in 4.2 days (median values for papers published in this journal in the first half of 2025).
- Recognition of Reviewers: APC discount vouchers, optional signed peer review, and reviewer names published annually in the journal.
- Software is a companion journal of Electronics.
Latest Articles
Enabling Progressive Server-Side Rendering for Traditional Web Template Engines with Java Virtual Threads
Software 2025, 4(3), 20; https://doi.org/10.3390/software4030020 - 13 Aug 2025
Abstract
Modern web applications increasingly demand rendering techniques that optimize performance, responsiveness, and scalability. Progressive Server-Side Rendering (PSSR) bridges the gap between Server-Side Rendering and Client-Side Rendering by progressively streaming HTML content, improving perceived load times. Still, traditional HTML template engines often rely on
[...] Read more.
Modern web applications increasingly demand rendering techniques that optimize performance, responsiveness, and scalability. Progressive Server-Side Rendering (PSSR) bridges the gap between Server-Side Rendering and Client-Side Rendering by progressively streaming HTML content, improving perceived load times. Still, traditional HTML template engines often rely on blocking interfaces that hinder their use in asynchronous, non-blocking contexts required for PSSR. This paper analyzes how Java virtual threads, introduced in Java 21, enable non-blocking execution of blocking I/O operations, allowing the reuse of traditional template engines for PSSR without complex asynchronous programming models. We benchmark multiple engines across Spring WebFlux, Spring MVC, and Quarkus using reactive, suspendable, and virtual thread-based approaches. Results show that virtual threads allow blocking engines to scale comparably to those designed for non-blocking I/O, achieving high throughput and responsiveness under load. This demonstrates that virtual threads provide a compelling path to simplify the implementation of PSSR with familiar HTML templates, significantly lowering the barrier to entry while maintaining performance.
Full article
(This article belongs to the Topic Software Engineering and Applications)
►
Show Figures
Open AccessArticle
Research and Development of Test Automation Maturity Model Building and Assessment Methods for E2E Testing
by
Daiju Kato, Ayane Mogi, Hiroshi Ishikawa and Yasufumi Takama
Software 2025, 4(3), 19; https://doi.org/10.3390/software4030019 - 5 Aug 2025
Abstract
Background: While several test-automation maturity models (e.g., CMMI, TMMi, TAIM) exist, none explicitly integrate ISO 9001-based quality management systems (QMS), leaving a gap for organizations that must align E2E test automation with formal quality assurance. Objective: This study proposes a test-automation maturity model
[...] Read more.
Background: While several test-automation maturity models (e.g., CMMI, TMMi, TAIM) exist, none explicitly integrate ISO 9001-based quality management systems (QMS), leaving a gap for organizations that must align E2E test automation with formal quality assurance. Objective: This study proposes a test-automation maturity model (TAMM) that bridges E2E automation capability with ISO 9001/ISO 9004 self-assessment principles, and evaluates its reliability and practical impact in industry. Methods: TAMM comprises eight maturity dimensions, 39 requirements, and 429 checklist items. Three independent assessors applied the checklist to three software teams; inter-rater reliability was ensured via consensus review (Cohen’s κ = 0.75). Short-term remediation actions based on the checklist were implemented over six months and re-assessed. Synergy with the organization’s ISO 9001 QMS was analyzed using ISO 9004 self-check scores. Results: Within 6 months of remediation, mean TAMM score rose from 2.75 → 2.85. Inter-rater reliability is filled with Cohen’s κ = 0.75. Conclusions: The proposed TAMM delivers measurable, short-term maturity gains and complements ISO 9001-based QMS without introducing conflicting processes. Practitioners can use the checklist to identify actionable gaps, prioritize remediation, and quantify progress, while researchers may extend TAMM to other domains or automate scoring via repository mining.
Full article
(This article belongs to the Special Issue Software Reliability, Security and Quality Assurance)
►▼
Show Figures

Figure 1
Open AccessArticle
Intersectional Software Engineering as a Field
by
Alicia Julia Wilson Takaoka, Claudia Maria Cutrupi and Letizia Jaccheri
Software 2025, 4(3), 18; https://doi.org/10.3390/software4030018 - 30 Jul 2025
Abstract
Intersectionality is a concept used to explain the power dynamics and inequalities that some groups experience owing to the interconnection of social differences such as in gender, sexual identity, poverty status, race, geographic location, disability, and education. The relation between software engineering, feminism,
[...] Read more.
Intersectionality is a concept used to explain the power dynamics and inequalities that some groups experience owing to the interconnection of social differences such as in gender, sexual identity, poverty status, race, geographic location, disability, and education. The relation between software engineering, feminism, and intersectionality has been addressed by some studies thus far, but it has never been codified before. In this paper, we employ the commonly used ABC Framework for empirical software engineering to show the contributions of intersectional software engineering (ISE) as a field of software engineering. In addition, we highlight the power dynamic, unique to ISE studies, and define gender-forward intersectionality as a way to use gender as a starting point to identify and examine inequalities and discrimination. We show that ISE is a field of study in software engineering that uses gender-forward intersectionality to produce knowledge about power dynamics in software engineering in its specific domains and environments. Employing empirical software engineering research strategies, we explain the importance of recognizing and evaluating ISE through four dimensions of dynamics, which are people, processes, products, and policies. Beginning with a set of 10 seminal papers that enable us to define the initial concepts and the query for the systematic mapping study, we conduct a systematic mapping study leads to a dataset of 140 primary papers, of which 15 are chosen as example papers. We apply the principles of ISE to these example papers to show how the field functions. Finally, we conclude the paper by advocating the recognition of ISE as a specialized field of study in software engineering.
Full article
(This article belongs to the Special Issue Women’s Special Issue Series: Software)
►▼
Show Figures

Figure 1
Open AccessArticle
Investigating Reproducibility Challenges in LLM Bugfixing on the HumanEvalFix Benchmark
by
Balázs Szalontai, Balázs Márton, Balázs Pintér and Tibor Gregorics
Software 2025, 4(3), 17; https://doi.org/10.3390/software4030017 - 14 Jul 2025
Abstract
Benchmark results for large language models often show inconsistencies across different studies. This paper investigates the challenges of reproducing these results in automatic bugfixing using LLMs, on the HumanEvalFix benchmark. To determine the cause of the differing results in the literature, we attempted
[...] Read more.
Benchmark results for large language models often show inconsistencies across different studies. This paper investigates the challenges of reproducing these results in automatic bugfixing using LLMs, on the HumanEvalFix benchmark. To determine the cause of the differing results in the literature, we attempted to reproduce a subset of them by evaluating 12 models in the DeepSeekCoder, CodeGemma, CodeLlama, and WizardCoder model families, in different sizes and tunings. A total of 35 unique results were reported for these models across studies, of which we successfully reproduced 12. We identified several relevant factors that influenced the results. The base models can be confused with their instruction-tuned variants, making their results better than expected. Incorrect prompt templates or generation length can decrease benchmark performance, as well as using 4-bit quantization. Using sampling instead of greedy decoding can increase the variance, especially with higher temperature values. We found that precision and 8-bit quantization have less influence on benchmark results.
Full article
(This article belongs to the Topic Applications of NLP, AI, and ML in Software Engineering)
►▼
Show Figures

Figure 1
Open AccessEditorial
New Editor-in-Chief of Software
by
Mirko Viroli
Software 2025, 4(3), 16; https://doi.org/10.3390/software4030016 - 10 Jul 2025
Abstract
I would like to introduce myself as the new Editor-in-Chief of Software [...]
Full article
Open AccessArticle
Analysing Concurrent Queues Using CSP: Examining Java’s ConcurrentLinkedQueue
by
Kevin Chalmers and Jan Bækgaard Pedersen
Software 2025, 4(3), 15; https://doi.org/10.3390/software4030015 - 7 Jul 2025
Abstract
►▼
Show Figures
In this paper we examine the OpenJDK library implementation of the ConcurrentLinkedQueue. We use model checking to verify that it behaves according to the algorithm it is based on: Michael and Scott’s fast and practical non-blocking concurrent queue algorithm. In addition, we
[...] Read more.
In this paper we examine the OpenJDK library implementation of the ConcurrentLinkedQueue. We use model checking to verify that it behaves according to the algorithm it is based on: Michael and Scott’s fast and practical non-blocking concurrent queue algorithm. In addition, we develop a simple concurrent queue specification in CSP and verify that Michael and Scott’s algorithm satisfies it. We conclude that both the algorithm and the implementation are correct and both conform to our simpler concurrent queue specification, which we can use in place of either implementation in future verification tasks. The complete code is available on GitHub.
Full article

Figure 1
Open AccessReview
Machine Learning Techniques for Requirements Engineering: A Comprehensive Literature Review
by
António Miguel Rosado da Cruz and Estrela Ferreira Cruz
Software 2025, 4(3), 14; https://doi.org/10.3390/software4030014 - 28 Jun 2025
Abstract
Software requirements engineering is one of the most critical and time-consuming phases of the software-development process. The lack of communication with stakeholders and the use of natural language for communicating leads to misunderstanding and misidentification of requirements or the creation of ambiguous requirements,
[...] Read more.
Software requirements engineering is one of the most critical and time-consuming phases of the software-development process. The lack of communication with stakeholders and the use of natural language for communicating leads to misunderstanding and misidentification of requirements or the creation of ambiguous requirements, which can jeopardize all subsequent steps in the software-development process and can compromise the quality of the final software product. Natural Language Processing (NLP) is an old area of research; however, it is currently undergoing strong and very positive impacts with recent advances in the area of Machine Learning (ML), namely with the emergence of Deep Learning and, more recently, with the so-called transformer models such as BERT and GPT. Software requirements engineering is also being strongly affected by the entire evolution of ML and other areas of Artificial Intelligence (AI). In this article we conduct a systematic review on how AI, ML and NLP are being used in the various stages of requirements engineering, including requirements elicitation, specification, classification, prioritization, requirements management, requirements traceability, etc. Furthermore, we identify which algorithms are most used in each of these stages, uncover challenges and open problems and suggest future research directions.
Full article
(This article belongs to the Topic Applications of NLP, AI, and ML in Software Engineering)
►▼
Show Figures

Figure 1
Open AccessArticle
Characterizing Agile Software Development: Insights from a Data-Driven Approach Using Large-Scale Public Repositories
by
Carlos Moreno Martínez, Jesús Gallego Carracedo and Jaime Sánchez Gallego
Software 2025, 4(2), 13; https://doi.org/10.3390/software4020013 - 24 May 2025
Abstract
►▼
Show Figures
This study investigates the prevalence and impact of Agile practices by leveraging metadata from thousands of public GitHub repositories through a novel data-driven methodology. To facilitate this analysis, we developed the AgileScore index, a metric designed to identify and evaluate patterns, characteristics, performance
[...] Read more.
This study investigates the prevalence and impact of Agile practices by leveraging metadata from thousands of public GitHub repositories through a novel data-driven methodology. To facilitate this analysis, we developed the AgileScore index, a metric designed to identify and evaluate patterns, characteristics, performance and community engagement in Agile-oriented projects. This approach enables comprehensive, large-scale comparisons between Agile methodologies and traditional development practices within digital environments. Our findings reveal a significant annual growth of 16% in the adoption of Agile practices and validate the AgileScore index as a systematic tool for assessing Agile methodologies across diverse development contexts. Furthermore, this study introduces innovative analytical tools for researchers in software project management, software engineering and related fields, providing a foundation for future work in areas such as cost estimation and hybrid project management. These insights contribute to a deeper understanding of Agile’s role in fostering collaboration and adaptability in dynamic digital ecosystems.
Full article

Figure 1
Open AccessArticle
AI Testing for Intelligent Chatbots—A Case Study
by
Jerry Gao, Radhika Agarwal and Prerna Garsole
Software 2025, 4(2), 12; https://doi.org/10.3390/software4020012 - 15 May 2025
Cited by 1
Abstract
►▼
Show Figures
The decision tree test method works as a flowchart structure for conversational flow. It has predetermined questions and answers that guide the user through specific tasks. Inspired by principles of the decision tree test method in software engineering, this paper discusses intelligent AI
[...] Read more.
The decision tree test method works as a flowchart structure for conversational flow. It has predetermined questions and answers that guide the user through specific tasks. Inspired by principles of the decision tree test method in software engineering, this paper discusses intelligent AI test modeling chat systems, including basic concepts, quality validation, test generation and augmentation, testing scopes, approaches, and needs. The paper’s novelty lies in an intelligent AI test modeling chatbot system built and implemented based on an innovative 3-dimensional AI test model for AI-powered functions in intelligent mobile apps to support model-based AI function testing, test data generation, and adequate test coverage result analysis. As a result, a case study is provided using a mental health and emotional intelligence chatbot system, Wysa. It helps in tracking and analyzing mood and helps in sentiment analysis.
Full article

Figure 1
Open AccessArticle
Improving the Fast Fourier Transform for Space and Edge Computing Applications with an Efficient In-Place Method
by
Christoforos Vasilakis, Alexandros Tsagkaropoulos, Ioannis Koutoulas and Dionysios Reisis
Software 2025, 4(2), 11; https://doi.org/10.3390/software4020011 - 12 May 2025
Abstract
►▼
Show Figures
Satellite and edge computing designers develop algorithms that restrict resource utilization and execution time. Among these design efforts, optimizing Fast Fourier Transform (FFT), key to many tasks, has led mainly to in-place FFT-specific hardware accelerators. Aiming at improving the FFT performance on processors
[...] Read more.
Satellite and edge computing designers develop algorithms that restrict resource utilization and execution time. Among these design efforts, optimizing Fast Fourier Transform (FFT), key to many tasks, has led mainly to in-place FFT-specific hardware accelerators. Aiming at improving the FFT performance on processors and computing devices with limited resources, the current paper enhances the efficiency of the radix-2 FFT by exploring the benefits of an in-place technique. First, we present the advantages of organizing the single memory bank of processors to store two (2) FFT elements in each memory address and provide parallel load and store of each FFT pair of data. Second, we optimize the floating point (FP) and block floating point (BFP) configurations to improve the FFT Signal-to-Noise (SNR) performance and the resource utilization. The resulting techniques reduce the memory requirements by two and significantly improve the time performance for the overall prevailing BFP representation. The execution of inputs ranging from 1K to 16K FFT points, using 8-bit or 16-bit as FP or BFP numbers, on the space-proven Atmel AVR32 and Vision Processing Unit (VPU) Intel Movidius Myriad 2, the edge device Raspberry Pi Zero 2W and a low-cost accelerator on Xilinx Zynq 7000 Field Programmable Gate Array (FPGA), validates the method’s performance improvement.
Full article

Figure 1
Open AccessArticle
Enhancing DevOps Practices in the IoT–Edge–Cloud Continuum: Architecture, Integration, and Software Orchestration Demonstrated in the COGNIFOG Framework
by
Kostas Petrakis, Evangelos Agorogiannis, Grigorios Antonopoulos, Themistoklis Anagnostopoulos, Nasos Grigoropoulos, Eleni Veroni, Alexandre Berne, Selma Azaiez, Zakaria Benomar, Harry Kakoulidis, Marios Prasinos, Philippos Sotiriades, Panagiotis Mavrothalassitis and Kosmas Alexopoulos
Software 2025, 4(2), 10; https://doi.org/10.3390/software4020010 - 15 Apr 2025
Cited by 1
Abstract
►▼
Show Figures
This paper presents COGNIFOG, an innovative framework under development that is designed to leverage decentralized decision-making, machine learning, and distributed computing to enable autonomous operation, adaptability, and scalability across the IoT–edge–cloud continuum. The work emphasizes Continuous Integration/Continuous Deployment (CI/CD) practices, development, and versatile
[...] Read more.
This paper presents COGNIFOG, an innovative framework under development that is designed to leverage decentralized decision-making, machine learning, and distributed computing to enable autonomous operation, adaptability, and scalability across the IoT–edge–cloud continuum. The work emphasizes Continuous Integration/Continuous Deployment (CI/CD) practices, development, and versatile integration infrastructures. The described methodology ensures efficient, reliable, and seamless integration of the framework, offering valuable insights into integration design, data flow, and the incorporation of cutting-edge technologies. Through three real-world trials in smart cities, e-health, and smart manufacturing and the development of a comprehensive QuickStart Guide for deployment, this work highlights the efficiency and adaptability of the COGNIFOG platform, presenting a robust solution for addressing the complexities of next-generation computing environments.
Full article

Figure 1
Open AccessArticle
Regression Testing in Agile—A Systematic Mapping Study
by
Suddhasvatta Das and Kevin Gary
Software 2025, 4(2), 9; https://doi.org/10.3390/software4020009 - 14 Apr 2025
Abstract
►▼
Show Figures
Background: Regression testing is critical in agile software development, as it ensures that frequent changes do not introduce defects into previously working functionalities. While agile methodologies emphasize rapid iterations and value delivery, regression testing research has predominantly focused on optimizing technical efficiency
[...] Read more.
Background: Regression testing is critical in agile software development, as it ensures that frequent changes do not introduce defects into previously working functionalities. While agile methodologies emphasize rapid iterations and value delivery, regression testing research has predominantly focused on optimizing technical efficiency rather than aligning with agile principles. Aim: This study aims to systematically map research trends and gaps in regression testing within agile environments, identifying areas that require further exploration to enhance alignment with agile practices and value-driven outcomes. Method: A systematic mapping study analyzed 35 primary studies. The research categorized studies based on their focus areas, evaluation metrics, agile frameworks, and methodologies, providing a comprehensive overview of the field. Results: The findings strongly emphasize test prioritization and selection, reflecting the need for optimized fault detection and execution efficiency in agile workflows. However, areas such as test generation, test minimization, and cost analysis are under-explored. Current evaluation metrics primarily address technical outcomes, neglecting agile-specific aspects like defect severity’s business impact and iterative workflows. Additionally, the research highlights the dominance of continuous integration frameworks, with limited attention to other agile practices like Scrum and a lack of datasets capturing agile-specific attributes such as testing costs and user story importance. Conclusions: This study underscores the need for research to expand beyond existing focus areas, exploring diverse testing techniques and developing agile-centric metrics and datasets. By addressing these gaps, future work can enhance the applicability of regression testing strategies and align them more closely with agile development principles.
Full article

Figure 1
Open AccessArticle
Uplifting Moods: Augmented Reality-Based Gamified Mood Intervention App with Attention Bias Modification
by
Yun Jung Yeh, Sarah S. Jo and Youngjun Cho
Software 2025, 4(2), 8; https://doi.org/10.3390/software4020008 - 1 Apr 2025
Cited by 1
Abstract
►▼
Show Figures
Attention Bias Modification (ABM) is a cost-effective mood intervention that has the potential to be used in daily settings beyond clinical environments. However, its interactivity and user engagement are known to be limited and underexplored. Here, we propose Uplifting Moods, a novel mood
[...] Read more.
Attention Bias Modification (ABM) is a cost-effective mood intervention that has the potential to be used in daily settings beyond clinical environments. However, its interactivity and user engagement are known to be limited and underexplored. Here, we propose Uplifting Moods, a novel mood intervention app that combines gamified ABM and augmented reality (AR) to address the limitation associated with the repetitive nature of ABM. By harnessing the benefits of mobile AR’s low-cost, portable, and accessible characteristics, this approach is to help users easily take part in ABM, positively shifting one’s emotions. We conducted a mixed methods study with 24 participants, which involves a controlled experiment with Self-Assessment Manikin as its primary measure and a semi-structured interview. Our analysis reports that the approach uniquely adds fun, exploring, and challenging features, helping improve engagement and feeling more cheerful and less under control. It also highlights the importance of personalization and consideration of gaming style, music preference, and socialization in designing a daily AR ABM game as an effective mental wellbeing intervention.
Full article

Figure 1
Open AccessArticle
Empirical Analysis of Data Sampling-Based Decision Forest Classifiers for Software Defect Prediction
by
Fatima Enehezei Usman-Hamza, Abdullateef Oluwagbemiga Balogun, Hussaini Mamman, Luiz Fernando Capretz, Shuib Basri, Rafiat Ajibade Oyekunle, Hammed Adeleye Mojeed and Abimbola Ganiyat Akintola
Software 2025, 4(2), 7; https://doi.org/10.3390/software4020007 - 21 Mar 2025
Abstract
►▼
Show Figures
The strategic significance of software testing in ensuring the success of software development projects is paramount. Comprehensive testing, conducted early and consistently across the development lifecycle, is vital for mitigating defects, especially given the constraints on time, budget, and other resources often faced
[...] Read more.
The strategic significance of software testing in ensuring the success of software development projects is paramount. Comprehensive testing, conducted early and consistently across the development lifecycle, is vital for mitigating defects, especially given the constraints on time, budget, and other resources often faced by development teams. Software defect prediction (SDP) serves as a proactive approach to identifying software components that are most likely to be defective. By predicting these high-risk modules, teams can prioritize thorough testing and inspection, thereby preventing defects from escalating to later stages where resolution becomes more resource intensive. SDP models must be continuously refined to improve predictive accuracy and performance. This involves integrating clean and preprocessed datasets, leveraging advanced machine learning (ML) methods, and optimizing key metrics. Statistical-based and traditional ML approaches have been widely explored for SDP. However, statistical-based models often struggle with scalability and robustness, while conventional ML models face challenges with imbalanced datasets, limiting their prediction efficacy. In this study, innovative decision forest (DF) models were developed to address these limitations. Specifically, this study evaluates the cost-sensitive forest (CS-Forest), forest penalizing attributes (FPA), and functional trees (FT) as DF models. These models were further enhanced using homogeneous ensemble techniques, such as bagging and boosting techniques. The experimental analysis on benchmark SDP datasets demonstrates that the proposed DF models effectively handle class imbalance, accurately distinguishing between defective and non-defective modules. Compared to baseline and state-of-the-art ML and deep learning (DL) methods, the suggested DF models exhibit superior prediction performance and offer scalable solutions for SDP. Consequently, the application of DF-based models is recommended for advancing defect prediction in software engineering and similar ML domains.
Full article

Figure 1
Open AccessReview
Designing Microservices Using AI: A Systematic Literature Review
by
Daniel Narváez, Nicolas Battaglia, Alejandro Fernández and Gustavo Rossi
Software 2025, 4(1), 6; https://doi.org/10.3390/software4010006 - 19 Mar 2025
Cited by 2
Abstract
►▼
Show Figures
Microservices architecture has emerged as a dominant approach for developing scalable and modular software systems, driven by the need for agility and independent deployability. However, designing these architectures poses significant challenges, particularly in service decomposition, inter-service communication, and maintaining data consistency. To address
[...] Read more.
Microservices architecture has emerged as a dominant approach for developing scalable and modular software systems, driven by the need for agility and independent deployability. However, designing these architectures poses significant challenges, particularly in service decomposition, inter-service communication, and maintaining data consistency. To address these issues, artificial intelligence (AI) techniques, such as machine learning (ML) and natural language processing (NLP), have been applied with increasing frequency to automate and enhance the design process. This systematic literature review examines the application of AI in microservices design, focusing on AI-driven tools and methods for improving service decomposition, decision-making, and architectural validation. This review analyzes research studies published between 2018 and 2024 that specifically focus on the application of AI techniques in microservices design, identifying key AI methods used, challenges encountered in integrating AI into microservices, and the emerging trends in this research area. The findings reveal that AI has effectively been used to optimize performance, automate design tasks, and mitigate some of the complexities inherent in microservices architectures. However, gaps remain in areas such as distributed transactions and security. The study concludes that while AI offers promising solutions, further empirical research is needed to refine AI’s role in microservices design and address the remaining challenges.
Full article

Figure 1
Open AccessArticle
A Systematic Approach for Assessing Large Language Models’ Test Case Generation Capability
by
Hung-Fu Chang and Mohammad Shokrolah Shirazi
Software 2025, 4(1), 5; https://doi.org/10.3390/software4010005 - 10 Mar 2025
Abstract
►▼
Show Figures
Software testing ensures the quality and reliability of software products, but manual test case creation is labor-intensive. With the rise of Large Language Models (LLMs), there is growing interest in unit test creation with LLMs. However, effective assessment of LLM-generated test cases is
[...] Read more.
Software testing ensures the quality and reliability of software products, but manual test case creation is labor-intensive. With the rise of Large Language Models (LLMs), there is growing interest in unit test creation with LLMs. However, effective assessment of LLM-generated test cases is limited by the lack of standardized benchmarks that comprehensively cover diverse programming scenarios. To address the assessment of an LLM’s test case generation ability and lacking a dataset for evaluation, we propose the Generated Benchmark from Control-Flow Structure and Variable Usage Composition (GBCV) approach, which systematically generates programs used for evaluating LLMs’ test generation capabilities. By leveraging basic control-flow structures and variable usage, GBCV provides a flexible framework to create a spectrum of programs ranging from simple to complex. Because GPT-4o and GPT-3.5-Turbo are publicly accessible models, to present real-world regular users’ use cases, we use GBCV to assess LLM performance on them. Our findings indicate that GPT-4o performs better on composite program structures, while all models effectively detect boundary values in simple conditions but face challenges with arithmetic computations. This study highlights the strengths and limitations of LLMs in test generation, provides a benchmark framework, and suggests directions for future improvement.
Full article

Figure 1
Open AccessArticle
On the Execution and Runtime Verification of UML Activity Diagrams
by
François Siewe and Guy Merlin Ngounou
Software 2025, 4(1), 4; https://doi.org/10.3390/software4010004 - 27 Feb 2025
Abstract
The unified modelling language (UML) is an industrial de facto standard for system modelling. It consists of a set of graphical notations (also known as diagrams) and has been used widely in many industrial applications. Although the graphical nature of UML is appealing
[...] Read more.
The unified modelling language (UML) is an industrial de facto standard for system modelling. It consists of a set of graphical notations (also known as diagrams) and has been used widely in many industrial applications. Although the graphical nature of UML is appealing to system developers, the official documentation of UML does not provide formal semantics for UML diagrams. This makes UML unsuitable for formal verification and, therefore, limited when it comes to the development of safety/security-critical systems where faults can cause damage to people, properties, or the environment. The UML activity diagram is an important UML graphical notation, which is effective in modelling the dynamic aspects of a system. This paper proposes a formal semantics for UML activity diagrams based on the calculus of context-aware ambients (CCA). An algorithm (semantic function) is proposed that maps any activity diagram onto a process in CCA, which describes the behaviours of the UML activity diagram. This process can then be executed and formally verified using the CCA simulation tool ccaPL and the CCA runtime verification tool ccaRV. Hence, design flaws can be detected and fixed early during the system development lifecycle. The pragmatics of the proposed approach are demonstrated using a case study in e-commerce.
Full article
(This article belongs to the Topic Software Engineering and Applications)
►▼
Show Figures

Figure 1
Open AccessArticle
The Scalable Detection and Resolution of Data Clumps Using a Modular Pipeline with ChatGPT
by
Nils Baumgartner, Padma Iyenghar, Timo Schoemaker and Elke Pulvermüller
Software 2025, 4(1), 3; https://doi.org/10.3390/software4010003 - 2 Feb 2025
Abstract
This paper explores a modular pipeline architecture that integrates ChatGPT, a Large Language Model (LLM), to automate the detection and refactoring of data clumps—a prevalent type of code smell that complicates software maintainability. Data clumps refer to clusters of code that are often
[...] Read more.
This paper explores a modular pipeline architecture that integrates ChatGPT, a Large Language Model (LLM), to automate the detection and refactoring of data clumps—a prevalent type of code smell that complicates software maintainability. Data clumps refer to clusters of code that are often repeated and should ideally be refactored to improve code quality. The pipeline leverages ChatGPT’s capabilities to understand context and generate structured outputs, making it suitable for addressing complex software refactoring tasks. Through systematic experimentation, our study not only addresses the research questions outlined but also demonstrates that the pipeline can accurately identify data clumps, particularly excelling in cases that require semantic understanding—where localized clumps are embedded within larger codebases. While the solution significantly enhances the refactoring workflow, facilitating the management of distributed clumps across multiple files, it also presents challenges such as occasional compiler errors and high computational costs. Feedback from developers underscores the usefulness of LLMs in software development but also highlights the essential role of human oversight to correct inaccuracies. These findings demonstrate the pipeline’s potential to enhance software maintainability, offering a scalable and efficient solution for addressing code smells in real-world projects, and contributing to the broader goal of enhancing software maintainability in large-scale projects.
Full article
(This article belongs to the Topic Applications of NLP, AI, and ML in Software Engineering)
►▼
Show Figures

Figure 1
Open AccessArticle
German Translation and Psychometric Analysis of the SOLID-SD: A German Inventory for Assessing Security Culture in Software Companies
by
Christina Glasauer, Hollie N. Pearl and Rainer W. Alexandrowicz
Software 2025, 4(1), 2; https://doi.org/10.3390/software4010002 - 24 Jan 2025
Abstract
The SOLID-S is an inventory assessing six dimensions of organizational (software) security culture, which is currently available in English. Here, we present the German version, SOLID-SD, along with its translation process and psychometric analysis. With a partial credit model based on a sample
[...] Read more.
The SOLID-S is an inventory assessing six dimensions of organizational (software) security culture, which is currently available in English. Here, we present the German version, SOLID-SD, along with its translation process and psychometric analysis. With a partial credit model based on a sample of N = 280 persons, we found, overall, highly satisfactory measurement properties for the instrument. There were no threshold permutations, no serious differential item functioning, and good item fits. The subscales’ internal consistencies and the inter-scale correlations show very high similarities between the SOLID-SD and the original English version, indicating a successful translation of the instrument.
Full article
(This article belongs to the Special Issue Software Reliability, Security and Quality Assurance)
►▼
Show Figures

Figure 1
Open AccessArticle
A Common Language of Software Evolution in Repositories (CLOSER)
by
Jordan Garrity and David Cutting
Software 2025, 4(1), 1; https://doi.org/10.3390/software4010001 - 6 Jan 2025
Abstract
►▼
Show Figures
Version Control Systems (VCSs) are used by development teams to manage the collaborative evolution of source code, and there are several widely used industry standard VCSs. In addition to the code files themselves, metadata about the changes made are also recorded by the
[...] Read more.
Version Control Systems (VCSs) are used by development teams to manage the collaborative evolution of source code, and there are several widely used industry standard VCSs. In addition to the code files themselves, metadata about the changes made are also recorded by the VCS, and this is often used with analytical tools to provide insight into the software development, a process known as Mining Software Repositories (MSRs). MSR tools are numerous but most often limited to one VCS format and, therefore, restricted in their scope of application in addition to the initial effort required to implement parsers for verbose textual VCS output. To address this limitation, a domain-specific language (DSL), the Common Language of Software Evolution in Repositories (CLOSER), was defined that abstracted away from specific implementations while isomorphically mapping to the data model of all major VCS formats. Using CLOSER directly as a data model or as an intermediate stage in a conversion analysis approach could make use of all major repositories rather than be limited to a single format. The initial barrier to adoption for MSR approaches was also lowered as CLOSER output is a concise, easily machine-readable format. CLOSER was implemented in tooling and tested against a number of common expected use cases, including a direct use in MSR analysis, proving the fidelity of the model and implementation. CLOSER was also successfully used to convert raw output logs from one VCS format to another, offering the possibility that legacy analysis tools could be used on other technologies without any changes being required. In addition to the advantages of a generic model opening all major VCS formats for analysis parsing, the CLOSER format was found to require less code and complete parsing faster than traditional VCS logging outputs.
Full article

Figure 1
Highly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
Algorithms, Applied Sciences, Electronics, MAKE, AI, Software
Applications of NLP, AI, and ML in Software Engineering
Topic Editors: Affan Yasin, Javed Ali Khan, Lijie WenDeadline: 31 August 2025
Topic in
Applied Sciences, Electronics, Informatics, Information, Software
Software Engineering and Applications
Topic Editors: Sanjay Misra, Robertas Damaševičius, Bharti SuriDeadline: 31 October 2025
Topic in
Applied Sciences, ASI, Blockchains, Computers, MAKE, Software
Recent Advances in AI-Enhanced Software Engineering and Web Services
Topic Editors: Hai Wang, Zhe HouDeadline: 31 May 2026

Conferences
Special Issues
Special Issue in
Software
Software Reliability, Security and Quality Assurance
Guest Editors: Tadashi Dohi, Junjun Zheng, Xiao-Yi ZhangDeadline: 25 December 2025
Special Issue in
Software
Women’s Special Issue Series: Software
Guest Editors: Tingting Bi, Xing Hu, Letizia JaccheriDeadline: 31 December 2025