AI-Augmented Software Engineering

A special issue of Mathematics (ISSN 2227-7390).

Deadline for manuscript submissions: 30 August 2024 | Viewed by 2282

Special Issue Editor


E-Mail Website
Guest Editor
School of Computing and Mathematic Sciences, University of Leicester, Leicester, UK
Interests: artificial creativity; creative computing
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Significant improvements in many areas of the software development lifecycle have been made based on advancements in machine learning (ML) research for software engineering (SE). Automated code generation is notable where ML models are trained on sizable code repositories to generate code snippets, assisting developers in producing software quickly. The use of ML techniques for bug detection and prediction allows the early detection of potential software flaws. As developers choose the best libraries, frameworks, and design patterns for a project, ML-powered recommendation systems have gained popularity.

These recent developments hold the promise of accelerating software development, enhancing code quality, and encouraging innovation in the area of SE as the symbiotic relationship between machine learning and software engineering continues to develop.

Prof. Dr. Hongji Yang
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

37 pages, 6035 KiB  
Article
Requirement Dependency Extraction Based on Improved Stacking Ensemble Machine Learning
by Hui Guan, Hang Xu and Lie Cai
Mathematics 2024, 12(9), 1272; https://doi.org/10.3390/math12091272 - 23 Apr 2024
Viewed by 346
Abstract
To address the cost and efficiency issues of manually analysing requirement dependency in requirements engineering, a requirement dependency extraction method based on part-of-speech features and an improved stacking ensemble learning model (P-Stacking) is proposed. Firstly, to overcome the problem of singularity in the [...] Read more.
To address the cost and efficiency issues of manually analysing requirement dependency in requirements engineering, a requirement dependency extraction method based on part-of-speech features and an improved stacking ensemble learning model (P-Stacking) is proposed. Firstly, to overcome the problem of singularity in the feature extraction process, this paper integrates part-of-speech features, TF-IDF features, and Word2Vec features during the feature selection stage. The particle swarm optimization algorithm is used to allocate weights to part-of-speech tags, which enhances the significance of crucial information in requirement texts. Secondly, to overcome the performance limitations of standalone machine learning models, an improved stacking model is proposed. The Low Correlation Algorithm and Grid Search Algorithms are utilized in P-stacking to automatically select the optimal combination of the base models, which reduces manual intervention and improves prediction performance. The experimental results show that compared with the method based on TF-IDF features, the highest F1 scores of a standalone machine learning model in the three datasets were improved by 3.89%, 10.68%, and 21.4%, respectively, after integrating part-of-speech features and Word2Vec features. Compared with the method based on a standalone machine learning model, the improved stacking ensemble machine learning model improved F1 scores by 2.29%, 5.18%, and 7.47% in the testing and evaluation of three datasets, respectively. Full article
(This article belongs to the Special Issue AI-Augmented Software Engineering)
Show Figures

Figure 1

19 pages, 4123 KiB  
Article
Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language Models
by Mohammad D. Alahmadi and Moayad Alshangiti
Mathematics 2024, 12(7), 1036; https://doi.org/10.3390/math12071036 - 30 Mar 2024
Viewed by 632
Abstract
The rapid evolution of video programming tutorials as a key educational resource has highlighted the need for effective code extraction methods. These tutorials, varying widely in video quality, present a challenge for accurately transcribing the embedded source code, crucial for learning and software [...] Read more.
The rapid evolution of video programming tutorials as a key educational resource has highlighted the need for effective code extraction methods. These tutorials, varying widely in video quality, present a challenge for accurately transcribing the embedded source code, crucial for learning and software development. This study investigates the impact of video quality on the performance of optical character recognition (OCR) engines and the potential of large language models (LLMs) to enhance code extraction accuracy. Our comprehensive empirical analysis utilizes a rich dataset of programming screencasts, involving manual transcription of source code and the application of both traditional OCR engines, like Tesseract and Google Vision, and advanced LLMs, including GPT-4V and Gemini. We investigate the efficacy of image super-resolution (SR) techniques, namely, enhanced deep super-resolution (EDSR) and multi-scale deep super-resolution (MDSR), in improving the quality of low-resolution video frames. The findings reveal significant improvements in OCR accuracy with the use of SR, particularly at lower resolutions such as 360p. LLMs demonstrate superior performance across all video qualities, indicating their robustness and advanced capabilities in diverse scenarios. This research contributes to the field of software engineering by offering a benchmark for code extraction from video tutorials and demonstrating the substantial impact of SR techniques and LLMs in enhancing the readability and reusability of code from these educational resources. Full article
(This article belongs to the Special Issue AI-Augmented Software Engineering)
Show Figures

Figure 1

38 pages, 1680 KiB  
Article
Commit-Level Software Change Intent Classification Using a Pre-Trained Transformer-Based Code Model
by Tjaša Heričko, Boštjan Šumak and Sašo Karakatič
Mathematics 2024, 12(7), 1012; https://doi.org/10.3390/math12071012 - 28 Mar 2024
Viewed by 810
Abstract
Software evolution is driven by changes made during software development and maintenance. While source control systems effectively manage these changes at the commit level, the intent behind them are often inadequately documented, making understanding their rationale challenging. Existing commit intent classification approaches, largely [...] Read more.
Software evolution is driven by changes made during software development and maintenance. While source control systems effectively manage these changes at the commit level, the intent behind them are often inadequately documented, making understanding their rationale challenging. Existing commit intent classification approaches, largely reliant on commit messages, only partially capture the underlying intent, predominantly due to the messages’ inadequate content and neglect of the semantic nuances in code changes. This paper presents a novel method for extracting semantic features from commits based on modifications in the source code, where each commit is represented by one or more fine-grained conjoint code changes, e.g., file-level or hunk-level changes. To address the unstructured nature of code, the method leverages a pre-trained transformer-based code model, further trained through task-adaptive pre-training and fine-tuning on the downstream task of intent classification. This fine-tuned task-adapted pre-trained code model is then utilized to embed fine-grained conjoint changes in a commit, which are aggregated into a unified commit-level vector representation. The proposed method was evaluated using two BERT-based code models, i.e., CodeBERT and GraphCodeBERT, and various aggregation techniques on data from open-source Java software projects. The results show that the proposed method can be used to effectively extract commit embeddings as features for commit intent classification and outperform current state-of-the-art methods of code commit representation for intent categorization in terms of software maintenance activities undertaken by commits. Full article
(This article belongs to the Special Issue AI-Augmented Software Engineering)
Show Figures

Figure 1

Back to TopTop