Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (334)

Search Parameters:
Keywords = automated code generation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
29 pages, 8095 KB  
Article
Analysis of Security Vulnerabilities in S-100-Based Maritime Navigation Software
by Hoyeon Cho, Changui Lee and Seojeong Lee
Sensors 2026, 26(4), 1246; https://doi.org/10.3390/s26041246 (registering DOI) - 14 Feb 2026
Abstract
The S-100 standard for Electronic Chart Display and Information Systems (ECDIS) uses Lua scripts to render electronic charts, yet lacks security specifications for script execution. This paper evaluates automated Static Application Security Testing (SAST) tools versus expert manual review for S-100-compliant software. Four [...] Read more.
The S-100 standard for Electronic Chart Display and Information Systems (ECDIS) uses Lua scripts to render electronic charts, yet lacks security specifications for script execution. This paper evaluates automated Static Application Security Testing (SAST) tools versus expert manual review for S-100-compliant software. Four SAST tools were applied alongside an expert review of OpenS100, a reference implementation for next-generation ECDIS. While automated tools identified numerous defects, they failed to detect 83% (19/23) of expert-identified vulnerabilities, including an unrestricted Lua interpreter flaw with a Common Vulnerability Scoring System (CVSS) score of 9.3. This vulnerability enables Remote Code Execution (RCE) via malicious portrayal catalogues, verified through Proof of Concept (PoC) development. The analysis demonstrates that SAST tools are constrained by limited maritime domain knowledge and challenges in analyzing cross-language semantic risks at the C++–Lua interface. The findings establish that identified vulnerabilities stem from specification gaps in the S-100 standard rather than isolated coding errors. These results indicate that functional safety certifications require supplementation to address design-level security risks. The evidence supports that the International Hydrographic Organization (IHO) incorporate security controls, such as script sandboxing and library restrictions, into the S-100 framework before the 2029 mandatory adoption deadline. Full article
24 pages, 10369 KB  
Article
AI-Driven Methods in Façade Design
by Sanghyun Son and Hyoensu Kim
Buildings 2026, 16(4), 782; https://doi.org/10.3390/buildings16040782 - 13 Feb 2026
Abstract
This study proposes an integrated façade design framework that harmonizes the creative divergence of Generative AI with the economic efficiency of Design for Manufacturing and Assembly (DfMA). To address low productivity in the construction industry, a stepwise pipeline is developed, synthesizing image generation [...] Read more.
This study proposes an integrated façade design framework that harmonizes the creative divergence of Generative AI with the economic efficiency of Design for Manufacturing and Assembly (DfMA). To address low productivity in the construction industry, a stepwise pipeline is developed, synthesizing image generation via Midjourney, automated coding using ChatGPT, and quantitative optimization. Central to this process is the Hamming Distance algorithm, which evaluates image similarity to implement core DfMA principles: standardization and simplification. The study introduces a multidimensional decision-making model utilizing Grid Size (GS), Replacement Rate (RR), and Hamming Threshold (HT) indices to visualize the trade-off between component minimization and design fidelity. This process transforms abstract 2D patterns into manufacturable geometric panels, bridging the gap between conceptual design and constructability. The results demonstrate that algorithmic optimization significantly reduces component count, contributing to potential cost savings and schedule reduction. Ultimately, this research establishes a collaborative model where architects’ qualitative insights complement AI’s quantitative analysis, enabling designers to regain agency over digital tools and realize creative visions within technical constraints. Full article
(This article belongs to the Section Building Structures)
18 pages, 2094 KB  
Article
Reliability of LLM Inference Engines from a Static Perspective: Root Cause Analysis and Repair Suggestion via Natural Language Reports
by Hongwei Li and Yongjun Wang
Big Data Cogn. Comput. 2026, 10(2), 60; https://doi.org/10.3390/bdcc10020060 - 13 Feb 2026
Viewed by 31
Abstract
Large Language Model (LLM) inference engines are becoming critical system infrastructure, yet their increasing architectural complexity makes defects difficult to be diagnosed and repaired. Existing reliability studies predominantly focus on model behavior or training frameworks, leaving inference engine bugs underexplored, especially in settings [...] Read more.
Large Language Model (LLM) inference engines are becoming critical system infrastructure, yet their increasing architectural complexity makes defects difficult to be diagnosed and repaired. Existing reliability studies predominantly focus on model behavior or training frameworks, leaving inference engine bugs underexplored, especially in settings where execution-based debugging is impractical. We present a static, issue-centric approach for automated root cause analysis and repair suggestion generation for LLM inference engines. Based solely on issue reports and developer discussions, we construct a real-world defect dataset and annotate each issue with a semantic root cause category and affected system module. Leveraging text-based representations, our framework performs root cause classification and coarse-grained module localization without requiring code execution or specialized runtime environments. We further integrate structured repair patterns with a large language model to generate interpretable and actionable repair suggestions. Experiments on real-world issues concerning vLLMs demonstrate that our approach achieves effective root cause identification and module localization under limited and imbalanced data. A cross-engine evaluation further shows promising generalization to TensorRT-LLM. Human evaluation confirms that the generated repair suggestions are correct, useful, and clearly expressed. Our results indicate that static, issue-level analysis is a viable foundation for scalable debugging assistance in LLM inference engines. This work highlights the feasibility of static, issue-level defect analysis for complex LLM inference engines and explores automated debugging assistance techniques. The dataset and implementation will be publicly released to facilitate future research. Full article
21 pages, 551 KB  
Article
Agentic RAG for Maritime AIoT: Natural Language Access to Structured Data
by Oxana Sachenkova, Melker Andreasson, Dongzhu Tan and Alisa Lincke
Sensors 2026, 26(4), 1227; https://doi.org/10.3390/s26041227 - 13 Feb 2026
Viewed by 96
Abstract
Maritime operations are increasingly reliant on sensor data to drive efficiency and enhance decision-making. However, despite rapid advances in large language models, including expanded context windows and stronger generative capabilities, critical industrial settings still require secure, role-constrained access to enterprise data and explicit [...] Read more.
Maritime operations are increasingly reliant on sensor data to drive efficiency and enhance decision-making. However, despite rapid advances in large language models, including expanded context windows and stronger generative capabilities, critical industrial settings still require secure, role-constrained access to enterprise data and explicit limitation of model context. Retrieval-Augmented Generation (RAG) remains essential to enforce data minimization, preserve privacy, support verifiability, and meet regulatory obligations by retrieving only permissioned, provenance-tracked slices of information at query time. However, current RAG solutions lack robust validation protocols for numerical accuracy for high-stakes industrial applications. This paper introduces Lighthouse Bot, a novel Agentic RAG system specifically designed to provide natural-language access to complex maritime sensor data, including time-series and relational sensor data. The system addresses a critical need for verifiable autonomous data analysis within the Artificial Intelligence of Things (AIoT) domain, which we explore through a case study on optimizing ferry operations. We present a detailed architecture that integrates a Large Language Model with a specialized database and coding agents to transform natural language into executable tasks, enabling core AIoT capabilities such as generating Python code for time-series analysis, executing complex SQL queries on relational sensor databases, and automating workflows, while keeping sensitive data outside the prompt and ensuring auditable, policy-aligned tool use. To evaluate performance, we designed a test suite of 24 questions with ground-truth answers, categorized by query complexity (simple, moderate, complex) and data interaction type (retrieval, aggregation, analysis). Our results show robust, controlled data access with high factual fidelity: the proprietary Claude 3.7 achieved close to 90% overall factual correctness, while the open-source Qwen 72B achieved 66% overall and 99% on simple retrieval and aggregation queries. These findings underscore the need for a secure limited-context RAG in maritime AIoT and the potential for cost-effective automation of routine exploratory analyses. Full article
Show Figures

Figure 1

28 pages, 1177 KB  
Article
Context-Aware Code Review Automation: A Retrieval-Augmented Approach
by Büşra İçöz and Göksel Biricik
Appl. Sci. 2026, 16(4), 1875; https://doi.org/10.3390/app16041875 - 13 Feb 2026
Viewed by 59
Abstract
Manual code review is essential for software quality, but often slows down development cycles due to the high time demands on developers. In this study, we propose an automated solution for Python (version 3.13) projects that generates code review comments by combining Large [...] Read more.
Manual code review is essential for software quality, but often slows down development cycles due to the high time demands on developers. In this study, we propose an automated solution for Python (version 3.13) projects that generates code review comments by combining Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG). To achieve this, we first curated a dataset from GitHub pull requests (PRs) using the GitHub REST Application Programming Interface (API) (version 2022-11-28) and classified comments into semantic categories using a semi-supervised Support Vector Machine (SVM) model. During the review process, our system uses a vector database to retrieve the top-k most relevant historical comments, providing context for a diverse spectrum of open-weights LLMs, including DeepSeek-Coder-33B, Qwen2.5-Coder-32B, Codestral-22B, CodeLlama-13B, Mistral-Instruct-7B, and Phi-3-Mini. We evaluated the system using a multi-step validation that combined standard metrics (BLEU-4, ROUGE-L, cosine similarity) with an LLM-as-a-Judge approach, and verified the results through targeted human review to ensure consistency with expert standards. The findings show that retrieval augmentation improves feedback relevance for larger models, with DeepSeek-Coder’s alignment score increasing by 17.9% at a retrieval depth of k = 3. In contrast, smaller models such as Phi-3-Mini suffered from context collapse, where too much context reduced accuracy. To manage this trade-off, we built a hybrid expert system that routes each task to the most suitable model. Our results indicate that the proposed approach improved performance by 13.2% compared to the zero-shot baseline (k = 0). In addition, our proposed system reduces hallucinations and generates comments that closely align with the standards expected from the experts. Full article
(This article belongs to the Special Issue Artificial Intelligence in Software Engineering)
Show Figures

Figure 1

25 pages, 2777 KB  
Article
An IFC-Based Framework for Automated Integration of Structural Analysis Results to Support BIM-Based Code Compliance
by Wonbok Lee, Yurim Jeong, Woosung Jeong, Youngsu Yu, Sang I. Park and Bonsang Koo
Buildings 2026, 16(4), 746; https://doi.org/10.3390/buildings16040746 - 12 Feb 2026
Viewed by 66
Abstract
As the digitalization of construction standards accelerates, the integration of structural analysis results into Building Information Modeling (BIM) environments has become a critical prerequisite for effective BIM-based Automated Code Checking (ACC), particularly for structural code compliance. In current practice, structural analysis results generated [...] Read more.
As the digitalization of construction standards accelerates, the integration of structural analysis results into Building Information Modeling (BIM) environments has become a critical prerequisite for effective BIM-based Automated Code Checking (ACC), particularly for structural code compliance. In current practice, structural analysis results generated by Computer-Aided Engineering (CAE) tools are often manually transferred into IFC-based BIM models, leading to inefficiencies and increased risk of human error. To address this limitation, this study proposes an extended IFC-based representation, termed IFC-KR-Structure, designed to systematically organize and manage section-wise and load combination-dependent structural analysis results required for code compliance within the IFC environment. Based on the proposed schema, an automated CAE-to-BIM integration module was implemented within the IFC-KR Toolkit to enable direct integration of analysis results generated by a commercial CAE tool (midas Civil NX) into IFC models. The approach establishes consistent element correspondence between structural and BIM models through coordinate alignment and spatial mapping procedures and represents multidimensional analysis results using a schema-compliant, tabular data structure embedded within IFC models. The applicability of the proposed framework was validated using a prestressed concrete girder bridge case, confirming that structural analysis results were accurately mapped, stored, visualized, and subsequently utilized within a BIM-based ACC workflow. The results demonstrate that the proposed approach enables systematic reintegration of CAE-generated analysis results into BIM models and significantly improves the efficiency, consistency, and reliability of BIM-based code compliance processes. Full article
(This article belongs to the Section Construction Management, and Computers & Digitization)
Show Figures

Figure 1

29 pages, 2919 KB  
Article
A Model-Driven Engineering Approach to AI-Powered Healthcare Platforms
by Mira Raheem, Neamat Eltazi, Michael Papazoglou, Bernd Krämer and Amal Elgammal
Informatics 2026, 13(2), 32; https://doi.org/10.3390/informatics13020032 - 11 Feb 2026
Viewed by 106
Abstract
Artificial intelligence (AI) has the potential to transform healthcare by supporting more accurate diagnoses and personalized treatments. However, its adoption in practice remains constrained by fragmented data sources, strict privacy rules, and the technical complexity of building reliable clinical systems. To address these [...] Read more.
Artificial intelligence (AI) has the potential to transform healthcare by supporting more accurate diagnoses and personalized treatments. However, its adoption in practice remains constrained by fragmented data sources, strict privacy rules, and the technical complexity of building reliable clinical systems. To address these challenges, we introduce a model-driven engineering (MDE) framework designed specifically for healthcare AI. The framework relies on formal metamodels, domain-specific languages (DSLs), and automated transformations to move from high-level specifications to running software. At its core is the Medical Interoperability Language (MILA), a graphical DSL that enables clinicians and data scientists to define queries and machine learning pipelines using shared ontologies. When combined with a federated learning architecture, MILA allows institutions to collaborate without exchanging raw patient data, ensuring semantic consistency across sites while preserving privacy. We evaluate this approach in a multi-center cancer immunotherapy study. The generated pipelines delivered strong predictive performance, with best-performing models achieving up to 98.5% accuracy on selected prediction tasks, while substantially reducing manual coding effort. These findings suggest that MDE principles—metamodeling, semantic integration, and automated code generation—can provide a practical path toward interoperable, reproducible, and reliable digital health platforms. Full article
(This article belongs to the Section Health Informatics)
Show Figures

Figure 1

20 pages, 2816 KB  
Article
Benchmarking Large Language Models for Embedded Systems Programming in Microcontroller-Driven IoT Applications
by Marek Babiuch and Pavel Smutný
Future Internet 2026, 18(2), 94; https://doi.org/10.3390/fi18020094 - 11 Feb 2026
Viewed by 156
Abstract
Large language models (LLMs) have shown strong potential for automated code generation in software development, yet their effectiveness in embedded systems programming—requiring understanding of software logic and hardware constraints—has not been well studied. Existing evaluation frameworks do not comprehensively cover practical microcontroller development [...] Read more.
Large language models (LLMs) have shown strong potential for automated code generation in software development, yet their effectiveness in embedded systems programming—requiring understanding of software logic and hardware constraints—has not been well studied. Existing evaluation frameworks do not comprehensively cover practical microcontroller development scenarios in real-world Internet of Things (IoT) projects. This study systematically evaluates 27 state-of-the-art LLMs across eight embedded systems scenarios of increasing complexity, from basic sensor reading to complete cloud database integration with visualization dashboards. Using ESP32 microcontrollers with environmental and motion sensors, we employed the Analytic Hierarchy Process with four weighted criteria: functional, instructions, output and creativity, evaluated independently by two expert reviewers. Top-performing models were Claude Sonnet 4.5, Claude Opus 4.1, and Gemini 2.5 Pro, with scores from 0.984 to 0.910. Performance degraded with complexity: 19–23 models generated compilable code for simple applications, but only 3–5 produced functional solutions for complex scenarios involving Grafana and cloud databases. The most frequent failure was hallucinated non-existent libraries or incorrect API usage, with functional capability as the primary barrier and instruction-following quality the key differentiator among competent models. These findings provide empirical guidance for embedded developers on LLM selection and identify limitations of zero-shot prompting for hardware-dependent IoT development. Full article
Show Figures

Figure 1

26 pages, 2172 KB  
Systematic Review
Artificial Intelligence for Infrastructure-as-Code—A Systematic Literature Review
by Claus Pahl, Övgüm Can Sezen and Florian Hofer
Electronics 2026, 15(4), 755; https://doi.org/10.3390/electronics15040755 - 10 Feb 2026
Viewed by 174
Abstract
ingInfrastructure-as-Code (IaC) is a systems management practice that involves managing and provisioning computing infrastructure through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools. IaC is an essential contribution to the complete automation of the entire software lifecycle in a [...] Read more.
ingInfrastructure-as-Code (IaC) is a systems management practice that involves managing and provisioning computing infrastructure through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools. IaC is an essential contribution to the complete automation of the entire software lifecycle in a Development and Operations (DevOps) context. The deployment and management of software through coded configuration, monitoring, and analysis is the IaC solution. In recent times, artificial intelligence (AI)—including generative AI, machine learning, and related techniques—offers opportunities to improve techniques across the IaC life cycle from IaC code generation to its deployment and runtime analysis. We conducted a comprehensive and systematic literature review for all IaC code development and operations phases, considering IaC as a specific software type that we map to the DevOps model. We present the bibliographic review results and investigate in which phases and how AI can enhance IaC techniques by extracting a framework of phase-specific AI contributions and research challenges, contrasting, in particular, generative AI and machine-learning applications across the phases. Key findings include Large Language Models (LLMs) dominating generation and Machine Learning (ML) dominating analysis activities, also showing that operations phases are less studied than IaC development. This review extends previous literature reviews by covering the full DevOps lifecycle, developing a phase-specific taxonomy of AI techniques for IaC, and aligning a comprehensive analysis of research challenges and directions with those that benefit developers by highlighting current innovations and pointing researchers to future directions. Full article
Show Figures

Figure 1

24 pages, 2506 KB  
Article
CEVD: Cluster-Based Ensemble Learning for Cross-Project Vulnerability Detection
by Yang Cao, Yunwei Dong and Jie Liu
Future Internet 2026, 18(2), 85; https://doi.org/10.3390/fi18020085 - 5 Feb 2026
Viewed by 132
Abstract
Deep learning has become an important approach for automated software vulnerability detection. However, due to domain shift, existing models often suffer from significant performance degradation when applied to unseen projects. To address this issue, prior studies have widely adopted Domain Adaptation (DA) techniques [...] Read more.
Deep learning has become an important approach for automated software vulnerability detection. However, due to domain shift, existing models often suffer from significant performance degradation when applied to unseen projects. To address this issue, prior studies have widely adopted Domain Adaptation (DA) techniques to improve cross-project generalization. Nevertheless, these methods typically rely on the implicit “project-as-domain” assumption and require access to target project data during training, which limits their applicability in practice. To overcome these limitations, this paper proposes a vulnerability detection framework that combines semantic clustering with ensemble-based Domain Generalization (DG), termed Cluster-based Ensemble Learning for Vulnerability Detection (CEVD). CEVD first performs unsupervised clustering on code semantic embeddings to automatically automatically identify latent semantic structures that transcend project boundaries, constructing pseudo-domains with intra-domain homogeneity. A soft domain labeling strategy is further introduced to model the membership of samples in multiple pseudo-domains, preserving semantic overlap across boundaries. Building upon this, CEVD employs an ensemble learning framework that jointly trains multiple expert models and a domain classifier. The predictions of these experts are dynamically fused based on the weights generated by the domain classifier, enabling effective vulnerability detection on unseen projects without requiring access to target data. Extensive experiments on real-world datasets demonstrate that CEVD consistently outperforms state-of-the-art baselines across different pre-trained backbone models. This work demonstrates the effectiveness of semantic structure mining in capturing latent domains and offers a practical solution for improving generalization in cross-project vulnerability detection. Full article
(This article belongs to the Special Issue Security of Computer System and Network)
Show Figures

Figure 1

15 pages, 1101 KB  
Article
Assessing Welfare in Ex Situ Lowland Tapirs Through Activity Patterns and Machine Learning
by Paw O. F. Christensen, Mads H. Clausen, Thea L. Faddersbøll, Frej Gammelgård, Silje M. Lund, Alexander P. M. Nielsen, Jonas Nielsen, Nynne H. Olsen, Tobias K. Olsen, Sussie Pagh and Cino Pertoldi
J. Zool. Bot. Gard. 2026, 7(1), 11; https://doi.org/10.3390/jzbg7010011 - 3 Feb 2026
Viewed by 194
Abstract
This study evaluates activity patterns and determines optimal observation periods for assessing the welfare of lowland tapirs (Tapirus terrestris L.) housed in the following two Danish zoological institutions: Aalborg Zoo and Randers Regnskov. The objectives were to identify the most efficient time [...] Read more.
This study evaluates activity patterns and determines optimal observation periods for assessing the welfare of lowland tapirs (Tapirus terrestris L.) housed in the following two Danish zoological institutions: Aalborg Zoo and Randers Regnskov. The objectives were to identify the most efficient time window for welfare assessments, determine whether machine learning (ML) could support behavioral evaluations by providing automated estimates of activity, and examine whether automated pose-based tracking could serve as a proxy for manual ethogram observations. Behavioral data were collected using standardized ethograms from wildlife camera footage recorded over 72 h. Lowland tapirs were generally more active during daytime, with individuals at Aalborg Zoo showing peak activity between 07:00 and 14:00, while those at Randers Regnskov were most active between 12:00 and 18:00. Activity patterns differed between institutions, with Aalborg individuals displaying concentrated activity peaks and Randers individuals showing more evenly distributed activity. A preliminary ML analysis using the pose-estimation tool SLEAP demonstrated that movement-based activity estimates closely matched manually coded data, suggesting that automated tracking may offer an efficient and non-invasive tool for welfare monitoring. The findings highlight the potential for integrating automated analysis into routine welfare assessments of zoo-housed animals. Full article
Show Figures

Figure 1

16 pages, 1397 KB  
Article
ODEL: An Experience-Augmented Self-Evolving Framework for Efficient Python-to-C++ Code Translation
by Kaiyuan Feng, Furong Peng and Jiayue Wu
Appl. Sci. 2026, 16(3), 1506; https://doi.org/10.3390/app16031506 - 2 Feb 2026
Viewed by 257
Abstract
Automated code translation plays an important role in improving software reusability and supporting system migration, particularly in scenarios where Python implementations need to be converted into efficient C++ programs. However, existing approaches often rely heavily on large external models or static inference pipelines, [...] Read more.
Automated code translation plays an important role in improving software reusability and supporting system migration, particularly in scenarios where Python implementations need to be converted into efficient C++ programs. However, existing approaches often rely heavily on large external models or static inference pipelines, which limits their ability to improve translation quality over time.To address these challenges, this paper proposes ODEL, an On-Demand Experience-enhanced Learning framework for Python-to-C++ code translation. ODEL adopts a hybrid inference architecture in which a lightweight internal model performs routine translation, while a more capable external model is selectively invoked upon verification failure to conduct error analysis and generate structured experience records. These experience records are accumulated and reused across subsequent translation phases, enabling progressive improvement through a closed-loop workflow that integrates generation, verification, consideration, and experience refinement. Experiments on the HumanEval-X benchmark demonstrate that ODEL significantly improves translation accuracy compared with competitive baselines. Specifically, the framework increases Pass@1 from 71.82% to 81.10% and Pass@10 from 74.30% to 89.02%, and exhibits a consistent performance improvement across multiple translation phases. These results indicate that experience reuse within a continuous task stream can effectively enhance automated code translation without modifying model parameters. Full article
(This article belongs to the Special Issue AI-Enabled Next-Generation Computing and Its Applications)
Show Figures

Figure 1

31 pages, 1573 KB  
Article
Generalised Cross-Dialectal Arabic Question Answering Through Adaptive Code-Mixed Data Augmentation
by Maha Jarallah Althobaiti
Information 2026, 17(2), 139; https://doi.org/10.3390/info17020139 - 1 Feb 2026
Viewed by 230
Abstract
Modern Standard Arabic (MSA) and the many regional dialects differ substantially in vocabulary, morphology, and pragmatic usage. Most available annotated resources are in MSA, and zero-shot transfer from MSA to dialectal tasks suffers a large performance drop. This paper addresses generalised cross-dialectal Arabic [...] Read more.
Modern Standard Arabic (MSA) and the many regional dialects differ substantially in vocabulary, morphology, and pragmatic usage. Most available annotated resources are in MSA, and zero-shot transfer from MSA to dialectal tasks suffers a large performance drop. This paper addresses generalised cross-dialectal Arabic question answering (QA), where the context and the question are written in different Arabic varieties. We propose a training-free augmentation framework that generates code-mixed questions to bridge lexical gaps across Arabic varieties. The method produces semantically faithful, balanced code-mixed questions through the following two-stage procedure: lexicon-based partial substitution with semantic similarity and substitution-rate constraints, followed by fallback neural machine translation with word-level alignment when needed. We also introduce automated multidialectal lexicon construction using machine translation, embedding-based alignment, and semantic checks. We carry out our evaluation in a zero-shot setting, where the model is fine-tuned only on MSA and then tested on dialectal inputs using ArDQA, covering five Arabic varieties and three domains (SQuAD, Vlogs, and Narratives). Experiments show consistent improvements under context-question dialect mismatch as follows: +1.09 F1/+0.87 EM on SQuAD, +1.54/+1.25 on Vlogs, and +2.75/+2.27 on Narratives, with the largest gains for Maghrebi questions in Narratives (+12.13 F1/+8.45 EM). These results show that our method improves zero-shot cross-dialectal transfer without fine-tuning or retraining. Full article
Show Figures

Graphical abstract

24 pages, 702 KB  
Article
AI-Driven Code Documentation: Comparative Evaluation of LLMs for Commit Message Generation
by Mohamed Mehdi Trigui, Wasfi G. Al-Khatib, Mohammad Amro and Fatma Mallouli
Computers 2026, 15(2), 87; https://doi.org/10.3390/computers15020087 - 1 Feb 2026
Viewed by 280
Abstract
Commit messages are essential for understanding software evolution and maintaining traceability of projects; however, their quality varies across repositories. Recent Large Language Models provide a promising path to automate this task by generating concise context-sensitive commit messages directly from code diffs. This paper [...] Read more.
Commit messages are essential for understanding software evolution and maintaining traceability of projects; however, their quality varies across repositories. Recent Large Language Models provide a promising path to automate this task by generating concise context-sensitive commit messages directly from code diffs. This paper provides a comparative study of three paradigms of large language models: zero-shot prompting, retrieval-augmented generation, and fine-tuning, using the large-scale CommitBench dataset that spans six programming languages. We assess the performance of the models with automatic metrics, namely BLEU, ROUGE-L, METEOR, and Adequacy, and a human assessment of 100 commits. In the latter, experienced developers rated each generated commit message for Adequacy and Fluency on a five-point Likert scale. The results show that fine-tuning and domain adaptation yield models that perform consistently better than general-purpose baselines across all evaluation metrics, thus generating commit messages with higher semantic adequacy and clearer phrasing than zero-shot approaches. The correlation analysis suggests that the Adequacy and BLEU scores are closer to human judgment, while ROUGE-L and METEOR tend to underestimate the quality in cases where the models generate stylistically diverse or paraphrased outputs. Finally, the study outlines a conceptual integration pathway for incorporating such models into software development workflows, emphasizing a human-in-the-loop approach for quality assurance. Full article
Show Figures

Figure 1

21 pages, 3332 KB  
Article
MPC-Coder: A Dual-Knowledge Enhanced Multi-Agent System with Closed-Loop Verification for PLC Code Generation
by Yinggang Zhang, Weiyi Xia, Ben Zhao, Tongwen Yuan and Xianchuan Yu
Symmetry 2026, 18(2), 248; https://doi.org/10.3390/sym18020248 - 30 Jan 2026
Viewed by 248
Abstract
Industrial PLC programming faces persistent difficulties: lengthy development cycles, low fault tolerance, and cross-platform incompatibility among vendors. While LLMs show promise for automated code generation, their direct application is hindered by the gap between ambiguous natural language and the strict determinism required by [...] Read more.
Industrial PLC programming faces persistent difficulties: lengthy development cycles, low fault tolerance, and cross-platform incompatibility among vendors. While LLMs show promise for automated code generation, their direct application is hindered by the gap between ambiguous natural language and the strict determinism required by control logic. This paper proposes MPC-Coder, a dual-knowledge enhanced multi-agent system that addresses this gap. The system combines a structured knowledge graph that imposes hard constraints on process parameters and equipment specifications with a vector database that offers implementation references such as code templates and function blocks. These two knowledge sources form a symmetric complementary architecture. A closed-loop “generation–verification–repair” mechanism leverages formal verification tools to iteratively refine the generated code. Experiments demonstrate that MPC-Coder achieves 100% syntactic correctness and 78% functional consistency, significantly outperforming general-purpose LLMs. The results indicate that the complementary fusion of domain knowledge and closed-loop verification effectively enhances the reliability of code generation, offering a viable technical pathway for the reliable application of LLMs in industrial control systems. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

Back to TopTop