MDPI - Publisher of Open Access Journals

26 pages, 613 KB

Open AccessArticle

AutoQALLMs: Automating Web Application Testing Using Large Language Models (LLMs) and Selenium

by Sindhupriya Mallipeddi, Muhammad Yaqoob, Javed Ali Khan, Tahir Mehmood, Alexios Mylonas and Nikolaos Pitropakis

Computers 2025, 14(11), 501; https://doi.org/10.3390/computers14110501 - 18 Nov 2025

Viewed by 1375

Abstract

Modern web applications change frequently in response to user and market needs, making their testing challenging. Manual testing and automation methods often struggle to keep up with these changes. We propose an automated testing framework, AutoQALLMs, that utilises various LLMs (Large Language Models), [...] Read more.

Modern web applications change frequently in response to user and market needs, making their testing challenging. Manual testing and automation methods often struggle to keep up with these changes. We propose an automated testing framework, AutoQALLMs, that utilises various LLMs (Large Language Models), including GPT-4, Claude, and Grok, alongside Selenium WebDriver, BeautifulSoup, and regular expressions. This framework enables one-click testing, where users provide a URL as input and receive test results as output, thus eliminating the need for human intervention. It extracts HTML (Hypertext Markup Language) elements from the webpage and utilises the LLMs API to generate Selenium-based test scripts. Regular expressions enhance the clarity and maintainability of these scripts. The scripts are executed automatically, and the results, such as pass/fail status and error details, are displayed to the tester. This streamlined input–output process forms the core foundation of the AutoQALLMs framework. We evaluated the framework on 30 websites. The results show that the system drastically reduces the time needed to create test cases, achieves broad test coverage (96%) with Claude 4.5 LLM, which is competitive with manual scripts (98%), and allows for rapid regeneration of tests in response to changes in webpage structure. Software testing expert feedback confirmed that the proposed AutoQALLMs method for automated web application testing enables faster regression testing, reduces manual effort, and maintains reliable test execution. However, some limitations remain in handling complex page changes and validation. Although Claude 4.5 achieved slightly higher test coverage in the comparative evaluation of the proposed experiment, GPT-4 was selected as the default model for AutoQALLMs due to its cost-efficiency, reproducibility, and stable script generation across diverse websites. Future improvements may focus on increasing accuracy, adding self-healing techniques, and expanding to more complex testing scenarios. Full article

(This article belongs to the Special Issue Best Practices, Challenges and Opportunities in Software Engineering)

► Show Figures

Figure 1

22 pages, 1027 KB

Open AccessArticle

Probing the Topology of the Space of Tokens with Structured Prompts

by Michael Robinson, Sourya Dey and Taisa Kushner

Mathematics 2025, 13(20), 3320; https://doi.org/10.3390/math13203320 - 17 Oct 2025

Viewed by 538

Abstract

Some large language models (LLMs) are open source and are therefore fully open for scientific study. However, many LLMs are proprietary, and their internals are hidden, which hinders the ability of the research community to study their behavior under controlled conditions. For instance, [...] Read more.

Some large language models (LLMs) are open source and are therefore fully open for scientific study. However, many LLMs are proprietary, and their internals are hidden, which hinders the ability of the research community to study their behavior under controlled conditions. For instance, the token input embedding specifies an internal vector representation of each token used by the model. If the token input embedding is hidden, latent semantic information about the set of tokens is unavailable to researchers. This article presents a general and flexible method for prompting an LLM to reveal its token input embedding, even if this information is not published with the model. Moreover, this article provides strong theoretical justification—a mathematical proof for generic LLMs—for why this method should be expected to work. If the LLM can be prompted systematically and certain benign conditions about the quantity of data collected from the responses are met, the topology of the token embedding is recovered. With this method in hand, we demonstrate its effectiveness by recovering the token subspace of the Llemma-7BLLM. We demonstrate the flexibility of this method by performing the recovery at three different times, each using the same algorithm applied to different information collected from the responses. While the prompting can be a performance bottleneck depending on the size and complexity of the LLM, the recovery runs within a few hours on a typical workstation. The results of this paper apply not only to LLMs but also to general nonlinear autoregressive processes. Full article

(This article belongs to the Special Issue New Perspectives in Harmonic Analysis)

► Show Figures

Figure 1

27 pages, 490 KB

Open AccessArticle

Dynamic Asymmetric Attention for Enhanced Reasoning and Interpretability in LLMs

by Feng Wen, Xiaoming Lu, Haikun Yu, Chunyang Lu, Huijie Li and Xiayang Shi

Symmetry 2025, 17(8), 1303; https://doi.org/10.3390/sym17081303 - 12 Aug 2025

Viewed by 1312

Abstract

The remarkable success of autoregressive Large Language Models (LLMs) is predicated on the causal attention mechanism, which enforces a static and rigid form of informational asymmetry by permitting each token to attend only to its predecessors. While effective for sequential generation, this hard-coded [...] Read more.

The remarkable success of autoregressive Large Language Models (LLMs) is predicated on the causal attention mechanism, which enforces a static and rigid form of informational asymmetry by permitting each token to attend only to its predecessors. While effective for sequential generation, this hard-coded unidirectional constraint fails to capture the more complex, dynamic, and nonlinear dependencies inherent in sophisticated reasoning, logical inference, and discourse. In this paper, we challenge this paradigm by introducing Dynamic Asymmetric Attention (DAA), a novel mechanism that replaces the static causal mask with a learnable context-aware guidance module. DAA dynamically generates a continuous-valued attention bias for each query–key pair, effectively learning a “soft” information flow policy that guides rather than merely restricts the model’s focus. Trained end-to-end, our DAA-augmented models demonstrate significant performance gains on a suite of benchmarks, including improvements in perplexity on language modeling and notable accuracy boosts on complex reasoning tasks such as code generation (HumanEval) and mathematical problem-solving (GSM8k). Crucially, DAA provides a new lens for model interpretability. By visualizing the learned asymmetric attention patterns, it is possible to uncover the implicit information flow graphs that the model constructs during inference. These visualizations reveal how the model dynamically prioritizes evidence and forges directed logical links in chain-of-thought reasoning, making its decision-making process more transparent. Our work demonstrates that transitioning from a static hard-wired asymmetry to a learned and dynamic one not only enhances model performance but also paves the way for a new class of more capable and profoundly more explainable LLMs. Full article

(This article belongs to the Special Issue Symmetry/Asymmetry Studies in Data Mining & Machine Learning of Large Language Models)

► Show Figures

Figure 1

14 pages, 5649 KB

Open AccessArticle

One-Shot Autoregressive Generation of Combinatorial Optimization Solutions Based on the Large Language Model Architecture and Learning Algorithms

by Bishad Ghimire, Ausif Mahmood and Khaled Elleithy

AI 2025, 6(4), 66; https://doi.org/10.3390/ai6040066 - 26 Mar 2025

Viewed by 3196

Abstract

Large Language Models (LLMs) have immensely advanced the field of Artificial Intelligence (AI), with recent models being able to perform chain-of-thought reasoning and solve complex mathematical problems, ranging from theorem proving to ones involving advanced calculus. The success of LLMs derives from a [...] Read more.

Large Language Models (LLMs) have immensely advanced the field of Artificial Intelligence (AI), with recent models being able to perform chain-of-thought reasoning and solve complex mathematical problems, ranging from theorem proving to ones involving advanced calculus. The success of LLMs derives from a combination of the Transformer architecture with its attention mechanism, the autoregressive training methodology with masked attention, and the alignment fine-tuning via reinforcement learning algorithms. In this research, we attempt to explore a possible solution to the fundamental NP-hard problem of combinatorial optimization, in particular, the Traveling Salesman Problem (TSP), by following the LLM approach in terms of the architecture and training algorithms. Similar to the LLM design, which is trained in an autoregressive manner to predict the next token, our model is trained to predict the next node in a TSP graph. After the model is trained on random TSP graphs with known near-optimal solutions, we fine-tune the model using Direct Preference Optimization (DPO). The tour generation in a trained model is autoregressive one-step generation with no need for iterative refinement. Our results are very promising and indicate that, for TSP graphs up to 100 nodes, a relatively small amount of training data yield solutions within a few percent of the optimal. This optimization improves if more data are used to train the model. Full article

(This article belongs to the Section AI Systems: Theory and Applications)

► Show Figures

Figure 1

29 pages, 549 KB

Open AccessReview

Generative Models in Medical Visual Question Answering: A Survey

by Wenjie Dong, Shuhao Shen, Yuqiang Han, Tao Tan, Jian Wu and Hongxia Xu

Appl. Sci. 2025, 15(6), 2983; https://doi.org/10.3390/app15062983 - 10 Mar 2025

Cited by 10 | Viewed by 9963

Abstract

Medical Visual Question Answering (MedVQA) is a crucial intersection of artificial intelligence and healthcare. It enables systems to interpret medical images—such as X-rays, MRIs, and pathology slides—and respond to clinical queries. Early approaches primarily relied on discriminative models, which select answers from predefined [...] Read more.

Medical Visual Question Answering (MedVQA) is a crucial intersection of artificial intelligence and healthcare. It enables systems to interpret medical images—such as X-rays, MRIs, and pathology slides—and respond to clinical queries. Early approaches primarily relied on discriminative models, which select answers from predefined candidates. However, these methods struggle to effectively address open-ended, domain-specific, or complex queries. Recent advancements have shifted the focus toward generative models, leveraging autoregressive decoders, large language models (LLMs), and multimodal large language models (MLLMs) to generate more nuanced and free-form answers. This review comprehensively examines the paradigm shift from discriminative to generative systems, examining generative MedVQA works on their model architectures and training process, summarizing evaluation benchmarks and metrics, highlighting key advances and techniques that propels the development of generative MedVQA, such as concept alignment, instruction tuning, and parameter-efficient fine-tuning (PEFT), alongside strategies for data augmentation and automated dataset creation. Finally, we propose future directions to enhance clinical reasoning and intepretability, build robust evaluation benchmarks and metrics, and employ scalable training strategies and deployment solutions. By analyzing the strengths and limitations of existing generative MedVQA approaches, we aim to provide valuable insights for researchers and practitioners working in this domain. Full article

(This article belongs to the Special Issue Feature Review Papers in "Computing and Artificial Intelligence")

► Show Figures

Figure 1

12 pages, 1034 KB

Open AccessArticle

Urinary Bladder Acute Inflammations and Nephritis of the Renal Pelvis: Diagnosis Using Fine-Tuned Large Language Models

by Mohammad Khaleel Sallam Ma’aitah, Abdulkader Helwan and Abdelrahman Radwan

J. Pers. Med. 2025, 15(2), 45; https://doi.org/10.3390/jpm15020045 - 24 Jan 2025

Cited by 3 | Viewed by 3087

Abstract

Background: Large language models (LLMs) have seen a significant boost recently in the field of natural language processing (NLP) due to their capabilities in analyzing words. These autoregressive models prove robust in classification tasks where texts need to be analyzed and classified. Objectives: [...] Read more.

Background: Large language models (LLMs) have seen a significant boost recently in the field of natural language processing (NLP) due to their capabilities in analyzing words. These autoregressive models prove robust in classification tasks where texts need to be analyzed and classified. Objectives: In this paper, we explore the power of base LLMs such as Generative Pre-trained Transformer 2 (GPT-2), Bidirectional Encoder Representations from Transformers (BERT), Distill-BERT, and TinyBERT in diagnosing acute inflammations of the urinary bladder and nephritis of the renal pelvis. Materials and Methods: the LLMs were trained and tested using supervised fine-tuning (SFT) on a dataset of 120 examples that include symptoms that may indicate the occurrence of these two conditions. Results: By employing a supervised fine-tuning method and carefully crafted prompts to present the data, we demonstrate the feasibility of using minimal training data to achieve a reasonable diagnostic, with overall testing accuracies of 100%, 100%, 94%, and 79%, for GPT-2, BERT, Distill-BERT, and TinyBERT, respectively. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence in Personalized Medicine: Diagnosis and Treatment)

► Show Figures

Figure 1

43 pages, 4570 KB

Open AccessArticle

Fine-Tuning Retrieval-Augmented Generation with an Auto-Regressive Language Model for Sentiment Analysis in Financial Reviews

by Miehleketo Mathebula, Abiodun Modupe and Vukosi Marivate

Appl. Sci. 2024, 14(23), 10782; https://doi.org/10.3390/app142310782 - 21 Nov 2024

Cited by 6 | Viewed by 6252

Abstract

Sentiment analysis is a well-known task that has been used to analyse customer feedback reviews and media headlines to detect the sentimental personality or polarisation of a given text. With the growth of social media and other online platforms, like Twitter (now branded [...] Read more.

Sentiment analysis is a well-known task that has been used to analyse customer feedback reviews and media headlines to detect the sentimental personality or polarisation of a given text. With the growth of social media and other online platforms, like Twitter (now branded as X), Facebook, blogs, and others, it has been used in the investment community to monitor customer feedback, reviews, and news headlines about financial institutions’ products and services to ensure business success and prioritise aspects of customer relationship management. Supervised learning algorithms have been popularly employed for this task, but the performance of these models has been compromised due to the brevity of the content and the presence of idiomatic expressions, sound imitations, and abbreviations. Additionally, the pre-training of a larger language model (PTLM) struggles to capture bidirectional contextual knowledge learnt through word dependency because the sentence-level representation fails to take broad features into account. We develop a novel structure called language feature extraction and adaptation for reviews (LFEAR), an advanced natural language model that amalgamates retrieval-augmented generation (RAG) with a conversation format for an auto-regressive fine-tuning model (ARFT). This helps to overcome the limitations of lexicon-based tools and the reliance on pre-defined sentiment lexicons, which may not fully capture the range of sentiments in natural language and address questions on various topics and tasks. LFEAR is fine-tuned on Hellopeter reviews that incorporate industry-specific contextual information retrieval to show resilience and flexibility for various tasks, including analysing sentiments in reviews of restaurants, movies, politics, and financial products. The proposed model achieved an average precision score of 98.45%, answer correctness of 93.85%, and context precision of 97.69% based on Retrieval-Augmented Generation Assessment (RAGAS) metrics. The LFEAR model is effective in conducting sentiment analysis across various domains due to its adaptability and scalable inference mechanism. It considers unique language characteristics and patterns in specific domains to ensure accurate sentiment annotation. This is particularly beneficial for individuals in the financial sector, such as investors and institutions, including those listed on the Johannesburg Stock Exchange (JSE), which is the primary stock exchange in South Africa and plays a significant role in the country’s financial market. Future initiatives will focus on incorporating a wider range of data sources and improving the system’s ability to express nuanced sentiments effectively, enhancing its usefulness in diverse real-world scenarios. Full article

(This article belongs to the Special Issue Applications of Data Science and Artificial Intelligence)

► Show Figures

Figure 1

14 pages, 4102 KB

Open AccessArticle

Electric Vehicle Sentiment Analysis Using Large Language Models

by Hemlata Sharma, Faiz Ud Din and Bayode Ogunleye

Analytics 2024, 3(4), 425-438; https://doi.org/10.3390/analytics3040023 - 1 Nov 2024

Cited by 5 | Viewed by 4879

Abstract

Sentiment analysis is a technique used to understand the public’s opinion towards an event, product, or organization. For example, sentiment analysis can be used to understand positive or negative opinions or attitudes towards electric vehicle (EV) brands. This provides companies with valuable insight [...] Read more.

Sentiment analysis is a technique used to understand the public’s opinion towards an event, product, or organization. For example, sentiment analysis can be used to understand positive or negative opinions or attitudes towards electric vehicle (EV) brands. This provides companies with valuable insight into the public’s opinion of their products and brands. In the field of natural language processing (NLP), transformer models have shown great performance compared to traditional machine learning algorithms. However, these models have not been explored extensively in the EV domain. EV companies are becoming significant competitors in the automotive industry and are projected to cover up to 30% of the United States light vehicle market by 2030 In this study, we present a comparative study of large language models (LLMs) including bidirectional encoder representations from transformers (BERT), robustly optimised BERT (RoBERTa), and a generalised autoregressive pre-training method (XLNet) using Lucid Motors and Tesla Motors YouTube datasets. Results evidenced that LLMs like BERT and her variants are off-the-shelf algorithms for sentiment analysis, specifically when fine-tuned. Furthermore, our findings present the need for domain adaptation whilst utilizing LLMs. Finally, the experimental results showed that RoBERTa achieved consistent performance across the EV datasets with an F1 score of at least 92%. Full article

► Show Figures

Figure 1

19 pages, 386 KB

Open AccessEditor’s ChoiceArticle

A Mathematical Interpretation of Autoregressive Generative Pre-Trained Transformer and Self-Supervised Learning

by Minhyeok Lee

Mathematics 2023, 11(11), 2451; https://doi.org/10.3390/math11112451 - 25 May 2023

Cited by 20 | Viewed by 14193

Abstract

In this paper, we present a rigorous mathematical examination of generative pre-trained transformer (GPT) models and their autoregressive self-supervised learning mechanisms. We begin by defining natural language space and knowledge space, which are two key concepts for understanding the dimensionality reduction process in [...] Read more.

In this paper, we present a rigorous mathematical examination of generative pre-trained transformer (GPT) models and their autoregressive self-supervised learning mechanisms. We begin by defining natural language space and knowledge space, which are two key concepts for understanding the dimensionality reduction process in GPT-based large language models (LLMs). By exploring projection functions and their inverses, we establish a framework for analyzing the language generation capabilities of these models. We then investigate the GPT representation space, examining its implications for the models’ approximation properties. Finally, we discuss the limitations and challenges of GPT models and their learning mechanisms, considering trade-offs between complexity and generalization, as well as the implications of incomplete inverse projection functions. Our findings demonstrate that GPT models possess the capability to encode knowledge into low-dimensional vectors through their autoregressive self-supervised learning mechanism. This comprehensive analysis provides a solid mathematical foundation for future advancements in GPT-based LLMs, promising advancements in natural language processing tasks such as language translation, text summarization, and question answering due to improved understanding and optimization of model training and performance. Full article

(This article belongs to the Special Issue Current Trends in Natural Language Processing (NLP) and Human Language Technology (HLT))

► Show Figures

Figure 1

Search Results (9)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (9)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI