Saved Queries

In this work, the utility of multimodal vision–language models (VLMs) for visual product understanding in e-commerce is investigated, focusing on two complementary models: ColQwen2 (vidore/colqwen2-v1.0) and ColPali (vidore/colpali-v1.2-hf). These models are integrated into two architectures and evaluated across various product interpretation tasks, including image-grounded question answering, brand recognition and visual retrieval based on natural language prompts. ColQwen2, built on the Qwen2-VL backbone with LoRA-based adapter hot-swapping, demonstrates strong performance, allowing end-to-end image querying and text response synthesis. It excels at identifying attributes such as brand, color or usage based solely on product images and responds fluently to user questions. In contrast, ColPali, which utilizes the PaliGemma backbone, is optimized for explainability. It delivers detailed visual-token alignment maps that reveal how specific regions of an image contribute to retrieval decisions, offering transparency ideal for diagnostics or educational applications. Through comparative experiments using footwear imagery, it is demonstrated that ColQwen2 is highly effective in generating accurate responses to product-related questions, while ColPali provides fine-grained visual explanations that reinforce trust and model accountability. Full article

►▼ Show Figures

Figure 1

13 pages, 564 KiB

Open AccessArticle

Enhanced Semantic Retrieval with Structured Prompt and Dimensionality Reduction for Big Data

by Donghyeon Kim, Minki Park, Jungsun Lee, Inho Lee, Jeonghyeon Jin and Yunsick Sung

Mathematics 2025, 13(15), 2469; https://doi.org/10.3390/math13152469 - 31 Jul 2025

Abstract

The exponential increase in textual data generated across sectors such as healthcare, finance, and smart manufacturing has intensified the need for effective Big Data analytics. Large language models (LLMs) have become critical tools because of their advanced language processing capabilities. However, their static nature limits their ability to incorporate real-time and domain-specific knowledge. Retrieval-augmented generation (RAG) addresses these limitations by enriching LLM outputs through external content retrieval. Nevertheless, traditional RAG systems remain inefficient, often exhibiting high retrieval latency, redundancy, and diminished response quality when scaled to large datasets. This paper proposes an innovative structured RAG framework specifically designed for large-scale Big Data analytics. The framework transforms unstructured partial prompts into structured semantically coherent partial prompts, leveraging element-specific embedding models and dimensionality reduction techniques, such as principal component analysis. To further improve the retrieval accuracy and computational efficiency, we introduce a multi-level filtering approach integrating semantic constraints and redundancy elimination. In the experiments, the proposed method was compared with structured-format RAG. After generating prompts utilizing two methods, silhouette scores were computed to assess the quality of embedding clusters. The proposed method outperformed the baseline by improving the clustering quality by 32.3%. These results demonstrate the effectiveness of the framework in enhancing LLMs for accurate, diverse, and efficient decision-making in complex Big Data environments. Full article

(This article belongs to the Special Issue Big Data Analysis, Computing and Applications)

►▼ Show Figures

Figure 1

34 pages, 2642 KiB

Open AccessArticle

Strengths and Weaknesses of LLM-Based and Rule-Based NLP Technologies and Their Potential Synergies

by Nikitas Ν. Karanikolas, Eirini Manga, Nikoletta Samaridi, Vaios Stergiopoulos, Eleni Tousidou and Michael Vassilakopoulos

Electronics 2025, 14(15), 3064; https://doi.org/10.3390/electronics14153064 (registering DOI) - 31 Jul 2025

Abstract

Large Language Models (LLMs) have been the cutting-edge technology in natural language processing (NLP) in recent years, making machine-generated text indistinguishable from human-generated text. On the other hand, “rule-based” Natural Language Generation (NLG) and Natural Language Understanding (NLU) algorithms were developed in earlier years, and they have performed well in certain areas of Natural Language Processing (NLP). Today, an arduous task that arises is how to estimate the quality of the produced text. This process depends on the aspects of text that you need to assess, varying from correct grammar and syntax to more intriguing aspects such as coherence and semantical fluency. Although the performance of LLMs is high, the challenge is whether LLMs can cooperate with rule-based NLG/NLU technology by leveraging their assets to overcome LLMs’ weak points. This paper presents the basics of these two families of technologies and the applications, strengths, and weaknesses of each approach, analyzes the different ways of evaluating a machine-generated text, and, lastly, focuses on a first-level approach of possible combinations of these two approaches to enhance performance in specific tasks. Full article

(This article belongs to the Special Issue Natural Language Processing Based on Neural Networks and Large Language Models)

►▼ Show Figures

Figure 1

18 pages, 919 KiB

Open AccessArticle

Timing of Intervals Between Utterances in Typically Developing Infants and Infants Later Diagnosed with Autism Spectrum Disorder

by Zahra Poursoroush, Gordon Ramsay, Ching-Chi Yang, Eugene H. Buder, Edina R. Bene, Pumpki Lei Su, Hyunjoo Yoo, Helen L. Long, Cheryl Klaiman, Moira L. Pileggi, Natalie Brane and D. Kimbrough Oller

Brain Sci. 2025, 15(8), 819; https://doi.org/10.3390/brainsci15080819 (registering DOI) - 30 Jul 2025

Abstract

Background: Understanding the origin and natural organization of early infant vocalizations is important for predicting communication and language abilities in later years. The very frequent production of speech-like vocalizations (hereafter “protophones”), occurring largely independently of interaction, is part of this developmental process. Objectives: This study aims to investigate the gap durations (time intervals) between protophones, comparing typically developing (TD) infants and infants later diagnosed with autism spectrum disorder (ASD) in a naturalistic setting where endogenous protophones occur frequently. Additionally, we explore potential age-related variations and sex differences in gap durations. Methods: We analyzed ~1500 five min recording segments from longitudinal all-day home recordings of 147 infants (103 TD infants and 44 autistic infants) during their first year of life. The data included over 90,000 infant protophones. Human coding was employed to ensure maximally accurate timing data. This method included the human judgment of gap durations specified based on time-domain and spectrographic displays. Results and Conclusions: Short gap durations occurred between protophones produced by infants, with a mode between 301 and 400 ms, roughly the length of an infant syllable, across all diagnoses, sex, and age groups. However, we found significant differences in the gap duration distributions between ASD and TD groups when infant-directed speech (IDS) was relatively frequent, as well as across age groups and sexes. The Generalized Linear Modeling (GLM) results confirmed these findings and revealed longer gap durations associated with higher IDS, female sex, older age, and TD diagnosis. Age-related differences and sex differences were highly significant for both diagnosis groups. Full article

(This article belongs to the Special Issue Biomarker Development in the Early Identification of Autism Spectrum Disorders)

►▼ Show Figures

Figure 1

16 pages, 2431 KiB

Open AccessArticle

AppHerb: Language Model for Recommending Traditional Thai Medicine

by Thanawat Piyasawetkul, Suppachai Tiyaworanant and Tarapong Srisongkram

AI 2025, 6(8), 170; https://doi.org/10.3390/ai6080170 - 29 Jul 2025

Viewed by 255

Abstract

Trust in Traditional Thai Medicine (TTM) among Thai people has been reduced due to a lack of objective standards and the susceptibility of the general population to false information. The emergence of generative artificial intelligence (Gen AI) has significantly impacted various industries, including traditional medicine. However, previous Gen AI models have primarily focused on prescription generation based on Traditional Chinese Medicine (TCM), leaving TTM unexplored. To address this gap, we propose a novel fast-learning fine-tuned language model fortified with TTM knowledge. We utilized textual data from two TTM textbooks, Wat Ratcha-orasaram Ratchaworawihan (WRO), and Tamra Osot Phra Narai (NR), to fine-tune Unsloth’s Gemma-2 with 9 billion parameters. We developed two specialized TTM tasks: treatment prediction (TrP) and herbal recipe generation (HRG). The TrP and HRG models achieved precision, recall, and F1 scores of 26.54%, 28.14%, and 24.00%, and 32.51%, 24.42%, and 24.84%, respectively. Performance evaluation against TCM-based generative models showed comparable precision, recall, and F1 results with a smaller knowledge corpus. We further addressed the challenges of utilizing Thai, a low-resource and linguistically complex language. Unlike English or Chinese, Thai lacks explicit sentence boundary markers and employs an abugida writing system without spaces between words, complicating text segmentation and generation. These characteristics pose significant difficulties for machine understanding and limit model accuracy. Despite these obstacles, our work establishes a foundation for further development of AI-assisted TTM applications and highlights both the opportunities and challenges in applying language models to traditional medicine knowledge systems in Thai language contexts. Full article

(This article belongs to the Section Medical & Healthcare AI)

►▼ Show Figures

Graphical abstract

23 pages, 2002 KiB

Open AccessArticle

Precision Oncology Through Dialogue: AI-HOPE-RTK-RAS Integrates Clinical and Genomic Insights into RTK-RAS Alterations in Colorectal Cancer

by Ei-Wen Yang, Brigette Waldrup and Enrique Velazquez-Villarreal

Biomedicines 2025, 13(8), 1835; https://doi.org/10.3390/biomedicines13081835 - 28 Jul 2025

Viewed by 322

Abstract

Background/Objectives: The RTK-RAS signaling cascade is a central axis in colorectal cancer (CRC) pathogenesis, governing cellular proliferation, survival, and therapeutic resistance. Somatic alterations in key pathway genes—including KRAS, NRAS, BRAF, and EGFR—are pivotal to clinical decision-making in precision oncology. However, the integration of these genomic events with clinical and demographic data remains hindered by fragmented resources and a lack of accessible analytical frameworks. To address this challenge, we developed AI-HOPE-RTK-RAS, a domain-specialized conversational artificial intelligence (AI) system designed to enable natural language-based, integrative analysis of RTK-RAS pathway alterations in CRC. Methods: AI-HOPE-RTK-RAS employs a modular architecture combining large language models (LLMs), a natural language-to-code translation engine, and a backend analytics pipeline operating on harmonized multi-dimensional datasets from cBioPortal. Unlike general-purpose AI platforms, this system is purpose-built for real-time exploration of RTK-RAS biology within CRC cohorts. The platform supports mutation frequency profiling, odds ratio testing, survival modeling, and stratified analyses across clinical, genomic, and demographic parameters. Validation included reproduction of known mutation trends and exploratory evaluation of co-alterations, therapy response, and ancestry-specific mutation patterns. Results: AI-HOPE-RTK-RAS enabled rapid, dialogue-driven interrogation of CRC datasets, confirming established patterns and revealing novel associations with translational relevance. Among early-onset CRC (EOCRC) patients, the prevalence of RTK-RAS alterations was significantly lower compared to late-onset disease (67.97% vs. 79.9%; OR = 0.534, p = 0.014), suggesting the involvement of alternative oncogenic drivers. In KRAS-mutant patients receiving Bevacizumab, early-stage disease (Stages I–III) was associated with superior overall survival relative to Stage IV (p = 0.0004). In contrast, BRAF-mutant tumors with microsatellite-stable (MSS) status displayed poorer prognosis despite higher chemotherapy exposure (OR = 7.226, p < 0.001; p = 0.0000). Among EOCRC patients treated with FOLFOX, RTK-RAS alterations were linked to worse outcomes (p = 0.0262). The system also identified ancestry-enriched noncanonical mutations—including CBL, MAPK3, and NF1—with NF1 mutations significantly associated with improved prognosis (p = 1 × 10⁻⁵). Conclusions: AI-HOPE-RTK-RAS exemplifies a new class of conversational AI platforms tailored to precision oncology, enabling integrative, real-time analysis of clinically and biologically complex questions. Its ability to uncover both canonical and ancestry-specific patterns in RTK-RAS dysregulation—especially in EOCRC and populations with disproportionate health burdens—underscores its utility in advancing equitable, personalized cancer care. This work demonstrates the translational potential of domain-optimized AI tools to accelerate biomarker discovery, support therapeutic stratification, and democratize access to multi-omic analysis. Full article

(This article belongs to the Special Issue Advancements in Artificial Intelligence (AI) for Cancer Genomics and Genetics)

►▼ Show Figures

Figure 1

50 pages, 9419 KiB

Open AccessReview

A Survey of Loss Functions in Deep Learning

by Caiyi Li, Kaishuai Liu and Shuai Liu

Mathematics 2025, 13(15), 2417; https://doi.org/10.3390/math13152417 - 27 Jul 2025

Viewed by 141

Abstract

Deep learning (DL), as a cutting-edge technology in artificial intelligence, has significantly impacted fields such as computer vision and natural language processing. Loss function determines the convergence speed and accuracy of the DL model and has a crucial impact on algorithm quality and model performance. However, most of the existing studies focus on the improvement of specific problems of loss function, which lack a systematic summary and comparison, especially in computer vision and natural language processing tasks. Therefore, this paper reclassifies and summarizes the loss functions in DL and proposes a new category of metric loss. Furthermore, this paper conducts a fine-grained division of regression loss, classification loss, and metric loss, elaborating on the existing problems and improvements. Finally, the new trend of compound loss and generative loss is anticipated. The proposed paper provides a new perspective for loss function division and a systematic reference for researchers in the DL field. Full article

(This article belongs to the Special Issue Advances in Applied Mathematics in Computer Vision)

►▼ Show Figures

Figure 1

17 pages, 8512 KiB

Open AccessArticle

Interactive Holographic Display System Based on Emotional Adaptability and CCNN-PCG

by Yu Zhao, Zhong Xu, Ting-Yu Zhang, Meng Xie, Bing Han and Ye Liu

Electronics 2025, 14(15), 2981; https://doi.org/10.3390/electronics14152981 - 26 Jul 2025

Viewed by 252

Abstract

Against the backdrop of the rapid advancement of intelligent speech interaction and holographic display technologies, this paper introduces an interactive holographic display system. This paper applies 2D-to-3D technology to acquisition work and uses a Complex-valued Convolutional Neural Network Point Cloud Gridding (CCNN-PCG) algorithm to generate a computer-generated hologram (CGH) with depth information for application in point cloud data. During digital human hologram building, 2D-to-3D conversion yields high-precision point cloud data. The system uses ChatGLM for natural language processing and emotion-adaptive responses, enabling multi-turn voice dialogs and text-driven model generation. The CCNN-PCG algorithm reduces computational complexity and improves display quality. Simulations and experiments show that CCNN-PCG enhances reconstruction quality and speeds up computation by over 2.2 times. This research provides a theoretical framework and practical technology for holographic interactive systems, applicable in virtual assistants, educational displays, and other fields. Full article

(This article belongs to the Special Issue Artificial Intelligence, Computer Vision and 3D Display)

►▼ Show Figures

Figure 1

23 pages, 8564 KiB

Open AccessArticle

VisRep: Towards an Automated, Reflective AI System for Documenting Visualisation Design Processes

by Aron E. Owen and Jonathan C. Roberts

Mach. Learn. Knowl. Extr. 2025, 7(3), 72; https://doi.org/10.3390/make7030072 - 25 Jul 2025

Viewed by 198

Abstract

VisRep (Visualisation Report) is an AI-powered system for capturing and structuring the early stages of the visualisation design process. It addresses a critical gap in predesign: the lack of tools that can naturally record, organise, and transform raw ideation, spoken thoughts, sketches, and evolving concepts into polished, shareable outputs. Users engage in talk-aloud sessions through a terminal-style interface supported by intelligent transcription and eleven structured questions that frame intent, audience, and output goals. These inputs are then processed by a large language model (LLM) guided by markdown-based output templates for reports, posters, and slides. The system aligns free-form ideas with structured communication using prompt engineering to ensure clarity, coherence, and visual consistency. VisRep not only automates the generation of professional deliverables but also enhances reflective practice by bridging spontaneous ideation and structured documentation. This paper introduces VisRep’s methodology, interface design, and AI-driven workflow, demonstrating how it improves the fidelity and transparency of the visualisation design process across academic, professional, and creative domains. Full article

(This article belongs to the Section Visualization)

18 pages, 516 KiB

Open AccessArticle

A Nested Named Entity Recognition Model Robust in Few-Shot Learning Environments Using Label Description Information

by Hyunsun Hwang, Youngjun Jung, Changki Lee and Wooyoung Go

Appl. Sci. 2025, 15(15), 8255; https://doi.org/10.3390/app15158255 - 24 Jul 2025

Viewed by 180

Abstract

Nested named entity recognition (NER) is a task that identifies hierarchically structured entities, where one entity can contain other entities within its span. This study introduces a nested NER model for few-shot learning environments, addressing the difficulty of building extensive datasets for general named entities. We enhance the Biaffine nested NER model by modifying its output layer to incorporate label semantic information through a novel label description embedding (LDE) approach, improving performance with limited training data. Our method replaces the traditional biaffine classifier with a label attention mechanism that leverages comprehensive natural language descriptions of entity types, encoded using BERT to capture rich semantic relationships between labels and input spans. We conducted comprehensive experiments on four benchmark datasets: GENIA (nested NER), ACE 2004 (nested NER), ACE 2005 (nested NER), and CoNLL 2003 English (flat NER). Performance was evaluated across multiple few-shot scenarios (1-shot, 5-shot, 10-shot, and 20-shot) using F1-measure as the primary metric, with five different random seeds to ensure robust evaluation. We compared our approach against strong baselines including BERT-LSTM-CRF with nested tags, the original Biaffine model, and recent few-shot NER methods (FewNER, FIT, LPNER, SpanNER). Results demonstrate significant improvements across all few-shot scenarios. On GENIA, our LDE model achieves 45.07% F1 in five-shot learning compared to 30.74% for the baseline Biaffine model (46.4% relative improvement). On ACE 2005, we obtain 44.24% vs. 32.38% F1 in five-shot scenarios (36.6% relative improvement). The model shows consistent gains in 10-shot (57.19% vs. 49.50% on ACE 2005) and 20-shot settings (64.50% vs. 58.21% on ACE 2005). Ablation studies confirm that semantic information from label descriptions is the key factor enabling robust few-shot performance. Transfer learning experiments demonstrate the model’s ability to leverage knowledge from related domains. Our findings suggest that incorporating label semantic information can substantially enhance NER models in low-resource settings, opening new possibilities for applying NER in specialized domains or languages with limited annotated data. Full article

(This article belongs to the Special Issue Applications of Natural Language Processing to Data Science)

►▼ Show Figures

Figure 1

17 pages, 609 KiB

Open AccessArticle

GPT-Based Text-to-SQL for Spatial Databases

by Hui Wang, Li Guo, Yubin Liang, Le Liu and Jiajin Huang

ISPRS Int. J. Geo-Inf. 2025, 14(8), 288; https://doi.org/10.3390/ijgi14080288 - 24 Jul 2025

Viewed by 195

Abstract

Text-to-SQL for spatial databases enables the translation of natural language questions into corresponding SQL queries, allowing non-experts to easily access spatial data, which has gained increasing attention from researchers. Previous research has primarily focused on rule-based methods. However, these methods have limitations when dealing with complicated or unknown natural language questions. While advanced machine learning models can be trained, they typically require large labeled training datasets, which are severely lacking for spatial databases. Recently, Generative Pre-Trained Transformer (GPT) models have emerged as a promising paradigm for Text-to-SQL tasks in relational databases, driven by carefully designed prompts. In response to the severe lack of datasets for spatial databases, we have created a publicly available dataset that supports both English and Chinese. Furthermore, we propose a GPT-based method to construct prompts for spatial databases, which incorporates geographic and spatial database knowledge into the prompts and requires only a small number of training samples, such as 1, 3, or 5 examples. Extensive experiments demonstrate that incorporating geographic and spatial database knowledge into prompts improves the accuracy of Text-to-SQL tasks for spatial databases. Our proposed method can help non-experts access spatial databases more easily and conveniently. Full article

►▼ Show Figures

Figure 1

20 pages, 2786 KiB

Open AccessArticle

Inverse Kinematics-Augmented Sign Language: A Simulation-Based Framework for Scalable Deep Gesture Recognition

by Binghao Wang, Lei Jing and Xiang Li

Algorithms 2025, 18(8), 463; https://doi.org/10.3390/a18080463 - 24 Jul 2025

Viewed by 193

Abstract

In this work, we introduce IK-AUG, a unified algorithmic framework for kinematics-driven data augmentation tailored to sign language recognition (SLR). Departing from traditional augmentation techniques that operate at the pixel or feature level, our method integrates inverse kinematics (IK) and virtual simulation to synthesize anatomically valid gesture sequences within a structured 3D environment. The proposed system begins with sparse 3D keypoints extracted via a pose estimator and projects them into a virtual coordinate space. A differentiable IK solver based on forward-and-backward constrained optimization is then employed to reconstruct biomechanically plausible joint trajectories. To emulate natural signer variability and enhance data richness, we define a set of parametric perturbation operators spanning spatial displacement, depth modulation, and solver sensitivity control. These operators are embedded into a generative loop that transforms each original gesture sample into a diverse sequence cluster, forming a high-fidelity augmentation corpus. We benchmark our method across five deep sequence models (CNN3D, TCN, Transformer, Informer, and Sparse Transformer) and observe consistent improvements in accuracy and convergence. Notably, Informer achieves 94.1% validation accuracy with IK-AUG enhanced training, underscoring the framework’s efficacy. These results suggest that algorithmic augmentation via kinematic modeling offers a scalable, annotation free pathway for improving SLR systems and lays the foundation for future integration with multi-sensor inputs in hybrid recognition pipelines. Full article

(This article belongs to the Section Algorithms for Multidisciplinary Applications)

►▼ Show Figures

Figure 1

16 pages, 1143 KiB

Open AccessArticle

AI-Driven Automated Test Generation Framework for VCU: A Multidimensional Coupling Approach Integrating Requirements, Variables and Logic

by Guangyao Wu, Xiaoming Xu and Yiting Kang

World Electr. Veh. J. 2025, 16(8), 417; https://doi.org/10.3390/wevj16080417 - 24 Jul 2025

Viewed by 245

Abstract

This paper proposes an AI-driven automated test generation framework for vehicle control units (VCUs), integrating natural language processing (NLP) and dynamic variable binding. To address the critical limitation of traditional AI-generated test cases lacking executable variables, the framework establishes a closed-loop transformation from requirements to executable code through a five-layer architecture: (1) structured parsing of PDF requirements using domain-adaptive prompt engineering; (2) construction of a multidimensional variable knowledge graph; (3) semantic atomic decomposition of requirements and logic expression generation; (4) dynamic visualization of cause–effect graphs; (5) path-sensitization-driven optimization of test sequences. Validated on VCU software from a leading OEM, the method achieves 97.3% variable matching accuracy and 100% test case executability, reducing invalid cases by 63% compared to conventional NLP approaches. This framework provides an explainable and traceable automated solution for intelligent vehicle software validation, significantly enhancing efficiency and reliability in automotive testing. Full article

(This article belongs to the Special Issue Intelligent Electric Vehicle Control, Testing and Evaluation)

►▼ Show Figures

Figure 1

14 pages, 1129 KiB

Open AccessArticle

Entropy-Guided KV Caching for Efficient LLM Inference

by Heekyum Kim and Yuchul Jung

Mathematics 2025, 13(15), 2366; https://doi.org/10.3390/math13152366 - 23 Jul 2025

Viewed by 296

Abstract

Large language models (LLMs), built upon Transformer architectures, have demonstrated remarkable performance in a wide range of natural language processing tasks. However, their practical deployment—especially in long-context scenarios—is often hindered by the computational and memory costs associated with managing the key–value (KV) cache during inference. Optimizing this process is therefore crucial for improving LLM efficiency and scalability. In this study, we propose a novel entropy-guided KV caching strategy that leverages the distribution characteristics of attention scores within each Transformer layer. Specifically, we compute the entropy of attention weights for each head and use the average entropy of all heads within a layer to assess the layer’s contextual importance. Higher-entropy layers—those exhibiting broader attention dispersion—are allocated larger KV cache budgets, while lower-entropy (sink-like) layers are assigned smaller budgets. Instead of selecting different key–value tokens per head, our method selects a common set of important tokens per layer, based on aggregated attention scores, and caches them uniformly across all heads within the same layer. This design preserves the structural integrity of multi-head attention while enabling efficient token selection during the prefilling phase. The experimental results demonstrate that our approach improves cache utilization and inference speed without compromising generation quality. For example, on the Qwen3 4B model, our method reduces memory usage by 4.18% while preserving ROUGE score, and on Mistral 0.1v 7B, it reduces decoding time by 46.6%, highlighting entropy-guided layer analysis as a principled mechanism for scalable long-context language modeling. Full article

(This article belongs to the Special Issue Mathematics and Applications)

►▼ Show Figures

Figure 1

19 pages, 460 KiB

Open AccessArticle

Refining Text2Cypher on Small Language Model with Reinforcement Learning Leveraging Semantic Information

by Quoc-Bao-Huy Tran, Aagha Abdul Waheed, Syed Mudasir and Sun-Tae Chung

Appl. Sci. 2025, 15(15), 8206; https://doi.org/10.3390/app15158206 - 23 Jul 2025

Viewed by 205

Abstract

Text2Cypher is a text-to-text task that converts natural language questions into Cypher queries. Recent research by Neo4j on Text2Cypher demonstrates that fine-tuning a baseline language model (a pretrained and instruction-tuned generative model) using a comprehensive Text2Cypher dataset can effectively enhance query generation performance. However, the improvement is still insufficient for effectively learning the syntax and semantics of complex natural texts, particularly when applied to unseen Cypher schema structures across diverse domains during training. To address this challenge, we propose a novel refinement training method based on baseline language models, employing reinforcement learning with Group Relative Policy Optimization (GRPO). This method leverages extracted semantic information, such as key-value properties and triple relationships from input texts during the training process. Experimental results of the proposed refinement training method applied to a small-scale baseline language model (SLM) like Qwen2.5-3B-Instruct demonstrate that it achieves competitive execution accuracy scores on unseen schemas across various domains. Furthermore, the proposed method significantly outperforms most baseline LMs with larger parameter sizes in terms of Google-BLEU and execution accuracy scores over Neo4j’s comprehensive Text2Cypher dataset, with the exception of colossal LLMs such as GPT4o, GPT4o-mini, and Gemini. Full article

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 28.

Go to page 1 2 3 4 5

Search Results (1,364)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI