MDPI - Publisher of Open Access Journals

27 pages, 3641 KiB

Open AccessArticle

TriagE-NLU: A Natural Language Understanding System for Clinical Triage and Intervention in Multilingual Emergency Dialogues

by Béatrix-May Balaban, Ioan Sacală and Alina-Claudia Petrescu-Niţă

Future Internet 2025, 17(7), 314; https://doi.org/10.3390/fi17070314 - 18 Jul 2025

Viewed by 156

Abstract

Telemedicine in emergency contexts presents unique challenges, particularly in multilingual and low-resource settings where accurate, clinical understanding and triage decision support are critical. This paper introduces TriagE-NLU, a novel multilingual natural language understanding system designed to perform both semantic parsing and clinical intervention [...] Read more.

Telemedicine in emergency contexts presents unique challenges, particularly in multilingual and low-resource settings where accurate, clinical understanding and triage decision support are critical. This paper introduces TriagE-NLU, a novel multilingual natural language understanding system designed to perform both semantic parsing and clinical intervention classification from emergency dialogues. The system is built on a federated learning architecture to ensure data privacy and adaptability across regions and is trained using TriageX, a synthetic, clinically grounded dataset covering five languages (English, Spanish, Romanian, Arabic, and Mandarin). TriagE-NLU integrates fine-tuned multilingual transformers with a hybrid rules-and-policy decision engine, enabling it to parse structured medical information (symptoms, risk factors, temporal markers) and recommend appropriate interventions based on recognized patterns. Evaluation against strong multilingual baselines, including mT5, mBART, and XLM-RoBERTa, demonstrates superior performance by TriagE-NLU, achieving F1 scores of 0.91 for semantic parsing and 0.89 for intervention classification, along with 0.92 accuracy and a BLEU score of 0.87. These results validate the system’s robustness in multilingual emergency telehealth and its ability to generalize across diverse input scenarios. This paper establishes a new direction for privacy-preserving, AI-assisted triage systems. Full article

(This article belongs to the Section Big Data and Augmented Intelligence)

► Show Figures

Figure 1

27 pages, 1817 KiB

Open AccessArticle

A Large Language Model-Based Approach for Multilingual Hate Speech Detection on Social Media

by Muhammad Usman, Muhammad Ahmad, Grigori Sidorov, Irina Gelbukh and Rolando Quintero Tellez

Computers 2025, 14(7), 279; https://doi.org/10.3390/computers14070279 - 15 Jul 2025

Viewed by 694

Abstract

The proliferation of hate speech on social media platforms poses significant threats to digital safety, social cohesion, and freedom of expression. Detecting such content—especially across diverse languages—remains a challenging task due to linguistic complexity, cultural context, and resource limitations. To address these challenges, [...] Read more.

The proliferation of hate speech on social media platforms poses significant threats to digital safety, social cohesion, and freedom of expression. Detecting such content—especially across diverse languages—remains a challenging task due to linguistic complexity, cultural context, and resource limitations. To address these challenges, this study introduces a comprehensive approach for multilingual hate speech detection. To facilitate robust hate speech detection across diverse languages, this study makes several key contributions. First, we created a novel trilingual hate speech dataset consisting of 10,193 manually annotated tweets in English, Spanish, and Urdu. Second, we applied two innovative techniques—joint multilingual and translation-based approaches—for cross-lingual hate speech detection that have not been previously explored for these languages. Third, we developed detailed hate speech annotation guidelines tailored specifically to all three languages to ensure consistent and high-quality labeling. Finally, we conducted 41 experiments employing machine learning models with TF–IDF features, deep learning models utilizing FastText and GloVe embeddings, and transformer-based models leveraging advanced contextual embeddings to comprehensively evaluate our approach. Additionally, we employed a large language model with advanced contextual embeddings to identify the best solution for the hate speech detection task. The experimental results showed that our GPT-3.5-turbo model significantly outperforms strong baselines, achieving up to an 8% improvement over XLM-R in Urdu hate speech detection and an average gain of 4% across all three languages. This research not only contributes a high-quality multilingual dataset but also offers a scalable and inclusive framework for hate speech detection in underrepresented languages. Full article

(This article belongs to the Special Issue Recent Advances in Social Networks and Social Media)

► Show Figures

Figure 1

37 pages, 618 KiB

Open AccessSystematic Review

Interaction, Artificial Intelligence, and Motivation in Children’s Speech Learning and Rehabilitation Through Digital Games: A Systematic Literature Review

by Chra Abdoulqadir and Fernando Loizides

Information 2025, 16(7), 599; https://doi.org/10.3390/info16070599 - 12 Jul 2025

Viewed by 480

Abstract

The integration of digital serious games into speech learning (rehabilitation) has demonstrated significant potential in enhancing accessibility and inclusivity for children with speech disabilities. This review of the state of the art examines the role of serious games, Artificial Intelligence (AI), and Natural [...] Read more.

The integration of digital serious games into speech learning (rehabilitation) has demonstrated significant potential in enhancing accessibility and inclusivity for children with speech disabilities. This review of the state of the art examines the role of serious games, Artificial Intelligence (AI), and Natural Language Processing (NLP) in speech rehabilitation, with a particular focus on interaction modalities, engagement autonomy, and motivation. We have reviewed 45 selected studies. Our key findings show how intelligent tutoring systems, adaptive voice-based interfaces, and gamified speech interventions can empower children to engage in self-directed speech learning, reducing dependence on therapists and caregivers. The diversity of interaction modalities, including speech recognition, phoneme-based exercises, and multimodal feedback, demonstrates how AI and Assistive Technology (AT) can personalise learning experiences to accommodate diverse needs. Furthermore, the incorporation of gamification strategies, such as reward systems and adaptive difficulty levels, has been shown to enhance children’s motivation and long-term participation in speech rehabilitation. The gaps identified show that despite advancements, challenges remain in achieving universal accessibility, particularly regarding speech recognition accuracy, multilingual support, and accessibility for users with multiple disabilities. This review advocates for interdisciplinary collaboration across educational technology, special education, cognitive science, and human–computer interaction (HCI). Our work contributes to the ongoing discourse on lifelong inclusive education, reinforcing the potential of AI-driven serious games as transformative tools for bridging learning gaps and promoting speech rehabilitation beyond clinical environments. Full article

(This article belongs to the Special Issue ICT, AI, and Assistive Technology for Accessible and Inclusive Education)

► Show Figures

Graphical abstract

22 pages, 818 KiB

Open AccessArticle

Towards Reliable Fake News Detection: Enhanced Attention-Based Transformer Model

by Jayanti Rout, Minati Mishra and Manob Jyoti Saikia

J. Cybersecur. Priv. 2025, 5(3), 43; https://doi.org/10.3390/jcp5030043 - 9 Jul 2025

Viewed by 671

Abstract

The widespread rise of misinformation across digital platforms has increased the demand for accurate and efficient Fake News Detection (FND) systems. This study introduces an enhanced transformer-based architecture for FND, developed through comprehensive ablation studies and empirical evaluations on multiple benchmark datasets. The [...] Read more.

The widespread rise of misinformation across digital platforms has increased the demand for accurate and efficient Fake News Detection (FND) systems. This study introduces an enhanced transformer-based architecture for FND, developed through comprehensive ablation studies and empirical evaluations on multiple benchmark datasets. The proposed model combines improved multi-head attention, dynamic positional encoding, and a lightweight classification head to effectively capture nuanced linguistic patterns, while maintaining computational efficiency. To ensure robust training, techniques such as label smoothing, learning rate warm-up, and reproducibility protocols were incorporated. The model demonstrates strong generalization across three diverse datasets, such as FakeNewsNet, ISOT, and LIAR, achieving an average accuracy of 79.85%. Specifically, it attains 80% accuracy on FakeNewsNet, 100% on ISOT, and 59.56% on LIAR. With just 3.1 to 4.3 million parameters, the model achieves an 85% reduction in size compared to full-sized BERT architectures. These results highlight the model’s effectiveness in balancing high accuracy with resource efficiency, making it suitable for real-world applications such as social media monitoring and automated fact-checking. Future work will explore multilingual extensions, cross-domain generalization, and integration with multimodal misinformation detection systems. Full article

(This article belongs to the Special Issue Cyber Security and Digital Forensics—2nd Edition)

► Show Figures

Figure 1

20 pages, 4752 KiB

Open AccessArticle

Designing an AI-Supported Framework for Literary Text Adaptation in Primary Classrooms

by Savvas A. Chatzichristofis, Alexandros Tsopozidis, Avgousta Kyriakidou-Zacharoudiou, Salomi Evripidou and Angelos Amanatiadis

AI 2025, 6(7), 150; https://doi.org/10.3390/ai6070150 - 8 Jul 2025

Viewed by 567

Abstract

Background/Objectives: This paper introduces a pedagogically grounded framework for transforming canonical literary texts in primary education through generative AI. Guided by multiliteracies theory, Vygotskian pedagogy, and epistemic justice, the system aims to enhance interpretive literacy, developmental alignment, and cultural responsiveness among learners aged [...] Read more.

Background/Objectives: This paper introduces a pedagogically grounded framework for transforming canonical literary texts in primary education through generative AI. Guided by multiliteracies theory, Vygotskian pedagogy, and epistemic justice, the system aims to enhance interpretive literacy, developmental alignment, and cultural responsiveness among learners aged 7–12. Methods: The proposed system enables educators to perform age-specific text simplification, visual re-narration, lexical reinvention, and multilingual augmentation through a suite of modular tools. Central to the design is the Ethical–Pedagogical Validation Layer (EPVL), a GPT-powered auditing module that evaluates AI-generated content across four normative dimensions: developmental appropriateness, cultural sensitivity, semantic fidelity, and ethical transparency. Results: The framework was fully implemented and piloted with primary educators (N = 8). The pilot demonstrated high usability, curricular alignment, and perceived value for classroom application. Unlike commercial Large Language Models (LLMs), the system requires no prompt engineering and supports editable, policy-aligned controls for normative localization. Conclusions: By embedding ethical evaluation within the generative loop, the framework fosters calibrated trust in human–AI collaboration and mitigates cultural stereotyping and ideological distortion. It advances a scalable, inclusive model for educator-centered AI integration, offering a new pathway for explainable and developmentally appropriate AI use in literary education. Full article

(This article belongs to the Special Issue AI Bias in the Media and Beyond)

► Show Figures

Figure 1

15 pages, 1701 KiB

Open AccessArticle

An Analysis of the Training Data Impact for Domain-Adapted Tokenizer Performances—The Case of Serbian Legal Domain Adaptation

by Miloš Bogdanović, Milena Frtunić Gligorijević, Jelena Kocić and Leonid Stoimenov

Appl. Sci. 2025, 15(13), 7491; https://doi.org/10.3390/app15137491 - 3 Jul 2025

Viewed by 457

Abstract

Various areas of natural language processing (NLP) have greatly benefited from the development of large language models in recent years. This research addresses the challenge of developing efficient tokenizers for transformer-based domain-specific language models. Tokenization efficiency within transformer-based models is directly related to [...] Read more.

Various areas of natural language processing (NLP) have greatly benefited from the development of large language models in recent years. This research addresses the challenge of developing efficient tokenizers for transformer-based domain-specific language models. Tokenization efficiency within transformer-based models is directly related to model efficiency, which motivated the research we present in this paper. Our goal in this research was to demonstrate that the appropriate selection of data used for tokenizer training has a significant impact on tokenizer performance. Subsequently, we will demonstrate that efficient tokenizers and models can be developed even if language resources are limited. To do so, we will present a domain-adapted large language model tokenizer developed for masked language modeling of the Serbian legal domain. In this paper, we will present a comparison of the tokenization performance for a domain-adapted tokenizer in version 2 of the SrBERTa language model we developed, against the performances of five other tokenizers belonging to state-of-the-art multilingual, Slavic or Serbian-specific models—XLM-RoBERTa (base-sized), BERTić, Jerteh-81, SrBERTa v1, NER4Legal_SRB. The comparison is performed using a test dataset consisting of 275,660 samples of legal texts written in the Cyrillic alphabet gathered from the Official Gazette of the Republic of Serbia. This dataset contains 197,134 distinct words, while the overall word count is 5,265,352. We will show that our tokenizer, trained upon a domain-adapted dataset, outperforms presented tokenizers by at least 4.5% ranging to 54.62%, regarding the number of tokens generated for the whole test dataset. In terms of tokenizer fertility, we will show that our tokenizer outperforms compared tokenizers by at least 6.39% ranging to 56.8%. Full article

(This article belongs to the Special Issue Advanced Large Language Models and Natural Language Processing Applications)

► Show Figures

Figure 1

24 pages, 2410 KiB

Open AccessArticle

UA-HSD-2025: Multi-Lingual Hate Speech Detection from Tweets Using Pre-Trained Transformers

by Muhammad Ahmad, Muhammad Waqas, Ameer Hamza, Sardar Usman, Ildar Batyrshin and Grigori Sidorov

Computers 2025, 14(6), 239; https://doi.org/10.3390/computers14060239 - 18 Jun 2025

Cited by 1 | Viewed by 696

Abstract

The rise in social media has improved communication but also amplified the spread of hate speech, creating serious societal risks. Automated detection remains difficult due to subjectivity, linguistic diversity, and implicit language. While prior research focuses on high-resource languages, this study addresses the [...] Read more.

The rise in social media has improved communication but also amplified the spread of hate speech, creating serious societal risks. Automated detection remains difficult due to subjectivity, linguistic diversity, and implicit language. While prior research focuses on high-resource languages, this study addresses the underexplored multilingual challenges of Arabic and Urdu hate speech through a comprehensive approach. To achieve this objective, this study makes four different key contributions. First, we have created a unique multi-lingual, manually annotated binary and multi-class dataset (UA-HSD-2025) sourced from X, which contains the five most important multi-class categories of hate speech. Secondly, we created detailed annotation guidelines to make a robust and perfect hate speech dataset. Third, we explore two strategies to address the challenges of multilingual data: a joint multilingual and translation-based approach. The translation-based approach involves converting all input text into a single target language before applying a classifier. In contrast, the joint multilingual approach employs a unified model trained to handle multiple languages simultaneously, enabling it to classify text across different languages without translation. Finally, we have employed state-of-the-art 54 different experiments using different machine learning using TF-IDF, deep learning using advanced pre-trained word embeddings such as FastText and Glove, and pre-trained language-based models using advanced contextual embeddings. Based on the analysis of the results, our language-based model (XLM-R) outperformed traditional supervised learning approaches, achieving 0.99 accuracy in binary classification for Arabic, Urdu, and joint-multilingual datasets, and 0.95, 0.94, and 0.94 accuracy in multi-class classification for joint-multilingual, Arabic, and Urdu datasets, respectively. Full article

(This article belongs to the Special Issue Recent Advances in Social Networks and Social Media)

► Show Figures

Figure 1

20 pages, 1955 KiB

Open AccessArticle

by Svitlana Biloshchytska, Arailym Tleubayeva, Oleksandr Kuchanskyi, Andrii Biloshchytskyi, Yurii Andrashko, Sapar Toxanov, Aidos Mukhatayev and Saltanat Sharipova

Appl. Sci. 2025, 15(12), 6707; https://doi.org/10.3390/app15126707 - 15 Jun 2025

Viewed by 577

Abstract

This study presents an advanced hybrid approach for detecting near-duplicate texts in the Kazakh language, addressing the specific challenges posed by its agglutinative morphology. The proposed method combines statistical and semantic techniques, including N-gram analysis, TF-IDF, LSH, LSA, and LDA, and is benchmarked [...] Read more.

This study presents an advanced hybrid approach for detecting near-duplicate texts in the Kazakh language, addressing the specific challenges posed by its agglutinative morphology. The proposed method combines statistical and semantic techniques, including N-gram analysis, TF-IDF, LSH, LSA, and LDA, and is benchmarked against the bert-base-multilingual-cased model. Experiments were conducted on the purpose-built Arailym-aitu/KazakhTextDuplicates corpus, which contains over 25,000 manually modified text fragments using typical techniques, such as paraphrasing, word order changes, synonym substitution, and morphological transformations. The results show that the hybrid model achieves a precision of 1.00, a recall of 0.73, and an F1-score of 0.84, significantly outperforming traditional N-gram and TF-IDF approaches and demonstrating comparable accuracy to the BERT model while requiring substantially lower computational resources. The hybrid model proved highly effective in detecting various types of near-duplicate texts, including paraphrased and structurally modified content, making it suitable for practical applications in academic integrity verification, plagiarism detection, and intelligent text analysis. Moreover, this study highlights the potential of lightweight hybrid architectures as a practical alternative to large transformer-based models, particularly for languages with limited annotated corpora and linguistic resources. It lays the foundation for future research in cross-lingual duplicate detection and deep model adaptation for the Kazakh language. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

21 pages, 1108 KiB

Open AccessArticle

Transformer-Based Abstractive Summarization of Legal Texts in Low-Resource Languages

by Salman Masih, Mehdi Hassan, Labiba Gillani Fahad and Bilal Hassan

Electronics 2025, 14(12), 2320; https://doi.org/10.3390/electronics14122320 - 6 Jun 2025

Viewed by 1258

Abstract

The emergence of large language models (LLMs) has revolutionized the trajectory of NLP research. Transformers, combined with attention mechanisms, have increased computational power, and massive datasets have led to the emergence of pre-trained large language models (PLLMs), which offer promising possibilities for multilingual [...] Read more.

The emergence of large language models (LLMs) has revolutionized the trajectory of NLP research. Transformers, combined with attention mechanisms, have increased computational power, and massive datasets have led to the emergence of pre-trained large language models (PLLMs), which offer promising possibilities for multilingual applications in low-resource settings. However, the scarcity of annotated resources and suitably pre-trained models continues to pose a significant hurdle for the low-resource abstractive text summarization of legal texts, particularly in Urdu. This study presents a transfer learning approach using pre-trained multilingual large models (the mBART and mT5, Small, Base, and Large) to generate abstractive summaries of Urdu legal texts. A curated dataset was developed with legal experts, who produced ground-truth summaries. The models were fine-tuned on this domain-specific corpus to adapt them for low-resource legal summarization. The experimental results demonstrated that the mT5-Large, fine-tuned on Urdu legal texts, outperforms all other evaluated models across standard summarization metrics, achieving ROUGE-1 scores of 0.7889, ROUGE-2 scores of 0.5961, and ROUGE-L scores of 0.7813. This indicates its strong capacity to generate fluent, coherent, and legally accurate summaries. The mT5-Base model closely follows with ROUGE-1 = 0.7774, while the mT5-Small shows moderate performance (ROUGE-1 = 0.6406), with reduced fidelity in capturing legal structure. The mBART50 model, despite being fine-tuned on the same legal corpus, performs lower (ROUGE-1 = 0.5914), revealing its relative limitations in this domain. Notably, models trained or fine-tuned on non-legal, out-of-domain data, such as the urT5 (ROUGE-1 = 0.3912), the mT5-XLSUM (ROUGE-1 = 0.0582), and the mBART50 (XLSUM) (ROUGE-1 = 0.0545), exhibit poor generalization to legal summaries, underscoring the necessity of domain adaptation when working in low-resource legal contexts. These findings highlight the effectiveness of fine-tuning multilingual LLMs for domain-specific tasks. The gains in legal summarization demonstrate the practical value of transfer learning in low-resource settings and the broader potential of AI-driven tools for legal document processing, information retrieval, and decision support. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

19 pages, 1662 KiB

Open AccessArticle

Scoring German Alternate Uses Items Applying Large Language Models

by Janika Saretzki, Thomas Knopf, Boris Forthmann, Benjamin Goecke, Ann-Kathrin Jaggy, Mathias Benedek and Selina Weiss

J. Intell. 2025, 13(6), 64; https://doi.org/10.3390/jintelligence13060064 - 29 May 2025

Viewed by 696

Abstract

The alternate uses task (AUT) is the most popular measure when it comes to the assessment of creative potential. Since their implementation, AUT responses have been rated by humans, which is a laborious task and requires considerable resources. Large language models (LLMs) have [...] Read more.

The alternate uses task (AUT) is the most popular measure when it comes to the assessment of creative potential. Since their implementation, AUT responses have been rated by humans, which is a laborious task and requires considerable resources. Large language models (LLMs) have shown promising performance in automatically scoring AUT responses in English as well as in other languages, but it is not clear which method works best for German data. Therefore, we investigated the performance of different LLMs for the automated scoring of German AUT responses. We compiled German data across five research groups including ~50,000 responses for 15 different alternate uses objects from eight lab and online survey studies (including ~2300 participants) to examine generalizability across datasets and assessment conditions. Following a pre-registered analysis plan, we compared the performance of two fine-tuned, multilingual LLM-based approaches [Cross-Lingual Alternate Uses Scoring (CLAUS) and the Open Creativity Scoring with Artificial Intelligence (OCSAI)] with the Generative Pre-trained Transformer (GPT-4) in scoring (a) the original German AUT responses and (b) the responses translated to English. We found that the LLM-based scorings were substantially correlated with human ratings, with higher relationships for OCSAI followed by GPT-4 and CLAUS. Response translation, however, had no consistent positive effect. We discuss the generalizability of the results across different items and studies and derive recommendations and future directions. Full article

(This article belongs to the Special Issue Generative AI: Reflections on Intelligence and Creativity)

► Show Figures

Figure 1

28 pages, 5257 KiB

Open AccessArticle

Comparative Evaluation of Sequential Neural Network (GRU, LSTM, Transformer) Within Siamese Networks for Enhanced Job–Candidate Matching in Applied Recruitment Systems

by Mateusz Łępicki, Tomasz Latkowski, Izabella Antoniuk, Michał Bukowski, Bartosz Świderski, Grzegorz Baranik, Bogusz Nowak, Robert Zakowicz, Łukasz Dobrakowski, Bogdan Act and Jarosław Kurek

Appl. Sci. 2025, 15(11), 5988; https://doi.org/10.3390/app15115988 - 26 May 2025

Viewed by 774

Abstract

Job–candidate matching is pivotal in recruitment, yet traditional manual or keyword-based methods can be laborious and prone to missing qualified candidates. In this study, we introduce the first Siamese framework that systematically contrasts GRU, LSTM, and Transformer sequential heads on top of a [...] Read more.

Job–candidate matching is pivotal in recruitment, yet traditional manual or keyword-based methods can be laborious and prone to missing qualified candidates. In this study, we introduce the first Siamese framework that systematically contrasts GRU, LSTM, and Transformer sequential heads on top of a multilingual Sentence Transformer backbone, which is trained end-to-end with triplet loss on real-world recruitment data. This combination captures both long-range dependencies across document segments and global semantics, representing a substantial advance over approaches that rely solely on static embeddings. We compare the three heads using ranking metrics such as Top-K accuracy and Mean Reciprocal Rank (MRR). The Transformer-based model yields the best overall performance, with an MRR of 0.979 and a Top-100 accuracy of 87.20% on the test set. Visualization of learned embeddings (t-SNE) shows that self-attention more effectively clusters matching texts and separates them from irrelevant ones. These findings underscore the potential of combining multilingual base embeddings with specialized sequential layers to reduce manual screening efforts and improve recruitment efficiency. Full article

(This article belongs to the Special Issue Innovations in Artificial Neural Network Applications)

► Show Figures

Figure 1

17 pages, 4114 KiB

Open AccessArticle

Biomimetic Computing for Efficient Spoken Language Identification

by Gaurav Kumar and Saurabh Bhardwaj

Biomimetics 2025, 10(5), 316; https://doi.org/10.3390/biomimetics10050316 - 14 May 2025

Viewed by 537

Abstract

Spoken Language Identification (SLID)-based applications have become increasingly important in everyday life, driven by advancements in artificial intelligence and machine learning. Multilingual countries utilize the SLID method to facilitate speech detection. This is accomplished by determining the language of the spoken parts using [...] Read more.

Spoken Language Identification (SLID)-based applications have become increasingly important in everyday life, driven by advancements in artificial intelligence and machine learning. Multilingual countries utilize the SLID method to facilitate speech detection. This is accomplished by determining the language of the spoken parts using language recognizers. On the other hand, when working with multilingual datasets, the presence of multiple languages that have a shared origin presents a significant challenge for accurately classifying languages using automatic techniques. Further, one more challenge is the significant variance in speech signals caused by factors such as different speakers, content, acoustic settings, language differences, changes in voice modulation based on age and gender, and variations in speech patterns. In this study, we introduce the DBODL-MSLIS approach, which integrates biomimetic optimization techniques inspired by natural intelligence to enhance language classification. The proposed method employs Dung Beetle Optimization (DBO) with Deep Learning, simulating the beetle’s foraging behavior to optimize feature selection and classification performance. The proposed technique integrates speech preprocessing, which encompasses pre-emphasis, windowing, and frame blocking, followed by feature extraction utilizing pitch, energy, Discrete Wavelet Transform (DWT), and Zero crossing rate (ZCR). Further, the selection of features is performed by DBO algorithm, which removes redundant features and helps to improve efficiency and accuracy. Spoken languages are classified using Bayesian optimization (BO) in conjunction with a long short-term memory (LSTM) network. The DBODL-MSLIS technique has been experimentally validated using the IIIT Spoken Language dataset. The results indicate an average accuracy of 95.54% and an F-score of 84.31%. This technique surpasses various other state-of-the-art models, such as SVM, MLP, LDA, DLA-ASLISS, HMHFS-IISLFAS, GA base fusion, and VGG-16. We have evaluated the accuracy of our proposed technique against state-of-the-art biomimetic computing models such as GA, PSO, GWO, DE, and ACO. While ACO achieved up to 89.45% accuracy, our Bayesian Optimization with LSTM outperformed all others, reaching a peak accuracy of 95.55%, demonstrating its effectiveness in enhancing spoken language identification. The suggested technique demonstrates promising potential for practical applications in the field of multi-lingual voice processing. Full article

► Show Figures

Figure 1

32 pages, 2219 KiB

Open AccessArticle

A New Large Language Model for Attribute Extraction in E-Commerce Product Categorization

by Mehmet Serhan Çiftlikçi, Yusuf Çakmak, Tolga Ahmet Kalaycı, Fatih Abut, Mehmet Fatih Akay and Mehmet Kızıldağ

Electronics 2025, 14(10), 1930; https://doi.org/10.3390/electronics14101930 - 9 May 2025

Viewed by 1818

Abstract

In the rapidly evolving field of e-commerce, precise and efficient attribute extraction from product descriptions is crucial for enhancing search functionality, improving customer experience, and streamlining the listing process for sellers. This study proposes a large language model (LLM)-based approach for automated attribute [...] Read more.

In the rapidly evolving field of e-commerce, precise and efficient attribute extraction from product descriptions is crucial for enhancing search functionality, improving customer experience, and streamlining the listing process for sellers. This study proposes a large language model (LLM)-based approach for automated attribute extraction on Trendyol’s e-commerce platform. For comparison purposes, a deep learning (DL) model is also developed, leveraging a transformer-based architecture to efficiently identify explicit attributes. In contrast, the LLM, built on the Mistral architecture, demonstrates superior contextual understanding, enabling the extraction of both explicit and implicit attributes from unstructured text. The models are evaluated on an extensive dataset derived from Trendyol’s Turkish-language product catalog, using performance metrics such as precision, recall, and F1-score. Results indicate that the proposed LLM outperforms the DL model across most metrics, demonstrating superiority not only in direct single-model comparisons but also in average performance across all evaluated categories. This advantage is particularly evident in handling complex linguistic structures and diverse product descriptions. The system has been integrated into Trendyol’s platform with a scalable backend infrastructure, employing Kubernetes and Nvidia Triton Inference Server for efficient bulk processing and real-time attribute suggestions during the product listing process. This study not only advances attribute extraction for Turkish-language e-commerce but also provides a scalable and efficient NLP-based solution applicable to large-scale marketplaces. The findings offer critical insights into the trade-offs between accuracy and computational efficiency in large-scale multilingual NLP applications, contributing to the broader field of automated product classification and information retrieval in e-commerce ecosystems. Full article

(This article belongs to the Special Issue AI/Machine Learning in Computer Vision/Image Processing and Natural Language Processing)

► Show Figures

Figure 1

17 pages, 2347 KiB

Open AccessSystematic Review

Risks of the Use of FinTech in the Financial Inclusion of the Population: A Systematic Review of the Literature

by Antonija Mandić, Biljana Marković and Iva Rosanda Žigo

J. Risk Financial Manag. 2025, 18(5), 250; https://doi.org/10.3390/jrfm18050250 - 6 May 2025

Cited by 1 | Viewed by 2013

Abstract

Financial technology (FinTech) has significantly changed access to financial services, particularly benefiting historically marginalized communities. While it offers many advantages, FinTech also brings substantial risks associated with this digital transformation. Recent studies highlight the significant impact of FinTech on financial inclusion, especially for [...] Read more.

Financial technology (FinTech) has significantly changed access to financial services, particularly benefiting historically marginalized communities. While it offers many advantages, FinTech also brings substantial risks associated with this digital transformation. Recent studies highlight the significant impact of FinTech on financial inclusion, especially for marginalized populations. To investigate the benefits and drawbacks of FinTech and identify specific risks affecting users, particularly vulnerable groups, we employed the PRISMA method. A systematic literature review was conducted using the Web of Science database to explore recent research on FinTech and its relationship with financial inclusion, focusing on associated risks. The search covered 2010–2025; however, after applying inclusion criteria, the final dataset comprised publications from 2012 to 2025. Unlike previous bibliometric studies broadly addressing FinTech innovations, this review identifies and categorizes key risks affecting financial inclusion, emphasizing regulatory barriers, digital literacy, and socio-cultural challenges. The review is limited by the exclusive use of Web of Science and the English language, suggesting future research avenues using additional databases and multilingual sources. Findings reveal a notable increase in research activity surrounding FinTech and financial inclusion. This highlights challenges such as data privacy, regulation, and financial literacy. By mapping FinTech-related risks, this study aims to inform policymakers and stakeholders about effective strategies to mitigate these challenges and promote safe, inclusive financial ecosystems. Full article

(This article belongs to the Section Financial Technology and Innovation)

► Show Figures

Figure 1

23 pages, 716 KiB

Open AccessArticle

Christian Missionary Interpreters in the Open Port Period and the Japanese Colonial Era and Church Interpretation in Modern Korea

by Boae Kim

Religions 2025, 16(5), 590; https://doi.org/10.3390/rel16050590 - 2 May 2025

Viewed by 945

Abstract

This study examines the role of Christian missionary interpreters from the Open Port Period to the Japanese colonial era, highlighting their historical significance and influence. During the Open Port Period, missionaries relied on Korean language teachers to serve as interpreters, translators, evangelists, and [...] Read more.

This study examines the role of Christian missionary interpreters from the Open Port Period to the Japanese colonial era, highlighting their historical significance and influence. During the Open Port Period, missionaries relied on Korean language teachers to serve as interpreters, translators, evangelists, and preachers. Although their English proficiency was often limited, they played a crucial role in early Christian missions. In the Japanese colonial era, elite intellectuals who had studied abroad increasingly assumed interpretation roles, actively contributing to theological education and social reform. This study analyzes historical records, newspaper articles, and existing research to reconstruct the evolving role and broader impact of Christian interpreters. The findings suggest that missionary interpreters were not merely linguistic mediators but key figures in evangelism and social transformation. Furthermore, the study highlights the historical transition from consecutive interpretation to simultaneous interpretation in Korean churches and underscores the need for systematic training programs. Given the growing linguistic diversity in Korean congregations, churches must recognize the importance of trained interpreters in ensuring effective multilingual worship and uphold the legacy of missionary interpretation. Full article

► Show Figures

Figure 1

Search Results (93)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (93)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI