Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (149)

Search Parameters:
Keywords = M-BERT

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 2861 KB  
Article
A Geometric Attribute Collaborative Method in Multi-Scale Polygonal Entity Matching Scenario: Integrating Sentence-BERT and Three-Branch Attention Network
by Zhuang Sun, Po Liu, Liang Zhai and Zutao Zhang
ISPRS Int. J. Geo-Inf. 2025, 14(11), 435; https://doi.org/10.3390/ijgi14110435 - 3 Nov 2025
Viewed by 302
Abstract
The cross-scale fusion and consistent representation of cross-source heterogeneous vector polygon data are fundamental tasks in the field of GIS, and they play an important role in areas such as the refined management of natural resources, territorial spatial planning, and the urban emergency [...] Read more.
The cross-scale fusion and consistent representation of cross-source heterogeneous vector polygon data are fundamental tasks in the field of GIS, and they play an important role in areas such as the refined management of natural resources, territorial spatial planning, and the urban emergency response. However, the existing methods suffer from two key limitations: the insufficient utilization of semantic information, especially non-standardized attributes, and the lack of differentiated modeling for 1:1, 1:M, and M:N matching relationships. To address these issues, this study proposes a geometric–attribute collaborative matching method for multi-scale polygonal entities. First, matching relationships are classified into 1:1, 1:M, and M:N based on the intersection of polygons. Second, geometric similarities including spatial overlap, size, shape, and orientation are computed for each relationship type. Third, semantic similarity is enhanced by fine-tuning the pre-trained Sentence-BERT model, which effectively captures the complex semantic information from non-standardized descriptions. Finally, a three-branch attention network is constructed to specifically handle the three matching relationships, with adaptive feature weighting via attention mechanisms. The experimental results on datasets from Tunxi District, Huangshan City, China show that the proposed method outperforms the existing approaches including geometry–attribute fusion and BPNNs in precision, recall, and F1-score, with improvements of 3.38%, 1.32%, and 2.41% compared to the geometry–attribute method, and 2.91%, 0.27%, and 1.66% compared to BPNNs, respectively. A generalization experiment on Hefei City data further validates its robustness. This method effectively enhances the accuracy and adaptability of multi-scale polygonal entity matching, providing a valuable tool for multi-source GIS database integration. Full article
Show Figures

Figure 1

12 pages, 717 KB  
Proceeding Paper
Leveraging Large Language Models and Data Augmentation in Cognitive Computing to Enhance Stock Price Predictions
by Nassera Habbat, Hicham Nouri and Zahra Berradi
Eng. Proc. 2025, 112(1), 40; https://doi.org/10.3390/engproc2025112040 - 17 Oct 2025
Viewed by 543
Abstract
Precise stock price forecasting is essential for informed decision-making in financial markets. This study examines the combination of large language models (LLMs) with data augmentation approaches, utilizing improvements in cognitive computing to enhance stock price prediction. Traditional methods rely on structured data and [...] Read more.
Precise stock price forecasting is essential for informed decision-making in financial markets. This study examines the combination of large language models (LLMs) with data augmentation approaches, utilizing improvements in cognitive computing to enhance stock price prediction. Traditional methods rely on structured data and basic time-series analysis. However, new research shows that deep learning and transformer-based architectures can effectively process unstructured financial data, such as news articles and social media sentiment. This study employs models, such as RNN, mBERT, RoBERTa, and GPT-4 based architectures, to illustrate the efficacy of our suggested method in forecasting stock movements. The research employs data augmentation techniques, including synthetic data creation using Generative Pre-trained Transformers, to rectify imbalances in training datasets. We assess metrics like accuracy, F1-score, recall, and precision to verify the models’ performance. We also investigate the influence of preprocessing methods like text normalization and feature engineering. Extensive tests show that transformer models are much better at predicting how stock prices will move than traditional methods. For example, the GPT-4 based model got an F1 score of 0.92 and an accuracy of 0.919, which shows that LLMs have a lot of potential in financial applications. Full article
Show Figures

Figure 1

25 pages, 9990 KB  
Article
Bidirectional Mamba-Enhanced 3D Human Pose Estimation for Accurate Clinical Gait Analysis
by Chengjun Wang, Wenhang Su, Jiabao Li and Jiahang Xu
Fractal Fract. 2025, 9(9), 603; https://doi.org/10.3390/fractalfract9090603 - 17 Sep 2025
Viewed by 1059
Abstract
Three-dimensional human pose estimation from monocular video remains challenging for clinical gait analysis due to high computational cost and the need for temporal consistency. We present Pose3DM, a bidirectional Mamba-based state-space framework that models intra-frame joint relations and inter-frame dynamics with linear computational [...] Read more.
Three-dimensional human pose estimation from monocular video remains challenging for clinical gait analysis due to high computational cost and the need for temporal consistency. We present Pose3DM, a bidirectional Mamba-based state-space framework that models intra-frame joint relations and inter-frame dynamics with linear computational complexity. Replacing transformer self-attention with state-space modeling improves efficiency without sacrificing accuracy. We further incorporate fractional-order total-variation regularization to capture long-range dependencies and memory effects, enhancing temporal and spatial coherence in gait dynamics. On Human3.6M, Pose3DM-L achieves 37.9 mm MPJPE under Protocol 1 (P1) and 32.1 mm P-MPJPE under Protocol 2 (P2), with 127 M MACs per frame and 30.8 G MACs in total. Relative to MotionBERT, P1 and P2 errors decrease by 3.3% and 2.4%, respectively, with 82.5% fewer parameters and 82.3% fewer MACs per frame. Compared with MotionAGFormer-L, Pose3DM-L improves P1 by 0.5 mm and P2 by 0.4 mm while using 60.6% less computation: 30.8 G vs. 78.3 G total MACs and 127 M vs. 322 M per frame. On AUST-VisGait across six gait patterns, Pose3DM consistently yields lower MPJPE, standard error, and maximum error, enabling reliable extraction of key gait parameters from monocular video. These results highlight state-space models as a cost-effective route to real-time gait assessment using a single RGB camera. Full article
Show Figures

Figure 1

25 pages, 1380 KB  
Review
A Systematic Review and Experimental Evaluation of Classical and Transformer-Based Models for Urdu Abstractive Text Summarization
by Muhammad Azhar, Adeen Amjad, Deshinta Arrova Dewi and Shahreen Kasim
Information 2025, 16(9), 784; https://doi.org/10.3390/info16090784 - 9 Sep 2025
Viewed by 634
Abstract
The rapid growth of digital content in Urdu has created an urgent need for effective automatic text summarization (ATS) systems. While extractive methods have been widely studied, abstractive summarization for Urdu remains largely unexplored due to the language’s complex morphology and rich literary [...] Read more.
The rapid growth of digital content in Urdu has created an urgent need for effective automatic text summarization (ATS) systems. While extractive methods have been widely studied, abstractive summarization for Urdu remains largely unexplored due to the language’s complex morphology and rich literary tradition. This paper systematically evaluates four transformer-based language models (BERT-Urdu, BART, mT5, and GPT-2) for Urdu abstractive summarization, comparing their performance against conventional machine learning and deep learning approaches. Using multiple Urdu datasets—including the Urdu Summarization Corpus, Fake News Dataset, and Urdu-Instruct-News—we show that fine-tuned Transformer Language Models (TLMs) consistently outperform traditional methods, with the multilingual mT5 model achieving a 0.42 absolute improvement in F1-score over the best baseline. Our analysis reveals that mT5’s architecture is particularly effective at handling Urdu-specific challenges such as right-to-left script processing, diacritic interpretation, and complex verb–noun compounding. Furthermore, we present empirically validated hyperparameter configurations and training strategies for Urdu ATS, establishing transformer-based approaches as the new state-of-the-art for Urdu summarization. Notably, mT5 outperforms Seq2Seq baselines by up to 20% in ROUGE-L, underscoring the efficacy of Transformer-based models for low-resource languages. This work contributes both a systematic review of prior research and a novel empirical benchmark for advancing Urdu abstractive summarization. Full article
Show Figures

Figure 1

14 pages, 657 KB  
Article
Pretrained Models Against Traditional Machine Learning for Detecting Fake Hadith
by Jawaher Alghamdi, Adeeb Albukhari and Thair Al-Dala’in
Electronics 2025, 14(17), 3484; https://doi.org/10.3390/electronics14173484 - 31 Aug 2025
Viewed by 961
Abstract
The proliferation of fake news, particularly in sensitive domains like religious texts, necessitates robust authenticity verification methods. This study addresses the growing challenge of authenticating Hadith, where traditional methods relying on the analysis of the chain of narrators (Isnad) and the content (Matn) [...] Read more.
The proliferation of fake news, particularly in sensitive domains like religious texts, necessitates robust authenticity verification methods. This study addresses the growing challenge of authenticating Hadith, where traditional methods relying on the analysis of the chain of narrators (Isnad) and the content (Matn) are increasingly strained by the sheer volume in circulation. To combat this issue, machine learning (ML) and natural language processing (NLP) techniques, specifically through transfer learning, are explored to automate Hadith classification into Genuine and Fake categories. This study utilizes an imbalanced dataset of 8544 Hadiths, with 7008 authentic and 1536 fake Hadiths, to systematically investigate the collective impact of both linguistic and contextual features, particularly the chain of narrators (Isnad), on Hadith authentication. For the first time in this specialized domain, state-of-the-art pre-trained language models (PLMs) such as Multilingual BERT (mBERT), CamelBERT, and AraBERT are evaluated alongside classical algorithms like logistic regression (LR) and support vector machine (SVM) for Hadith authentication. Our best-performing model, AraBERT, achieved a 99.94% F1score when including the chain of narrators, demonstrating the profound effectiveness of contextual elements (Isnad) in significantly improving accuracy, providing novel insights into the indispensable role of computational methods in Hadith authentication and reinforcing traditional scholarly emphasis. This research represents a significant advancement in combating misinformation in this important field. Full article
Show Figures

Figure 1

15 pages, 678 KB  
Article
Local Health Department COVID-19 Vaccination Efforts and Associated Outcomes: Evidence from Jefferson County, Kentucky
by Shaminul H. Shakib, Seyed M. Karimi, J. Daniel McGeeney, Md Yasin Ali Parh, Hamid Zarei, Yuting Chen, Ben Goldman, Dana Novario, Michael Schurfranz, Ciara A. Warren, Demetra Antimisiaris, Bert B. Little, W. Paul McKinney and Angela J. Graham
Vaccines 2025, 13(9), 901; https://doi.org/10.3390/vaccines13090901 - 26 Aug 2025
Viewed by 1232
Abstract
Background: While disparities in vaccine uptake have been well documented, few studies have evaluated the impact of local vaccine programs on COVID-19 outcomes, namely cases, hospitalizations, and deaths. Objectives: Evaluate the impact of COVID-19 vaccine doses coordinated by the Louisville Metro [...] Read more.
Background: While disparities in vaccine uptake have been well documented, few studies have evaluated the impact of local vaccine programs on COVID-19 outcomes, namely cases, hospitalizations, and deaths. Objectives: Evaluate the impact of COVID-19 vaccine doses coordinated by the Louisville Metro Department of Public Health and Wellness (LMPHW) on COVID-19 outcomes by race across ZIP codes from December 2020 to May 2022 in Jefferson County, Kentucky. Methods: Fixed-effects longitudinal models with ZIP codes as ecological time-series units were estimated to measure the association between COVID-19 vaccine doses and outcomes with time lags of one week, two weeks, three weeks, four weeks, and one month. Models were adjusted for time (week or month of the year) and its interaction with ZIP code. Results: In the one-week lag model, significant negative associations were observed between LMPHW-coordinated vaccine doses and COVID-19 outcomes, indicating reductions of 11.6 cases, 0.4 hospitalizations, and 0.3 deaths per 100 doses administered. Vaccine doses were consistently associated with fewer deaths among White residents across all lags, with an average reduction of 0.2 deaths per 100 doses. No significant associations were found for Black residents. Temporal trends also indicated declines in COVID-19 outcomes when LMPHW’s vaccine administration program peaked, between March and May 2021. Conclusions: Timely uptake of COVID-19 vaccines remains critical in avoiding severe outcomes, especially with emerging variants. Racial disparities in vaccine–outcome associations emphasize the potential need for equitable, community-driven vaccine campaigns to improve population health outcomes. Full article
Show Figures

Figure 1

23 pages, 3836 KB  
Article
RUDA-2025: Depression Severity Detection Using Pre-Trained Transformers on Social Media Data
by Muhammad Ahmad, Pierpaolo Basile, Fida Ullah, Ildar Batyrshin and Grigori Sidorov
AI 2025, 6(8), 191; https://doi.org/10.3390/ai6080191 - 18 Aug 2025
Viewed by 1475
Abstract
Depression is a serious mental health disorder affecting cognition, emotions, and behavior. It impacts over 300 million people globally, with mental health care costs exceeding $1 trillion annually. Traditional diagnostic methods are often expensive, time-consuming, stigmatizing, and difficult to access. This study leverages [...] Read more.
Depression is a serious mental health disorder affecting cognition, emotions, and behavior. It impacts over 300 million people globally, with mental health care costs exceeding $1 trillion annually. Traditional diagnostic methods are often expensive, time-consuming, stigmatizing, and difficult to access. This study leverages NLP techniques to identify depressive cues in social media posts, focusing on both standard Urdu and code-mixed Roman Urdu, which are often overlooked in existing research. To the best of our knowledge, a script-conversion and combination-based approach for Roman Urdu and Nastaliq Urdu has not been explored earlier. To address this gap, our study makes four key contributions. First, we created a manually annotated dataset named Ruda-2025, containing posts in code-mixed Roman Urdu and Nastaliq Urdu for both binary and multiclass classification. The binary classes are depression” and not depression, with the depression class further divided into fine-grained categories: Mild, Moderate, and Severe depression alongside not depression. Second, we applied first-time two novel techniques to the RUDA-2025 dataset: (1) script-conversion approach that translates between code-mixed Roman Urdu and Standard Urdu and (2) combination-based approach that merges both scripts to make a single dataset to address linguistic challenges in depression assessment. Finally, we employed 60 different experiments using a combination of traditional machine learning and deep learning techniques to find the best-fit model for the detection of mental disorder. Based on our analysis, our proposed model (mBERT) using custom attention mechanism outperformed baseline (XGB) in combination-based, code-mixed Roman and Nastaliq Urdu script conversions. Full article
Show Figures

Figure 1

19 pages, 821 KB  
Article
Multimodal Multisource Neural Machine Translation: Building Resources for Image Caption Translation from European Languages into Arabic
by Roweida Mohammed, Inad Aljarrah, Mahmoud Al-Ayyoub and Ali Fadel
Computation 2025, 13(8), 194; https://doi.org/10.3390/computation13080194 - 8 Aug 2025
Viewed by 1217
Abstract
Neural machine translation (NMT) models combining textual and visual inputs generate more accurate translations compared with unimodal models. Moreover, translation models with an under-resourced target language benefit from multisource inputs (source sentences are provided in different languages). Building MultiModal MutliSource NMT (M3 [...] Read more.
Neural machine translation (NMT) models combining textual and visual inputs generate more accurate translations compared with unimodal models. Moreover, translation models with an under-resourced target language benefit from multisource inputs (source sentences are provided in different languages). Building MultiModal MutliSource NMT (M3S-NMT) systems require significant efforts to curate datasets suitable for such a multifaceted task. This work uses image caption translation as an example of multimodal translation and presents a novel public dataset for translating captions from multiple European languages (viz., English, German, French, and Czech) into the distant and under-resourced Arabic language. Moreover, it presents multitask learning models trained and tested on this dataset to serve as solid baselines to help further research in this area. These models involve two parts: one for learning the visual representations of the input images, and the other for translating the textual input based on these representations. The translations are produced from a framework of attention-based encoder–decoder architectures. The visual features are learned from a pretrained convolutional neural network (CNN). These features are then integrated with textual features learned through the very basic yet well-known recurrent neural networks (RNNs) with GloVe or BERT word embeddings. Despite the challenges associated with the task at hand, the results of these systems are very promising, reaching 34.57 and 42.52 METEOR scores. Full article
(This article belongs to the Section Computational Social Science)
Show Figures

Figure 1

20 pages, 704 KB  
Review
Clinical Applications of Corneal Cells Derived from Induced Pluripotent Stem Cells
by Yixin Luan, Aytan Musayeva, Jina Kim, Debbie Le Blon, Bert van den Bogerd, Mor M. Dickman, Vanessa L. S. LaPointe, Sorcha Ni Dhubhghaill and Silke Oellerich
Biomolecules 2025, 15(8), 1139; https://doi.org/10.3390/biom15081139 - 7 Aug 2025
Cited by 1 | Viewed by 2285
Abstract
Corneal diseases are among the leading causes of blindness worldwide and the standard treatment is the transplantation of corneal donor tissue. Treatment for cornea-related visual impairment and blindness is, however, often constrained by the global shortage of suitable donor grafts. To alleviate the [...] Read more.
Corneal diseases are among the leading causes of blindness worldwide and the standard treatment is the transplantation of corneal donor tissue. Treatment for cornea-related visual impairment and blindness is, however, often constrained by the global shortage of suitable donor grafts. To alleviate the shortage of corneal donor tissue, new treatment options have been explored in the last decade. The discovery of induced pluripotent stem cells (iPSCs), which has revolutionized regenerative medicine, offers immense potential for corneal repair and regeneration. Using iPSCs can provide a renewable source for generating various corneal cell types, including corneal epithelial cells, stromal keratocytes, and corneal endothelial cells. To document the recent progress towards the clinical application of iPSC-derived corneal cells, this review summarizes the latest advancements in iPSC-derived corneal cell therapies, ranging from differentiation protocols and preclinical studies to the first clinical trials, and discusses the challenges for successful translation to the clinic. Full article
Show Figures

Figure 1

31 pages, 855 KB  
Article
A Comparative Evaluation of Transformer-Based Language Models for Topic-Based Sentiment Analysis
by Spyridon Tzimiris, Stefanos Nikiforos, Maria Nefeli Nikiforos, Despoina Mouratidis and Katia Lida Kermanidis
Electronics 2025, 14(15), 2957; https://doi.org/10.3390/electronics14152957 - 24 Jul 2025
Viewed by 2707
Abstract
This research investigates topic-based sentiment classification in Greek educational-related data using transformer-based language models. A comparative evaluation is conducted on GreekBERT, XLM-r-Greek, mBERT, and Palobert using three original sentiment-annotated datasets representing parents of students with functional diversity, school directors, and teachers, each capturing [...] Read more.
This research investigates topic-based sentiment classification in Greek educational-related data using transformer-based language models. A comparative evaluation is conducted on GreekBERT, XLM-r-Greek, mBERT, and Palobert using three original sentiment-annotated datasets representing parents of students with functional diversity, school directors, and teachers, each capturing diverse educational perspectives. The analysis examines both overall sentiment performance and topic-specific evaluations across four thematic classes: (i) Material and Technical Conditions, (ii) Educational Dimension, (iii) Psychological/Emotional Dimension, and (iv) Learning Difficulties and Emergency Remote Teaching. Results indicate that GreekBERT consistently outperforms other models, achieving the highest overall F1 score (0.91), particularly excelling in negative sentiment detection (F1 = 0.95) and showing robust performance for positive sentiment classification. The Psychological/Emotional Dimension emerged as the most reliably classified category, with GreekBERT and mBERT demonstrating notably high accuracy and F1 scores. Conversely, Learning Difficulties and Emergency Remote Teaching presented significant classification challenges, especially for Palobert. This study contributes significantly to the field of sentiment analysis with Greek-language data by introducing original annotated datasets, pioneering the application of topic-based sentiment analysis within the Greek educational context, and offering a comparative evaluation of transformer models. Additionally, it highlights the superior performance of Greek-pretrained models in capturing emotional detail, and provides empirical evidence of the negative emotional responses toward Emergency Remote Teaching. Full article
Show Figures

Figure 1

22 pages, 2514 KB  
Article
High-Accuracy Recognition Method for Diseased Chicken Feces Based on Image and Text Information Fusion
by Duanli Yang, Zishang Tian, Jianzhong Xi, Hui Chen, Erdong Sun and Lianzeng Wang
Animals 2025, 15(15), 2158; https://doi.org/10.3390/ani15152158 - 22 Jul 2025
Viewed by 636
Abstract
Poultry feces, a critical biomarker for health assessment, requires timely and accurate pathological identification for food safety. Conventional visual-only methods face limitations due to environmental sensitivity and high visual similarity among feces from different diseases. To address this, we propose MMCD (Multimodal Chicken-feces [...] Read more.
Poultry feces, a critical biomarker for health assessment, requires timely and accurate pathological identification for food safety. Conventional visual-only methods face limitations due to environmental sensitivity and high visual similarity among feces from different diseases. To address this, we propose MMCD (Multimodal Chicken-feces Diagnosis), a ResNet50-based multimodal fusion model leveraging semantic complementarity between images and descriptive text to enhance diagnostic precision. Key innovations include the following: (1) Integrating MASA(Manhattan self-attention)and DSconv (Depthwise Separable convolution) into the backbone network to mitigate feature confusion. (2) Utilizing a pre-trained BERT to extract textual semantic features, reducing annotation dependency and cost. (3) Designing a lightweight Gated Cross-Attention (GCA) module for dynamic multimodal fusion, achieving a 41% parameter reduction versus cross-modal transformers. Experiments demonstrate that MMCD significantly outperforms single-modal baselines in Accuracy (+8.69%), Recall (+8.72%), Precision (+8.67%), and F1 score (+8.72%). It surpasses simple feature concatenation by 2.51–2.82% and reduces parameters by 7.5M and computations by 1.62 GFLOPs versus the base ResNet50. This work validates multimodal fusion’s efficacy in pathological fecal detection, providing a theoretical and technical foundation for agricultural health monitoring systems. Full article
(This article belongs to the Section Animal Welfare)
Show Figures

Figure 1

28 pages, 2518 KB  
Article
Enhancing Keyword Spotting via NLP-Based Re-Ranking: Leveraging Semantic Relevance Feedback in the Handwritten Domain
by Stergios Papazis, Angelos P. Giotis and Christophoros Nikou
Electronics 2025, 14(14), 2900; https://doi.org/10.3390/electronics14142900 - 20 Jul 2025
Viewed by 1110
Abstract
Handwritten Keyword Spotting (KWS) remains a challenging task, particularly in segmentation-free scenarios where word images must be retrieved and ranked based on their similarity to a query without relying on prior page-level segmentation. Traditional KWS methods primarily focus on visual similarity, often overlooking [...] Read more.
Handwritten Keyword Spotting (KWS) remains a challenging task, particularly in segmentation-free scenarios where word images must be retrieved and ranked based on their similarity to a query without relying on prior page-level segmentation. Traditional KWS methods primarily focus on visual similarity, often overlooking the underlying semantic relationships between words. In this work, we propose a novel NLP-driven re-ranking approach that refines the initial ranked lists produced by state-of-the-art KWS models. By leveraging semantic embeddings from pre-trained BERT-like Large Language Models (LLMs, e.g., RoBERTa, MPNet, and MiniLM), we introduce a relevance feedback mechanism that improves both verbatim and semantic keyword spotting. Our framework operates in two stages: (1) projecting retrieved word image transcriptions into a semantic space via LLMs and (2) re-ranking the retrieval list using a weighted combination of semantic and exact relevance scores based on pairwise similarities with the query. We evaluate our approach on the widely used George Washington (GW) and IAM collections using two cutting-edge segmentation-free KWS models, which are further integrated into our proposed pipeline. Our results show consistent gains in Mean Average Precision (mAP), with improvements of up to 2.3% (from 94.3% to 96.6%) on GW and 3% (from 79.15% to 82.12%) on IAM. Even when mAP gains are smaller, qualitative improvements emerge: semantically relevant but inexact matches are retrieved more frequently without compromising exact match recall. We further examine the effect of fine-tuning transformer-based OCR (TrOCR) models on historical GW data to align textual and visual features more effectively. Overall, our findings suggest that semantic feedback can enhance retrieval effectiveness in KWS pipelines, paving the way for lightweight hybrid vision-language approaches in handwritten document analysis. Full article
(This article belongs to the Special Issue AI Synergy: Vision, Language, and Modality)
Show Figures

Figure 1

29 pages, 1234 KB  
Article
Automatic Detection of the CaRS Framework in Scholarly Writing Using Natural Language Processing
by Olajide Omotola, Nonso Nnamoko, Charles Lam, Ioannis Korkontzelos, Callum Altham and Joseph Barrowclough
Electronics 2025, 14(14), 2799; https://doi.org/10.3390/electronics14142799 - 11 Jul 2025
Viewed by 821
Abstract
Many academic introductions suffer from inconsistencies and a lack of comprehensive structure, often failing to effectively outline the core elements of the research. This not only impacts the clarity and readability of the article but also hinders the communication of its significance and [...] Read more.
Many academic introductions suffer from inconsistencies and a lack of comprehensive structure, often failing to effectively outline the core elements of the research. This not only impacts the clarity and readability of the article but also hinders the communication of its significance and objectives to the intended audience. This study aims to automate the CaRS (Creating a Research Space) model using machine learning and natural language processing techniques. We conducted a series of experiments using a custom-developed corpus of 50 biology research article introductions, annotated with rhetorical moves and steps. The dataset was used to evaluate the performance of four classification algorithms: Prototypical Network (PN), Support Vector Machines (SVM), Naïve Bayes (NB), and Random Forest (RF); in combination with six embedding models: Word2Vec, GloVe, BERT, GPT-2, Llama-3.2-3B, and TEv3-small. Multiple experiments were carried out to assess performance at both the move and step levels using 5-fold cross-validation. Evaluation metrics included accuracy and weighted F1-score, with comprehensive results provided. Results show that the SVM classifier, when paired with Llama-3.2-3B embeddings, consistently achieved the highest performance across multiple tasks when trained on preprocessed dataset, with 79% accuracy and weighted F1-score on rhetorical moves and strong results on M2 steps (75% accuracy and weighted F1-score). While other combinations showed promise, particularly NB and RF with newer embeddings, none matched the consistency of the SVM–Llama pairing. Compared to existing benchmarks, our model achieves similar or better performance; however, direct comparison is limited due to differences in datasets and experimental setups. Despite the unavailability of the benchmark dataset, our findings indicate that SVM is an effective choice for rhetorical classification, even in few-shot learning scenarios. Full article
Show Figures

Figure 1

22 pages, 3822 KB  
Article
Human Extravillous Trophoblasts Require SRC-2 for Sustained Viability, Migration, and Invasion
by Vineet K. Maurya, Pooja Popli, Bryan C. Nikolai, David M. Lonard, Ramakrishna Kommagani, Bert W. O’Malley and John P. Lydon
Cells 2025, 14(13), 1024; https://doi.org/10.3390/cells14131024 - 4 Jul 2025
Viewed by 1016
Abstract
Defective placentation is a recognized etiology for several gestational complications that include early pregnancy loss, preeclampsia, and intrauterine growth restriction. Sustained viability, migration, and invasion are essential cellular properties for embryonic extravillous trophoblasts to execute their roles in placental development and function, while [...] Read more.
Defective placentation is a recognized etiology for several gestational complications that include early pregnancy loss, preeclampsia, and intrauterine growth restriction. Sustained viability, migration, and invasion are essential cellular properties for embryonic extravillous trophoblasts to execute their roles in placental development and function, while derailment of these cellular processes is linked to placental disorders. Although the cellular functions of extravillous trophoblasts are well recognized, our understanding of the pivotal molecular determinants of these functions is incomplete. Using the HTR-8/SVneo immortalized human extravillous trophoblast cell line, we report that steroid receptor coactivator-2 (SRC-2), a coregulator of transcription factor-mediated gene expression, is essential for extravillous trophoblast cell viability, motility, and invasion. Genome-scale transcriptomics identified an SRC-2-dependent transcriptome in HTR-8/SVneo cells that encodes a diverse spectrum of proteins involved in placental tissue development and function. Underscoring the utility of this transcriptomic dataset, we demonstrate that WNT family member 9A (WNT 9A) is not only regulated by SRC-2 but is also crucial for maintaining many of the above SRC-2-dependent cellular functions of human extravillous trophoblasts. Full article
Show Figures

Figure 1

19 pages, 914 KB  
Article
RU-OLD: A Comprehensive Analysis of Offensive Language Detection in Roman Urdu Using Hybrid Machine Learning, Deep Learning, and Transformer Models
by Muhammad Zain, Nisar Hussain, Amna Qasim, Gull Mehak, Fiaz Ahmad, Grigori Sidorov and Alexander Gelbukh
Algorithms 2025, 18(7), 396; https://doi.org/10.3390/a18070396 - 28 Jun 2025
Cited by 2 | Viewed by 969
Abstract
The detection of abusive language in Roman Urdu is important for secure digital interaction. This work investigates machine learning (ML), deep learning (DL), and transformer-based methods for detecting offensive language in Roman Urdu comments collected from YouTube news channels. Extracted features use TF-IDF [...] Read more.
The detection of abusive language in Roman Urdu is important for secure digital interaction. This work investigates machine learning (ML), deep learning (DL), and transformer-based methods for detecting offensive language in Roman Urdu comments collected from YouTube news channels. Extracted features use TF-IDF and Count Vectorizer for unigrams, bigrams, and trigrams. Of all the ML models—Random Forest (RF), Logistic Regression (LR), Support Vector Machine (SVM), and Naïve Bayes (NB)—the best performance was achieved by the same SVM. DL models involved evaluating Bi-LSTM and CNN models, where the CNN model outperformed the others. Moreover, transformer variants such as LLaMA 2 and ModernBERT (MBERT) were instantiated and fine-tuned with LoRA (Low-Rank Adaptation) for better efficiency. LoRA has been tuned for large language models (LLMs), a family of advanced machine learning frameworks, based on the principle of making the process efficient with extremely low computational cost with better enhancement. According to the experimental results, LLaMA 2 with LoRA attained the highest F1-score of 96.58%, greatly exceeding the performance of other approaches. To elaborate, LoRA-optimized transformers perform well in capturing detailed subtleties of linguistic nuances, lending themselves well to Roman Urdu offensive language detection. The study compares the performance of conventional and contemporary NLP methods, highlighting the relevance of effective fine-tuning methods. Our findings pave the way for scalable and accurate automated moderation systems for online platforms supporting multiple languages. Full article
(This article belongs to the Topic Applications of NLP, AI, and ML in Software Engineering)
Show Figures

Figure 1

Back to TopTop