Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (5,084)

Search Parameters:
Keywords = pretraining

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 336 KB  
Review
Training Methods for Large Language Models: Current Approaches and Challenges
by Dimitris Karydas, Dimosthenis Margaritis and Helen C. Leligou
Technologies 2026, 14(2), 133; https://doi.org/10.3390/technologies14020133 - 19 Feb 2026
Abstract
Large Language Models (LLMs) have emerged as a dominant paradigm in natural language processing, demonstrating strong performance across a wide range of generation and reasoning tasks. These systems depend on multi-stage training pipelines that integrate large-scale self-supervised pre-training, supervised fine-tuning, and alignment techniques. [...] Read more.
Large Language Models (LLMs) have emerged as a dominant paradigm in natural language processing, demonstrating strong performance across a wide range of generation and reasoning tasks. These systems depend on multi-stage training pipelines that integrate large-scale self-supervised pre-training, supervised fine-tuning, and alignment techniques. This paper presents a systematic mapping study of contemporary LLM training methodologies, emphasizing transformer-based architectures, optimization objectives, and data curation strategies as well as emerging sparse architectures such as Mixture-of-Experts (MoE) models. We analyze parameter-efficient fine-tuning approaches, retrieval-augmented generation frameworks, and multimodal training techniques, which we organize into a unified comparative taxonomy. We discuss key technical challenges such as scalability constraints, hallucination, bias amplification, and alignment–capability tradeoffs, then identify emerging research directions such as reasoning-centric training. This work provides a concise technical reference for researchers and practitioners working on scalable and reliable language model training. Full article
(This article belongs to the Section Information and Communication Technologies)
Show Figures

Graphical abstract

19 pages, 20762 KB  
Article
Asymmetric Explicit Synergy for Multi-Modal 3D Gaussian Pre-Training in Autonomous Driving
by Dingwei Zhang, Jie Ji, Chengjun Huang, Bichun Li, Chennian Yu, Chenhui Qu, Zhengyuan Yang, Chen Hua and Biao Yu
World Electr. Veh. J. 2026, 17(2), 102; https://doi.org/10.3390/wevj17020102 - 19 Feb 2026
Abstract
Generative pre-training via neural rendering has become a cornerstone for scaling 3D perception in autonomous driving. However, prevalent approaches relying on implicit Neural Radiance Fields (NeRFs) face two fundamental limitations: the shape-radiance ambiguity inherent in vision-centric optimization and the prohibitive computational overhead of [...] Read more.
Generative pre-training via neural rendering has become a cornerstone for scaling 3D perception in autonomous driving. However, prevalent approaches relying on implicit Neural Radiance Fields (NeRFs) face two fundamental limitations: the shape-radiance ambiguity inherent in vision-centric optimization and the prohibitive computational overhead of volumetric ray marching. To address these challenges, we propose AES-Gaussian, a novel multi-modal pre-training framework grounded in the efficient 3D Gaussian Splatting (3DGS) representation. Diverging from symmetric fusion paradigms, our core innovation is an Asymmetric Encoder architecture that couples a deep semantic vision backbone with a lightweight, physics-aware LiDAR branch. In this framework, LiDAR data serve not merely for semantic extraction, but as sparse physical anchors. By employing a novel Explicit Feature Synergy mechanism, we directly inject raw LiDAR intensity and depth priors into the Gaussian decoding process, thereby rigidly constraining scene geometry in open-world environments. Extensive empirical validation on the nuScenes dataset demonstrates the superiority of our approach. AES-Gaussian achieves state-of-the-art transfer performance, yielding a substantial 7.0% improvement in NDS for 3D Object Detection and a 4.8% mIoU gain in 3D semantic occupancy prediction compared to baselines. Notably, our method reduces geometric reconstruction error by over 50% while significantly improving training and inference efficiency, attributed to the streamlined asymmetric design and rapid Gaussian rasterization. Ultimately, by enhancing both perception accuracy and system efficiency, this work contributes to the development of safer and more reliable autonomous driving systems. Full article
(This article belongs to the Section Automated and Connected Vehicles)
Show Figures

Figure 1

16 pages, 585 KB  
Article
TDA-Phys: Temporal Difference Adaptation of Video Foundation Model for Remote Photoplethysmography
by Wei Chen, Yinghao Ding, Kunze Bu, Ming Yu and Hang Wu
Appl. Sci. 2026, 16(4), 2038; https://doi.org/10.3390/app16042038 - 19 Feb 2026
Abstract
Remote photoplethysmography (rPPG) enables noncontact estimation of vital signs, particularly heart rate, by analyzing subtle periodic skin color variations in facial videos. While deep learning has advanced rPPG signal extraction, existing methods rely on carefully designed task-specific architectures that are costly to develop [...] Read more.
Remote photoplethysmography (rPPG) enables noncontact estimation of vital signs, particularly heart rate, by analyzing subtle periodic skin color variations in facial videos. While deep learning has advanced rPPG signal extraction, existing methods rely on carefully designed task-specific architectures that are costly to develop and generalize poorly. In this work, we demonstrate that the general video foundation model VideoMAE v2 can be effectively adapted to the rPPG signal regression task by introducing only a lightweight adapter, without modifying its pretrained backbone. We freeze the entire VideoMAE v2 encoder and introduce a Temporal Difference Convolutional Adapter to capture the subtle interframe intensity differences. To address the mismatch between VideoMAE v2′s short input window (16 frames) and the long temporal context typically required for robust rPPG extraction (e.g., 160 frames), we adopt an overlapping sliding window strategy for segmented inference and reconstruct the full signal through weighted temporal aggregation. On the COHFACE and UBFC-rPPG datasets, our method achieves mean absolute errors (MAEs) of 0.90 and 1.55, reducing the error by more than 55% and 42%, respectively, compared to PhysFormer (2.00 and 2.70). Furthermore, on challenging real-world datasets such as BUAA-MIHR, which features strong illumination variations, and VIPL-HR, which involves significant head movements, our approach achieves MAEs of 6.68 and 8.23, respectively, despite incorporating no task-specific robustness modules. These results demonstrate stable rPPG signal recovery and validate the feasibility of leveraging general video foundation models for physiological signal perception. Full article
29 pages, 14455 KB  
Review
Few-Shot Semantic Segmentation in Remote Sensing: A Review on Definitions, Methods, Datasets, Advances and Future Trends
by Marko Petrov, Ema Pandilova, Ivica Dimitrovski, Dimitar Trajanov, Vlatko Spasev and Ivan Kitanovski
Remote Sens. 2026, 18(4), 637; https://doi.org/10.3390/rs18040637 - 18 Feb 2026
Viewed by 47
Abstract
Semantic segmentation in remote sensing images, which is the task of classifying each pixel of the image in a specific category, is widely used in areas such as disaster management, environmental monitoring, precision agriculture, and many others. However, traditional semantic segmentation methods face [...] Read more.
Semantic segmentation in remote sensing images, which is the task of classifying each pixel of the image in a specific category, is widely used in areas such as disaster management, environmental monitoring, precision agriculture, and many others. However, traditional semantic segmentation methods face a major challenge: they require large amounts of annotated data to train effectively. To tackle this challenge, few-shot semantic segmentation has been introduced, where the models can learn and adapt quickly to new classes from just a few annotated samples. This paper presents a comprehensive review of recent advances in few-shot semantic segmentation (FSSS) for remote sensing, covering datasets, methods, and emerging research directions. We first outline the fundamental principles of few-shot learning and summarize commonly used remote-sensing benchmarks, emphasizing their scale, geographic diversity, and relevance to episodic evaluation. Next, we categorize FSSS methods into major families (meta-learning, conditioning-based, and foundation-assisted approaches) and analyze how architectural choices, pretraining strategies, and inference protocols influence performance. The discussion highlights empirical trends across datasets, the behavior of different conditioning mechanisms, the impact of self-supervised and multimodal pretraining, and the role of reproducibility and evaluation design. Finally, we identify key challenges and future trends, including benchmark standardization, integration with foundation and multimodal models, efficiency at scale, and uncertainty-aware adaptation. Collectively, they signal a shift toward unified, adaptive models capable of segmenting novel classes across sensors, regions, and temporal domains with minimal supervision. Full article
Show Figures

Figure 1

12 pages, 2260 KB  
Article
PDCG: A Diffusion Model Guided by Pre-Training for Molecular Conformation Generation
by Yanchen Liu, Yameng Zheng, Amina Tariq, Xiaofei Nan, Lingbo Qu and Jinshuai Song
Chemistry 2026, 8(2), 29; https://doi.org/10.3390/chemistry8020029 - 18 Feb 2026
Viewed by 47
Abstract
Background: While machine learning has advanced molecular conformation generation, existing models often suffer from limited generalization and inaccuracies, especially for complex molecular structures. These limitations hinder their reliability in downstream applications. Methods: We proposed a molecular conformation model combined with a molecular graph [...] Read more.
Background: While machine learning has advanced molecular conformation generation, existing models often suffer from limited generalization and inaccuracies, especially for complex molecular structures. These limitations hinder their reliability in downstream applications. Methods: We proposed a molecular conformation model combined with a molecular graph pre-training module and a diffusion model (PDCG). Feature embeddings are obtained from a pre-trained model and concatenated with the molecular graph information. Fusion features are used for generating conformations in the model. The model was trained and evaluated on the GEOM-QM9 and GEOM-Drugs datasets. Results: PDCG significantly outperforms existing baselines, which shows markedly superior results. Furthermore, in downstream molecular property prediction tasks, conformations generated by PDCG yield results comparable to those derived from DFT-optimized geometries. Conclusions: Our work provides a robust and generalizable model for accurate conformation generation. PDCG offers a reliable tool for downstream computational tasks, such as the virtual screening of functional materials and drug-like molecules. Full article
(This article belongs to the Special Issue AI and Big Data in Chemistry)
Show Figures

Graphical abstract

24 pages, 6631 KB  
Article
Application of Computer Vision to the Automated Extraction of Metadata from Natural History Specimen Labels: A Case Study on Herbarium Specimens
by Jacopo Zacchigna, Weiwei Liu, Felice Andrea Pellegrino, Adriano Peron, Francesco Roma-Marzio, Lorenzo Peruzzi and Stefano Martellos
Plants 2026, 15(4), 637; https://doi.org/10.3390/plants15040637 - 17 Feb 2026
Viewed by 104
Abstract
Metadata extraction from natural history collection labels is a pivotal task for the online publication of digitized specimens. However, given the scale of these collections—which are estimated to host more than 2 billion specimens worldwide, including ca. 400 million herbarium specimens—manual metadata extraction [...] Read more.
Metadata extraction from natural history collection labels is a pivotal task for the online publication of digitized specimens. However, given the scale of these collections—which are estimated to host more than 2 billion specimens worldwide, including ca. 400 million herbarium specimens—manual metadata extraction is an extremely time-consuming task. Thus, automated data extraction from digital images of specimens and their labels therefore is a promising application of state-of-the-art computer vision techniques. Extracting information from herbarium specimen labels normally involves three main steps: text segmentation, multilingual and handwriting recognition, and data parsing. The primary bottleneck in this workflow lies in the limitations of Optical Character Recognition (OCR) systems. This study explores how the general knowledge embedded in multimodal Transformer models can be transferred to the specific task of herbarium specimen label digitization. The final goal is to develop an easy-to-use, end-to-end solution to mitigate the limitations of classic OCR approaches while offering greater flexibility to adapt to different label formats. Donut-base, a pre-trained visual document understanding (VDU) transformer, was the base model selected for fine-tuning. A dataset from the University of Pisa served as a test bed. The initial attempt achieved an accuracy of 85%, measured using the Tree Edit Distance (TED), demonstrating the feasibility of fine-tuning for this task. Cases with low accuracies were also investigated to identify limitations of the approach. In particular, specimens with multiple labels, especially if combining handwritten and typewritten text, proved to be the most challenging. Strategies aimed at addressing these weaknesses are discussed. Full article
16 pages, 668 KB  
Article
Evaluation of a Company’s Media Reputation Based on the Articles Published on News Portals
by Algimantas Venčkauskas, Vacius Jusas and Dominykas Barisas
Appl. Sci. 2026, 16(4), 1987; https://doi.org/10.3390/app16041987 - 17 Feb 2026
Viewed by 88
Abstract
A company’s reputation is an important, intangible asset, which is heavily influenced by media reputation. We developed a method to measure a company’s reputation based on sentiments detected in online articles. The sentiment of each sentence was evaluated and categorized into one of [...] Read more.
A company’s reputation is an important, intangible asset, which is heavily influenced by media reputation. We developed a method to measure a company’s reputation based on sentiments detected in online articles. The sentiment of each sentence was evaluated and categorized into one of three polarities: positive, negative, or neutral. Then, we developed another method to assess a company’s media reputation using all available online articles about the company. The company’s media reputation is presented as a tuple consisting of their media reputation on a scale from 0 to 100, the number of articles related to the company, and the margin of error. Experiments were conducted using articles written in Lithuanian published on major news portals. We used two different tools to assess the sentiments of the articles: Stanford CoreNLP v.4.5.10, combined with Google API, and the pre-trained transformer model XLM-RoBERTa. Google API was used for translation into English, as Stanford CoreNLP does not support the Lithuanian language. The results obtained were compared with those of existing methods, based on the coefficients of media endorsement and media favorableness, showing that the results of the proposed method are less moderate than the coefficient of media favorableness and less extreme than the coefficient of media endorsement. Full article
(This article belongs to the Special Issue Multimodal Emotion Recognition and Affective Computing)
Show Figures

Figure 1

8 pages, 6865 KB  
Proceeding Paper
Evaluating Semantic Segmentation Performance Using DeepLabv3+ with Pretrained ResNet Backbones and Multi-Class Annotations
by Matej Spajić, Marija Habijan, Danijel Marinčić and Irena Galić
Eng. Proc. 2026, 125(1), 23; https://doi.org/10.3390/engproc2026125023 - 16 Feb 2026
Viewed by 108
Abstract
Semantic segmentation is a critical task in computer vision, enabling dense classification of image regions. This work investigates the effectiveness of the DeepLabv3+ architecture for binary semantic segmentation using annotated image data. A pretrained ResNet-101 backbone is employed to extract deep features, while [...] Read more.
Semantic segmentation is a critical task in computer vision, enabling dense classification of image regions. This work investigates the effectiveness of the DeepLabv3+ architecture for binary semantic segmentation using annotated image data. A pretrained ResNet-101 backbone is employed to extract deep features, while Atrous Spatial Pyramid Pooling (ASPP) and a decoder module refine the segmentation outputs. The dataset provides per-image annotations indicating class presence, which are leveraged to approximate segmentation masks for training purposes. Various data augmentation techniques and training strategies were applied to support effective learning and reduce overfitting. Experimental results on the MHIST dataset show that the proposed pipeline achieves strong performance despite the lack of pixel-level annotations, with a mean Intersection-over-Union (mIoU) of 0.76 and a mean Dice coefficient of 0.84. These confirm the potential of weakly supervised segmentation using class-aware CAMs and deep pretrained encoders for structured pixel-level prediction tasks in medical imaging. Full article
Show Figures

Figure 1

19 pages, 1381 KB  
Article
Mitigating Hallucinations in Knowledge Graph Completion via Embedding-Guided Instruction Tuning
by Pengfei Zhang, Xing Xu, Junying Wu, Xin Lu, Jiahao Shi, Xiaodong Zhang, Dezhi Cui, Xiuxian Peng, Sihao He, Ping Zong, Guoxin Zhang, Zhonghong Ou, Meina Song and Yifan Zhu
Information 2026, 17(2), 207; https://doi.org/10.3390/info17020207 - 16 Feb 2026
Viewed by 110
Abstract
Real-world Knowledge Graphs (KGs) are inherently incomplete, which hinders effective downstream reasoning. While Large Language Models (LLMs) possess powerful semantic capabilities, directly applying them to Knowledge Graph Completion (KGC) often leads to hallucinations and a lack of structural awareness. To address these challenges, [...] Read more.
Real-world Knowledge Graphs (KGs) are inherently incomplete, which hinders effective downstream reasoning. While Large Language Models (LLMs) possess powerful semantic capabilities, directly applying them to Knowledge Graph Completion (KGC) often leads to hallucinations and a lack of structural awareness. To address these challenges, we propose Embedding-Guided Instruction Tuning (EGIT), a novel framework that synergizes the structural precision of embedding models with the semantic reasoning of LLMs. Our approach operates in three key stages: (1) utilizing pre-trained embedding models to automatically synthesize high-quality, annotation-free instruction data; (2) fine-tuning the LLM with these structure-aware instructions to adapt it to the KGC task; and (3) employing a joint inference mechanism where the embedding model retrieves candidates and the fine-tuned LLM performs the final selection, thereby significantly reducing hallucinations. In extensive experiments, the best variant of EGIT achieves 7.0% and 2.5% improvements in Hits@1 on the FB15k-237 and WN18RR datasets, respectively. Full article
(This article belongs to the Section Artificial Intelligence)
30 pages, 2117 KB  
Article
Automated Structuring and Analysis of Unstructured Equipment Maintenance Text Data in Manufacturing Using Generative AI Models: A Comparative Study of Pre-Trained Language Models
by Yongju Cho
Appl. Sci. 2026, 16(4), 1969; https://doi.org/10.3390/app16041969 - 16 Feb 2026
Viewed by 173
Abstract
Manufacturing companies face significant challenges in leveraging artificial intelligence for equipment management due to high infrastructure costs and limited availability of labeled data for failures. While most manufacturing AI applications focus on structured sensor data, vast amounts of unstructured textual information containing valuable [...] Read more.
Manufacturing companies face significant challenges in leveraging artificial intelligence for equipment management due to high infrastructure costs and limited availability of labeled data for failures. While most manufacturing AI applications focus on structured sensor data, vast amounts of unstructured textual information containing valuable maintenance knowledge remain underutilized. This study presents a practical generative AI-based framework for structured information extraction that automatically converts unstructured equipment maintenance texts into predefined semantic fields to support predictive maintenance in manufacturing environments. We adopted and evaluated three representative generative models—Bidirectional and Auto-Regressive Transformers (BART) with KoBART, Text-to-Text Transfer Transformer (T5) with pko-t5-base, and the large language model Qwen—to generate structured outputs by extracting three predefined fields: failed components, failure types, and corrective actions. The framework enables the structuring of equipment management text data from Manufacturing Execution Systems (MES) to build predictive maintenance support systems. We validated the approach using a large-scale MES dataset consisting of 29,736 equipment maintenance records from a major automotive parts manufacturer, from which curated subsets were used for model training and evaluation. Our methodology employs Generative Pre-trained Transformer 4 (GPT-4) for initial dataset construction, followed by domain expert validation to ensure data quality. The trained models achieved promising performance when evaluated using extraction-aligned metrics, including exact match (EM) and token-level precision, recall, and F1-score, which directly assess field-level extraction correctness. ROUGE scores are additionally reported as a supplementary indicator of lexical overlap. Among the evaluated models, Qwen consistently outperformed BART and T5 across all extracted fields. The structured outputs are further processed through domain-specific dictionaries and regular expressions to create a comprehensive analytical database supporting predictive maintenance strategies. We implemented a web-based analytics platform enabling time-series analysis, correlation analysis, frequency analysis, and anomaly detection for equipment maintenance optimization. The proposed system converts tacit knowledge embedded in maintenance texts into explicit, actionable insights without requiring additional sensor installations or infrastructure investments. This research contributes to the manufacturing AI field by demonstrating a comprehensive application of generative language models to equipment maintenance text analysis, providing a cost-effective approach for digital transformation in manufacturing environments. The framework’s scalability and cloud-based deployment model present significant opportunities for widespread adoption in the manufacturing sector, supporting the transition from reactive to predictive maintenance strategies. Full article
Show Figures

Figure 1

24 pages, 11174 KB  
Article
JMSC: Joint Spatial–Temporal Modeling with Semantic Completion for Audio–Visual Learning
by Xinfu Xu, Fan Yang and Zhibin Yu
Sensors 2026, 26(4), 1288; https://doi.org/10.3390/s26041288 - 16 Feb 2026
Viewed by 176
Abstract
‌Audio–visual learning‌ seeks to achieve holistic scene understanding by integrating auditory and visual cues. Early research focused on fully fine-tuning pre-trained models, incurring high computational costs. Consequently, recent studies have adopted ‌parameter-efficient tuning‌ methods to adapt large-scale vision models to the audio–visual domain. [...] Read more.
‌Audio–visual learning‌ seeks to achieve holistic scene understanding by integrating auditory and visual cues. Early research focused on fully fine-tuning pre-trained models, incurring high computational costs. Consequently, recent studies have adopted ‌parameter-efficient tuning‌ methods to adapt large-scale vision models to the audio–visual domain. Despite the competitive performance of existing methods, several challenges persist. Firstly, effectively leveraging the ‌complementary semantics‌ between the audio and visual modalities remains difficult, as these two modalities capture fundamentally different aspects of a video. Secondly, comprehending ‌dynamic video context is challenging because both spatial attributes (such as scale) and temporal characteristics (such as motion) of objects co-evolve over time, making semantic comprehension more complex. To address these challenges, we propose a novel framework, named Joint Spatial–Temporal Modeling with Semantic Completion (JMSC). JMSC introduces cross-modal latent reconstruction, which moves beyond shallow correlation by encouraging the model to reconstruct one modality’s complete semantic summary from a masked version of its counterpart. Furthermore, JMSC learns a unified representation of video spatial attributes and temporal changes by jointly modeling them under audio guidance, enabling accurate localization and consistent tracking in dynamic video scenes. Experimental results demonstrate that JMSC achieves state-of-the-art performance across multiple downstream tasks while maintaining high computational efficiency. Full article
17 pages, 3413 KB  
Article
DRAG: Dual-Channel Retrieval-Augmented Generation for Hybrid-Modal Document Understanding
by Zhe Xin, Shuyuan Xia and Xin Guo
Electronics 2026, 15(4), 843; https://doi.org/10.3390/electronics15040843 - 16 Feb 2026
Viewed by 103
Abstract
Large Language Models (LLMs) have acquired vast amounts of knowledge during pre-training. However, there are a lot of challenges when it is deployed in real-world applications, such as poor interpretability, hallucinations, and the inability to reference private data. To address these issues, Retrieval-Augmented [...] Read more.
Large Language Models (LLMs) have acquired vast amounts of knowledge during pre-training. However, there are a lot of challenges when it is deployed in real-world applications, such as poor interpretability, hallucinations, and the inability to reference private data. To address these issues, Retrieval-Augmented Generation (RAG) has been proposed. Traditional RAG relying on text-based retrievers often converts documents using Optical Character Recognition (OCR) before retrieval. While testing has revealed that it tends to overlook tables and images contained within the documents. RAG, relying on vision-based retrievers, often loses information on text-dense pages. To address these limitations, we propose DRAG: Dual-channel Retrieval-Augmented Generation for Hybrid-Modal Document Understanding, a novel retrieval paradigm. The DRAG method proposed in this paper primarily comprises two core improvements: first, a parallel dual-channel processing architecture is adopted to separately extract and preserve the visual structural information and deep semantic information of documents, thereby effectively enhancing information integrity; second, a novel dynamic weighted fusion mechanism is proposed to integrate the retrieval results from both channels, enabling precise screening of the most relevant information segments. Empirical results demonstrate that our method achieves Competitive performance across multiple general benchmarks. Furthermore, performance on biomedical datasets (e.g., BioM) specifically highlights its potential in specialized, vertical domains such as elderly care and rehabilitation, where documents are characterized by dense hybrid-modal information. Full article
(This article belongs to the Special Issue AI-Driven Intelligent Systems in Energy, Healthcare, and Beyond)
21 pages, 1722 KB  
Article
Cyberbullying Detection Based on Hybrid Neural Networks and Multi-Feature Fusion
by Junkuo Cao, Yunpeng Xiong, Weiquan Wang and Guolian Chen
Information 2026, 17(2), 205; https://doi.org/10.3390/info17020205 - 16 Feb 2026
Viewed by 83
Abstract
Cyberbullying demonstrates notable metaphorical and contextual traits, characterized by a high-dimensional sparse semantic space and dynamic evolution. Pre-trained models utilize extensive textual data for learning and employ transformer-based word vector generation techniques to accurately capture intricate semantics and nuanced syntax in text. However, [...] Read more.
Cyberbullying demonstrates notable metaphorical and contextual traits, characterized by a high-dimensional sparse semantic space and dynamic evolution. Pre-trained models utilize extensive textual data for learning and employ transformer-based word vector generation techniques to accurately capture intricate semantics and nuanced syntax in text. However, although a single pre-trained model demonstrates strong performance in contextual modeling, it still faces challenges including inadequate feature representation and limited generalization capability in classifying cyberbullying texts. This study proposes a cyberbullying detection model employing BERT-BiGRU-CNN (BBGC) to address this issue. The BBGC model initially employs BERT to produce word embeddings, subsequently inputs them into a BiGRU layer to acquire sequence features, and finally utilizes a CNN for the extraction of local features. The features derived from BERT, BiGRU, and CNN are integrated, followed by the application of the softmax function to yield the final outcome of cyberbullying detection. Experimental findings indicate that the BBGC fusion model surpasses individual pre-trained models in the task of detecting cyberbullying text. Furthermore, in comparison to hybrid neural network models utilizing RoBERTa, ALBERT, DistilBERT and other pre-trained models, the BBGC model demonstrates considerable advantages. Full article
31 pages, 5533 KB  
Article
Comparative Evaluation of Fusion Strategies Using Multi-Pretrained Deep Learning Fusion-Based (MPDLF) Model for Histopathology Image Classification
by Fatma Alshohoumi and Abdullah Al-Hamdani
Appl. Sci. 2026, 16(4), 1964; https://doi.org/10.3390/app16041964 - 16 Feb 2026
Viewed by 112
Abstract
Histopathological image analysis remains the cornerstone of cancer diagnosis; however, manual assessment is challenged by stain variability, differences in imaging magnification, and complex morphological patterns. The proposed multi-pretrained deep learning fusion (MPDLF) approach combines two widely used CNN architectures: ResNet50, which captures deeper [...] Read more.
Histopathological image analysis remains the cornerstone of cancer diagnosis; however, manual assessment is challenged by stain variability, differences in imaging magnification, and complex morphological patterns. The proposed multi-pretrained deep learning fusion (MPDLF) approach combines two widely used CNN architectures: ResNet50, which captures deeper semantic representations, and VGG16, which extracts fine-grained details. This work differs from previous fusion studies by providing a controlled evaluation of early, intermediate, and late fusion for integrating two pretrained CNN backbones (ResNet50 and VGG16) under single-modality histopathology constraints. To isolate the fusion effect, identical training settings are used across three public H&E datasets. Early fusion achieved the best test performance for the two primary tasks reported here: breast cancer binary classification (accuracy = 0.9070, 95% CI: 0.8742–0.9404; AUC = 0.9707, 95% CI: 0.9541–0.9844) and renal clear cell carcinoma (RCCC) five-class grading (accuracy = 0.8792, 95% CI: 0.8529–0.9041; AUC (OvR, macro) = 0.9895, 95% CI: 0.9859–0.9927). Future work will extend these experiments to additional magnification levels (100×, 200×, and 400×) for breast cancer histopathology images and explore advanced hybrid fusion strategies across different histopathology datasets. Full article
(This article belongs to the Special Issue AI for Medical Systems: Algorithms, Applications, and Challenges)
Show Figures

Figure 1

18 pages, 2759 KB  
Article
Research on Lightweight Rose Disease Detection Based on Transferable Feature Representation
by Li Liu, Tao Yin, Yuyan Bai, Bingjie Yang and Jianping Yang
Plants 2026, 15(4), 623; https://doi.org/10.3390/plants15040623 - 16 Feb 2026
Viewed by 110
Abstract
Rose leaf diseases severely reduce yield and product quality, and traditional disease monitoring relies on manual visual inspection by experts, which is inefficient for large-scale cultivation. However, deploying accurate and lightweight detectors in field environments remains challenging due to two main obstacles. First, [...] Read more.
Rose leaf diseases severely reduce yield and product quality, and traditional disease monitoring relies on manual visual inspection by experts, which is inefficient for large-scale cultivation. However, deploying accurate and lightweight detectors in field environments remains challenging due to two main obstacles. First, models trained under controlled laboratory conditions suffer performance degradation due to domain shift when deployed in complex field environments. Second, the computational capacity of hardware deployable in the field is often limited. To address these problems, this study proposes a practical knowledge distillation approach based on transferable feature representations from a pre-trained teacher model, rather than on complex distillation architecture. A high-capacity YOLOv12-L teacher, pre-trained on laboratory images, guided the training of a compact YOLOv12-N student using field images. The distilled YOLOv12-N student model achieved an mAP@50 of 81.1% on field test set, representing a 3.5% improvement over the baseline YOLOv12-N model, while maintaining a highly efficient architecture of only 2.56 million parameters and 6.3 GFLOPs. Several ablation studies confirm the core contribution of this work, namely that the performance gains in lightweight detection stem primarily from the transfer of the teacher model’s feature representations, rather than from modifications to the distillation algorithm or student model’s architecture, thus clarifying the importance of high quality feature transfer in cross-domain agricultural vision tasks. This approach provides a generalizable and efficient solution for real-time rose leaf disease detection in precision agriculture. Full article
(This article belongs to the Section Plant Modeling)
Show Figures

Figure 1

Back to TopTop