Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (868)

Search Parameters:
Keywords = multi-modal transformer

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 732 KiB  
Article
Accuracy-Aware MLLM Task Offloading and Resource Allocation in UAV-Assisted Satellite Edge Computing
by Huabing Yan, Hualong Huang, Zijia Zhao, Zhi Wang and Zitian Zhao
Drones 2025, 9(7), 500; https://doi.org/10.3390/drones9070500 - 16 Jul 2025
Abstract
This paper presents a novel framework for optimizing multimodal large language model (MLLM) inference through task offloading and resource allocation in UAV-assisted satellite edge computing (SEC) networks. MLLMs leverage transformer architectures to integrate heterogeneous data modalities for IoT applications, particularly real-time monitoring in [...] Read more.
This paper presents a novel framework for optimizing multimodal large language model (MLLM) inference through task offloading and resource allocation in UAV-assisted satellite edge computing (SEC) networks. MLLMs leverage transformer architectures to integrate heterogeneous data modalities for IoT applications, particularly real-time monitoring in remote areas. However, cloud computing dependency introduces latency, bandwidth, and privacy challenges, while IoT device limitations require efficient distributed computing solutions. SEC, utilizing low-earth orbit (LEO) satellites and unmanned aerial vehicles (UAVs), extends mobile edge computing to provide ubiquitous computational resources for remote IoTDs. We formulate the joint optimization of MLLM task offloading and resource allocation as a mixed-integer nonlinear programming (MINLP) problem, minimizing latency and energy consumption while optimizing offloading decisions, power allocation, and UAV trajectories. To address the dynamic SEC environment characterized by satellite mobility, we propose an action-decoupled soft actor–critic (AD-SAC) algorithm with discrete–continuous hybrid action spaces. The simulation results demonstrate that our approach significantly outperforms conventional deep reinforcement learning methods in convergence and system cost reduction compared to baseline algorithms. Full article
Show Figures

Figure 1

39 pages, 4034 KiB  
Article
Three-Dimensional Modeling and AI-Assisted Contextual Narratives in Digital Heritage Education: Course for Enhancing Design Skill, Cultural Awareness, and User Experience
by Yaojiong Yu and Weifeng Hu
Heritage 2025, 8(7), 280; https://doi.org/10.3390/heritage8070280 - 15 Jul 2025
Abstract
This study introduces an educational framework that merges 3D modeling with AI-assisted narrative interaction to apply digital technology in cultural heritage education, exemplified by an ancient carriage culture. Through immersive tasks and contextual narratives, the course notably improved learners’ professional skills and cultural [...] Read more.
This study introduces an educational framework that merges 3D modeling with AI-assisted narrative interaction to apply digital technology in cultural heritage education, exemplified by an ancient carriage culture. Through immersive tasks and contextual narratives, the course notably improved learners’ professional skills and cultural awareness. Experimental results revealed significant knowledge acquisition among participants post-engagement. Additionally, the user experience improved, with increased satisfaction in the narrative interaction design course. These enhancements led to heightened interest in cultural heritage and deeper knowledge acquisition. Utilizing Norman’s three-layer interaction model, Ryan’s contextual narrative theory, and Falk and Dierking’s museum learning experience model, the study developed a systematic course for multi-sensory design and contextual interaction, confirming the positive impact of multimodal interaction on learning outcomes. This research provides theoretical support for the digital transformation of cultural education and practical examples for educational practitioners and cultural institutions to implement in virtual presentations and online learning. Full article
(This article belongs to the Special Issue Progress in Heritage Education: Evolving Techniques and Methods)
20 pages, 5700 KiB  
Article
Multimodal Personality Recognition Using Self-Attention-Based Fusion of Audio, Visual, and Text Features
by Hyeonuk Bhin and Jongsuk Choi
Electronics 2025, 14(14), 2837; https://doi.org/10.3390/electronics14142837 - 15 Jul 2025
Abstract
Personality is a fundamental psychological trait that exerts a long-term influence on human behavior patterns and social interactions. Automatic personality recognition (APR) has exhibited increasing importance across various domains, including Human–Robot Interaction (HRI), personalized services, and psychological assessments. In this study, we propose [...] Read more.
Personality is a fundamental psychological trait that exerts a long-term influence on human behavior patterns and social interactions. Automatic personality recognition (APR) has exhibited increasing importance across various domains, including Human–Robot Interaction (HRI), personalized services, and psychological assessments. In this study, we propose a multimodal personality recognition model that classifies the Big Five personality traits by extracting features from three heterogeneous sources: audio processed using Wav2Vec2, video represented as Skeleton Landmark time series, and text encoded through Bidirectional Encoder Representations from Transformers (BERT) and Doc2Vec embeddings. Each modality is handled through an independent Self-Attention block that highlights salient temporal information, and these representations are then summarized and integrated using a late fusion approach to effectively reflect both the inter-modal complementarity and cross-modal interactions. Compared to traditional recurrent neural network (RNN)-based multimodal models and unimodal classifiers, the proposed model achieves an improvement of up to 12 percent in the F1-score. It also maintains a high prediction accuracy and robustness under limited input conditions. Furthermore, a visualization based on t-distributed Stochastic Neighbor Embedding (t-SNE) demonstrates clear distributional separation across the personality classes, enhancing the interpretability of the model and providing insights into the structural characteristics of its latent representations. To support real-time deployment, a lightweight thread-based processing architecture is implemented, ensuring computational efficiency. By leveraging deep learning-based feature extraction and the Self-Attention mechanism, we present a novel personality recognition framework that balances performance with interpretability. The proposed approach establishes a strong foundation for practical applications in HRI, counseling, education, and other interactive systems that require personalized adaptation. Full article
(This article belongs to the Special Issue Explainable Machine Learning and Data Mining)
Show Figures

Figure 1

21 pages, 4147 KiB  
Article
AgriFusionNet: A Lightweight Deep Learning Model for Multisource Plant Disease Diagnosis
by Saleh Albahli
Agriculture 2025, 15(14), 1523; https://doi.org/10.3390/agriculture15141523 - 15 Jul 2025
Abstract
Timely and accurate identification of plant diseases is critical to mitigating crop losses and enhancing yield in precision agriculture. This paper proposes AgriFusionNet, a lightweight and efficient deep learning model designed to diagnose plant diseases using multimodal data sources. The framework integrates RGB [...] Read more.
Timely and accurate identification of plant diseases is critical to mitigating crop losses and enhancing yield in precision agriculture. This paper proposes AgriFusionNet, a lightweight and efficient deep learning model designed to diagnose plant diseases using multimodal data sources. The framework integrates RGB and multispectral drone imagery with IoT-based environmental sensor data (e.g., temperature, humidity, soil moisture), recorded over six months across multiple agricultural zones. Built on the EfficientNetV2-B4 backbone, AgriFusionNet incorporates Fused-MBConv blocks and Swish activation to improve gradient flow, capture fine-grained disease patterns, and reduce inference latency. The model was evaluated using a comprehensive dataset composed of real-world and benchmarked samples, showing superior performance with 94.3% classification accuracy, 28.5 ms inference time, and a 30% reduction in model parameters compared to state-of-the-art models such as Vision Transformers and InceptionV4. Extensive comparisons with both traditional machine learning and advanced deep learning methods underscore its robustness, generalization, and suitability for deployment on edge devices. Ablation studies and confusion matrix analyses further confirm its diagnostic precision, even in visually ambiguous cases. The proposed framework offers a scalable, practical solution for real-time crop health monitoring, contributing toward smart and sustainable agricultural ecosystems. Full article
(This article belongs to the Special Issue Computational, AI and IT Solutions Helping Agriculture)
Show Figures

Figure 1

25 pages, 1657 KiB  
Review
Integrating New Technologies in Lipidology: A Comprehensive Review
by Carlos Escobar-Cervantes, Jesús Saldaña-García, Ana Torremocha-López, Cristina Contreras-Lorenzo, Alejandro Lara-García, Lucía Canales-Muñoz, Ricardo Martínez-González, Joaquín Vila-García and Maciej Banach
J. Clin. Med. 2025, 14(14), 4984; https://doi.org/10.3390/jcm14144984 - 14 Jul 2025
Viewed by 221
Abstract
Cardiovascular disease remains the world’s leading cause of death, and even when patients reach guideline low-density lipoprotein cholesterol targets, a substantial “residual risk” persists, underscoring the need for more nuanced assessment and intervention. At the same time, rapid advances in high-resolution lipidomics, connected [...] Read more.
Cardiovascular disease remains the world’s leading cause of death, and even when patients reach guideline low-density lipoprotein cholesterol targets, a substantial “residual risk” persists, underscoring the need for more nuanced assessment and intervention. At the same time, rapid advances in high-resolution lipidomics, connected point-of-care diagnostics, and RNA- or gene-based lipid-modifying therapies are transforming what clinicians can measure, monitor, and treat. Integrating multimodal data through machine learning algorithms capable of handling high-dimensional datasets has the potential to improve cardiovascular risk prediction and re-stratification compared to traditional models. This narrative review therefore sets out to (i) trace how these emerging technologies expand our understanding of dyslipidemia beyond the traditional lipid panel, (ii) examine their potential to enable earlier, more personalized and durable cardiovascular risk reduction, and (iii) highlight the scientific, regulatory and ethical hurdles that must be cleared before such innovations can deliver widespread, equitable benefit. Full article
Show Figures

Figure 1

23 pages, 3492 KiB  
Article
A Multimodal Deep Learning Framework for Accurate Biomass and Carbon Sequestration Estimation from UAV Imagery
by Furkat Safarov, Ugiloy Khojamuratova, Misirov Komoliddin, Xusinov Ibragim Ismailovich and Young Im Cho
Drones 2025, 9(7), 496; https://doi.org/10.3390/drones9070496 - 14 Jul 2025
Viewed by 52
Abstract
Accurate quantification of above-ground biomass (AGB) and carbon sequestration is vital for monitoring terrestrial ecosystem dynamics, informing climate policy, and supporting carbon neutrality initiatives. However, conventional methods—ranging from manual field surveys to remote sensing techniques based solely on 2D vegetation indices—often fail to [...] Read more.
Accurate quantification of above-ground biomass (AGB) and carbon sequestration is vital for monitoring terrestrial ecosystem dynamics, informing climate policy, and supporting carbon neutrality initiatives. However, conventional methods—ranging from manual field surveys to remote sensing techniques based solely on 2D vegetation indices—often fail to capture the intricate spectral and structural heterogeneity of forest canopies, particularly at fine spatial resolutions. To address these limitations, we introduce ForestIQNet, a novel end-to-end multimodal deep learning framework designed to estimate AGB and associated carbon stocks from UAV-acquired imagery with high spatial fidelity. ForestIQNet combines dual-stream encoders for processing multispectral UAV imagery and a voxelized Canopy Height Model (CHM), fused via a Cross-Attentional Feature Fusion (CAFF) module, enabling fine-grained interaction between spectral reflectance and 3D structure. A lightweight Transformer-based regression head then performs multitask prediction of AGB and CO2e, capturing long-range spatial dependencies and enhancing generalization. Proposed method achieves an R2 of 0.93 and RMSE of 6.1 kg for AGB prediction, compared to 0.78 R2 and 11.7 kg RMSE for XGBoost and 0.73 R2 and 13.2 kg RMSE for Random Forest. Despite its architectural complexity, ForestIQNet maintains a low inference cost (27 ms per patch) and generalizes well across species, terrain, and canopy structures. These results establish a new benchmark for UAV-enabled biomass estimation and provide scalable, interpretable tools for climate monitoring and forest management. Full article
(This article belongs to the Special Issue UAVs for Nature Conservation Tasks in Complex Environments)
Show Figures

Figure 1

25 pages, 4948 KiB  
Review
A Review of Visual Grounding on Remote Sensing Images
by Ziyan Wang, Lei Liu, Gang Wan, Wei Zhang, Binjian Zhong, Haiyang Chang, Xinyi Li, Xiaoxuan Liu and Guangde Sun
Electronics 2025, 14(14), 2815; https://doi.org/10.3390/electronics14142815 - 13 Jul 2025
Viewed by 180
Abstract
Remote sensing visual grounding, a pivotal technology bridging natural language and high-resolution remote sensing images, holds significant application value in disaster monitoring, urban planning, and related fields. However, it faces critical challenges due to the inherent scale heterogeneity, semantic complexity, and annotation scarcity [...] Read more.
Remote sensing visual grounding, a pivotal technology bridging natural language and high-resolution remote sensing images, holds significant application value in disaster monitoring, urban planning, and related fields. However, it faces critical challenges due to the inherent scale heterogeneity, semantic complexity, and annotation scarcity of remote sensing data. This paper first reviews the development history of remote sensing visual grounding, providing an overview of the basic background knowledge, including fundamental concepts, datasets, and evaluation metrics. Then, it categorizes methods by whether they employ large language models as a pedestal, and provides in-depth analyses of the innovations and limitations of Transformer-based and multimodal large language model-based methods. Furthermore, focusing on remote sensing image characteristics, it discusses cutting-edge techniques such as cross-modal feature fusion, language-guided visual optimization, multi-scale, and hierarchical feature processing, open-set expansion and efficient fine-tuning. Finally, it outlines current bottlenecks and proposes valuable directions for future research. As the first comprehensive review dedicated to remote sensing visual grounding, this work is a reference resource for researchers to grasp domain-specific concepts and track the latest developments. Full article
Show Figures

Figure 1

37 pages, 618 KiB  
Systematic Review
Interaction, Artificial Intelligence, and Motivation in Children’s Speech Learning and Rehabilitation Through Digital Games: A Systematic Literature Review
by Chra Abdoulqadir and Fernando Loizides
Information 2025, 16(7), 599; https://doi.org/10.3390/info16070599 - 12 Jul 2025
Viewed by 199
Abstract
The integration of digital serious games into speech learning (rehabilitation) has demonstrated significant potential in enhancing accessibility and inclusivity for children with speech disabilities. This review of the state of the art examines the role of serious games, Artificial Intelligence (AI), and Natural [...] Read more.
The integration of digital serious games into speech learning (rehabilitation) has demonstrated significant potential in enhancing accessibility and inclusivity for children with speech disabilities. This review of the state of the art examines the role of serious games, Artificial Intelligence (AI), and Natural Language Processing (NLP) in speech rehabilitation, with a particular focus on interaction modalities, engagement autonomy, and motivation. We have reviewed 45 selected studies. Our key findings show how intelligent tutoring systems, adaptive voice-based interfaces, and gamified speech interventions can empower children to engage in self-directed speech learning, reducing dependence on therapists and caregivers. The diversity of interaction modalities, including speech recognition, phoneme-based exercises, and multimodal feedback, demonstrates how AI and Assistive Technology (AT) can personalise learning experiences to accommodate diverse needs. Furthermore, the incorporation of gamification strategies, such as reward systems and adaptive difficulty levels, has been shown to enhance children’s motivation and long-term participation in speech rehabilitation. The gaps identified show that despite advancements, challenges remain in achieving universal accessibility, particularly regarding speech recognition accuracy, multilingual support, and accessibility for users with multiple disabilities. This review advocates for interdisciplinary collaboration across educational technology, special education, cognitive science, and human–computer interaction (HCI). Our work contributes to the ongoing discourse on lifelong inclusive education, reinforcing the potential of AI-driven serious games as transformative tools for bridging learning gaps and promoting speech rehabilitation beyond clinical environments. Full article
Show Figures

Graphical abstract

19 pages, 2641 KiB  
Article
MSFF-Net: Multi-Sensor Frequency-Domain Feature Fusion Network with Lightweight 1D CNN for Bearing Fault Diagnosis
by Miao Dai, Hangyeol Jo, Moonsuk Kim and Sang-Woo Ban
Sensors 2025, 25(14), 4348; https://doi.org/10.3390/s25144348 - 11 Jul 2025
Viewed by 240
Abstract
This study proposes MSFF-Net, a lightweight deep learning framework for bearing fault diagnosis based on frequency-domain multi-sensor fusion. The vibration and acoustic signals are initially converted into the frequency domain using the fast Fourier transform (FFT), enabling the extraction of temporally invariant spectral [...] Read more.
This study proposes MSFF-Net, a lightweight deep learning framework for bearing fault diagnosis based on frequency-domain multi-sensor fusion. The vibration and acoustic signals are initially converted into the frequency domain using the fast Fourier transform (FFT), enabling the extraction of temporally invariant spectral features. These features are processed by a compact one-dimensional convolutional neural network, where modality-specific representations are fused at the feature level to capture complementary fault-related information. The proposed method demonstrates robust and superior performance under both full and scarce data conditions, as verified through experiments on a publicly available dataset. Experimental results on a publicly available dataset indicate that the proposed model attains an average accuracy of 99.73%, outperforming state-of-the-art (SOTA) methods in both accuracy and stability. With only about 70.3% of the parameters of the SOTA model, it offers faster inference and reduced computational cost. Ablation studies confirm that multi-sensor fusion improves all classification metrics over single-sensor setups. Under few-shot conditions with 20 samples per class, the model retains 94.69% accuracy, highlighting its strong generalization in data-limited scenarios. The results validate the effectiveness, computational efficiency, and practical applicability of the model for deployment in data-constrained industrial environments. Full article
(This article belongs to the Special Issue Condition Monitoring in Manufacturing with Advanced Sensors)
Show Figures

Figure 1

21 pages, 12122 KiB  
Article
RA3T: An Innovative Region-Aligned 3D Transformer for Self-Supervised Sim-to-Real Adaptation in Low-Altitude UAV Vision
by Xingrao Ma, Jie Xie, Di Shao, Aiting Yao and Chengzu Dong
Electronics 2025, 14(14), 2797; https://doi.org/10.3390/electronics14142797 - 11 Jul 2025
Viewed by 131
Abstract
Low-altitude unmanned aerial vehicle (UAV) vision is critically hindered by the Sim-to-Real Gap, where models trained exclusively on simulation data degrade under real-world variations in lighting, texture, and weather. To address this problem, we propose RA3T (Region-Aligned 3D Transformer), a novel self-supervised framework [...] Read more.
Low-altitude unmanned aerial vehicle (UAV) vision is critically hindered by the Sim-to-Real Gap, where models trained exclusively on simulation data degrade under real-world variations in lighting, texture, and weather. To address this problem, we propose RA3T (Region-Aligned 3D Transformer), a novel self-supervised framework that enables robust Sim-to-Real adaptation. Specifically, we first develop a dual-branch strategy for self-supervised feature learning, integrating Masked Autoencoders and contrastive learning. This approach extracts domain-invariant representations from unlabeled simulated imagery to enhance robustness against occlusion while reducing annotation dependency. Leveraging these learned features, we then introduce a 3D Transformer fusion module that unifies multi-view RGB and LiDAR point clouds through cross-modal attention. By explicitly modeling spatial layouts and height differentials, this component significantly improves recognition of small and occluded targets in complex low-altitude environments. To address persistent fine-grained domain shifts, we finally design region-level adversarial calibration that deploys local discriminators on partitioned feature maps. This mechanism directly aligns texture, shadow, and illumination discrepancies which challenge conventional global alignment methods. Extensive experiments on UAV benchmarks VisDrone and DOTA demonstrate the effectiveness of RA3T. The framework achieves +5.1% mAP on VisDrone and +7.4% mAP on DOTA over the 2D adversarial baseline, particularly on small objects and sparse occlusions, while maintaining real-time performance of 17 FPS at 1024 × 1024 resolution on an RTX 4080 GPU. Visual analysis confirms that the synergistic integration of 3D geometric encoding and local adversarial alignment effectively mitigates domain gaps caused by uneven illumination and perspective variations, establishing an efficient pathway for simulation-to-reality UAV perception. Full article
(This article belongs to the Special Issue Innovative Technologies and Services for Unmanned Aerial Vehicles)
Show Figures

Figure 1

34 pages, 947 KiB  
Review
Multimodal Artificial Intelligence in Medical Diagnostics
by Bassem Jandoubi and Moulay A. Akhloufi
Information 2025, 16(7), 591; https://doi.org/10.3390/info16070591 - 9 Jul 2025
Viewed by 345
Abstract
The integration of artificial intelligence into healthcare has advanced rapidly in recent years, with multimodal approaches emerging as promising tools for improving diagnostic accuracy and clinical decision making. These approaches combine heterogeneous data sources such as medical images, electronic health records, physiological signals, [...] Read more.
The integration of artificial intelligence into healthcare has advanced rapidly in recent years, with multimodal approaches emerging as promising tools for improving diagnostic accuracy and clinical decision making. These approaches combine heterogeneous data sources such as medical images, electronic health records, physiological signals, and clinical notes to better capture the complexity of disease processes. Despite this progress, only a limited number of studies offer a unified view of multimodal AI applications in medicine. In this review, we provide a comprehensive and up-to-date analysis of machine learning and deep learning-based multimodal architectures, fusion strategies, and their performance across a range of diagnostic tasks. We begin by summarizing publicly available datasets and examining the preprocessing pipelines required for harmonizing heterogeneous medical data. We then categorize key fusion strategies used to integrate information from multiple modalities and overview representative model architectures, from hybrid designs and transformer-based vision-language models to optimization-driven and EHR-centric frameworks. Finally, we highlight the challenges present in existing works. Our analysis shows that multimodal approaches tend to outperform unimodal systems in diagnostic performance, robustness, and generalization. This review provides a unified view of the field and opens up future research directions aimed at building clinically usable, interpretable, and scalable multimodal diagnostic systems. Full article
Show Figures

Graphical abstract

28 pages, 1727 KiB  
Review
Computational and Imaging Approaches for Precision Characterization of Bone, Cartilage, and Synovial Biomolecules
by Rahul Kumar, Kyle Sporn, Vibhav Prabhakar, Ahab Alnemri, Akshay Khanna, Phani Paladugu, Chirag Gowda, Louis Clarkson, Nasif Zaman and Alireza Tavakkoli
J. Pers. Med. 2025, 15(7), 298; https://doi.org/10.3390/jpm15070298 - 9 Jul 2025
Viewed by 312
Abstract
Background/Objectives: Degenerative joint diseases (DJDs) involve intricate molecular disruptions within bone, cartilage, and synovial tissues, often preceding overt radiographic changes. These tissues exhibit complex biomolecular architectures and their degeneration leads to microstructural disorganization and inflammation that are challenging to detect with conventional imaging [...] Read more.
Background/Objectives: Degenerative joint diseases (DJDs) involve intricate molecular disruptions within bone, cartilage, and synovial tissues, often preceding overt radiographic changes. These tissues exhibit complex biomolecular architectures and their degeneration leads to microstructural disorganization and inflammation that are challenging to detect with conventional imaging techniques. This review aims to synthesize recent advances in imaging, computational modeling, and sequencing technologies that enable high-resolution, non-invasive characterization of joint tissue health. Methods: We examined advanced modalities including high-resolution MRI (e.g., T1ρ, sodium MRI), quantitative and dual-energy CT (qCT, DECT), and ultrasound elastography, integrating them with radiomics, deep learning, and multi-scale modeling approaches. We also evaluated RNA-seq, spatial transcriptomics, and mass spectrometry-based proteomics for omics-guided imaging biomarker discovery. Results: Emerging technologies now permit detailed visualization of proteoglycan content, collagen integrity, mineralization patterns, and inflammatory microenvironments. Computational frameworks ranging from convolutional neural networks to finite element and agent-based models enhance diagnostic granularity. Multi-omics integration links imaging phenotypes to gene and protein expression, enabling predictive modeling of tissue remodeling, risk stratification, and personalized therapy planning. Conclusions: The convergence of imaging, AI, and molecular profiling is transforming musculoskeletal diagnostics. These synergistic platforms enable early detection, multi-parametric tissue assessment, and targeted intervention. Widespread clinical integration requires robust data infrastructure, regulatory compliance, and physician education, but offers a pathway toward precision musculoskeletal care. Full article
(This article belongs to the Special Issue Cutting-Edge Diagnostics: The Impact of Imaging on Precision Medicine)
Show Figures

Figure 1

22 pages, 818 KiB  
Article
Towards Reliable Fake News Detection: Enhanced Attention-Based Transformer Model
by Jayanti Rout, Minati Mishra and Manob Jyoti Saikia
J. Cybersecur. Priv. 2025, 5(3), 43; https://doi.org/10.3390/jcp5030043 - 9 Jul 2025
Viewed by 367
Abstract
The widespread rise of misinformation across digital platforms has increased the demand for accurate and efficient Fake News Detection (FND) systems. This study introduces an enhanced transformer-based architecture for FND, developed through comprehensive ablation studies and empirical evaluations on multiple benchmark datasets. The [...] Read more.
The widespread rise of misinformation across digital platforms has increased the demand for accurate and efficient Fake News Detection (FND) systems. This study introduces an enhanced transformer-based architecture for FND, developed through comprehensive ablation studies and empirical evaluations on multiple benchmark datasets. The proposed model combines improved multi-head attention, dynamic positional encoding, and a lightweight classification head to effectively capture nuanced linguistic patterns, while maintaining computational efficiency. To ensure robust training, techniques such as label smoothing, learning rate warm-up, and reproducibility protocols were incorporated. The model demonstrates strong generalization across three diverse datasets, such as FakeNewsNet, ISOT, and LIAR, achieving an average accuracy of 79.85%. Specifically, it attains 80% accuracy on FakeNewsNet, 100% on ISOT, and 59.56% on LIAR. With just 3.1 to 4.3 million parameters, the model achieves an 85% reduction in size compared to full-sized BERT architectures. These results highlight the model’s effectiveness in balancing high accuracy with resource efficiency, making it suitable for real-world applications such as social media monitoring and automated fact-checking. Future work will explore multilingual extensions, cross-domain generalization, and integration with multimodal misinformation detection systems. Full article
(This article belongs to the Special Issue Cyber Security and Digital Forensics—2nd Edition)
Show Figures

Figure 1

28 pages, 2047 KiB  
Article
Multimodal-Based Non-Contact High Intraocular Pressure Detection Method
by Zibo Lan, Ying Hu, Shuang Yang, Jiayun Ren and He Zhang
Sensors 2025, 25(14), 4258; https://doi.org/10.3390/s25144258 - 8 Jul 2025
Viewed by 246
Abstract
This study proposes a deep learning-based, non-contact method for detecting elevated intraocular pressure (IOP) by integrating Scheimpflug images with corneal biomechanical features. Glaucoma, the leading cause of irreversible blindness worldwide, requires accurate IOP monitoring for early diagnosis and effective treatment. Traditional IOP measurements [...] Read more.
This study proposes a deep learning-based, non-contact method for detecting elevated intraocular pressure (IOP) by integrating Scheimpflug images with corneal biomechanical features. Glaucoma, the leading cause of irreversible blindness worldwide, requires accurate IOP monitoring for early diagnosis and effective treatment. Traditional IOP measurements are often influenced by corneal biomechanical variability, leading to inaccurate readings. To address these limitations, we present a multi-modal framework incorporating CycleGAN for data augmentation, Swin Transformer for visual feature extraction, and the Kolmogorov–Arnold Network (KAN) for efficient fusion of heterogeneous data. KAN approximates complex nonlinear relationships with fewer parameters, making it effective in small-sample scenarios with intricate variable dependencies. A diverse dataset was constructed and augmented to alleviate data scarcity and class imbalance. By combining Scheimpflug imaging with clinical parameters, the model effectively integrates multi-source information to improve high IOP prediction accuracy. Experiments on a real-world private hospital dataset show that the model achieves a diagnostic accuracy of 0.91, outperforming traditional approaches. Grad-CAM visualizations identify critical anatomical regions, such as corneal thickness and anterior chamber depth, that correlate with IOP changes. These findings underscore the role of corneal structure in IOP regulation and suggest new directions for non-invasive, biomechanics-informed IOP screening. Full article
(This article belongs to the Collection Medical Image Classification)
Show Figures

Figure 1

24 pages, 2692 KiB  
Article
Fine-Grained Dismantling Decision-Making for Distribution Transformers Based on Knowledge Graph Subgraph Contrast and Multimodal Fusion Perception
by Li Wang, Yujia Hu, Zhiyao Zheng, Guangqiang Wu, Jianqin Lin, Jialing Li and Kexin Zhang
Electronics 2025, 14(14), 2754; https://doi.org/10.3390/electronics14142754 - 8 Jul 2025
Viewed by 272
Abstract
Distribution transformers serve as critical nodes in smart grids, and management of their recycling plays a vital role in the full life-cycle management for electrical equipment. However, the traditional manual dismantling methods often exhibit a low metal recovery efficiency and high levels of [...] Read more.
Distribution transformers serve as critical nodes in smart grids, and management of their recycling plays a vital role in the full life-cycle management for electrical equipment. However, the traditional manual dismantling methods often exhibit a low metal recovery efficiency and high levels of hazardous substance residue. To facilitate green, cost-effective, and fine-grained recycling of distribution transformers, this study proposes a fine-grained dismantling decision-making system based on a knowledge graph subgraph comparison and multimodal fusion perception. First, a standardized dismantling process is designed to achieve refined transformer decomposition. Second, a comprehensive set of multi-dimensional evaluation metrics is established to assess the effectiveness of various recycling strategies for different transformers. Finally, through the integration of multimodal perception with knowledge graph technology, the system achieves automated sequencing of the dismantling operations. The experimental results demonstrate that the proposed method attains 99% accuracy in identifying recyclable transformers and 97% accuracy in auction-based pricing. The residual oil rate in dismantled transformers is reduced to below 1%, while the metal recovery efficiency increases by 40%. Furthermore, the environmental sustainability and economic value are improved by 23% and 40%, respectively. This approach significantly enhances the recycling value and environmental safety of distribution transformers, providing effective technical support for smart grid development and environmental protection. Full article
Show Figures

Figure 1

Back to TopTop