Saved Queries

Breast cancer remains one of the most significant challenges in modern oncology, while advances in artificial intelligence (AI) are creating new opportunities to improve diagnosis, prognosis, and treatment personalization. The aim of this review was to summarize current and emerging applications of AI in the comprehensive care of patients with breast cancer. This study was conducted as a structured narrative review with elements of integrative evidence synthesis based on publications retrieved from PubMed/MEDLINE, Scopus, Web of Science, and Embase. The review included studies evaluating machine learning and deep learning approaches, such as support vector machines, random forests, convolutional neural networks, Vision Transformers, foundation models, self-supervised learning, federated learning, and multimodal AI systems. The strongest clinical evidence currently concerns AI-supported mammographic screening, where large prospective and real-world studies suggest improvements in cancer detection and workflow efficiency. Applications involving MRI, ultrasound, histopathology, molecular prediction, treatment-response assessment, and treatment selection have shown promising performance, but most remain investigational because of limited prospective multicenter validation. Emerging approaches integrating imaging, pathological, molecular, and clinical data show considerable potential for precision oncology. AI may also support treatment selection, patient monitoring, and survivorship care. Despite promising results, widespread clinical implementation remains limited by challenges related to data heterogeneity, model interpretability, external validation, and integration into clinical workflows. Further prospective multicenter studies are required to establish the safety, reliability, and clinical utility of AI-driven systems in breast cancer care. Full article

(This article belongs to the Section Algorithms and Mathematical Models for Computer-Assisted Diagnostic Systems)

►▼ Show Figures

Figure 1

37 pages, 6376 KB

Open AccessArticle

PA-DFNet: Polarity-Aware Attention Network with Feature Dynamic Fusion for Point Cloud Classification and Semantic Segmentation

by Zhigang Su, Kai Jin, Jingtang Hao and Bing Han

Sensors 2026, 26(13), 4108; https://doi.org/10.3390/s26134108 (registering DOI) - 28 Jun 2026

Abstract

Point cloud segmentation constitutes a core task in 3D computer vision. However, prevailing models suffer from inherent limitations, including the absence of polarity correlation (i.e., spatial attribute-containing features derived from the separation and calculation of positive/negative correlations within point cloud query–key pairs), inefficient feature fusion, loss of fine-grained geometric details, and excessive computational complexity in self-attention mechanisms. These deficiencies constrain both the performance and practical deployment of such models. To address these challenges, the Polarity-Aware Attention and Feature Dynamic Fusion Network (PA-DFNet) is proposed in this paper. Built upon the PointNet++ framework, PA-DFNet replaces the original Multilayer Perceptron (MLP) with a Polarity-Aware Network (PAN). The PAN enhances key semantic interactions by explicitly separating positive and negative correlations from point cloud query–key pairs, generates adaptive neighborhood weights via integration with a linear attention mechanism, and introduces a learnable power function to perform nonlinear scaling of attention, thereby improving the model’s structural perception capability. Additionally, a Point Cloud Feature Dynamic Fusion (PFF) module is proposed to enable adaptive fusion of encoder–decoder features, preserving rich geometric details. Experimental results demonstrate that, on the ModelNet40 classification task, the overall accuracy (OA) and mean accuracy (mAcc) of PA-DFNet are improved by 2.4% and 2.2%, respectively, compared with PointNet++. On the S3DIS semantic segmentation task, PA-DFNet achieves an mAcc of 72.8% and a mean Intersection over Union (mIoU) of 66.2%, while exhibiting a shorter training time than Point Transformer. In summary, PA-DFNet achieves an optimal balance between segmentation performance and efficiency by effectively controlling the number of model parameters and computational complexity. Full article

(This article belongs to the Section Sensor Networks)

31 pages, 5285 KB

Open AccessArticle

Power and Phase Fusion Spectrogram with Three-Dimensional Convolution and Vision Transformer for Seizure Detection

by Yuyue Jiang, Zhuohan Wang, Yazhou Zhao, Weidong Zhou and Guoyang Liu

Diagnostics 2026, 16(13), 2012; https://doi.org/10.3390/diagnostics16132012 (registering DOI) - 27 Jun 2026

Abstract

Background/Objectives: Reliable detection of epileptic seizures using electroencephalography (EEG) is crucial for clinical diagnosis and for alleviating clinicians’ workload. However, existing studies still make insufficient use of phase information, and the synergy between local time–frequency pattern extraction and global dependency modeling remains limited. Methods: We propose a seizure detection framework based on the continuous wavelet transform (CWT), a three-dimensional convolutional neural network (3D-CNN), and a vision transformer (ViT). First, multichannel EEG segments are preprocessed, after which CWT is used to generate power spectrograms and phase spectrograms. These representations are then fused along the depth dimension into a unified power-phase volume and fed into a hybrid network composed of a 3D-CNN feature extractor and a single-layer ViT encoder to jointly learn local time–frequency–channel coupling patterns and higher-level global dependencies. Finally, seizure detection is completed by combining moving-average filtering, thresholding, and collar correction. Results: On the public CHB-MIT dataset and the clinical SH-SDU dataset, the proposed method achieved average segment-level sensitivities of 98.68% and 92.05%, specificities of 98.33% and 97.53%, accuracies of 98.49% and 96.37%, and AUC values of 97.26% and 92.89%, respectively. In event-level evaluation, the average sensitivities were 99.13% and 96.08%, with false detection rates of 0.88/h and 0.69/h, respectively. Further multi-stage ablation experiments together with t-SNE and Grad-CAM visualizations provided qualitative and experimental support for the design rationale of the joint power-phase input and the hybrid 3D-CNN-ViT architecture. Conclusions: The proposed framework effectively exploits the complementary discriminative value of power and phase information in epileptic EEG and demonstrates strong detection performance under patient-specific evaluation on both public and clinically collected datasets. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

25 pages, 31983 KB

Open AccessArticle

Wide + Tiles Vision Transformer Framework for Smartphone-Based Grassland Biomass Prediction in Heterogeneous Field Conditions

by Ranida Arystanova, Darkhan Zeinulla, Gulnara Kabzhanova, Anuarbek Bissembayev, Roza Bekseitova, Dani Sarsekova, Bakhbayeva Saule, Asset Arystanov, Janay Sagin and Margulan Nurtay

Agriculture 2026, 16(13), 1401; https://doi.org/10.3390/agriculture16131401 (registering DOI) - 27 Jun 2026

Viewed by 4

Abstract

This study addresses the issue of accurate and rapid aboveground biomass estimation in rangeland ecosystems, as traditional grazing methods are labor-intensive, while modern remote sensing techniques often require expensive equipment and controlled conditions. The goal of this work is to develop an efficient and accessible approach for biomass estimation of natural pastures based on ground-level RGB images captured with smartphones. For this purpose, a dataset consisting of 1196 field images and corresponding biomass values collected from 40 districts in southern Kazakhstan was used, and a wide + tiles architecture based on the DINOv3 model of Vision Transformer was proposed. The model utilized attention pooling and feature fusion mechanisms to integrate both global and local features, and various preprocessing and augmentation strategies were comparatively examined. Experimental results demonstrated that the proposed method exhibits high accuracy (with the best result being R² = 0.733, MAE ≈ 0.779 c/ha), where the DINOv3 model showed clear advantages over ConvNeXtV2. Furthermore, the impact of preprocessing strategies was minimal, and the importance of high-resolution images was clearly established. The obtained results show that the proposed method performs consistently under heterogeneous field conditions and allows for reliable biomass estimation without the need for specialized equipment. This makes it a practical tool for monitoring pastures, planning forage supply, and supporting agronomic decision-making. Full article

(This article belongs to the Topic Advances in Smart Agriculture with Remote Sensing as the Core and Its Applications in Crops Field, 2nd Edition)

►▼ Show Figures

Figure 1

19 pages, 4246 KB

Open AccessArticle

Implementation of Image-Based Artificial Intelligence Is Associated with Increased Case Volume in a High-Acuity, 15-Room Cardiothoracic Operating Suite at a Tertiary Academic Hospital

by Ngoc-Anh A. Nguyen, Grace Lee, Sarah Sossong, Jannika V. Machnik, Sarah Pletcher and Roberta Schwartz

J. Imaging 2026, 12(7), 283; https://doi.org/10.3390/jimaging12070283 (registering DOI) - 27 Jun 2026

Viewed by 24

Abstract

Background: Operating rooms generate substantial visual data that is rarely captured systematically. Image-based AI (IBAI) systems using computer vision offer a new approach to real-time perioperative workflow monitoring, but evidence of their impact on surgical case volume remains limited. The aim of this study was to evaluate the association between deployment of an IBAI system and monthly surgical case volume in a high-acuity cardiothoracic operating suite, using synthetic control with difference-in-differences estimation. Methods: We deployed an IBAI system with wall-mounted cameras and a YOLO-based (You Only Look Once) object detection model coupled with a transformer-based event detector in a 15-room cardiothoracic suite at Houston Methodist Hospital (HMH), the tertiary academic hospital of Houston Methodist health system. The deployment was conducted under an IRB-determined quality improvement framework with patient consent for ambient video capture, defined retention limits, and restricted access to recordings. Over a 16-month period spanning 6 months pre-deployment and 10 months post-deployment, the system monitored 5417 surgical cases and automatically detected additional perioperative events including patient entry, draping, and room turnover. Using a synthetic control methodology, we compared post-deployment outcomes at the intervention site against a weighted combination drawn from a pool of 11 Houston Methodist sites that did not yet implement IBAI (116,098 cases across the comparison sites; 121,515 cases in the full analytic dataset). Results: The synthetic control analysis with difference-in-differences estimation showed a statistically significant increase of approximately 25 cases per month (95% CI 8.3 to 41.0; p < 0.01; Bonferroni-adjusted p < 0.05), corresponding to a 7% increase in monthly case volume relative to baseline. Conclusions: Our findings suggest that IBAI can meaningfully improve OR efficiency and support data-driven perioperative management. Future work should evaluate whether case volume gains generalize across other surgical specialties, assess changes in operational outcomes such as turnover time and first-case on-time starts, and examine clinicians’ perceptions of IBAI. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

►▼ Show Figures

Figure 1

22 pages, 3822 KB

Open AccessArticle

Research on Fish Recognition in Complex Backgrounds Using ViT-Enhanced YOLOv11

by Xiangshuo He, Shenglong Yang, Wei Wang, Kai Zhu, Shengmao Zhang, Yang Dai, Keji Jiang and Fei Wang

Fishes 2026, 11(7), 385; https://doi.org/10.3390/fishes11070385 (registering DOI) - 27 Jun 2026

Viewed by 94

Abstract

To address the common challenges in fish recognition tasks under complex backgrounds, such as target overlap, occlusion, and chaotic spatial distribution, an improved YOLOv11 recognition model based on the Vision Transformer (ViT) is proposed. Traditional Convolutional Neural Networks (CNNs) and the YOLO series models are limited by their local receptive fields, making it difficult to capture global semantic correlations in dense and heavily occluded fish target detection, which often leads to feature confusion and false detections. By embedding ViT modules at the beginning of the Head and at the end of the Backbone of YOLOv11, the self-attention mechanism of ViT is leveraged to capture global dependencies in the image, re-integrate and enhance multi-scale features from the Backbone and Neck, thus constructing two improved ViT models. Comparative experiments are conducted on the FishRecognition-2025 dataset, which contains 955 high-resolution RGB images covering nine common coastal fish species across four categories: single fish species, multiple classes separated, slight overlap of multiple fish species, and severe overlap of multiple fish species. Under identical training strategies and evaluation metrics, the four models—original YOLOv11, traditional CNN, ViT-Head, and ViT-Backbone—are compared. The results show that the second improved ViT model (with ViT placed at the end of the Backbone) outperformed the first improved model (with ViT placed at the beginning of the Head) in terms of mAP50 and mAP50-95. Moreover, its overall accuracy across the four test data categories (single fish species, multiple classes separated, slight overlap of multiple fish species, and severe overlap of multiple fish species) surpassed that of YOLOv11, CNN, and the first ViT model. Although its accuracy in single fish species and multiple classes separated scenarios was slightly lower than that of the CNN model, it demonstrated significant advantages in scenarios with slight overlap of multiple fish species and severe overlap of multiple fish species. These findings validate the effectiveness of the ViT module in global feature modeling and adaptability to complex backgrounds, suggesting a promising technical direction for future real-time recognition in fishery field operations. Full article

(This article belongs to the Special Issue Application of Remote Sensing to Fisheries)

►▼ Show Figures

Figure 1

21 pages, 5044 KB

Open AccessReview

Scoping Review of Recent Trends and Challenges in Artificial Intelligence Based Medical Ultrasound Denoising

by Mizanu Zelalem Degu, Midhila Madhusoodanan, Medha Chippa and Abhilash Hareendranathan

AI Med. 2026, 1(3), 18; https://doi.org/10.3390/aimed1030018 (registering DOI) - 26 Jun 2026

Viewed by 52

Abstract

(1) Background: Ultrasound (US) imaging is widely used in clinical diagnosis but is often degraded by speckle noise, which reduces image quality and can hinder interpretation. Deep learning (DL) has emerged as a promising approach for US denoising, yet its clinical applicability remains unclear. (2) Methods: A scoping review of studies published in the last four years on DL-based US denoising was conducted following PRISMA-ScR guidelines. Searches were performed in IEEE-Xplore, PubMed, ScienceDirect, Scopus, Web of Science, and Google Scholar. Data was extracted on anatomy, noise type, learning paradigm, network architecture, datasets, evaluation metrics, and performance outcomes. (3) Results: From 951 records retrieved, 36 studies were included. Most focused on breast, fetal, cardiac, and abdominal US. Convolutional neural networks (CNNs), particularly U-Net, were the most common approach, while generative adversarial network, vision transformers, and variational autoencoders were less explored. Reported peak signal-to-noise ratio ranged from 30 to 45 dB and structural similarity index measure from 0.85 to 0.97. Most studies (34 out of 36) relied on synthetic noise, 2D images and paired datasets, with limited evaluation on real clinical images. (4) Conclusion: Supervised CNN-based methods dominate US image denoising, but clinical translation is limited by reliance on synthetic data. Non-paired and no-ground-truth learning approaches remain underexplored despite their suitability for US imaging. Progress is further hindered by inconsistent evaluation protocols, limited robustness assessment on clinical tasks, and restricted dataset access. Future work should focus on standardized clinically meaningful evaluation, openly available datasets, and clinical validation to improve reliability and generalizability of DL-based US denoising methods. Full article

►▼ Show Figures

Figure 1

23 pages, 5223 KB

Open AccessArticle

A Multi-Task Deep Learning Framework for Characterizing Beating Behavior and Synchrony in Cardiomyocyte Clusters

by Tianxin Wang, Xinjie Liu, Fangshuo Zhang, Qianwen Guo, Xiaoyu Li, Yuanyuan Sun and Jingjing Xu

Bioengineering 2026, 13(7), 742; https://doi.org/10.3390/bioengineering13070742 (registering DOI) - 25 Jun 2026

Viewed by 136

Abstract

Beat-level synchrony among cardiomyocyte clusters is a critical indicator of cardiac electromechanical function. Traditional invasive approaches have substantial limitations, and conventional computer vision methods are poorly suited for resolving densely packed, adherent clusters. To address these challenges, we developed an analysis framework to characterize the beating characteristics of cardiomyocyte clusters from microscopic imaging data. Specifically, we propose CardioSegNet, a multi-task deep learning model that combines attention mechanisms with three prediction heads (semantic segmentation, contour detection, and distance transform), followed by a watershed algorithm to achieve high-accuracy cluster-level segmentation of cardiomyocyte clusters. The Pixel-Difference method is applied to extract time-series beating signals from each segmented cluster and compute several dynamic parameters, including beating amplitude, period, frequency, and the Beat Rate Irregularity (BRI). We further introduce PeriodAwareNAPTD_ij to quantify the beating synchrony among different clusters. Our experimental results show that CardioSegNet achieves a Dice coefficient of 0.8868 and an HD95 of 93.02 µm on an independent test set, demonstrating strong segmentation performance. The cardiomyocyte populations are not uniformly globally synchronized; rather, they consist of multiple local subgroups with high internal synchrony, and the degree of synchronization between clusters is positively correlated with their physical distance. This label-free analytical pipeline provides an efficient tool for myocardial function evaluation and cardiotoxicity screening in vitro. Full article

(This article belongs to the Section Biosignal Processing)

22 pages, 4062 KB

Open AccessArticle

WGTMM: WGAN with Transformer Feature Matching for Generating fMRI Data in MCI Patients

by Bocheng Wang

Brain Sci. 2026, 16(7), 665; https://doi.org/10.3390/brainsci16070665 (registering DOI) - 25 Jun 2026

Viewed by 163

Abstract

Background: The emergence of generative adversarial networks has laid the groundwork for data augmentation, addressing challenges of missing training data in various research scenarios. However, simulating functional magnetic resonance imaging (fMRI) data remains particularly challenging, especially for populations with varying degrees of mild cognitive impairment (MCI). Effectively characterizing and capturing the mechanisms of brain function variations poses a critical issue in cognitive neuroscience. This study aims to simulate and analyze synthetic fMRI blood-oxygen-level-dependent (BOLD) signals across four cognitive stages: healthy control (HC), early MCI (EMCI), late MCI (LMCI), and Alzheimer’s disease (AD). Methods: We propose WGTMM, an innovative method that integrates the Vision Transformer for fMRI (VTFF) into a generative adversarial network architecture. Crucially, WGTMM directly generates fMRI time-series data from pink noise rather than modeling in a latent space, thereby preserving rich temporal dynamics. The framework incorporates a Wasserstein GAN (WGAN) with feature matching to enhance generation quality and mitigate mode collapse. Results: demonstrate that WGTMM-generated fMRI data exhibit lower Kullback-Leibler (KL) divergence compared to traditional GAN and WGAN models, indicating a closer resemblance to real datasets from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Furthermore, when applied to data augmentation, the synthetic data substantially improve multi-class classification performance. Conclusions: WGTMM not only enriches training datasets but also provides new insights into spatial biomarkers of cognitive decline. By leveraging VTFF to investigate class token attention patterns across 360 brain regions, this study reveals monotonic weight variations along disease stages in key cortical areas, including the rostral Area 6, the primary sensory cortex, and PFm near Wernicke’s area, offering a fine-grained exploration of disease progression. Full article

(This article belongs to the Section Computational Neuroscience, Neuroinformatics, and Neurocomputing)

►▼ Show Figures

Figure 1

27 pages, 1221 KB

Open AccessArticle

Digital and Remote Interventions for Musculoskeletal Aging: Real-Time Muscle Strain Severity Detection Using Artificial Intelligence

by Zulaikha Fatima, Abdullah, Nida Hafeez, Rolando Quintero Téllez, Miguel Jesús Torres Ruiz, Carlos Guzmán Sánchez Mejorada, Miguel Félix Mata-Rivera and Roberto Zagal-Flores

Biosensors 2026, 16(7), 354; https://doi.org/10.3390/bios16070354 - 25 Jun 2026

Viewed by 160

Abstract

As global populations grow and technology advances, daily life is increasingly shaped by digital systems such as computers and smart devices. However, prolonged device use has contributed to increasing physical and mental health concerns, particularly those associated with poor sitting posture. Posture-related strain is frequently overlooked and contributes to musculoskeletal discomfort, including back, neck, shoulder, and wrist pain, and may also be associated with sleep disturbances and elevated stress levels. To the best of our knowledge and based on the existing literature, this is the first study to introduce a machine learning-based framework for advanced muscle strain severity classification using Internet of Things (IoT) devices that integrates posture monitoring and muscle strain detection into a unified low-cost framework ($23 hardware cost). The primary objective of this work is accurate classification of muscle strain severity, while real-time alerts serve as a secondary ergonomic feedback mechanism. Specifically, this study makes four major contributions. First, we created a novel dataset through real-time acquisition of electromyography (EMG) and posture signals from participants in hospital and industrial environments, capturing diverse muscle strain patterns validated against clinical assessment procedures. Second, we designed a two-part hardware architecture consisting of posture detection (PD) and strain detection (SD) modules using a NodeMCU ESP8266, HC-SR04 ultrasonic sensor, EMG sensor, and buzzer for real-time physiological monitoring, incorporating EMG-specific preprocessing including band-pass filtering, rectification, and RMS smoothing. Third, we proposed and evaluated a hybrid machine learning framework integrating Vision Transformer (ViT) and XGBoost to classify strain severity into three study-specific categories: baseline (EMG RMS < 40 µV), compensatory strain (40–59 µV), and overload (≥60 µV). These categories were used as reproducible severity proxies for machine learning annotation and should not be interpreted as universal biomarkers of structural tissue damage. Finally, the proposed framework achieved a classification accuracy of 99.0% (95% CI: 98.5–99.5%) with an inference latency of 15.2 ms. Full article

(This article belongs to the Special Issue Biosensors for Physiological Signal Monitoring)

►▼ Show Figures

Figure 1

38 pages, 1879 KB

Open AccessSystematic Review

Precision Livestock Farming and Biomedical Engineering: pAssessing Feed Quality, Animal Health, and Behavior Using Machine Learning for Sensor Data

by Nikolay Kiktev, Danylo Hradoboiev, Mykola Pravilov, Ievgen Antypov, Yuliia Meish, Liliia Stroianovska, Pawel Kielbasa and Taras Hutsol

Sensors 2026, 26(13), 4015; https://doi.org/10.3390/s26134015 - 24 Jun 2026

Viewed by 198

Abstract

This review analyses and logically structures modern intelligent sensor technologies in the context of animal husbandry, feed production, and veterinary medicine. The main research discussed in the article focuses on machine learning based on modern neural network models, computer vision, and sensor systems that are transforming the methods for assessing the health, behavior, and nutrition of farm animals. The first part examines modern approaches to quality control and optimization of mineral and vitamin premixes, including visual inspection using visual sensors and neural networks. Key roles are played by precise dosing, component stability (minerals, vitamins), and the transition to more bioefficient organic forms of micronutrients to reduce environmental impact. Improvements in feed and premix production are analyzed, including automation, energy management, and the use of machine learning for non-destructive quality control, defect detection, mixing homogeneity assessment, and vitamin stability prediction. The second part analyzes methods for animal location and behavior detection. This article presents computer vision-based systems, including modifications of YOLO, for automatically tracking and classifying key behavioral patterns (lying down, standing, feeding, and aggression) in cattle and pigs, even in crowded conditions. It also discusses the use of ultra-wideband (UWB) systems and accelerometers combined with machine learning for high-precision positioning and detection of specific behavioral anomalies, such as lameness and playfulness. The third section focuses on the application of machine learning in veterinary diagnostics, including the automated interpretation of medical images (X-ray, ultrasound, and MRI) as sensor data streams for the diagnosis of cardiovascular, oncological, and orthopedic diseases in farm and small animals. Furthermore, the article examines the use of machine learning models for proactive disease diagnosis in farm animals and poultry based on multimodal data and image analysis. Considerable attention is given to methods and tools for radiometric diagnosis of animal diseases at an early stage using microwave sensors, as well as laser therapy and surgery in veterinary medicine. The review concludes that the integration of intelligent systems enables a transition to data-driven livestock management, significantly improving animal welfare and, consequently, the efficiency and sustainability of agricultural production. Full article

(This article belongs to the Section Smart Agriculture)

25 pages, 8611 KB

Open AccessArticle

Enhancing Plunger Lift Anomaly Detection: A Vision Transformer-Based Approach Leveraging Pretrained Models and Graphic Data Augmentation

by Jianjun Zhu, Yujun Liu, Haoyu Wang, Mai Chen, Nan Li, Guangqiang Cao, Ruizhi Zhong and Haiwen Zhu

Processes 2026, 14(13), 2045; https://doi.org/10.3390/pr14132045 - 24 Jun 2026

Viewed by 127

Abstract

Plunger lift systems are vital for optimizing production in gas wells, but their performance can be compromised by various operational anomalies. Traditional diagnostic methods and conventional convolutional neural network (CNN) approaches often struggle with the complex, transient data from these systems, particularly in capturing long-range temporal dependencies and generalizing from limited, imbalanced datasets. This study presents an enhanced diagnostic framework for plunger lift anomaly detection by leveraging the strengths of a pre-trained Vision Transformer (ViT). The methodology transforms one-dimensional time-series pressure data into two-dimensional image representations using the element-wise summation of Gramian Angular Summation Field (GASF) and Gramian Angular Difference Field (GADF), which simultaneously preserves global operational trends and local transient dynamics for vision model analysis. The ViT model, initialized with pre-trained weights, is further optimized using Bayesian optimization (BO) for hyperparameter tuning, and a tailored data augmentation pipeline is employed to improve robustness. Comparative evaluations demonstrate that the proposed ViT-based approach, particularly the ViT + GAF + BO model, significantly outperforms baseline CNN models and their optimized variants, achieving the highest Precision, Recall, and F1-score, with an F1-score of 0.93. Visualizations using t-SNE confirm the ViT’s superior capability in learning discriminative features, showcasing well-separated clusters for different operational conditions compared to CNNs. This research underscores the potential of pre-trained ViTs combined with appropriate data representation and optimization techniques for achieving accurate and reliable anomaly detection in plunger lift systems. Full article

(This article belongs to the Special Issue Hybrid Artificial Intelligence for Smart Process Control)

►▼ Show Figures

Figure 1

16 pages, 1370 KB

Open AccessArticle

CPM-XNet: Annotation-Efficient Deep-Learning Framework for Detecting Tuberculosis in Chest X-Ray Images

by Tzu-Chin Yang, Bing-Yen Wang, Jin-Yu Li, Yu-Kang Chang, Shih-Huan Lin, Chi-Chang Chang and Yen-Wei Chu

Diagnostics 2026, 16(13), 1947; https://doi.org/10.3390/diagnostics16131947 - 23 Jun 2026

Viewed by 175

Abstract

Background/Objectives: Chest X-ray (CXR) images are a widely used first-line screening tool for pulmonary tuberculosis (TB) detection but are difficult to interpret, which has increased demand for an automated screening tool. Deep-learning-based computer-aided diagnosis systems have demonstrated a classification performance comparable to that of trained radiologists, but they rely on dense annotations such as lesion-level or pixel-level labels, which are costly and difficult to obtain in routine clinical workflows. We developed CPM-XNet, an annotation-efficient framework for lesion-annotation-free downstream TB classification in CXR images. Methods: CPM-XNet incorporates a compressing–projecting mask (CPM) to provide soft lung-aware modulation while preserving global contextual information. The CPM-modulated images are then used for downstream classification with multiple convolutional neural network backbones and a vision transformer baseline. Results: Experiments were conducted using an internal hospital dataset and public TB datasets, and CPM-XNet showed improved performance compared with baseline models trained on unmodulated images. In a repeated-seed evaluation of the main ResNet-101 configuration on the Tung cohort, CPM-ResNet101 showed higher and more stable performance than the non-CPM counterpart and demonstrated significant paired improvement using McNemar’s exact test. An ablation analysis indicated that CPM modulation was the main contributor to performance improvement while data augmentation and the classifier architecture further influenced the overall robustness. Conclusions: CPM-XNet provides an annotation-efficient strategy for lesion-annotation-free downstream TB classification in CXR images. The findings support preliminary technical feasibility, although larger, naturally imbalanced, cross-institutional validation is required before clinical deployment can be inferred. Full article

(This article belongs to the Special Issue Advances in Disease Prediction—2nd Edition)

►▼ Show Figures

Figure 1

18 pages, 1889 KB

Open AccessArticle

Vision Transformer with Spatial 2D Multi-Channel Tokens

by Sirui Zheng, Yu Li, Zhongxiang Zhang and Dequn Zhao

Electronics 2026, 15(13), 2752; https://doi.org/10.3390/electronics15132752 - 23 Jun 2026

Viewed by 175

Abstract

Vision Transformer (ViT) has been widely adopted in the computer vision community. However, the standard ViT often contains many parameters, usually performs poorly when trained from scratch on medium-scale datasets, and does not explicitly preserve the local spatial and channel-wise structures within each token. This work proposes a novel model called the Token-Shared Convolutional Projection Vision Transformer (TSCP-ViT). The core idea of TSCP-ViT is to integrate convolutional layers into the multi-head attention mechanism and to apply the same convolutional operation independently to each token, where each token exhibits spatial 2D multi-channel characteristics. In addition, this work introduces a Transformer decoder immediately after each Transformer encoder, enabling the classification tokens to aggregate information from all tokens and be updated using statistical information. Moreover, a trainable Non-Reversing Gate GELU (NRG-GELU) activation is also proposed. Comparative experiments on CIFAR-100, Food-101, and ImageNet100 show that, under comparable parameter counts and without pretraining or knowledge distillation, TSCP-ViT substantially surpasses ViT, outperforms CvT, outperforms ResNet on Food-101, and approaches ResNet on CIFAR-100 and ImageNet100, although with considerably higher FLOPs. Full article

(This article belongs to the Topic Transformer and Deep Learning Applications in Image Processing)

►▼ Show Figures

Figure 1

17 pages, 7171 KB

Open AccessArticle

V3Reg: Model Integrating Visual Information for Extreme Low Overlap Point Cloud Registration

by Yaxiong Li, Yifan Hou, Qisong Yang and Dongdong Guan

Remote Sens. 2026, 18(12), 2050; https://doi.org/10.3390/rs18122050 - 21 Jun 2026

Viewed by 165

Abstract

Extremely low overlap leads to severely scarce local geometric correspondences across frame pairs. Pure geometric descriptors—encoding merely low-level shape signatures—inherently fail to impose sufficient constraints for reliable transformation estimation when matches become critically sparse, rendering registration fundamentally fragile. While recent red-green-blue-depth (RGB-D) attempts have explored visual augmentation, they predominantly rely on low-level chromatic statistics or shallow convolutional neural network (CNN) features, underutilizing the rich hierarchical semantics inherent in RGB imagery. We present V3Reg, a robust registration framework that pioneers the integration of large-scale vision foundation models (DINOv3) with adaptive cross-modal fusion. Specifically, we extract mid-to-deep semantic features (Layer 11) from DINOv3 to transcend low-level texture limitations, and propose a Task-Aware Channel-Wise Gated Adaptive Fusion (TACGAF) module that dynamically calibrates geometric-visual contributions via registration-error-guided channel-wise gating. To rigorously evaluate ultra-low-overlap robustness, we reconstruct RGBD-ZeroMatch, a benchmark with controllable overlap ratios ranging from 1% to 20%. Extensive experiments demonstrate that V3Reg achieves 99.6% Feature Matching Recall and 96.3% Registration Recall on standard benchmarks. Notably, it maintains 50.2% Registration Recall at merely 5% overlap, outperforming prior methods by over 18 percentage points. Full article

(This article belongs to the Special Issue Point Cloud Data Analysis and Applications)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 74.

Go to page 1 2 3 4 5

Search Results (3,679)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI