Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (775)

Search Parameters:
Keywords = Swin Transformer

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
29 pages, 6113 KB  
Article
Intensity-Texture Enhanced Swin Fusion for Bacterial Contamination Detection in Alocasia Explants
by Jiatian Liu, Wenjie Chen and Xiangyang Yu
Sensors 2026, 26(7), 2103; https://doi.org/10.3390/s26072103 (registering DOI) - 28 Mar 2026
Abstract
Non-destructive and automated detection of bacterial contamination is a critical prerequisite for ensuring high efficiency production and quality control in plant tissue culture. In this study, we developed a multispectral image acquisition system for Alocasia explants and proposed a novel image fusion model, [...] Read more.
Non-destructive and automated detection of bacterial contamination is a critical prerequisite for ensuring high efficiency production and quality control in plant tissue culture. In this study, we developed a multispectral image acquisition system for Alocasia explants and proposed a novel image fusion model, termed Intensity-Texture enhanced Swin Fusion (ITSF). The ITSF framework employs convolutional neural networks to extract texture and intensity features from visible and near-infrared channels. Subsequently, a Swin Transformer-based module is integrated to model long-range spatial dependencies, ensuring cross-domain integration between the texture and intensity features. We formulated a composite loss function to guide the fusion process toward optimal results. This objective function integrates texture loss, entropy weighted structural similarity index (SSIM) and intensity aware dynamic gain guided loss. Experimental results demonstrate that the proposed method significantly enhances the visual saliency of bacteria and achieves superior quantitative performance across a comprehensive range of objective image fusion metrics. The detection performance reached a mean Average Precision (mAP50) of 0.949 with the fused images, satisfying industrial requirements for high-precision inspection, which provides a critical technical solution for the industrialization of automated micropropagation. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

37 pages, 3540 KB  
Article
A Multimodal Time-Frequency Fusion Architecture for FaultDiagnosis in Rotating Machinery
by Hui Wang, Congming Wu, Yong Jiang, Yanqing Ouyang, Chongguang Ren, Xianqiong Tang and Wei Zhou
Appl. Sci. 2026, 16(7), 3269; https://doi.org/10.3390/app16073269 - 27 Mar 2026
Abstract
Accurate fault diagnosis of rotating machinery in complex industrial environments demands an optimal trade-off between feature representation capability and computational efficiency. Existing single-modality models relying solely on 1D time-series signals or heavy 2D time-frequency images often fail to simultaneously capture high-frequency transient impacts [...] Read more.
Accurate fault diagnosis of rotating machinery in complex industrial environments demands an optimal trade-off between feature representation capability and computational efficiency. Existing single-modality models relying solely on 1D time-series signals or heavy 2D time-frequency images often fail to simultaneously capture high-frequency transient impacts and long-range degradation trends. CLiST (Complementary Lightweight Spatiotemporal Network), a novel lightweight multimodal framework driven by time-frequency fusion, was proposed to overcome this limitation. The architecture of CLiST employs a synergistic dual-stream design: a LightTS module efficiently extracts global operational trends from 1D vibration signals with linear complexity, while a structurally pruned LiteSwin integrated with Triplet Attention captures local high-frequency textures from 2D continuous wavelet transform (CWT) images. This mechanism establishes explicit cross-dimensional dependencies, effectively eliminating feature blind spots without excessive computational overhead. The experimental results show that CLiST not only achieves perfect accuracy on the fundamental CWRU benchmark but also exhibits exceptional spatial generalization when independently evaluated on non-dominant sensor axes of the XJTUGearbox dataset. Furthermore, validation on the real-world dataset (Guangzhou port) proves that the framework has excellent robustness to the attenuation of the signal transmission path and reduces the performance fluctuation between remote measurement points. Ultimately, CLiST delivers highly reliable AI-driven image and signal-processing solutions for vibration monitoring in industrial equipment. Full article
15 pages, 2219 KB  
Article
One Patch Is All You Need: Joint Surface Material Reconstruction and Classification from Minimal Visual Cues
by Sindhuja Penchala, Gavin Money, Gabriel Marques, Samuel Wood, Jessica Kirschman, Travis Atkison, Shahram Rahimi and Noorbakhsh Amiri Golilarz
Sensors 2026, 26(7), 2083; https://doi.org/10.3390/s26072083 - 27 Mar 2026
Viewed by 37
Abstract
Understanding material surfaces from sparse visual cues is critical for applications in robotics, simulation and material perception. However, most existing methods rely on dense or full scene observations, limiting their effectiveness in constrained or partial view environments. This gap highlights the need for [...] Read more.
Understanding material surfaces from sparse visual cues is critical for applications in robotics, simulation and material perception. However, most existing methods rely on dense or full scene observations, limiting their effectiveness in constrained or partial view environments. This gap highlights the need for models capable of inferring surfaces’ properties from extremely limited visual information. To address this challenge, we introduce SMARC, a unified model for Surface MAterial Reconstruction and Classification from minimal visual input. By giving only a single 10% contiguous patch of the image, SMARC recognizes and reconstructs the full RGB surface while simultaneously classifying the material category. Our architecture combines a Partial Convolutional U-Net with a classification head, enabling both spatial inpainting and semantic understanding under extreme observation sparsity. We compared SMARC against five models including convolutional autoencoders, Vision Transformer (ViT), Masked Autoencoder (MAE), Swin Transformer and DETR using the Touch and Go dataset of real-world surface textures. SMARC achieves the highest performance among the evaluated methods with a PSNR of 17.55 dB and a surface classification accuracy of 85.10%. These results validate the effectiveness of SMARC in relation to surface material understanding and highlight its potential for deployment in robotic perception tasks where visual access is inherently limited. Full article
(This article belongs to the Special Issue Advanced Sensors and AI Integration for Human–Robot Teaming)
Show Figures

Figure 1

22 pages, 1692 KB  
Article
A Novel AAF-SwinT Model for Automatic Recognition of Abnormal Goat Lung Sounds
by Shengli Kou, Decao Zhang, Jiadong Yu, Yanling Yin, Weizheng Shen and Qiutong Cen
Animals 2026, 16(7), 1021; https://doi.org/10.3390/ani16071021 - 26 Mar 2026
Viewed by 116
Abstract
In abnormal goat lung sound recognition, high inter-class similarity and large intra-class variability pose significant challenges. To address this issue and improve recognition performance, we propose a deep learning model, AAF-SwinT, based on an improved Swin Transformer. The model replaces the original Swin [...] Read more.
In abnormal goat lung sound recognition, high inter-class similarity and large intra-class variability pose significant challenges. To address this issue and improve recognition performance, we propose a deep learning model, AAF-SwinT, based on an improved Swin Transformer. The model replaces the original Swin Transformer self-attention module with Axial Decomposed Attention (ADA), modeling the temporal and frequency axes separately and integrating attention weights to mitigate inter-class feature similarity. Adaptive Spatial Aggregation for Patch Merging (ASAP) is designed to emphasize key time-frequency regions, and a Frequency-Aware Multi-Layer Perceptron (FAM) is introduced to model features across different frequency bands, further enhancing the discriminative ability for abnormal lung sounds. Experiments on a self-constructed goat lung sound dataset demonstrate that AAF-SwinT achieves an accuracy of 88.21%, outperforming existing mainstream Transformer-based models by 2.68–5.98%. Ablation studies further confirm the effectiveness of each proposed module, improving the accuracy of baseline Swin Transformer model from 85.53% to 88.21%. These results indicate that the proposed approach exhibits strong robustness and practical potential for abnormal lung sound recognition in goats, providing technical support for early diagnosis and management of respiratory diseases in large-scale goat farming. Full article
(This article belongs to the Special Issue Artificial Intelligence Applications for Veterinary Medicine)
17 pages, 26938 KB  
Article
Dual-SwinOrd: A Dual-Head Swin Transformer with Semantic Prior Injection for Ordinal Diabetic Retinopathy Grading
by Wenjuan Yu, Xiaonan Si and Jingxiang Zhong
Bioengineering 2026, 13(4), 374; https://doi.org/10.3390/bioengineering13040374 - 24 Mar 2026
Viewed by 206
Abstract
Diabetic retinopathy (DR) is the largest cause of permanent vision loss in the working-age population, making automated grading critical for timely therapeutic intervention. While recent deep learning algorithms have improved feature discrimination, modern state-of-the-art systems have two fundamental drawbacks. First, most models rely [...] Read more.
Diabetic retinopathy (DR) is the largest cause of permanent vision loss in the working-age population, making automated grading critical for timely therapeutic intervention. While recent deep learning algorithms have improved feature discrimination, modern state-of-the-art systems have two fundamental drawbacks. First, most models rely on standard Convolutional Neural Networks, which struggle to capture long-range relationships and lack semantic reasoning, resulting in visual findings that do not correlate with clinical knowledge. Second, present approaches often consider grading as a nominal classification or a pure ordinal regression task, failing to strike a compromise between high classification accuracy and severity-consistent predictions (Quadratic Weighted Kappa). To address these challenges, we propose Dual-SwinOrd, a novel framework that integrates a hierarchical Vision Transformer with a semantically guided dual-head mechanism. Specifically, we use a Swin Transformer backbone to extract hierarchical features, effectively capturing global retinal structures. To handle diverse lesion scales, we incorporate a Progressive Lesion-aware Kernel Attention (PLKA) module and a Semantic Prior Modulation (SPM) module guided by PubMedCLIP, bridging the gap between visual features and medical linguistic priors. In addition, we propose a Dual-Head learning strategy that decouples the optimization objective into two parallel streams: a Classification Head to maximize diagnostic accuracy and an Ordinal Regression Head (DPE) to enforce rank-consistency. This design effectively mitigates the trade-off between precision and ordinality. Extensive experiments on the APTOS 2019 and DDR datasets demonstrate that Dual-SwinOrd achieves state-of-the-art performance, yielding an Accuracy of 87.98% and a Quadratic Weighted Kappa (QWK) of 0.9370 on the APTOS 2019 dataset, as well as an Accuracy of 86.54% and a QWK of 0.9040 on the DDR dataset. Full article
(This article belongs to the Special Issue AI-Driven Approaches to Diseases Detection and Diagnosis)
Show Figures

Figure 1

24 pages, 5930 KB  
Article
Style-Abstraction-Based Data Augmentation for Robust Affective Computing
by Xu Qiu, Taewan Kim and Bongjae Kim
Appl. Sci. 2026, 16(6), 3109; https://doi.org/10.3390/app16063109 - 23 Mar 2026
Viewed by 207
Abstract
Personality recognition and emotion recognition, two core tasks within affective computing, are fundamentally constrained by data scarcity as collecting and annotating human behavioral data is expensive and restricted by privacy concerns. Under these limited data conditions, existing models tend to rely on superficial [...] Read more.
Personality recognition and emotion recognition, two core tasks within affective computing, are fundamentally constrained by data scarcity as collecting and annotating human behavioral data is expensive and restricted by privacy concerns. Under these limited data conditions, existing models tend to rely on superficial shortcut features such as background appearance, lighting conditions, or color variations, rather than behavior-relevant cues including facial expressions, posture, and motion dynamics. To address this issue, we propose Style-Abstraction-based Data Augmentation, a style transfer-based augmentation strategy that reduces dependency on low-level appearance information while preserving high-level semantic cues. Specifically, we employ cartoonization to generate stylized variants of training videos that retain expressive characteristics but remove stylistic bias. We validate our approach on three diverse personality benchmarks (First Impression v2, UDIVA v0.5, and KETI) and emotion benchmark(Emotion Dataset) using state-of-the-art models including ViViT (Video Vision Transformer), TimeSformer, and VST (Video Swin Transformer). Our experiments indicate that increasing the proportion of style-abstracted data in the training set can improve performance on the evaluated datasets. Notably, our method yields consistent gains across all benchmarks: a 0.0893 reduction in MSE on UDIVA v0.5 (with VST), a 0.0023 improvement in 1-MAE on KETI (with TimeSformer), and a 0.0051 improvement on First Impression v2 (with TimeSformer). Furthermore, extending style-abstraction-based data augmentation to a four-class categorical emotion recognition task demonstrates similar performance gains, achieving up to a 3.44% accuracy increase with the TimeSformer backbone. These findings verify that our style-abstraction-based data augmentation facilitates learning of behavior-relevant features by reducing reliance on superficial shortcuts. Overall, cartoonization-based style abstraction for data augmentation functions as both an effective augmentation strategy and a regularization mechanism, encouraging the model to learn more stable and generalizable representations for affective computing applications. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)
Show Figures

Figure 1

35 pages, 5649 KB  
Article
Cross-Dataset Benchmarking of Deep Learning Models for Surface Defect Classification in Metal Parts
by Fábio Mendes da Silva, João Manuel R. S. Tavares, António Mendes Lopes and Antonio Ramos Silva
Appl. Sci. 2026, 16(6), 3022; https://doi.org/10.3390/app16063022 - 20 Mar 2026
Viewed by 177
Abstract
Accurate surface defect classification is critical for industrial quality control. Although Deep Learning achieves strong results on individual datasets, most prior studies benchmark only a narrow set of models under inconsistent pipelines, limiting comparability and industrial relevance. This work introduces the first systematic [...] Read more.
Accurate surface defect classification is critical for industrial quality control. Although Deep Learning achieves strong results on individual datasets, most prior studies benchmark only a narrow set of models under inconsistent pipelines, limiting comparability and industrial relevance. This work introduces the first systematic benchmark of ten architectures—CNNs (CNN, ResNet18/50), lightweight models (MobileNetV2, SuperSimpleNet, GhostNet, EfficientNetV2), Vision Transformers (Swin Transformer), a hybrid CNN–Transformer (CoAtNet), and a one-stage detector (YOLOv12)—across five public defect datasets (NEU-DET, X-SDD, KolektorSDD2, DAGM, MTDD) under a unified pipeline. Results show that Swin Transformer and CoAtNet achieve the best performance (mean F1-scores 90.8% and 85.5%), while EfficientNetV2 underperformed (41.9%), underscoring the need for domain-specific benchmarks. Lightweight models such as MobileNetV2, GhostNet, and SuperSimpleNet deliver competitive accuracy at much lower cost, offering practical solutions for edge deployment. By bridging the gap between academic benchmarks and manufacturing requirements, this study provides actionable guidance for selecting defect detection models in automated inspection. Full article
Show Figures

Figure 1

15 pages, 5710 KB  
Article
Prediction of Cataract Severity Using Slit Lamp Images from a Portable Smartphone Device: A Pilot Study
by David Z. Chen, Changshuo Liu, Junran Wu, Lei Zhu and Beng Chin Ooi
Sensors 2026, 26(6), 1954; https://doi.org/10.3390/s26061954 - 20 Mar 2026
Viewed by 308
Abstract
Cataract diagnosis requires a comprehensive dilated examination by an ophthalmologist using a slit lamp; there is currently no effective means to objectively screen for cataracts in the community using portable devices without dilation. We hypothesized that it would be possible to predict cataract [...] Read more.
Cataract diagnosis requires a comprehensive dilated examination by an ophthalmologist using a slit lamp; there is currently no effective means to objectively screen for cataracts in the community using portable devices without dilation. We hypothesized that it would be possible to predict cataract severity using deep learning on images taken using a portable smartphone-based slit lamp prototype, with and without dilation. In this prospective cross-sectional pilot study, slit lamp images were captured from eligible patients with cataracts in a tertiary clinic using a portable slit lamp prototype attached to a smartphone. The Pentacam nuclear staging score (PNS, Pentacam®, Oculus, Inc., Arlington, WA, USA) was taken from the dilated pupils and served as ground truth. A transformer prototypical network with the Swin transformer on the images was trained to assign the class label corresponding to the highest predicted probability. Heat maps were generated based on attribution masks to identify the anatomical areas of concern. A total of 1900 images from 198 eyes of 99 patients were captured. The average age was 65.3 ± 10.4 years (range, 41.0 to 88.0 years) and the average PNS score was 1.57 ± 0.81 (range, 0 to 4). The model achieved an average accuracy of 81.25% and 74.38% for undilated and dilated eyes, respectively. Heat map visualization using the integrated gradient method successfully identified the anatomical area of interest in certain images. This study suggests the possibility of estimating cataract density using a portable smartphone slit lamp device without dilation. Further work is under way to validate this technique in a larger and more diverse group of eyes with cataracts. Full article
(This article belongs to the Special Issue Smartphone Sensors and Their Applications)
Show Figures

Figure 1

27 pages, 2697 KB  
Article
S2A-Swin: Spectral Smoothing–Guided Spectral–Spatial Windows with Generative Augmentation for Hyperspectral Image Classification Under Class Imbalance and Limited Labels
by Baisen Liu, Jianxin Chen, Wulin Zhang, Zhiming Dang, Xinyao Li and Weili Kong
Remote Sens. 2026, 18(6), 935; https://doi.org/10.3390/rs18060935 - 19 Mar 2026
Viewed by 150
Abstract
Hyperspectral image (HSI) classification faces the challenges of scarce labeled data and severe class imbalance, which limits the effective training and generalization capabilities of models. To address these issues, we propose S2A-Swin, a joint spatial–spectral hybrid Swin Transformer framework. First, we develop a [...] Read more.
Hyperspectral image (HSI) classification faces the challenges of scarce labeled data and severe class imbalance, which limits the effective training and generalization capabilities of models. To address these issues, we propose S2A-Swin, a joint spatial–spectral hybrid Swin Transformer framework. First, we develop a spectral–spatial conditional generative adversarial network (SSC-cGAN), which combines spectral and spatial smoothing regularizers to synthesize class-specific image patches, thus alleviating the problems of data scarcity and class imbalance while maintaining spectral continuity and local spatial structure consistent with real data. Second, we introduce a dimension-aware hybrid Transformer module, which adds local windows along the spectral dimension to the standard spatial window, thereby facilitating cross-dimensional feature interactions and ensuring that each spectral band is modeled using the local spatial context for more efficient joint spatial–spectral modeling. In this module, attention mechanisms for spectral and spatial windows are applied alternately (“cross-sequence” attention mechanisms), the execution order of which is guided by hyperspectral prior knowledge to enhance cross-dimensional representation learning. This module is embedded in the lightweight Swin backbone and extends the traditional spatial window mechanism through spectral window attention, capturing spectral continuity while maintaining spatial structure consistency. Extensive experiments on multiple datasets demonstrate that, compared to mainstream CNN and Transformer baselines on four benchmark datasets, the proposed method achieves overall accuracy (OA) improvements of 2.45%, 7.05%, 5.17%, and 0.85%. Full article
Show Figures

Figure 1

21 pages, 2957 KB  
Article
Automated Single-Slice Lumbar QCT HU Value Measurement with Clinical Workflow
by Zhe-Yu Ye, Jun-Mu Peng, Bing-Qian Lu and Tamotsu Kamishima
Mach. Learn. Knowl. Extr. 2026, 8(3), 77; https://doi.org/10.3390/make8030077 - 19 Mar 2026
Viewed by 130
Abstract
Manual single-slice lumbar quantitative computed tomography (QCT) depends on operator-driven slice selection and trabecular region-of-interest (ROI) placement. We developed a fully automated single-slice workflow for vertebral trabecular Hounsfield unit (HU) measurement that combines unsuitable-slice prescreening, dual-purpose segmentation, intra-patient slice-quality ranking, and a deterministic [...] Read more.
Manual single-slice lumbar quantitative computed tomography (QCT) depends on operator-driven slice selection and trabecular region-of-interest (ROI) placement. We developed a fully automated single-slice workflow for vertebral trabecular Hounsfield unit (HU) measurement that combines unsuitable-slice prescreening, dual-purpose segmentation, intra-patient slice-quality ranking, and a deterministic inner ROI rule. The pipeline includes an Eligibility Gate, QC-Envelope segmentation for broad, vertebral- and usability-preserving delineation, PairRank-Swin for best-slice selection, and dedicated trabecular segmentation for final quantitative analysis. In the independent external cohort, 4 cases were considered non-evaluable by both manual review and the pipeline, and 2 additional borderline-quality cases were manually measured but rejected by the pipeline; therefore, paired HU agreement analysis included 44 evaluable cases. Agreement remained high, with Pearson’s r = 0.987, Lin’s CCC = 0.985, mean bias −0.44 HU, and limits of agreement from −14.88 to +13.99 HU. Coverage was 84.1% within ±10 HU and 97.7% within ±15 HU. Ablation analysis showed that slice ranking and ROI erosion were the most critical components. In an open module-level baseline comparison, QC-Envelope segmentation substantially outperformed TotalSegmentator. This workflow provides high agreement with expert HU measurement while preserving reviewable intermediate outputs. Full article
Show Figures

Figure 1

22 pages, 2762 KB  
Article
Automated Classification of Medical Image Modality and Anatomy
by Jean de Smidt, Kian Anderson and Andries Engelbrecht
Algorithms 2026, 19(3), 222; https://doi.org/10.3390/a19030222 - 16 Mar 2026
Viewed by 205
Abstract
Radiological departments face challenges in efficiency and diagnostic consistency. The interpretation of radiographs remains highly variable between practitioners, which creates potential disparities in patient care. This study explores how artificial intelligence (AI), specifically transfer learning techniques, can automate parts of the radiological workflow [...] Read more.
Radiological departments face challenges in efficiency and diagnostic consistency. The interpretation of radiographs remains highly variable between practitioners, which creates potential disparities in patient care. This study explores how artificial intelligence (AI), specifically transfer learning techniques, can automate parts of the radiological workflow to improve service quality and efficiency. Transfer learning methods were applied to various convolutional neural network (CNN) architectures and compared to classify medical images across different modalities, i.e., X-rays, ultrasound, magnetic resonance imaging (MRI), and angiography, through a two-component model: medical image modality prediction and anatomical region prediction. Several publicly available datasets were combined to create a representative dataset to evaluate residual networks (ResNet), dense networks (DenseNet), efficient networks (EfficientNet), and the Swin Transformer (Swin-T). The models were evaluated through accuracy, precision, recall, and F1-score metrics with macro-averaging to account for class imbalance. The results demonstrate that lightweight transfer learning methods effectively classify medical imagery, with an accuracy of 97.21% on test data for the combined transfer learning pipeline. EfficientNet-B4 demonstrated the best performance on both components of the proposed pipeline and achieved a 99.6% accuracy for modality prediction and 99.21% accuracy for anatomical region prediction on unseen test data. This approach offers the potential for streamlined radiological workflows while maintaining diagnostic quality. The strong model performance across diverse modalities and anatomical regions indicates robust generalisability for practical implementation in clinical settings. Full article
(This article belongs to the Special Issue Advances in Deep Learning-Based Data Analysis)
Show Figures

Figure 1

29 pages, 7769 KB  
Article
Efficient Deep Learning Models Integrated with a Smart Web Application for Classifying Heart Diseases Based on ECG Signals
by Saeed Mohsen, Ahmed F. Ibrahim, Osama F. Hassan, Norah Alnaim, Noorah Albehaijan and M. Abdel-Aziz
Computers 2026, 15(3), 191; https://doi.org/10.3390/computers15030191 - 16 Mar 2026
Viewed by 301
Abstract
Recent advancements in the accuracy of deep learning (DL) hold significant promise for improving the classification of heart patients. Nevertheless, continued refinement is essential to achieve even greater levels of precision in DL techniques. This paper proposes three efficient DL models: Swin Transformer [...] Read more.
Recent advancements in the accuracy of deep learning (DL) hold significant promise for improving the classification of heart patients. Nevertheless, continued refinement is essential to achieve even greater levels of precision in DL techniques. This paper proposes three efficient DL models: Swin Transformer (Swin-T), Visual Geometry Group (VGG)-19, and Vision Transformer (ViT), which are implemented to classify different types of heart patients. The three DL models are learned on a balanced dataset comprising 600 electrocardiogram (ECG) samples. This dataset contains three classes: Arrhythmia Patient, Myocardic Patient, and Normal Patient. The DL models are applied using a PyTorch framework v2.10.0, with fine-tuning for the models’ hyperparameters to maximize the classification accuracy, and data augmentation techniques are implemented for the ECG samples. Additionally, a smart web application is designed for classifying heart patients into three different diagnostic categories. The performance of the three models is assessed by several metrics such as area under precision-recall (AUPR) curves and normalized confusion matrices (NCMs). The proposed three models achieve high testing accuracy for the classification of heart patients. Regarding testing loss (TL) rates for the Swin-T, VGG-19, and ViT achieve rates of 0.0707, 0.4138, and 0.0015, respectively. Also, the ViT achieves an F1-score, true positive rate (TPR), and AUPR curves of 100%. Full article
(This article belongs to the Special Issue AI in Bioinformatics)
Show Figures

Figure 1

17 pages, 2662 KB  
Article
A Swin-Transformer-Based Network for Adaptive Backlight Optimization
by Jin Li, Rui Pu, Junbang Jiang and Man Zhu
Symmetry 2026, 18(3), 502; https://doi.org/10.3390/sym18030502 - 15 Mar 2026
Viewed by 190
Abstract
Mini-LED local dimming systems commonly suffer from luminance discontinuity, halo artifacts, and temporal instability in dynamic scenes. Traditional heuristic-based methods and standard convolutional neural networks often fail to capture long-range spatial dependencies and struggle to balance spatial smoothness, content fidelity, and real-time performance [...] Read more.
Mini-LED local dimming systems commonly suffer from luminance discontinuity, halo artifacts, and temporal instability in dynamic scenes. Traditional heuristic-based methods and standard convolutional neural networks often fail to capture long-range spatial dependencies and struggle to balance spatial smoothness, content fidelity, and real-time performance under hardware constraints. To address these challenges, this paper proposes SwinLightNet, an efficient adaptive backlight optimization network tailored for Mini-LED displays. Built upon a Swin Transformer framework tailored for Mini-LED backlight optimization, SwinLightNet integrates five hardware-aware design strategies: (i) a lightweight Swin variant (window size = 8, MLP ratio = 2.0) for efficient global context modeling; (ii) CNN encoder–decoder integration for multi-scale feature extraction; (iii) a partition-level alignment module ensuring spatial consistency; (iv) a backlight constraint module enforcing local luminance consistency and contrast preservation; (v) a change-aware temporal decision framework stabilizing dynamic sequences. These components synergistically resolve core limitations: global modeling suppresses halo artifacts while preserving content fidelity; alignment and constraint modules eliminate luminance discontinuity without compromising contrast; and the temporal framework guarantees flicker-free output under motion. Evaluated on DIV2K (static images) and a custom 2K-resolution video dataset (dynamic scenes), SwinLightNet demonstrates robust reconstruction quality while maintaining only 1.18 million parameters and 0.088 GFLOPs (Computational Cost). The results confirm SwinLightNet’s effectiveness in holistically addressing spatial, temporal, and hardware constraints, demonstrating strong potential for practical deployment in resource-constrained Mini-LED backlight control systems. Full article
(This article belongs to the Special Issue Symmetry and Asymmetry in Optimization Algorithms and Control Systems)
Show Figures

Figure 1

30 pages, 3316 KB  
Article
A Novel Hybrid CNN-ViT-Based Bi-Directional Cross-Guidance Fusion-Driven Breast Cancer Detection Model
by Abdul Rahaman Wahab Sait and Yazeed Alkhurayyif
Life 2026, 16(3), 474; https://doi.org/10.3390/life16030474 - 14 Mar 2026
Viewed by 286
Abstract
Accurate and early identification of breast cancer from mammography is key to reducing breast cancer mortality, and automated analysis is challenging due to subtle lesion appearances, heterogeneous breast density, and the variance caused by modality. Standard Convolutional Neural Networks (CNNs) are excellent at [...] Read more.
Accurate and early identification of breast cancer from mammography is key to reducing breast cancer mortality, and automated analysis is challenging due to subtle lesion appearances, heterogeneous breast density, and the variance caused by modality. Standard Convolutional Neural Networks (CNNs) are excellent at capturing localized textures, whereas Vision Transformers (ViTs) capture long-range dependencies; however, both often struggle to produce a unified representation that consistently supports diagnostic decision-making. To address these limitations, this study presents a dual-stream framework integrating ConvNeXt for high-fidelity local feature extraction with Swin Transformer V2 for hierarchical global context modeling. A Bi-Directional Cross-Guidance (BDCG) mechanism is added to harmonize interactions between the two feature domains and ensure mutual information learning in the representations. Furthermore, a Prototype-Anchored Similarity Head (PASH) is used to stabilize classification using distance-based reasoning instead of using linear separation. Comprehensive experiments show the effectiveness of the proposed method using two benchmark datasets. On Dataset 1, the model achieves accuracy: 98.8%, precision: 98.7%, recall: 98.6%, and F1 score: 97.2%, outperforming existing models based on CNN, ViTs, and hybrid architectures, and provides a lower inference time (8.3 ms/image). On the more heterogeneous Dataset 2, the model maintains strong performance, with an accuracy of 97.0%, precision of 95.4%, recall of 94.8%, and F1-score of 95.1%, demonstrating its resilience to domain shift and imaging variability. These results underscore the value of structural multi-scale feature interaction and prototype-driven classification for robust mammographic analysis. The consistent performance across internal and external evaluations indicates the potential for the proposed framework to be reliably applied in computer-aided screening systems. Full article
Show Figures

Figure 1

24 pages, 7664 KB  
Article
Deep Learning-Based Evaluation of Offshore Wind Energy Resources in Southeastern China for the Future
by Chengguang Lai, Peilin Zeng, Zifeng Deng, Zhaoli Wang and Xuezhi Tan
Energies 2026, 19(6), 1447; https://doi.org/10.3390/en19061447 - 13 Mar 2026
Viewed by 259
Abstract
The evaluation of offshore wind energy resources is important to the construction of offshore wind power facilities. In this paper, using four models from CMIP6 and the ERA5 reanalysis dataset, a deep learning model termed SwinWind was developed and proposed to evaluate future [...] Read more.
The evaluation of offshore wind energy resources is important to the construction of offshore wind power facilities. In this paper, using four models from CMIP6 and the ERA5 reanalysis dataset, a deep learning model termed SwinWind was developed and proposed to evaluate future offshore wind energy resources in Southeastern China for the periods 2020–2050 and 2070–2100. The feature extraction capability of the Swin Transformer was utilized to construct a bias correction and downscaling framework. This approach achieves performance comparable to existing high-cost models while significantly reducing computational costs and complexity. The SwinWind model corrected most of the biases and effectively learned spatial relationships, successfully performing the downscaling task. Based on future wind speed projections derived from the SwinWind model, this study presents a comprehensive evaluation of offshore wind resources, examining five critical dimensions: resource abundance, efficiency, stability, the impact of extreme winds, and economic feasibility. It is projected that offshore wind resources around Shanghai, Jiangsu and Zhejiang will experience a decline in the 21st century, while offshore wind resources around the Guangdong, Fujian and the Beibu Gulf show an increasing trend. The evaluation index shows that the coastal areas of Guangdong and the southern coastline of Taiwan are the most suitable locations for wind power exploitation. The Taiwan Strait, which has the highest wind energy density, is not the best spot due to its extreme wind speed and unstable wind resources. This study provides an important reference for the location of wind farms with practical application value. Full article
Show Figures

Figure 1

Back to TopTop