Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (62)

Search Parameters:
Keywords = swintransformer

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
26 pages, 102536 KB  
Article
SPOD-YOLO: A Modular Approach for Small and Oriented Aircraft Detection in Satellite Remote Sensing Imagery
by Jiajian Chen, Pengyu Guo, Yong Liu, Lu Cao, Dechao Ran, Kai Wang, Wei Hu and Liyang Wan
Remote Sens. 2025, 17(24), 3963; https://doi.org/10.3390/rs17243963 - 8 Dec 2025
Viewed by 352
Abstract
The accurate detection of small, densely packed and arbitrarily oriented aircraft in high-resolution remote sensing imagery remains highly challenging due to significant variations in object scale, orientation and background complexity. Existing detection frameworks often struggle with insufficient representation of small objects, instability of [...] Read more.
The accurate detection of small, densely packed and arbitrarily oriented aircraft in high-resolution remote sensing imagery remains highly challenging due to significant variations in object scale, orientation and background complexity. Existing detection frameworks often struggle with insufficient representation of small objects, instability of rotated bounding box regression and inability to adapt to complex background. To address these limitations, we propose SPOD-YOLO, a novel detection framework specifically designed for small aircraft in remote sensing images. This method is based on YOLOv11, combined with the feature attention mechanism of swintransformer, through targeted improvements on cross-scale feature modelling, dynamic convolutional adaptation, and rotational geometry optimization to achieve effective detection. Additionally, we have constructed a new dataset based on satellite remote sensing images, which has high density of small aircraft with rotated bounding box annotations to provide more realistic and challenging evaluation settings. Extensive experiments on MAR20, UCAS-AOD and the constructed dataset demonstrate that our method achieves consistent performance gains over state-of-the-art approaches. SPOD-YOLO achieves an 4.54% increase in mAP50 and a 11.78% gain in mAP50:95 with only 3.77 million parameters on the constructed dataset. These results validate the effectiveness and robustness of our approach in complex remote sensing scenarios, offering a practical advancement for the detection of small objects in aerospace imagery. Full article
Show Figures

Figure 1

22 pages, 5462 KB  
Article
Ship Motion State Recognition Using Trajectory Image Modeling and CNN-Lite
by Shuaibing Zhao, Zongshun Tian, Yuefeng Lu, Peng Xie, Xueyuan Li, Yu Yan and Bo Liu
J. Mar. Sci. Eng. 2025, 13(12), 2327; https://doi.org/10.3390/jmse13122327 - 8 Dec 2025
Viewed by 294
Abstract
Intelligent recognition of ship motion states is a key technology for achieving smart maritime supervision and optimized port scheduling. To enhance both the modeling efficiency and recognition accuracy of AIS trajectory data, this paper proposes a ship behavior recognition method that integrates trajectory-to-image [...] Read more.
Intelligent recognition of ship motion states is a key technology for achieving smart maritime supervision and optimized port scheduling. To enhance both the modeling efficiency and recognition accuracy of AIS trajectory data, this paper proposes a ship behavior recognition method that integrates trajectory-to-image conversion with a convolutional neural network (CNN) for classifying three typical motion states: mooring, anchoring, and sailing. Firstly, a multi-step preprocessing pipeline is established, incorporating trajectory cleaning, interpolation complementation, and segmentation to ensure data completeness and consistency; secondly, dynamic features—including speed, heading, and temporal progression—are encoded into an RGB three-channel image, which not only preserves the original spatial and temporal information of the trajectory but also strengthens the dimension of the feature expression of the image. Thirdly, the lightweight CNN architecture (CNN-Lite) is designed to automatically extract spatial motion patterns from these images, with data augmentation techniques further enhancing model robustness and generalization across diverse scenarios. Finally, comprehensive comparative experiments are conducted to evaluate the proposed method. On a real-world AIS dataset, the proposed method achieves an accuracy of 91.54%, precision of 91.51%, recall of 91.54%, and F1-score of 91.52%—demonstrating superior or highly competitive performance compared with SVM, KNN, MLSTM, ResNet-50 and Swin-Transformer in both classification accuracy and model stability. These results confirm that constructing dynamic-feature-enriched RGB trajectory images and designing a lightweight CNN can effectively improve ship behavior recognition performance and provide a practical and efficient technical solution for abnormal anchoring detection, maritime traffic monitoring, and development of intelligent shipping systems. Full article
(This article belongs to the Special Issue Advanced Ship Trajectory Prediction and Route Planning)
Show Figures

Figure 1

25 pages, 2684 KB  
Article
Railway Signal Relay Voiceprint Fault Diagnosis Method Based on Swin-Transformer and Fusion of Gaussian-Laplacian Pyramid
by Yi Liu, Liang Chen, Zhen Wang, Shangmin Zhou and Bobo Zhao
Mathematics 2025, 13(23), 3846; https://doi.org/10.3390/math13233846 - 1 Dec 2025
Viewed by 210
Abstract
Fault diagnosis of railway signal relays is crucial for the operational safety and efficiency of railway systems. With the continuous advancement of deep learning techniques in various applications, voiceprint-based fault diagnosis has emerged as a research hotspot, facilitating the transition from failure-based repair [...] Read more.
Fault diagnosis of railway signal relays is crucial for the operational safety and efficiency of railway systems. With the continuous advancement of deep learning techniques in various applications, voiceprint-based fault diagnosis has emerged as a research hotspot, facilitating the transition from failure-based repair to condition-based maintenance. However, this approach still faces challenges such as the limited feature extraction capability of single voiceprint features and poor discriminability when features are highly concentrated. To address these issues, this paper proposes a voiceprint-based fault diagnosis method for railway signal relays that utilizes a Gaussian–Laplacian pyramid fusion rule and an improved Swin Transformer. The enhanced Swin Transformer integrates the original architecture with a saliency feature map as a masking strategy. Experimental results demonstrate that the proposed method, based on the Gaussian–Laplacian pyramid fusion rule and the improved Swin Transformer, reduces the number of parameters by 54.8% compared to the Vision Transformer while the accuracy is almost same. Full article
Show Figures

Figure 1

17 pages, 3889 KB  
Article
STGAN: A Fusion of Infrared and Visible Images
by Liuhui Gong, Yueping Han and Ruihong Li
Electronics 2025, 14(21), 4219; https://doi.org/10.3390/electronics14214219 - 29 Oct 2025
Viewed by 569
Abstract
The fusion of infrared and visible images provides critical value in computer vision by integrating their complementary information, especially in the field of industrial detection, which provides a more reliable data basis for subsequent defect recognition. This paper presents STGAN, a novel Generative [...] Read more.
The fusion of infrared and visible images provides critical value in computer vision by integrating their complementary information, especially in the field of industrial detection, which provides a more reliable data basis for subsequent defect recognition. This paper presents STGAN, a novel Generative Adversarial Network framework based on a Swin Transformer for high-quality infrared and visible image fusion. Firstly, the generator employs a Swin Transformer as its backbone for feature extraction, which adopts a U-Net architecture, and the improved W-MSA is introduced into the bottleneck layer to enhance local attention and improve the expression ability of cross-modal features. Secondly, the discriminator uses a Markov discriminator to distinguish the difference. Then, the core GAN framework is leveraged to guarantee the retention of both infrared thermal radiation and visible-light texture details in the generated image so as to improve the clarity and contrast of the fused image. Finally, simulation verification showed that six out of seven indicators ranked in the top two, especially in key indicators such as PSNR, VIF, MI, and EN, which achieved optimal or suboptimal values. The experimental results on the general dataset show that this method is superior to the advanced method in terms of subjective vision and objective indicators, and it can effectively enhance the fine structure and thermal anomaly information in the image, which gives it great potential in the application of industrial surface defect detection. Full article
Show Figures

Figure 1

12 pages, 30038 KB  
Article
An Online Blast Furnace Condition Recognition Method Based on Spatiotemporal Texture Feature Coupling and Diffusion Networks
by Xiao Ji, Jie Han, Jianjun He and Weihua Gui
Processes 2025, 13(11), 3416; https://doi.org/10.3390/pr13113416 - 24 Oct 2025
Viewed by 401
Abstract
Real-time and accurate identification of the blast furnace (BF) condition is essential for maintaining stability and improving energy efficiency in steelmaking. However, the harsh environment inside the BF makes direct acquisition of the BF condition extremely difficult. To address this challenge, this study [...] Read more.
Real-time and accurate identification of the blast furnace (BF) condition is essential for maintaining stability and improving energy efficiency in steelmaking. However, the harsh environment inside the BF makes direct acquisition of the BF condition extremely difficult. To address this challenge, this study proposes an online BF condition recognition method based on spatiotemporal texture feature coupling and diffusion networks (STFC-DN). The method employs a multi-domain Swin-Transformer module (MDSTM) combined with wavelet decomposition and channel attention to extract the gas flow region. A temporal feature pyramid network module (T-FPNM) is then used to capture both the global and local spatiotemporal characteristics of this region. Heuristic clustering and an idempotent generative network (IGN) are introduced to obtain standardized BF condition features, enabling intelligent classification through multi-metric similarity analysis. Experimental results show that the proposed STFC-DN achieves an average accuracy exceeding 98% when identifying four BF conditions: normal, hanging, oblique stockline, and collapsing, with an inference speed of approximately 28 FPS. This approach demonstrates both high accuracy and real-time capability, showing strong potential for advancing the intelligent and sustainable development of the steel industry. Full article
Show Figures

Figure 1

26 pages, 3973 KB  
Article
ViT-DCNN: Vision Transformer with Deformable CNN Model for Lung and Colon Cancer Detection
by Aditya Pal, Hari Mohan Rai, Joon Yoo, Sang-Ryong Lee and Yooheon Park
Cancers 2025, 17(18), 3005; https://doi.org/10.3390/cancers17183005 - 15 Sep 2025
Cited by 1 | Viewed by 1088
Abstract
Background/Objectives: Lung and colon cancers remain among the most prevalent and fatal diseases worldwide, and their early detection is a serious challenge. The data used in this study was obtained from the Lung and Colon Cancer Histopathological Images Dataset, which comprises five different [...] Read more.
Background/Objectives: Lung and colon cancers remain among the most prevalent and fatal diseases worldwide, and their early detection is a serious challenge. The data used in this study was obtained from the Lung and Colon Cancer Histopathological Images Dataset, which comprises five different classes of image data, namely colon adenocarcinoma, colon normal, lung adenocarcinoma, lung normal, and lung squamous cell carcinoma, split into training (80%), validation (10%), and test (10%) subsets. In this study, we propose the ViT-DCNN (Vision Transformer with Deformable CNN) model, with the aim of improving cancer detection and classification using medical images. Methods: The combination of the ViT’s self-attention capabilities with deformable convolutions allows for improved feature extraction, while also enabling the model to learn both holistic contextual information as well as fine-grained localized spatial details. Results: On the test set, the model performed remarkably well, with an accuracy of 94.24%, an F1 score of 94.23%, recall of 94.24%, and precision of 94.37%, confirming its robustness in detecting cancerous tissues. Furthermore, our proposed ViT-DCNN model outperforms several state-of-the-art models, including ResNet-152, EfficientNet-B7, SwinTransformer, DenseNet-201, ConvNext, TransUNet, CNN-LSTM, MobileNetV3, and NASNet-A, across all major performance metrics. Conclusions: By using deep learning and advanced image analysis, this model enhances the efficiency of cancer detection, thus representing a valuable tool for radiologists and clinicians. This study demonstrates that the proposed ViT-DCNN model can reduce diagnostic inaccuracies and improve detection efficiency. Future work will focus on dataset enrichment and enhancing the model’s interpretability to evaluate its clinical applicability. This paper demonstrates the promise of artificial-intelligence-driven diagnostic models in transforming lung and colon cancer detection and improving patient diagnosis. Full article
(This article belongs to the Special Issue Image Analysis and Machine Learning in Cancers: 2nd Edition)
Show Figures

Figure 1

40 pages, 7941 KB  
Article
Synergistic Hierarchical AI Framework for USV Navigation: Closing the Loop Between Swin-Transformer Perception, T-ASTAR Planning, and Energy-Aware TD3 Control
by Haonan Ye, Hongjun Tian, Qingyun Wu, Yihong Xue, Jiayu Xiao, Guijie Liu and Yang Xiong
Sensors 2025, 25(15), 4699; https://doi.org/10.3390/s25154699 - 30 Jul 2025
Cited by 1 | Viewed by 1170
Abstract
Autonomous Unmanned Surface Vehicle (USV) operations in complex ocean engineering scenarios necessitate robust navigation, guidance, and control technologies. These systems require reliable sensor-based object detection and efficient, safe, and energy-aware path planning. To address these multifaceted challenges, this paper proposes a novel synergistic [...] Read more.
Autonomous Unmanned Surface Vehicle (USV) operations in complex ocean engineering scenarios necessitate robust navigation, guidance, and control technologies. These systems require reliable sensor-based object detection and efficient, safe, and energy-aware path planning. To address these multifaceted challenges, this paper proposes a novel synergistic AI framework. The framework integrates (1) a novel adaptation of the Swin-Transformer to generate a dense, semantic risk map from raw visual data, enabling the system to interpret ambiguous marine conditions like sun glare and choppy water, enabling real-time environmental understanding crucial for guidance; (2) a Transformer-enhanced A-star (T-ASTAR) algorithm with spatio-temporal attentional guidance to generate globally near-optimal and energy-aware static paths; (3) a domain-adapted TD3 agent featuring a novel energy-aware reward function that optimizes for USV hydrodynamic constraints, making it suitable for long-endurance missions tailored for USVs to perform dynamic local path optimization and real-time obstacle avoidance, forming a key control element; and (4) CUDA acceleration to meet the computational demands of real-time ocean engineering applications. Simulations and real-world data verify the framework’s superiority over benchmarks like A* and RRT, achieving 30% shorter routes, 70% fewer turns, 64.7% fewer dynamic collisions, and a 215-fold speed improvement in map generation via CUDA acceleration. This research underscores the importance of integrating powerful AI components within a hierarchical synergy, encompassing AI-based perception, hierarchical decision planning for guidance, and multi-stage optimal search algorithms for control. The proposed solution significantly advances USV autonomy, addressing critical ocean engineering challenges such as navigation in dynamic environments, object avoidance, and energy-constrained operations for unmanned maritime systems. Full article
Show Figures

Figure 1

18 pages, 3368 KB  
Article
Segmentation-Assisted Fusion-Based Classification for Automated CXR Image Analysis
by Shilu Kang, Dongfang Li, Jiaxin Xu, Aokun Mei and Hua Huo
Sensors 2025, 25(15), 4580; https://doi.org/10.3390/s25154580 - 24 Jul 2025
Cited by 1 | Viewed by 1080
Abstract
Accurate classification of chest X-ray (CXR) images is crucial for diagnosing lung diseases in medical imaging. Existing deep learning models for CXR image classification face challenges in distinguishing non-lung features. In this work, we propose a new segmentation-assisted fusion-based classification method. The method [...] Read more.
Accurate classification of chest X-ray (CXR) images is crucial for diagnosing lung diseases in medical imaging. Existing deep learning models for CXR image classification face challenges in distinguishing non-lung features. In this work, we propose a new segmentation-assisted fusion-based classification method. The method involves two stages: first, we use a lightweight segmentation model, Partial Convolutional Segmentation Network (PCSNet) designed based on an encoder–decoder architecture, to accurately obtain lung masks from CXR images. Then, a fusion of the masked CXR image with the original image enables classification using the improved lightweight ShuffleNetV2 model. The proposed method is trained and evaluated on segmentation datasets including the Montgomery County Dataset (MC) and Shenzhen Hospital Dataset (SH), and classification datasets such as Chest X-Ray Images for Pneumonia (CXIP) and COVIDx. Compared with seven segmentation models (U-Net, Attention-Net, SegNet, FPNNet, DANet, DMNet, and SETR), five classification models (ResNet34, ResNet50, DenseNet121, Swin-Transforms, and ShuffleNetV2), and state-of-the-art methods, our PCSNet model achieved high segmentation performance on CXR images. Compared to the state-of-the-art Attention-Net model, the accuracy of PCSNet increased by 0.19% (98.94% vs. 98.75%), and the boundary accuracy improved by 0.3% (97.86% vs. 97.56%), while requiring 62% fewer parameters. For pneumonia classification using the CXIP dataset, the proposed strategy outperforms the current best model by 0.14% in accuracy (98.55% vs. 98.41%). For COVID-19 classification with the COVIDx dataset, the model reached an accuracy of 97.50%, the absolute improvement in accuracy compared to CovXNet was 0.1%, and clinical metrics demonstrate more significant gains: specificity increased from 94.7% to 99.5%. These results highlight the model’s effectiveness in medical image analysis, demonstrating clinically meaningful improvements over state-of-the-art approaches. Full article
(This article belongs to the Special Issue Vision- and Image-Based Biomedical Diagnostics—2nd Edition)
Show Figures

Figure 1

19 pages, 3888 KB  
Article
Swin-GAT Fusion Dual-Stream Hybrid Network for High-Resolution Remote Sensing Road Extraction
by Hongkai Zhang, Hongxuan Yuan, Minghao Shao, Junxin Wang and Suhong Liu
Remote Sens. 2025, 17(13), 2238; https://doi.org/10.3390/rs17132238 - 29 Jun 2025
Cited by 2 | Viewed by 1078
Abstract
This paper introduces a novel dual-stream collaborative architecture for remote sensing road segmentation, designed to overcome multi-scale feature conflicts, limited dynamic adaptability, and compromised topological integrity. Our network employs a parallel “local–global” encoding scheme: the local stream uses depth-wise separable convolutions to capture [...] Read more.
This paper introduces a novel dual-stream collaborative architecture for remote sensing road segmentation, designed to overcome multi-scale feature conflicts, limited dynamic adaptability, and compromised topological integrity. Our network employs a parallel “local–global” encoding scheme: the local stream uses depth-wise separable convolutions to capture fine-grained details, while the global stream integrates a Swin-Transformer with a graph-attention module (Swin-GAT) to model long-range contextual and topological relationships. By decoupling detailed feature extraction from global context modeling, the proposed framework more faithfully represents complex road structures. Comprehensive experiments on multiple aerial datasets demonstrate that our approach outperforms conventional baselines—especially under shadow occlusion and for thin-road delineation—while achieving real-time inference at 31 FPS. Ablation studies further confirm the critical roles of the Swin Transformer and GAT components in preserving topological continuity. Overall, this dual-stream dynamic-fusion network sets a new benchmark for remote sensing road extraction and holds promise for real-world, real-time applications. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Figure 1

17 pages, 4478 KB  
Article
A Study on Generating Maritime Image Captions Based on Transformer Dual Information Flow
by Zhenqiang Zhao, Helong Shen, Meng Wang and Yufei Wang
J. Mar. Sci. Eng. 2025, 13(7), 1204; https://doi.org/10.3390/jmse13071204 - 21 Jun 2025
Cited by 1 | Viewed by 838
Abstract
The environmental perception capability of intelligent ships is essential for enhancing maritime navigation safety and advancing shipping intelligence. Image caption generation technology plays a pivotal role in this context by converting visual information into structured semantic descriptions. However, existing general purpose models often [...] Read more.
The environmental perception capability of intelligent ships is essential for enhancing maritime navigation safety and advancing shipping intelligence. Image caption generation technology plays a pivotal role in this context by converting visual information into structured semantic descriptions. However, existing general purpose models often struggle to perform effectively in complex maritime environments due to limitations in visual feature extraction and semantic modeling. To address these challenges, this study proposes a transformer dual-stream information (TDSI) model. The proposed model uses a Swin-transformer to extract grid features and combines them with fine-grained scene semantics obtained via SegFormer. A dual-encoder structure independently encodes the grid and segmentation features, which are subsequently fused through a feature fusion module for implicit integration. A decoder with a cross-attention mechanism is then employed to generate descriptive captions for maritime images. Extensive experiments were conducted using the constructed maritime semantic segmentation and maritime image captioning datasets. The results demonstrate that the proposed TDSI model outperforms existing mainstream methods in terms of several evaluation metrics, including BLEU, METEOR, ROUGE, and CIDEr. These findings confirm the effectiveness of the TDSI model in enhancing image captioning performance in maritime environments. Full article
Show Figures

Figure 1

19 pages, 4970 KB  
Article
LGFUNet: A Water Extraction Network in SAR Images Based on Multiscale Local Features with Global Information
by Xiaowei Bai, Yonghong Zhang and Jujie Wei
Sensors 2025, 25(12), 3814; https://doi.org/10.3390/s25123814 - 18 Jun 2025
Viewed by 686
Abstract
To address existing issues in water extraction from SAR images based on deep learning, such as confusion between mountain shadows and water bodies and difficulty in extracting complex boundary details for continuous water bodies, the LGFUNet model is proposed. The LGFUNet model consists [...] Read more.
To address existing issues in water extraction from SAR images based on deep learning, such as confusion between mountain shadows and water bodies and difficulty in extracting complex boundary details for continuous water bodies, the LGFUNet model is proposed. The LGFUNet model consists of three parts: the encoder–decoder, the DECASPP module, and the LGFF module. In the encoder–decoder, the Swin-Transformer module is used instead of convolution kernels for feature extraction, enhancing the learning of global information and improving the model’s ability to capture the spatial features of continuous water bodies. The DECASPP module is employed to extract and select multiscale features, focusing on complex water body boundary details. Additionally, a series of LGFF modules are inserted between the encoder and decoder to reduce the semantic gap between the encoder and decoder feature maps and the spatial information loss caused by the encoder’s downsampling process, improving the model’s ability to learn detailed information. Sentinel-1 SAR data from the Qinghai–Tibet Plateau region are selected, and the water extraction performance of the proposed LGFUNet model is compared with that of existing methods such as U-Net, Swin-UNet, and SCUNet++. The results show that the LGFUNet model achieves the best performance, respectively. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

19 pages, 3487 KB  
Article
Cross-Modal Weakly Supervised RGB-D Salient Object Detection with a Focus on Filamentary Structures
by Yifan Ding, Weiwei Chen, Guomin Zhang, Zhaoming Feng and Xuan Li
Sensors 2025, 25(10), 2990; https://doi.org/10.3390/s25102990 - 9 May 2025
Viewed by 1207
Abstract
Current weakly supervised salient object detection (SOD) methods for RGB-D images mostly rely on image-level labels and sparse annotations, which makes it difficult to completely contour object boundaries in complex scenes, especially when detecting objects with filamentary structures. To address the aforementioned issues, [...] Read more.
Current weakly supervised salient object detection (SOD) methods for RGB-D images mostly rely on image-level labels and sparse annotations, which makes it difficult to completely contour object boundaries in complex scenes, especially when detecting objects with filamentary structures. To address the aforementioned issues, we propose a novel cross-modal weakly supervised SOD framework. The framework can adequately exploit the advantages of cross-modal weak labels to generate high-quality pseudo-labels, and it can fully couple the multi-scale features of RGB and depth images for precise saliency prediction. The framework mainly consists of a cross-modal pseudo-label generation network (CPGN) and an asymmetric salient-region prediction network (ASPN). Among them, the CPGN is proposed to sufficiently leverage the precise pixel-level guidance provided by point labels and the enhanced semantic supervision provided by text labels to generate high-quality pseudo-labels, which are used to supervise the subsequent training of the ASPN. To better capture the contextual information and geometric features from RGB and depth images, the ASPN, an asymmetrically progressive network, is proposed to gradually extract multi-scale features from RGB and depth images by using the Swin-Transformer and CNN encoders, respectively. This significantly enhances the model’s ability to perceive detailed structures. Additionally, an edge constraint module (ECM) is designed to sharpen the edges of the predicted salient regions. The experimental results demonstrate that the method shows better performance in depicting salient objects, especially the filamentary structures, than other weakly supervised SOD methods. Full article
(This article belongs to the Section Optical Sensors)
Show Figures

Figure 1

19 pages, 6509 KB  
Article
Optimized Faster R-CNN with Swintransformer for Robust Multi-Class Wildfire Detection
by Sugi Choi, Sunghwan Kim and Haiyoung Jung
Fire 2025, 8(5), 180; https://doi.org/10.3390/fire8050180 - 30 Apr 2025
Cited by 2 | Viewed by 1888
Abstract
Wildfires are a critical global threat, emphasizing the need for efficient detection systems capable of identifying fires and distinguishing fire-related from non-fire events in their early stages. This study integrates the swintransformer into the Faster R-CNN backbone to overcome challenges in detecting small [...] Read more.
Wildfires are a critical global threat, emphasizing the need for efficient detection systems capable of identifying fires and distinguishing fire-related from non-fire events in their early stages. This study integrates the swintransformer into the Faster R-CNN backbone to overcome challenges in detecting small flames and smoke and distinguishing complex scenarios like fog/haze and chimney smoke. The proposed model was evaluated using a dataset comprising five classes: flames, smoke, clouds, fog/haze, and chimney smoke. Experimental results demonstrate that swintransformer-based models outperform ResNet-based Faster R-CNN models, achieving a maximum mAP50 of 0.841 with the swintransformer-based model. The model exhibited superior performance in detecting small and dynamic objects while reducing misclassification rates between similar classes, such as smoke and chimney smoke. Precision–recall analysis further validated the model’s robustness across diverse scenarios. However, slightly lower recall for specific classes and a lower FPS compared to ResNet models suggest a need for further optimization for real-time applications. This study highlights the swintransformer’s potential to enhance wildfire detection systems by addressing fire and non-fire events effectively. Future research will focus on optimizing its real-time performance and improving its recall for challenging scenarios, thereby contributing to the development of robust and reliable wildfire detection systems. Full article
Show Figures

Figure 1

25 pages, 7765 KB  
Article
A Novel Swin-Transformer with Multi-Source Information Fusion for Online Cross-Domain Bearing RUL Prediction
by Zaimi Xie, Chunmei Mo and Baozhu Jia
J. Mar. Sci. Eng. 2025, 13(5), 842; https://doi.org/10.3390/jmse13050842 - 24 Apr 2025
Viewed by 1149
Abstract
Accurate remaining useful life (RUL) prediction of rolling bearings plays a critical role in predictive maintenance. However, existing methods face challenges in extracting and fusing multi-source spatiotemporal features, addressing distribution differences between intra-domain and inter-domain features, and balancing global-local feature attention. To overcome [...] Read more.
Accurate remaining useful life (RUL) prediction of rolling bearings plays a critical role in predictive maintenance. However, existing methods face challenges in extracting and fusing multi-source spatiotemporal features, addressing distribution differences between intra-domain and inter-domain features, and balancing global-local feature attention. To overcome these limitations, this paper proposes an online cross-domain RUL prediction method based on a swin-transformer with multi-source information fusion. The method uses a Bidirectional Long Short-Term Memory (Bi-LSTM) network to capture temporal features, which are transformed into 2D images using Gramian Angular Fields (GAF) for spatial feature extraction by a 2D Convolutional Neural Network (CNN). A self-attention mechanism further integrates multi-source features, while an adversarial Multi-Kernel Maximum Mean Discrepancy (MK-MMD) combined with a relational network mitigates feature distribution differences across domains. Additionally, an offline-online swin-transformer with a dynamic weight updating strategy enhances cross-domain feature learning. Experimental results demonstrate that the proposed method significantly reduces Root Mean Square Error (RMSE) and Mean Absolute Error (MAE), outperforming public methods in prediction accuracy and robustness. Full article
(This article belongs to the Special Issue Ship Wireless Sensor)
Show Figures

Figure 1

22 pages, 9435 KB  
Article
Enhanced Liver and Tumor Segmentation Using a Self-Supervised Swin-Transformer-Based Framework with Multitask Learning and Attention Mechanisms
by Zhebin Chen, Meng Dou, Xu Luo and Yu Yao
Appl. Sci. 2025, 15(7), 3985; https://doi.org/10.3390/app15073985 - 4 Apr 2025
Cited by 2 | Viewed by 2468
Abstract
Automatic liver and tumor segmentation in contrast-enhanced magnetic resonance imaging (CE-MRI) images are of great value in clinical practice as they can reduce surgeons’ workload and increase the probability of success in surgery. However, this is still a challenging task due to the [...] Read more.
Automatic liver and tumor segmentation in contrast-enhanced magnetic resonance imaging (CE-MRI) images are of great value in clinical practice as they can reduce surgeons’ workload and increase the probability of success in surgery. However, this is still a challenging task due to the complex background, irregular shape, and low contrast between the organ and lesion. In addition, the size, number, shape, and spatial location of liver tumors vary from person to person, and existing automatic segmentation models are unable to achieve satisfactory results. In this work, drawing inspiration from self-attention mechanisms and multitask learning, we propose a segmentation network that leverages Swin-Transformer as the backbone, incorporating self-supervised learning strategies to enhance performance. In addition, accurately segmenting the boundaries and spatial location of liver tumors is the biggest challenge. To address this, we propose a multitask learning strategy based on segmentation and signed distance map (SDM), incorporating an attention gate into the skip connections. The strategy can perform liver tumor segmentation and SDM regression tasks simultaneously. The incorporation of the SDM regression branch effectively improves the detection and segmentation performance for small objects since it imposes additional shape and global constraints on the network. We performed comprehensive evaluations, both quantitative and qualitative, of our approach. The model we proposed outperforms existing state-of-the-art models in terms of DSC, 95HD, and ASD metrics. This research provides a valuable solution that lessens the burden on surgeons and improves the chances of successful surgeries. Full article
Show Figures

Figure 1

Back to TopTop