MDPI - Publisher of Open Access Journals

26 pages, 4572 KiB

Open AccessArticle

Transfer Learning-Based Ensemble of CNNs and Vision Transformers for Accurate Melanoma Diagnosis and Image Retrieval

by Murat Sarıateş and Erdal Özbay

Diagnostics 2025, 15(15), 1928; https://doi.org/10.3390/diagnostics15151928 - 31 Jul 2025

Viewed by 175

Background/Objectives: Melanoma is an aggressive type of skin cancer that poses serious health risks if not detected in its early stages. Although early diagnosis enables effective treatment, delays can result in life-threatening consequences. Traditional diagnostic processes predominantly rely on the subjective expertise [...] Read more.

Background/Objectives: Melanoma is an aggressive type of skin cancer that poses serious health risks if not detected in its early stages. Although early diagnosis enables effective treatment, delays can result in life-threatening consequences. Traditional diagnostic processes predominantly rely on the subjective expertise of dermatologists, which can lead to variability and time inefficiencies. Consequently, there is an increasing demand for automated systems that can accurately classify melanoma lesions and retrieve visually similar cases to support clinical decision-making. Methods: This study proposes a transfer learning (TL)-based deep learning (DL) framework for the classification of melanoma images and the enhancement of content-based image retrieval (CBIR) systems. Pre-trained models including DenseNet121, InceptionV3, Vision Transformer (ViT), and Xception were employed to extract deep feature representations. These features were integrated using a weighted fusion strategy and classified through an Ensemble learning approach designed to capitalize on the complementary strengths of the individual models. The performance of the proposed system was evaluated using classification accuracy and mean Average Precision (mAP) metrics. Results: Experimental evaluations demonstrated that the proposed Ensemble model significantly outperformed each standalone model in both classification and retrieval tasks. The Ensemble approach achieved a classification accuracy of 95.25%. In the CBIR task, the system attained a mean Average Precision (mAP) score of 0.9538, indicating high retrieval effectiveness. The performance gains were attributed to the synergistic integration of features from diverse model architectures through the ensemble and fusion strategies. Conclusions: The findings underscore the effectiveness of TL-based DL models in automating melanoma image classification and enhancing CBIR systems. The integration of deep features from multiple pre-trained models using an Ensemble approach not only improved accuracy but also demonstrated robustness in feature generalization. This approach holds promise for integration into clinical workflows, offering improved diagnostic accuracy and efficiency in the early detection of melanoma. Full article

(This article belongs to the Special Issue Lesion Detection and Analysis Using Artificial Intelligence, Third Edition)

► Show Figures

Figure 1

25 pages, 4296 KiB

Open AccessArticle

StripSurface-YOLO: An Enhanced Yolov8n-Based Framework for Detecting Surface Defects on Strip Steel in Industrial Environments

by Haomin Li, Huanzun Zhang and Wenke Zang

Electronics 2025, 14(15), 2994; https://doi.org/10.3390/electronics14152994 - 27 Jul 2025

Viewed by 349

Abstract

Recent advances in precision manufacturing and high-end equipment technologies have imposed ever more stringent requirements on the accuracy, real-time performance, and lightweight design of online steel strip surface defect detection systems. To reconcile the persistent trade-off between detection precision and inference efficiency in [...] Read more.

Recent advances in precision manufacturing and high-end equipment technologies have imposed ever more stringent requirements on the accuracy, real-time performance, and lightweight design of online steel strip surface defect detection systems. To reconcile the persistent trade-off between detection precision and inference efficiency in complex industrial environments, this study proposes StripSurface–YOLO, a novel real-time defect detection framework built upon YOLOv8n. The core architecture integrates an Efficient Cross-Stage Local Perception module (ResGSCSP), which synergistically combines GSConv lightweight convolutions with a one-shot aggregation strategy, thereby markedly reducing both model parameters and computational complexity. To further enhance multi-scale feature representation, this study introduces an Efficient Multi-Scale Attention (EMA) mechanism at the feature-fusion stage, enabling the network to more effectively attend to critical defect regions. Moreover, conventional nearest-neighbor upsampling is replaced by DySample, which produces deeper, high-resolution feature maps enriched with semantic content, improving both inference speed and fusion quality. To heighten sensitivity to small-scale and low-contrast defects, the model adopts Focal Loss, dynamically adjusting to sample difficulty. Extensive evaluations on the NEU-DET dataset demonstrate that StripSurface–YOLO reduces FLOPs by 11.6% and parameter count by 7.4% relative to the baseline YOLOv8n, while achieving respective improvements of 1.4%, 3.1%, 4.1%, and 3.0% in precision, recall, mAP₅₀, and mAP_50:95. Under adverse conditions—including contrast variations, brightness fluctuations, and Gaussian noise—SteelSurface-YOLO outperforms the baseline model, delivering improvements of 5.0% in mAP₅₀ and 4.7% in mAP_50:95, attesting to the model’s robust interference resistance. These findings underscore the potential of StripSurface–YOLO to meet the rigorous performance demands of real-time surface defect detection in the metal forging industry. Full article

► Show Figures

Figure 1

17 pages, 2072 KiB

Open AccessArticle

Barefoot Footprint Detection Algorithm Based on YOLOv8-StarNet

by Yujie Shen, Xuemei Jiang, Yabin Zhao and Wenxin Xie

Sensors 2025, 25(15), 4578; https://doi.org/10.3390/s25154578 - 24 Jul 2025

Viewed by 284

Abstract

This study proposes an optimized footprint recognition model based on an enhanced StarNet architecture for biometric identification in the security, medical, and criminal investigation fields. Conventional image recognition algorithms exhibit limitations in processing barefoot footprint images characterized by concentrated feature distributions and rich [...] Read more.

This study proposes an optimized footprint recognition model based on an enhanced StarNet architecture for biometric identification in the security, medical, and criminal investigation fields. Conventional image recognition algorithms exhibit limitations in processing barefoot footprint images characterized by concentrated feature distributions and rich texture patterns. To address this, our framework integrates an improved StarNet into the backbone of YOLOv8 architecture. Leveraging the unique advantages of element-wise multiplication, the redesigned backbone efficiently maps inputs to a high-dimensional nonlinear feature space without increasing channel dimensions, achieving enhanced representational capacity with low computational latency. Subsequently, an Encoder layer facilitates feature interaction within the backbone through multi-scale feature fusion and attention mechanisms, effectively extracting rich semantic information while maintaining computational efficiency. In the feature fusion part, a feature modulation block processes multi-scale features by synergistically combining global and local information, thereby reducing redundant computations and decreasing both parameter count and computational complexity to achieve model lightweighting. Experimental evaluations on a proprietary barefoot footprint dataset demonstrate that the proposed model exhibits significant advantages in terms of parameter efficiency, recognition accuracy, and computational complexity. The number of parameters has been reduced by 0.73 million, further improving the model’s speed. Gflops has been reduced by 1.5, lowering the performance requirements for computational hardware during model deployment. Recognition accuracy has reached 99.5%, with further improvements in model precision. Future research will explore how to capture shoeprint images with complex backgrounds from shoes worn at crime scenes, aiming to further enhance the model’s recognition capabilities in more forensic scenarios. Full article

(This article belongs to the Special Issue Transformer Applications in Target Tracking)

► Show Figures

Figure 1

16 pages, 2721 KiB

Open AccessArticle

An Adapter and Segmentation Network-Based Approach for Automated Atmospheric Front Detection

by Xinya Ding, Xuan Peng, Yanguang Xue, Liang Zhang, Tianying Wang and Yunpeng Zhang

Appl. Sci. 2025, 15(14), 7855; https://doi.org/10.3390/app15147855 - 14 Jul 2025

Viewed by 159

Abstract

This study presents AD-MRCNN, an advanced deep learning framework for automated atmospheric front detection that addresses two critical limitations in existing methods. First, current approaches directly input raw meteorological data without optimizing feature compatibility, potentially hindering model performance. Second, they typically only provide [...] Read more.

This study presents AD-MRCNN, an advanced deep learning framework for automated atmospheric front detection that addresses two critical limitations in existing methods. First, current approaches directly input raw meteorological data without optimizing feature compatibility, potentially hindering model performance. Second, they typically only provide frontal category information without identifying individual frontal systems. Our solution integrates two key innovations: 1. An intelligent adapter module that performs adaptive feature fusion, automatically weighting and combining multi-source meteorological inputs (including temperature, wind fields, and humidity data) to maximize their synergistic effects while minimizing feature conflicts; the utilized network achieves an average improvement of over 4% across various metrics. 2. An enhanced instance segmentation network based on Mask R-CNN architecture that simultaneously achieves (1) precise frontal type classification (cold/warm/stationary/occluded), (2) accurate spatial localization, and (3) identification of distinct frontal systems. Comprehensive evaluation using ERA5 reanalysis data (2009–2018) demonstrates significant improvements, including an 85.1% F1-score, outperforming traditional methods (TFP: 63.1%) and deep learning approaches (Unet: 83.3%), and a 31% reduction in false alarms compared to semantic segmentation methods. The framework’s modular design allows for potential application to other meteorological feature detection tasks. Future work will focus on incorporating temporal dynamics for frontal evolution prediction. Full article

► Show Figures

Figure 1

23 pages, 3614 KiB

Open AccessArticle

A Multimodal Semantic-Enhanced Attention Network for Fake News Detection

by Weijie Chen, Yuzhuo Dang and Xin Zhang

Entropy 2025, 27(7), 746; https://doi.org/10.3390/e27070746 - 12 Jul 2025

Viewed by 523

Abstract

The proliferation of social media platforms has triggered an unprecedented increase in multimodal fake news, creating pressing challenges for content authenticity verification. Current fake news detection systems predominantly rely on isolated unimodal analysis (text or image), failing to exploit critical cross-modal correlations or [...] Read more.

The proliferation of social media platforms has triggered an unprecedented increase in multimodal fake news, creating pressing challenges for content authenticity verification. Current fake news detection systems predominantly rely on isolated unimodal analysis (text or image), failing to exploit critical cross-modal correlations or leverage latent social context cues. To bridge this gap, we introduce the SCCN (Semantic-enhanced Cross-modal Co-attention Network), a novel framework that synergistically combines multimodal features with refined social graph signals. Our approach innovatively combines text, image, and social relation features through a hierarchical fusion framework. First, we extract modality-specific features and enhance semantics by identifying entities in both text and visual data. Second, an improved co-attention mechanism selectively integrates social relations while removing irrelevant connections to reduce noise and exploring latent informative links. Finally, the model is optimized via cross-entropy loss with entropy minimization. Experimental results for benchmark datasets (PHEME and Weibo) show that SCCN consistently outperforms existing approaches, achieving relative accuracy enhancements of 1.7% and 1.6% over the best-performing baseline methods in each dataset. Full article

(This article belongs to the Section Multidisciplinary Applications)

► Show Figures

Figure 1

21 pages, 12122 KiB

Open AccessArticle

RA3T: An Innovative Region-Aligned 3D Transformer for Self-Supervised Sim-to-Real Adaptation in Low-Altitude UAV Vision

by Xingrao Ma, Jie Xie, Di Shao, Aiting Yao and Chengzu Dong

Electronics 2025, 14(14), 2797; https://doi.org/10.3390/electronics14142797 - 11 Jul 2025

Viewed by 285

Abstract

Low-altitude unmanned aerial vehicle (UAV) vision is critically hindered by the Sim-to-Real Gap, where models trained exclusively on simulation data degrade under real-world variations in lighting, texture, and weather. To address this problem, we propose RA3T (Region-Aligned 3D Transformer), a novel self-supervised framework [...] Read more.

Low-altitude unmanned aerial vehicle (UAV) vision is critically hindered by the Sim-to-Real Gap, where models trained exclusively on simulation data degrade under real-world variations in lighting, texture, and weather. To address this problem, we propose RA3T (Region-Aligned 3D Transformer), a novel self-supervised framework that enables robust Sim-to-Real adaptation. Specifically, we first develop a dual-branch strategy for self-supervised feature learning, integrating Masked Autoencoders and contrastive learning. This approach extracts domain-invariant representations from unlabeled simulated imagery to enhance robustness against occlusion while reducing annotation dependency. Leveraging these learned features, we then introduce a 3D Transformer fusion module that unifies multi-view RGB and LiDAR point clouds through cross-modal attention. By explicitly modeling spatial layouts and height differentials, this component significantly improves recognition of small and occluded targets in complex low-altitude environments. To address persistent fine-grained domain shifts, we finally design region-level adversarial calibration that deploys local discriminators on partitioned feature maps. This mechanism directly aligns texture, shadow, and illumination discrepancies which challenge conventional global alignment methods. Extensive experiments on UAV benchmarks VisDrone and DOTA demonstrate the effectiveness of RA3T. The framework achieves +5.1% mAP on VisDrone and +7.4% mAP on DOTA over the 2D adversarial baseline, particularly on small objects and sparse occlusions, while maintaining real-time performance of 17 FPS at 1024 × 1024 resolution on an RTX 4080 GPU. Visual analysis confirms that the synergistic integration of 3D geometric encoding and local adversarial alignment effectively mitigates domain gaps caused by uneven illumination and perspective variations, establishing an efficient pathway for simulation-to-reality UAV perception. Full article

(This article belongs to the Special Issue Innovative Technologies and Services for Unmanned Aerial Vehicles)

► Show Figures

Figure 1

42 pages, 4946 KiB

Open AccessArticle

Enhanced AUV Autonomy Through Fused Energy-Optimized Path Planning and Deep Reinforcement Learning for Integrated Navigation and Dynamic Obstacle Detection

by Kaijie Zhang, Yuchen Ye, Kaihao Chen, Zao Li and Kangshun Li

J. Mar. Sci. Eng. 2025, 13(7), 1294; https://doi.org/10.3390/jmse13071294 - 30 Jun 2025

Viewed by 300

Abstract

Autonomous Underwater Vehicles (AUVs) operating in dynamic, constrained underwater environments demand sophisticated navigation and detection fusion capabilities that traditional methods often fail to provide. This paper introduces a novel hybrid framework that synergistically fuses a Multithreaded Energy-Optimized Batch Informed Trees (MEO-BIT*) algorithm with [...] Read more.

Autonomous Underwater Vehicles (AUVs) operating in dynamic, constrained underwater environments demand sophisticated navigation and detection fusion capabilities that traditional methods often fail to provide. This paper introduces a novel hybrid framework that synergistically fuses a Multithreaded Energy-Optimized Batch Informed Trees (MEO-BIT*) algorithm with Deep Q-Networks (DQN) to achieve robust AUV autonomy. The MEO-BIT* component delivers efficient global path planning through (1) a multithreaded batch sampling mechanism for rapid state-space exploration, (2) heuristic-driven search accelerated by KD-tree spatial indexing for optimized path discovery, and (3) an energy-aware cost function balancing path length and steering effort for enhanced endurance. Critically, the DQN component facilitates dynamic obstacle detection and adaptive local navigation, enabling the AUV to adjust its trajectory intelligently in real time. This integrated approach leverages the strengths of both algorithms. The global path intelligence of MEO-BIT* is dynamically informed and refined by the DQN’s learned perception. This allows the DQN to make effective decisions to avoid moving obstacles. Experimental validation in a simulated Achao waterway (Chile) demonstrates the MEO-BIT* + DQN system’s superiority, achieving a 46% reduction in collision rates (directly reflecting improved detection and avoidance fusion), a 15.7% improvement in path smoothness, and a 78.9% faster execution time compared to conventional RRT* and BIT* methods. This work presents a robust solution that effectively fuses two key components: the computational efficiency of MEO-BIT* and the adaptive capabilities of DQN. This fusion significantly advances the integration of navigation with dynamic obstacle detection. Ultimately, it enhances AUV operational performance and autonomy in complex maritime scenarios. Full article

(This article belongs to the Special Issue Navigation and Detection Fusion for Autonomous Underwater Vehicles)

► Show Figures

Figure 1

19 pages, 11482 KiB

Open AccessArticle

BiCA-LI: A Cross-Attention Multi-Task Deep Learning Model for Time Series Forecasting and Anomaly Detection in IDC Equipment

by Zhongxing Sun, Yuhao Zhou, Zheng Gong, Cong Wen, Zhenyu Cai and Xi Zeng

Appl. Sci. 2025, 15(13), 7168; https://doi.org/10.3390/app15137168 - 25 Jun 2025

Viewed by 374

Abstract

To accurately monitor the operational state of Internet Data Centers (IDCs) and fulfill integrated management objectives, this paper introduces a bidirectional cross-attention LSTM–Informer with uncertainty-aware multi-task learning framework (BiCA-LI) for time series analysis. The architecture employs dual-branch temporal encoders—long short-term memory (LSTM) and [...] Read more.

To accurately monitor the operational state of Internet Data Centers (IDCs) and fulfill integrated management objectives, this paper introduces a bidirectional cross-attention LSTM–Informer with uncertainty-aware multi-task learning framework (BiCA-LI) for time series analysis. The architecture employs dual-branch temporal encoders—long short-term memory (LSTM) and Informer—to extract local transient dynamics and global long-term dependencies, respectively. A bidirectional cross-attention module is subsequently designed to synergistically fuse multi-scale temporal representations. Finally, task-specific regression and classification heads generate predictive outputs and anomaly detection results, while an uncertainty-aware dynamic loss weighting strategy adaptively balances task-specific gradients during training. Experimental results validate BiCA-LI’s superior performance across dual objectives. In regression tasks, it achieves an MAE of 0.086, MSE of 0.014, and RMSE of 0.117. For classification, the model attains 99.5% accuracy, 100% precision, and an AUC score of 0.950, demonstrating substantial improvements over standalone LSTM and Informer baselines. The dual-encoder design, coupled with cross-modal attention fusion and gradient-aware loss optimization, enables robust joint modeling of heterogeneous temporal patterns. This methodology establishes a scalable paradigm for intelligent IDC operations, enabling real-time anomaly mitigation and resource orchestration in energy-intensive infrastructures. Full article

► Show Figures

Figure 1

19 pages, 2410 KiB

Open AccessArticle

MAK-Net: A Multi-Scale Attentive Kolmogorov–Arnold Network with BiGRU for Imbalanced ECG Arrhythmia Classification

by Cong Zhao, Bingwei Lai, Yongzheng Xu, Yiping Wang and Haorong Dong

Sensors 2025, 25(13), 3928; https://doi.org/10.3390/s25133928 - 24 Jun 2025

Viewed by 558

Abstract

Accurate classification of electrocardiogram (ECG) signals is vital for reliable arrhythmia diagnosis and informed clinical decision-making, yet real-world datasets often suffer severe class imbalance that degrades recall and F1-score. To address these limitations, we introduce MAK-Net, a hybrid deep learning framework that combines: [...] Read more.

Accurate classification of electrocardiogram (ECG) signals is vital for reliable arrhythmia diagnosis and informed clinical decision-making, yet real-world datasets often suffer severe class imbalance that degrades recall and F1-score. To address these limitations, we introduce MAK-Net, a hybrid deep learning framework that combines: (1) a four-branch multiscale convolutional module for comprehensive feature extraction across diverse waveform morphologies; (2) an efficient channel attention mechanism for adaptive weighting of clinically salient segments; (3) bidirectional gated recurrent units (BiGRU) to capture long-range temporal dependencies; and (4) Kolmogorov–Arnold Network (KAN) layers with learnable spline activations for enhanced nonlinear representation and interpretability. We further mitigate imbalance by synergistically applying focal loss and the Synthetic Minority Oversampling Technique (SMOTE). On the MIT-BIH arrhythmia database, MAK-Net attains state-of-the-art performance—0.9980 accuracy, 0.9888 F1-score, 0.9871 recall, 0.9905 precision, and 0.9991 specificity—demonstrating superior robustness to imbalanced classes compared with existing methods. These findings validate the efficacy of multiscale feature fusion, attention-guided learning, and KAN-based nonlinear mapping for automated, clinically reliable arrhythmia detection. Full article

(This article belongs to the Section Biomedical Sensors)

► Show Figures

Figure 1

27 pages, 12000 KiB

Open AccessArticle

Multi-Model Synergistic Satellite-Derived Bathymetry Fusion Approach Based on Mamba Coral Reef Habitat Classification

by Xuechun Zhang, Yi Ma, Feifei Zhang, Zhongwei Li and Jingyu Zhang

Remote Sens. 2025, 17(13), 2134; https://doi.org/10.3390/rs17132134 - 21 Jun 2025

Viewed by 388

Abstract

As fundamental geophysical information, the high-precision detection of shallow water bathymetry is critical data support for the utilization of island resources and coral reef protection delimitation. In recent years, the combination of active and passive remote sensing technologies has led to a revolutionary [...] Read more.

As fundamental geophysical information, the high-precision detection of shallow water bathymetry is critical data support for the utilization of island resources and coral reef protection delimitation. In recent years, the combination of active and passive remote sensing technologies has led to a revolutionary breakthrough in satellite-derived bathymetry (SDB). Optical SDB extracts bathymetry by quantifying light–water–bottom interactions. Therefore, the apparent differences in the reflectance of different bottom types in specific wavelength bands are a core component of SDB. In this study, refined classification was performed for complex seafloor sediment and geomorphic features in coral reef habitats. A multi-model synergistic SDB fusion approach constrained by coral reef habitat classification based on the deep learning framework Mamba was constructed. The dual error of the global single model was suppressed by exploiting sediment and geomorphic partitions, as well as the accuracy complementarity of different models. Based on multispectral remote sensing imagery Sentinel-2 and the Ice, Cloud, and Land Elevation Satellite-2 (ICESat-2) active spaceborne lidar bathymetry data, wide-range and high-accuracy coral reef habitat classification results and bathymetry information were obtained for the Yuya Shoal (0–23 m) and Niihau Island (0–40 m). The results showed that the overall Mean Absolute Errors (MAEs) in the two study areas were 0.2 m and 0.5 m and the Mean Absolute Percentage Errors (MAPEs) were 9.77% and 6.47%, respectively. And R² reached 0.98 in both areas. The estimated error of the SDB fusion strategy based on coral reef habitat classification was reduced by more than 90% compared with classical SDB models and a single machine learning method, thereby improving the capability of SDB in complex geomorphic ocean areas. Full article

(This article belongs to the Section Remote Sensing in Geology, Geomorphology and Hydrology)

► Show Figures

Figure 1

24 pages, 6594 KiB

Open AccessArticle

GAT-Enhanced YOLOv8_L with Dilated Encoder for Multi-Scale Space Object Detection

by Haifeng Zhang, Han Ai, Donglin Xue, Zeyu He, Haoran Zhu, Delian Liu, Jianzhong Cao and Chao Mei

Remote Sens. 2025, 17(13), 2119; https://doi.org/10.3390/rs17132119 - 20 Jun 2025

Viewed by 474

Abstract

The problem of inadequate object detection accuracy in complex remote sensing scenarios has been identified as a primary concern. Traditional YOLO-series algorithms encounter challenges such as poor robustness in small object detection and significant interference from complex backgrounds. In this paper, a multi-scale [...] Read more.

The problem of inadequate object detection accuracy in complex remote sensing scenarios has been identified as a primary concern. Traditional YOLO-series algorithms encounter challenges such as poor robustness in small object detection and significant interference from complex backgrounds. In this paper, a multi-scale feature fusion framework based on an improved version of YOLOv8_L is proposed. The combination of a graph attention network (GAT) and Dilated Encoder network significantly improves the algorithm detection and recognition performance for space remote sensing objects. It mainly includes abandoning the original Feature Pyramid Network (FPN) structure, proposing an adaptive fusion strategy based on multi-level features of backbone network, enhancing the expression ability of multi-scale objects through upsampling and feature stacking, and reconstructing the FPN. The local features extracted by convolutional neural networks are mapped to graph-structured data, and the nodal attention mechanism of GAT is used to capture the global topological association of space objects, which makes up for the deficiency of the convolutional operation in weight allocation and realizes GAT integration. The Dilated Encoder network is introduced to cover different-scale targets by differentiating receptive fields, and the feature weight allocation is optimized by combining it with a Convolutional Block Attention Module (CBAM). According to the characteristics of space missions, an annotated dataset containing 8000 satellite and space station images is constructed, covering a variety of lighting, attitude and scale scenes, and providing benchmark support for model training and verification. Experimental results on the space object dataset reveal that the enhanced algorithm achieves a mean average precision (mAP) of 97.2%, representing a 2.1% improvement over the original YOLOv8_L. Comparative experiments with six other models demonstrate that the proposed algorithm outperforms its counterparts. Ablation studies further validate the synergistic effect between the graph attention network (GAT) and the Dilated Encoder. The results indicate that the model maintains a high detection accuracy under challenging conditions, including strong light interference, multi-scale variations, and low-light environments. Full article

(This article belongs to the Special Issue Remote Sensing Image Thorough Analysis by Advanced Machine Learning)

► Show Figures

Figure 1

36 pages, 26627 KiB

Open AccessArticle

NSA-CHG: An Intelligent Prediction Framework for Real-Time TBM Parameter Optimization in Complex Geological Conditions

by Youliang Chen, Wencan Guan, Rafig Azzam and Siyu Chen

AI 2025, 6(6), 127; https://doi.org/10.3390/ai6060127 - 16 Jun 2025

Viewed by 1608

Abstract

This study proposes an intelligent prediction framework integrating native sparse attention (NSA) with the Chen-Guan (CHG) algorithm to optimize tunnel boring machine (TBM) operations in heterogeneous geological environments. The framework resolves critical limitations of conventional experience-driven approaches that inadequately address the nonlinear coupling [...] Read more.

This study proposes an intelligent prediction framework integrating native sparse attention (NSA) with the Chen-Guan (CHG) algorithm to optimize tunnel boring machine (TBM) operations in heterogeneous geological environments. The framework resolves critical limitations of conventional experience-driven approaches that inadequately address the nonlinear coupling between the spatial heterogeneity of rock mass parameters and mechanical system responses. Three principal innovations are introduced: (1) a hardware-compatible sparse attention architecture achieving O(n) computational complexity while preserving high-fidelity geological feature extraction capabilities; (2) an adaptive kernel function optimization mechanism that reduces confidence interval width by 41.3% through synergistic integration of boundary likelihood-driven kernel selection with Chebyshev inequality-based posterior estimation; and (3) a physics-enhanced modelling methodology combining non-Hertzian contact mechanics with eddy field evolution equations. Validation experiments employing field data from the Pujiang Town Plot 125-2 Tunnel Project demonstrated superior performance metrics, including 92.4% ± 1.8% warning accuracy for fractured zones, ≤28 ms optimization response time, and ≤4.7% relative error in energy dissipation analysis. Comparative analysis revealed a 32.7% reduction in root mean square error (p < 0.01) and 4.8-fold inference speed acceleration relative to conventional methods, establishing a novel data–physics fusion paradigm for TBM control with substantial implications for intelligent tunnelling in complex geological formations. Full article

► Show Figures

Figure 1

19 pages, 3735 KiB

Open AccessArticle

Hybrid Hydrological Forecasting Through a Physical Model and a Weather-Informed Transformer Model: A Case Study in Greek Watershed

by Haris Ampas, Ioannis Refanidis and Vasilios Ampas

Appl. Sci. 2025, 15(12), 6679; https://doi.org/10.3390/app15126679 - 13 Jun 2025

Viewed by 1039

Abstract

This study explores a hybrid AI framework for streamflow forecasting that integrates physically based hydrological modeling, bias correction, and deep learning. HEC-HMS simulations generate synthetic discharge, which a machine learning-based bias correction model adjusts for irrigation-induced discrepancies—improving the Nash–Sutcliffe Efficiency (NSE) from 0.55 [...] Read more.

This study explores a hybrid AI framework for streamflow forecasting that integrates physically based hydrological modeling, bias correction, and deep learning. HEC-HMS simulations generate synthetic discharge, which a machine learning-based bias correction model adjusts for irrigation-induced discrepancies—improving the Nash–Sutcliffe Efficiency (NSE) from 0.55 to 0.84, the Kling–Gupta Efficiency (KGE) from 0.67 to 0.89, and reducing the RMSE from 1.084 to 0.301 m³/s. The corrected discharge is used as input to a Temporal Fusion Transformer (TFT) trained on hourly meteorological data to predict streamflow at 24-, 48-, and 72-h horizons. In a semi-arid, irrigated basin in Northern Greece, the TFT achieves NSEs of 0.84, 0.78, and 0.71 and RMSEs of 0.301, 0.743, and 0.980 m³/s, respectively. Probabilistic forecasts deliver uncertainty bounds with coverage near nominal levels. In addition, the model’s built-in interpretability reveals temporal and meteorological influences—such as precipitation—that enhance predictive performance. This framework demonstrates the synergistic benefits of combining physically based modeling with state-of-the-art deep learning to support robust, multi-horizon forecasts in irrigation-influenced, data-scarce environments. Full article

(This article belongs to the Special Issue Innovative Artificial Intelligence Methods, Tools and Methodologies to Address Challenging Real-World Problems)

► Show Figures

Figure 1

39 pages, 2810 KiB

Open AccessReview

A Survey of Deep Learning-Driven 3D Object Detection: Sensor Modalities, Technical Architectures, and Applications

by Xiang Zhang, Hai Wang and Haoran Dong

Sensors 2025, 25(12), 3668; https://doi.org/10.3390/s25123668 - 11 Jun 2025

Viewed by 1734

Abstract

This review presents a comprehensive survey on deep learning-driven 3D object detection, focusing on the synergistic innovation between sensor modalities and technical architectures. Through a dual-axis “sensor modality–technical architecture” classification framework, it systematically analyzes detection methods based on RGB cameras, LiDAR, and multimodal [...] Read more.

This review presents a comprehensive survey on deep learning-driven 3D object detection, focusing on the synergistic innovation between sensor modalities and technical architectures. Through a dual-axis “sensor modality–technical architecture” classification framework, it systematically analyzes detection methods based on RGB cameras, LiDAR, and multimodal fusion. From the sensor perspective, the study reveals the evolutionary paths of monocular depth estimation optimization, LiDAR point cloud processing from voxel-based to pillar-based modeling, and three-level cross-modal fusion paradigms (data-level alignment, feature-level interaction, and result-level verification). Regarding technical architectures, the paper examines structured representation optimization in traditional convolutional networks, spatiotemporal modeling breakthroughs in bird’s-eye view (BEV) methods, voxel-level modeling advantages of occupancy networks for irregular objects, and dynamic scene understanding capabilities of temporal fusion architectures. The applications in autonomous driving and agricultural robotics are discussed, highlighting future directions including depth perception enhancement, open-scene modeling, and lightweight deployment to advance 3D perception systems toward higher accuracy and stronger generalization. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

20 pages, 25324 KiB

Open AccessArticle

DGSS-YOLOv8s: A Real-Time Model for Small and Complex Object Detection in Autonomous Vehicles

by Siqiang Cheng, Lingshan Chen and Kun Yang

Algorithms 2025, 18(6), 358; https://doi.org/10.3390/a18060358 - 11 Jun 2025

Viewed by 1413

Abstract

Object detection in complex road scenes is vital for autonomous driving, facing challenges such as object occlusion, small target sizes, and irregularly shaped targets. To address these issues, this paper introduces DGSS-YOLOv8s, a model designed to enhance detection accuracy and high-FPS performance within [...] Read more.

Object detection in complex road scenes is vital for autonomous driving, facing challenges such as object occlusion, small target sizes, and irregularly shaped targets. To address these issues, this paper introduces DGSS-YOLOv8s, a model designed to enhance detection accuracy and high-FPS performance within the You Only Look Once version 8 small (YOLOv8s) framework. The key innovation lies in the synergistic integration of several architectural enhancements: the DCNv3_LKA_C2f module, leveraging Deformable Convolution v3 (DCNv3) and Large Kernel Attention (LKA) for better the capture of complex object shapes; an Optimized Feature Pyramid Network structure (Optimized-GFPN) for improved multi-scale feature fusion; the Detect_SA module, incorporating spatial Self-Attention (SA) at the detection head for broader context awareness; and an Inner-Shape Intersection over Union (IoU) loss function to improve bounding box regression accuracy. These components collectively target the aforementioned challenges in road environments. Evaluations on the Berkeley DeepDrive 100K (BDD100K) and Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) datasets demonstrate the model’s effectiveness. Compared to baseline YOLOv8s, DGSS-YOLOv8s achieves mean Average Precision (mAP)@50 improvements of 2.4% (BDD100K) and 4.6% (KITTI). Significant gains were observed for challenging categories, notably 87.3% mAP@50 for cyclists on KITTI, and small object detection (AP-small) improved by up to 9.7% on KITTI. Crucially, DGSS-YOLOv8s achieved high processing speeds suitable for autonomous driving, operating at 103.1 FPS (BDD100K) and 102.5 FPS (KITTI) on an NVIDIA GeForce RTX 4090 GPU. These results highlight that DGSS-YOLOv8s effectively balances enhanced detection accuracy for complex scenarios with high processing speed, demonstrating its potential for demanding autonomous driving applications. Full article

(This article belongs to the Special Issue Advances in Computer Vision: Emerging Trends and Applications)

► Show Figures

Figure 1

Search Results (51)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (51)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI