Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (347)

Search Parameters:
Keywords = depth map fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
0 pages, 2669 KB  
Article
Multimodal Guidewire 3D Reconstruction Based on Magnetic Field Data
by Wenbin Jiang, Qian Zheng, Dong Yang, Jiaqian Li and Wei Wei
Sensors 2026, 26(2), 545; https://doi.org/10.3390/s26020545 - 13 Jan 2026
Viewed by 72
Abstract
Accurate 3D reconstruction of guidewires is crucial in minimally invasive surgery and interventional procedures. Traditional biplanar X-ray–based reconstruction methods can achieve reasonable accuracy but involve high radiation doses, limiting their clinical applicability; meanwhile, single-view images inherently lack reliable depth cues. To address these [...] Read more.
Accurate 3D reconstruction of guidewires is crucial in minimally invasive surgery and interventional procedures. Traditional biplanar X-ray–based reconstruction methods can achieve reasonable accuracy but involve high radiation doses, limiting their clinical applicability; meanwhile, single-view images inherently lack reliable depth cues. To address these issues, this paper proposes a multimodal guidewire 3D reconstruction approach that integrates magnetic field information. The method first employs the MiDaS v3 network to estimate an initial depth map from a single image and then incorporates tri-axial magnetic field measurements to enrich and refine the spatial information. To effectively fuse the two modalities, we design a multi-stage strategy combining nearest-neighbor matching (KNN) with a cross-modal attention mechanism (Cross-Attention), enabling accurate alignment and fusion of image and magnetic features. The fused representation is subsequently fed into a PointNet-based regressor to generate the final 3D coordinates of the guidewire. Experimental results demonstrate that our method achieves a root-mean-square error of 2.045 mm, a mean absolute error of 1.738 mm, and a z-axis MAE of 0.285 mm on the test set. These findings indicate that the proposed multimodal framework improves 3D reconstruction accuracy under single-view imaging and offers enhanced visualization support for interventional procedures. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

20 pages, 4726 KB  
Article
Enhancing SeeGround with Relational Depth Text for 3D Visual Grounding
by Hyun-Sik Jeon, Seong-Hui Kang and Jong-Eun Ha
Appl. Sci. 2026, 16(2), 652; https://doi.org/10.3390/app16020652 - 8 Jan 2026
Viewed by 140
Abstract
Three-dimensional visual grounding is a core technology that identifies specific objects within complex 3D scenes based on natural language instructions, enhancing human–machine interactions in robotics and augmented reality domains. Traditional approaches have focused on supervised learning, which relies on annotated data; however, zero-shot [...] Read more.
Three-dimensional visual grounding is a core technology that identifies specific objects within complex 3D scenes based on natural language instructions, enhancing human–machine interactions in robotics and augmented reality domains. Traditional approaches have focused on supervised learning, which relies on annotated data; however, zero-shot methodologies are emerging due to the high costs of data construction and limitations in generalization. SeeGround achieves state-of-the-art performance by integrating 2D rendered images and spatial text descriptions. Nevertheless, SeeGround exhibits vulnerabilities in clearly discerning relative depth relationships owing to its implicit depth representations in 2D views. This study proposes the relational depth text (RDT) technique to overcome these limitations, utilizing a Monocular Depth Estimation model to extract depth maps from rendered 2D images and applying the K-Nearest Neighbors algorithm to convert inter-object relative depth relations into natural language descriptions, thereby incorporating them into Vision–Language Model (VLM) prompts. This method distinguishes itself by augmenting spatial reasoning capabilities while preserving SeeGround’s existing pipeline, demonstrating a 3.54% improvement in the Acc@0.25 metric on the Nr3D dataset in a 7B VLM environment that is approximately 10.3 times lighter than the original model, along with a 6.74% increase in Unique cases on the ScanRefer dataset, albeit with a 1.70% decline in Multiple cases. The proposed technique enhances the robustness of grounding through viewpoint anchoring and candidate discrimination in complex query scenarios, and is expected to improve efficiency in practical applications through future multi-view fusion and conditional execution optimizations. Full article
(This article belongs to the Special Issue Advances in Computer Graphics and 3D Technologies)
Show Figures

Figure 1

19 pages, 17699 KB  
Article
Research on a Method for Identifying and Localizing Goji Berries Based on Binocular Stereo Vision Technology
by Juntao Shi, Changyong Li, Zehui Zhao and Shunchun Zhang
AgriEngineering 2026, 8(1), 6; https://doi.org/10.3390/agriengineering8010006 - 1 Jan 2026
Viewed by 242
Abstract
To address the issue of low depth estimation accuracy in complex goji berry orchards, this paper proposes a method for identifying and locating goji berries that combines the YOLO-VitBiS object detection network with stereo vision technology. Based on the YOLO11n backbone network, the [...] Read more.
To address the issue of low depth estimation accuracy in complex goji berry orchards, this paper proposes a method for identifying and locating goji berries that combines the YOLO-VitBiS object detection network with stereo vision technology. Based on the YOLO11n backbone network, the C3K2 module in the backbone is first improved using the AdditiveBlock module to enhance its detail-capturing capability in complex environments. The AdditiveBlock introduces lightweight long-range interactions via residual additive operations, thereby strengthening global context modeling without significantly increasing computation. Subsequently, a weighted bidirectional feature pyramid network is introduced into the Neck to enable more flexible and efficient feature fusion. Finally, a lightweight shared detail-enhanced detection head is proposed to further reduce the network’s computational complexity and parameter count. The enhanced model is integrated with binocular stereo vision technology, employing the CREStereo depth estimation algorithm for disparity calculation during binocular stereo matching to derive the three-dimensional spatial coordinates of the goji berry target. This approach enables efficient and precise positioning. Experimental results demonstrate that the YOLO-VitBiS model achieves a detection accuracy of 96.6%, with a model size of 4.3MB and only 1.856M parameters. Compared to the traditional SGBM method and other deep learning approaches such as UniMatch, the CREStereo algorithm generates superior depth maps under complex conditions. Within a distance range of 400 mm to 1000 mm, the average relative error between the estimated and actual depth measurements is 2.42%, meeting the detection and ranging accuracy requirements for field operations and providing reliable recognition and localization support for subsequent goji berry harvesting robots. Full article
Show Figures

Figure 1

31 pages, 10819 KB  
Article
Research on High-Precision Localization Method of Curved Surface Feature Points Based on RGB-D Data Fusion
by Enguo Wang, Rui Zou and Chengzhi Su
Sensors 2026, 26(1), 137; https://doi.org/10.3390/s26010137 - 25 Dec 2025
Viewed by 274
Abstract
Although RGB images contain rich details, they lack 3D depth information. Depth data, while providing spatial positioning, is often affected by noise and suffers from sparsity or missing data at key feature points, leading to low accuracy and high computational complexity in traditional [...] Read more.
Although RGB images contain rich details, they lack 3D depth information. Depth data, while providing spatial positioning, is often affected by noise and suffers from sparsity or missing data at key feature points, leading to low accuracy and high computational complexity in traditional visual localization. To address this, this paper proposes a high-precision, sub-pixel-level localization method for workpiece feature points based on RGB-D data fusion. The method specifically targets two types of localization objects: planar corner keypoints and sharp-corner keypoints. It employs the YOLOv10 model combined with a Background Misdetection Filtering Module (BMFM) to classify and identify feature points in RGB images. An improved Prewitt operator (using 5 × 5 convolution kernels in 8 directions) and sub-pixel refinement techniques are utilized to enhance 2D localization accuracy. The 2D feature boundaries are then mapped into 3D point cloud space based on camera extrinsic parameters. After coarse error detection in the point cloud and local quadric surface fitting, 3D localization is achieved by intersecting spatial rays with the fitted surfaces. Experimental results demonstrate that the proposed method achieves a mean absolute error (MAE) of 0.17 mm for localizing flat, free-form, and grooved components, with a maximum error of less than 0.22 mm, meeting the requirements of high-precision industrial applications such as precision manufacturing and quality inspection. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

29 pages, 5902 KB  
Article
MSLCP-DETR: A Multi-Scale Linear Attention and Sparse Fusion Framework for Infrared Small Target Detection in Vehicle-Mounted Systems
by Fu Li, Meimei Zhu, Ming Zhao, Yuxin Sun and Wangyu Wu
Mathematics 2026, 14(1), 67; https://doi.org/10.3390/math14010067 - 24 Dec 2025
Viewed by 224
Abstract
Detecting small infrared targets in vehicle-mounted systems remains challenging due to weak thermal radiation, cross-scale feature loss, and dynamic background interference. To address these issues, this paper proposes MSLCP-DETR, an enhanced RT-DETR-based framework that integrates multi-scale linear attention and sparse fusion mechanisms. The [...] Read more.
Detecting small infrared targets in vehicle-mounted systems remains challenging due to weak thermal radiation, cross-scale feature loss, and dynamic background interference. To address these issues, this paper proposes MSLCP-DETR, an enhanced RT-DETR-based framework that integrates multi-scale linear attention and sparse fusion mechanisms. The model introduces three novel components: a Multi-Scale Linear Attention Encoder (MSLA-AIFI), which combines multi-branch depth-wise convolution with linear attention to efficiently capture cross-scale features while reducing computational complexity; a Cross-Scale Small Object Feature Optimization module (CSOFO), which enhances the localization of small targets in dense scenes through spatial rearrangement and dynamic modeling; and a Pyramid Sparse Transformer (PST), which replaces traditional dense fusion with a dual-branch sparse attention mechanism to improve both accuracy and real-time performance. Extensive experiments on the M3FD and FLIR datasets demonstrate that MSLCP-DETR achieves an excellent balance between accuracy and efficiency, with its precision, mAP@50, and mAP@50:95 reaching 90.3%, 79.5%, and 86.0%, respectively. Ablation studies and visual analysis further validate the effectiveness of the proposed modules and the overall design strategy. Full article
Show Figures

Figure 1

13 pages, 3595 KB  
Article
Study on the Application of Machine Learning of Melt Pool Geometries in Silicon Steel Fabricated by Powder Bed Fusion
by Ho Sung Jang, Sujeong Kim, Jong Bae Jeon, Donghwi Kim, Yoon Suk Choi and Sunmi Shin
Materials 2026, 19(1), 68; https://doi.org/10.3390/ma19010068 - 24 Dec 2025
Viewed by 465
Abstract
In this study, regression-based machine learning models were developed to predict the melt pool width and depth formed during the Laser Powder Bed Fusion (LPBF) process for Fe-3.4Si and Fe-6Si alloys. Based on experimentally obtained melt pool width and depth data, a total [...] Read more.
In this study, regression-based machine learning models were developed to predict the melt pool width and depth formed during the Laser Powder Bed Fusion (LPBF) process for Fe-3.4Si and Fe-6Si alloys. Based on experimentally obtained melt pool width and depth data, a total of 11 regression models were trained and evaluated, and hyperparameters were optimized via Bayesian optimization. Key process parameters were identified through data preprocessing and feature engineering, and SHAP analysis confirmed that the input energy had the strongest influence on both melt pool width and depth. The comparison of prediction performance revealed that the support vector regressor with a linear kernel (SVR_lin) exhibited the best performance for predicting melt pool width, while the multilayer perceptron (MLP) model achieved the best results for predicting melt pool depth. Based on these trained models, a power–velocity (P-V) process map was constructed, incorporating boundary conditions such as the overlap ratio and the melt pool morphology. The optimal input energy range was derived as 0.45 to 0.60 J/mm, ensuring stable melt pool formation. Specimens manufactured under the derived conditions were analyzed using 3D X-ray CT, revealing porosity levels ranging from 0.29% to 2.89%. In particular, the lowest porosity was observed under conduction mode conditions when the melt pool depth was approximately 1.0 to 1.5 times the layer thickness. Conversely, porosity tended to increase in the transition mode and lack of fusion regions, consistent with the model predictions. Therefore, this study demonstrated that a machine learning-based regression model can reliably predict melt pool characteristics in the LPBF process of Fe-Si alloys, contributing to the development of process maps and optimization strategies. Full article
(This article belongs to the Special Issue Intelligent Processing Technology of Materials)
Show Figures

Graphical abstract

25 pages, 4823 KB  
Article
Improving Shielding Gas Flow Distribution to Enhance Quality and Consistency in Metal Laser Powder Bed Fusion Processes
by H. Hugo Estrada Medinilla, Christopher J. Elkins, Jorge Mireles, Andres Estrada and Ryan B. Wicker
J. Manuf. Mater. Process. 2026, 10(1), 3; https://doi.org/10.3390/jmmp10010003 - 23 Dec 2025
Viewed by 581
Abstract
Shielding gas flow in metal Laser Powder Bed Fusion (PBF-LB/M) removes ejecta and byproducts from the build plate and the optical path, preventing laser interference and loss of part quality. Previous research conducted on an EOS M290 used Magnetic Resonance Velocimetry (MRV) to [...] Read more.
Shielding gas flow in metal Laser Powder Bed Fusion (PBF-LB/M) removes ejecta and byproducts from the build plate and the optical path, preventing laser interference and loss of part quality. Previous research conducted on an EOS M290 used Magnetic Resonance Velocimetry (MRV) to resolve the three-component, three-dimensional flow field and identified a region of recirculation below the lower vent. The present work demonstrates the correction of this recirculation through practical chamber modifications: raising the build platform and optical assembly, and redesigning the recoater and the lower inlet to reflect the new build plate position. MRV was leveraged to generate flow distribution maps and velocity profiles of the modified configuration, showing a marked change in the overall flow field. Plate scans across the build area characterized the impact of gas flow improvements on process response. Specimens from the original configuration showed progressively shallower melt pools toward the vent, whereas those from the modified configuration exhibited a ~10% higher average melt pool depth in the region most affected by prior recirculation. Qualification artifacts built under both conditions provided preliminary evidence of improved part performance via enhanced gas flow distribution. These results highlight potential benefits of uniform gas flow distribution across the build plate through simple EOS M290 chamber modifications. Full article
(This article belongs to the Special Issue Progress and Perspectives in Metal Laser Additive Manufacturing)
Show Figures

Graphical abstract

25 pages, 6176 KB  
Article
Audiovisual Brain Activity Recognition Based on Symmetric Spatio-Temporal–Frequency Feature Association Vectors
by Yang Xi, Lu Zhang, Chenxue Wu, Bingjie Shi and Cunzhen Li
Symmetry 2025, 17(12), 2175; https://doi.org/10.3390/sym17122175 - 17 Dec 2025
Viewed by 233
Abstract
The neural mechanisms of auditory and visual processing are not only a core research focus in cognitive neuroscience but also hold critical importance for the development of brain–computer interfaces, neurological disease diagnosis, and human–computer interaction technologies. However, EEG-based studies on classifying auditory and [...] Read more.
The neural mechanisms of auditory and visual processing are not only a core research focus in cognitive neuroscience but also hold critical importance for the development of brain–computer interfaces, neurological disease diagnosis, and human–computer interaction technologies. However, EEG-based studies on classifying auditory and visual brain activities largely overlook the in-depth utilization of spatial distribution patterns and frequency-specific characteristics inherent in such activities. This paper proposes an analytical framework that constructs symmetrical spatio-temporal–frequency feature association vectors to represent brain activities by computing EEG microstates across multiple frequency bands and brain functional connectivity networks. Then we construct an Adaptive Tensor Fusion Network (ATFN) that leverages feature association vectors to recognize brain activities related to auditory, visual, and audiovisual processing. The ATFN includes a feature fusion and selection module based on differential feature enhancement, a feature encoding module enhanced with attention mechanisms, and a classifier based on a multilayer perceptron to achieve the efficient recognition of audiovisual brain activities. The feature association vectors are then processed by the Adaptive Tensor Fusion Network (ATFN) to efficiently recognize different types of audiovisual brain activities. The results show that the classification accuracy for auditory, visual, and audiovisual brain activity reaches 96.97% using the ATFN, demonstrating that the proposed symmetric spatio-temporal–frequency feature association vectors effectively characterize visual, auditory, and audiovisual brain activities. The symmetrical spatio-temporal–frequency feature association vectors establish a computable mapping that captures the intrinsic correlations among temporal, spatial, and frequency features, offering a more interpretable method to represent brain activities. The proposed ATFN provides an effective recognition framework for brain activity, with a potential application for brain–computer interfaces and neurological disease diagnosis. Full article
Show Figures

Figure 1

22 pages, 1188 KB  
Article
EFDepth: A Monocular Depth Estimation Model for Multi-Scale Feature Optimization
by Fengchun Liu, Xinying Shao, Chunying Zhang, Liya Wang, Lu Liu and Jing Ren
Sensors 2025, 25(23), 7379; https://doi.org/10.3390/s25237379 - 4 Dec 2025
Viewed by 595
Abstract
To address the accuracy issues in monocular depth estimation caused by insufficient feature extraction and inadequate context modeling, a multi-scale feature optimization model named EFDepth was proposed to improve prediction performance. This framework adopted an encoder–decoder structure: the encoder (EC-Net) was composed of [...] Read more.
To address the accuracy issues in monocular depth estimation caused by insufficient feature extraction and inadequate context modeling, a multi-scale feature optimization model named EFDepth was proposed to improve prediction performance. This framework adopted an encoder–decoder structure: the encoder (EC-Net) was composed of MobileNetV3-E and ETFBlock, and its features were optimized through multi-scale dilated convolution; the decoder (LapFA-Net) combined the Laplacian pyramid and the FMA module to enhance cross-scale feature fusion and output accurate depth maps. Comparative experiments between EFDepth and algorithms including Lite-mono, Hr-depth, and Lapdepth were conducted on the KITTI datasets. The results show that, for the three error metrics—RMSE (Root Mean Square Error), AbsRel (Absolute Relative Error), and SqRel (Squared Relative Error)—EFDepth is 1.623, 0.030, and 0.445 lower than the average values of the comparison algorithms, respectively, and for the three accuracy metrics, it is 0.052, 0.023, and 0.011 higher than the average values of the comparison algorithms, respectively. Experimental results indicate that EFDepth outperforms the comparison methods in most metrics, providing an effective reference for monocular depth estimation and 3D reconstruction of complex scenes. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

20 pages, 26260 KB  
Article
AFMNet: A Dual-Domain Collaborative Network with Frequency Prior Guidance for Low-Light Image Enhancement
by Qianqian An and Long Ma
Entropy 2025, 27(12), 1220; https://doi.org/10.3390/e27121220 - 1 Dec 2025
Viewed by 477
Abstract
Low-light image enhancement (LLIE) degradation arises from insufficient illumination, reflectance occlusion, and noise coupling, and it manifests in the frequency domain as suppressed amplitudes with relatively stable phases. To address the fact that pure spatial mappings struggle to balance brightness enhancement and detail [...] Read more.
Low-light image enhancement (LLIE) degradation arises from insufficient illumination, reflectance occlusion, and noise coupling, and it manifests in the frequency domain as suppressed amplitudes with relatively stable phases. To address the fact that pure spatial mappings struggle to balance brightness enhancement and detail fidelity, whereas pure frequency-domain processing lacks semantic modeling, we propose AFMNet—a dual-domain collaborative enhancement network guided by an information-theoretic frequency prior. This prior regularizes global illumination, while spatial branches restore local details. First, a Multi-Scale Amplitude Estimator (MSAE) adaptively generates fine-grained amplitude-modulation maps via multi-scale fusion, encouraging higher output entropy through adaptive spectral-energy redistribution. Next, a Dual-Branch Spectral–Spatial Attention (DBSSA) module—comprising a Frequency-Modulated Attention Block (FMAB) and a Scale-Variable Depth Attention Block (SVDAB)—is employed: FMAB injects the modulation map as a frequency-domain prior into the attention mechanism to conditionally modulate the amplitude of value features while keeping the phase unchanged, thereby helping to preserve structural information in the enhanced output; SVDAB uses multi-scale depthwise-separable convolutions with scale attention to produce adaptively enhanced spatial features. Finally, a Spectral-Gated Feed-Forward Network (SGFFN) applies learnable spectral filters to local features for band-wise selective enhancement. This collaborative design achieves a favorable balance between illumination correction and detail preservation, and AFMNet delivers state-of-the-art performance on multiple low-light enhancement benchmarks. Full article
Show Figures

Figure 1

23 pages, 9200 KB  
Article
GC-HG Gaussian Splatting Single-View 3D Reconstruction Method Based on Depth Prior and Pseudo-Triplane
by Hua Gong, Peide Wang, Yuanjing Ma and Yong Zhang
Algorithms 2025, 18(12), 761; https://doi.org/10.3390/a18120761 - 30 Nov 2025
Viewed by 1099
Abstract
3D Gaussian Splatting (3DGS) is a multi-view 3D reconstruction method that relies solely on image loss for supervision, lacking explicit constraints on the geometric consistency of the rendering model. It uses a multi-view scene-by-scene training paradigm, which limits generalization to unknown scenes in [...] Read more.
3D Gaussian Splatting (3DGS) is a multi-view 3D reconstruction method that relies solely on image loss for supervision, lacking explicit constraints on the geometric consistency of the rendering model. It uses a multi-view scene-by-scene training paradigm, which limits generalization to unknown scenes in the case of single-view limited input. To address these issues, this paper proposes a Geometric Consistency-High Generalization (GC-HG), a single-view 3DGS reconstruction framework integrating depth prior and a pseudo-triplane. First, we utilize the VGGT 3D geometry pre-trained model to derive depth prior, back-projecting them into point clouds to construct a dual-modal input alongside the image. Second, we introduce a pseudo-triplane mechanism with a learnable Z-plane token for feature decoupling and pseudo-triplane feature fusion, thereby enhancing geometry perception and consistency. Finally, we integrate a parent–child hierarchical Gaussian renderer into the feed-forward 3DGS framework, combining depth and 3D offsets to model depth and geometry information, while mapping parent and child Gaussians into a linear structure through an MLP. Evaluations on the RealEstate10K dataset validate our approach, demonstrating improvements in geometric modeling and generalization for single-view reconstruction. Our method improves Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Learned Perceptual Image Patch Similarity (LPIPS) metrics, demonstrating its advantages in geometric consistency modeling and cross-scene generalization. Full article
(This article belongs to the Special Issue Artificial Intelligence in Modeling and Simulation (2nd Edition))
Show Figures

Figure 1

24 pages, 9828 KB  
Article
A Novel Object Detection Algorithm Combined YOLOv11 with Dual-Encoder Feature Aggregation
by Haisong Chen, Pengfei Yuan, Wenbai Liu, Fuling Li and Aili Wang
Sensors 2025, 25(23), 7270; https://doi.org/10.3390/s25237270 - 28 Nov 2025
Cited by 1 | Viewed by 722
Abstract
To address the limitations of unimodal visual detection in complex scenarios involving low illumination, occlusion, and texture-sparse environments, this paper proposes an improved YOLOv11-based dual-branch RGB-D fusion framework. The symmetric architecture processes RGB images and depth maps in parallel, integrating a Dual-Encoder Cross-Attention [...] Read more.
To address the limitations of unimodal visual detection in complex scenarios involving low illumination, occlusion, and texture-sparse environments, this paper proposes an improved YOLOv11-based dual-branch RGB-D fusion framework. The symmetric architecture processes RGB images and depth maps in parallel, integrating a Dual-Encoder Cross-Attention (DECA) module for cross-modal feature weighting and a Dual-Encoder Feature Aggregation (DEPA) module for hierarchical fusion—where the RGB branch captures texture semantics while the depth branch extracts geometric priors. To comprehensively validate the effectiveness and generalization capability of the proposed framework, we designed a multi-stage evaluation strategy leveraging complementary benchmark datasets. On the M3FD dataset, the model was evaluated under both RGB-depth and RGB-infrared configurations to verify core fusion performance and extensibility to diverse modalities. Additionally, the VOC2007 dataset was augmented with pseudo-depth maps generated by Depth Anything, assessing adaptability under monocular input constraints. Experimental results demonstrate that our method achieves mAP50 scores of 82.59% on VOC2007 and 81.14% on M3FD in RGB-infrared mode, outperforming the baseline YOLOv11 by 5.06% and 9.15%, respectively. Notably, in the RGB-depth configuration on M3FD, the model attains a mAP50 of 77.37% with precision of 88.91%, highlighting its robustness in geometric-aware detection tasks. Ablation studies confirm the critical roles of the Dynamic Branch Enhancement (DBE) module in adaptive feature calibration and the Dual-Encoder Attention (DEA) mechanism in multi-scale fusion, significantly enhancing detection stability under challenging conditions. With only 2.47M parameters, the framework provides an efficient and scalable solution for high-precision spatial perception in autonomous driving and robotics applications. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

27 pages, 3424 KB  
Article
Reciprocating Pump Fault Diagnosis Using Enhanced Deep Learning Model with Hybrid Attention Mechanism and Dynamic Temporal Convolutional Networks
by Liming Zhang, Yanlong Xu, Tian Tan, Ling Chen and Xiangyu Guo
Processes 2025, 13(12), 3786; https://doi.org/10.3390/pr13123786 - 24 Nov 2025
Viewed by 430
Abstract
Fault diagnosis is critical for ensuring the reliability of reciprocating pumps in industrial settings. However, challenges such as strong noise interference and unbalanced conditions of existing methods persist. To address these issues, this paper proposes a novel fusion framework integrating a multiple-branch residual [...] Read more.
Fault diagnosis is critical for ensuring the reliability of reciprocating pumps in industrial settings. However, challenges such as strong noise interference and unbalanced conditions of existing methods persist. To address these issues, this paper proposes a novel fusion framework integrating a multiple-branch residual module and a hybrid attention module for reciprocating pump fault diagnosis. The framework introduces a multiple-branch residual module with parallel depth-wise separable convolution, dilated convolution, and direct mapping paths to capture complementary features across different scales. A hybrid attention module is designed to achieve adaptive fusion of channel and spatial attention information while reducing computational overhead through learnable gate mechanisms. Experimental validation on the reciprocating pump dataset demonstrates that the proposed framework outperforms existing methods, achieving an average diagnostic accuracy exceeding 98% even in low signal-to-noise ratio (SNR = −3 dB) environments. This research provides a new perspective for mechanical fault diagnosis, offering significant advantages in diagnostic accuracy, robustness, and industrial applicability. Full article
(This article belongs to the Section Process Control and Monitoring)
Show Figures

Figure 1

18 pages, 11993 KB  
Article
Spatiotemporal Coupling Analysis of Street Vitality and Built Environment: A Multisource Data-Driven Dynamic Assessment Model
by Caijian Hua, Wei Lv and Yan Zhang
Sustainability 2025, 17(21), 9517; https://doi.org/10.3390/su17219517 - 26 Oct 2025
Viewed by 638
Abstract
To overcome the limited accuracy of existing street vitality assessments under dense occlusion and their lack of dynamic, multi-source data fusion, this study proposes an integrated dynamic model that couples an enhanced YOLOv11 with heterogeneous spatiotemporal datasets. The network introduces a two-backbone architecture [...] Read more.
To overcome the limited accuracy of existing street vitality assessments under dense occlusion and their lack of dynamic, multi-source data fusion, this study proposes an integrated dynamic model that couples an enhanced YOLOv11 with heterogeneous spatiotemporal datasets. The network introduces a two-backbone architecture for stronger multi-scale fusion, Spatial Pyramid Depth Convolution (SPDConv) for richer urban scene features, and Dynamic Sparse Sampling (DySample) for robust occlusion handling. Validated in Yibin, the model achieves 90.4% precision, 67.3% recall, and 77.2% mAP@50 gains of 6.5%, 5.3%, and 5.1% over the baseline. By fusing Baidu heatmaps, street-view imagery, road networks, and POI data, a spatial coupling framework quantifies the interplay between commercial facilities and street vitality, enabling dynamic assessment of urban dynamics based on multi-source data fusion, offering insights for targeted retail regulation and adaptive traffic management. By enabling continuous monitoring of urban space use, the model enhances the allocation of public resources and cuts energy waste from idle traffic, thereby advancing urban sustainability via improved commercial planning and responsive traffic control. The work provides a methodological foundation for shifting urban resource allocation from static planning to dynamic, responsive systems. Full article
Show Figures

Figure 1

37 pages, 14970 KB  
Article
Research on Strawberry Visual Recognition and 3D Localization Based on Lightweight RAFS-YOLO and RGB-D Camera
by Kaixuan Li, Xinyuan Wei, Qiang Wang and Wuping Zhang
Agriculture 2025, 15(21), 2212; https://doi.org/10.3390/agriculture15212212 - 24 Oct 2025
Viewed by 983
Abstract
Improving the accuracy and real-time performance of strawberry recognition and localization algorithms remains a major challenge in intelligent harvesting. To address this, this study presents an integrated approach for strawberry maturity detection and 3D localization that combines a lightweight deep learning model with [...] Read more.
Improving the accuracy and real-time performance of strawberry recognition and localization algorithms remains a major challenge in intelligent harvesting. To address this, this study presents an integrated approach for strawberry maturity detection and 3D localization that combines a lightweight deep learning model with an RGB-D camera. Built upon the YOLOv11 framework, an enhanced RAFS-YOLO model is developed, incorporating three core modules to strengthen multi-scale feature fusion and spatial modeling capabilities. Specifically, the CRA module enhances spatial relationship perception through cross-layer attention, the HSFPN module performs hierarchical semantic filtering to suppress redundant features, and the DySample module dynamically optimizes the upsampling process to improve computational efficiency. By integrating the trained model with RGB-D depth data, the method achieves precise 3D localization of strawberries through coordinate mapping based on detection box centers. Experimental results indicate that RAFS-YOLO surpasses YOLOv11n, improving precision, recall, and mAP@50 by 4.2%, 3.8%, and 2.0%, respectively, while reducing parameters by 36.8% and computational cost by 23.8%. The 3D localization attains millimeter-level precision, with average RMSE values ranging from 0.21 to 0.31 cm across all axes. Overall, the proposed approach achieves a balance between detection accuracy, model efficiency, and localization precision, providing a reliable perception framework for intelligent strawberry-picking robots. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

Back to TopTop