Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (63)

Search Parameters:
Keywords = depth upsampling

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
33 pages, 10391 KB  
Article
Computational Method for Predicting Visual Attention in Older Adults with Age-Related Features
by Xiangdong Li, Xinchi Shi, Haoyu Gu, Tianai Shen, Shiwei Cheng and Jing Wang
Multimodal Technol. Interact. 2026, 10(6), 63; https://doi.org/10.3390/mti10060063 - 1 Jun 2026
Viewed by 246
Abstract
Age-related changes in visual perception alter attentional deployment, yet computational models of visual attention have been validated almost exclusively on younger populations. This limits both the theoretical investigation of age-specific mechanisms and practical applications in age-inclusive design, where researchers depend on specialised eye-tracking [...] Read more.
Age-related changes in visual perception alter attentional deployment, yet computational models of visual attention have been validated almost exclusively on younger populations. This limits both the theoretical investigation of age-specific mechanisms and practical applications in age-inclusive design, where researchers depend on specialised eye-tracking equipment to observe such differences. Therefore, we present the Elderly Visual Attention Estimation (EVAE) model, a computational framework that predicts early visual attentional orienting in older adults by combining stimulus-driven image features with age-specific top-down priors. The framework models six dimensions of elderly visual attention from cross-age eye-tracking data: colour brightness sensitivity, centre bias, foreground–background differentiation, depth detection, early attentional prior, and sustained-attention spatial prior. On public datasets, EVAE achieves an AUC-Judd of 0.92, which outperforms existing saliency models and deep learning approaches such as DeepGaze II. The framework is optimised for an input resolution of 128 × 96 pixels, producing fixation probability maps that are upsampled to match the original stimulus resolution for practical interface evaluation. Cross-age validation confirms the model’s specificity, as EVAE predicts attentional behaviour in older adults but does not generalise to younger adults. An ablation study shows that image features and top-down spatial priors each contribute independently to prediction accuracy, and that bottom-up saliency alone cannot account for age-related attentional patterns. Centre bias and early attentional prior are the strongest predictors, indicating that visual ageing involves greater reliance on spatial strategies and compensatory processing. As an alternative to hardware-based eye-tracking, EVAE widens the scope of empirical research into older adults’ visual attention and informs the design of accessible digital interfaces. Full article
Show Figures

Figure 1

23 pages, 2533 KB  
Article
Attention-Enhanced Segmentation for Vegetation and Snow Cover Extraction Supporting Grassland Fire Danger Factor Monitoring
by Weiping Liu, Shuye Chen, Yun Yang and Yili Zheng
Fire 2026, 9(5), 210; https://doi.org/10.3390/fire9050210 - 20 May 2026
Viewed by 417
Abstract
Grassland fire is one of the major disasters threatening regional ecological security. Its occurrence, development, and spread are closely related to the spatial distribution and coverage of surface vegetation and snow cover across grassland areas. As the primary combustible fuel source, higher vegetation [...] Read more.
Grassland fire is one of the major disasters threatening regional ecological security. Its occurrence, development, and spread are closely related to the spatial distribution and coverage of surface vegetation and snow cover across grassland areas. As the primary combustible fuel source, higher vegetation coverage increases fuel load and continuity, thereby directly determining grassland fire danger levels and accelerating fire spread velocity. In contrast, snow cover imposes an indirect regulatory effect on the spatiotemporal pattern of fire danger factors: it lowers surface temperature, raises near-surface humidity, and restricts the germination and growth of herbaceous vegetation in cold seasons, which effectively reduces available combustible materials and weakens regional fire hazard conditions. Therefore, accurately obtaining the coverage status of vegetation (direct combustible fuel factor) and snow cover (indirect fire-regulating factor) in complex grassland scenarios is the essential premise for reliable grassland fire danger monitoring, early warning, disaster prevention and control, and regional ecological management. Aiming at the practical problems in complex grassland scenarios (such as undulating terrain, uneven vegetation growth, large differences in snow depth, and complex lighting conditions), including difficulty in extracting vegetation and snow-covered areas, blurred and confusing boundaries, and low accuracy in coverage calculation, which seriously restrict the technical bottleneck of precise monitoring of grassland fire danger factors, this study takes near-ground images collected by grassland fire danger factor monitoring stations as the core data source, and proposes an improved UNet image segmentation model combined with image segmentation technology and deep learning methods to realize precise extraction of vegetation and snow-covered areas and efficient calculation of coverage in complex scenarios. To improve the model’s feature extraction ability, boundary localization accuracy, and reduce model parameters and computational overhead, the CBAM-ASPP (Convolutional Block Attention Module—Atrous Spatial Pyramid Pooling) module is integrated at the end of the encoding path. The attention mechanism is used to enhance the weight of key features, and the multi-scale receptive field of atrous spatial pyramid pooling is utilized to strengthen the model’s ability to fuse features of vegetation and snow areas of different scales. The residual attention mechanism is introduced in the upsampling stage to effectively alleviate the gradient disappearance problem, improve the model’s ability to accurately locate the boundaries of vegetation and snow areas, and reduce segmentation errors. In the training process, a dynamically weighted hybrid loss function is adopted to dynamically adjust the weights according to the segmentation difficulty of different types of samples during training, optimize the model training effect, and improve the segmentation accuracy and generalization ability. Experiments were conducted using near-ground images of typical complex grassland scenarios as the dataset, and the performance of the proposed model was verified through comparative experiments. The results show that in the vegetation segmentation task, the mean Intersection over Union (mIoU) of the model reaches 84.70%, and the accuracy rate is 91.28%, which are 1.48 and 1.58 percentage points higher than those of the standard UNet model, respectively. In the snow segmentation task, the mIoU of the model reaches 92.74%, and the accuracy rate is 94.19%, which are 2.39 and 2.36 percentage points higher than those of the standard UNet model, respectively. At the same time, the number of parameters of the model is reduced by 12.85% compared with the standard UNet. Also, its comprehensive performance is significantly better than that of mainstream image segmentation models such as FCN, SegNet, and DeepLabv3+. Based on the standardized time-series data retrieved by the optimized segmentation model, this study further constructs a Grassland Fire Risk Index (GFRI) using the Analytic Hierarchy Process (AHP). Pearson correlation verification confirms that the GFRI has an extremely significant positive correlation with historical fire frequency, accurately capturing the seasonal dynamic rhythm of regional grassland fire occurrence. This integrated framework of intelligent segmentation and fire risk quantification provides a reliable technical solution for grassland fire factor monitoring, dynamic fire risk assessment, early warning systems, and refined regional ecological management. Full article
(This article belongs to the Special Issue Forest Fuel Treatment and Fire Risk Assessment, 2nd Edition)
Show Figures

Figure 1

24 pages, 6157 KB  
Article
ACL-Net: A Lane Detection Method Based on Coordinate Attention and Multi-Scale Context Enhancement
by Yunyao Zhu, Siqi Lai, Lin Chai, Ruofan Kang, Man Bai and Hua Yang
Appl. Sci. 2026, 16(10), 5098; https://doi.org/10.3390/app16105098 - 20 May 2026
Viewed by 182
Abstract
Lane detection is a crucial perception task for autonomous driving, but existing methods often struggle with spatial information loss, feature upsampling artifacts, and prediction discontinuities under complex scenarios such as occlusions or poor lighting. To address these limitations, this paper proposes ACL-Net, an [...] Read more.
Lane detection is a crucial perception task for autonomous driving, but existing methods often struggle with spatial information loss, feature upsampling artifacts, and prediction discontinuities under complex scenarios such as occlusions or poor lighting. To address these limitations, this paper proposes ACL-Net, an end-to-end lane detection network integrating attention mechanisms and context enhancement based on the Cross Layer Refinement Network framework. First, a coordinate attention module is embedded at the output of the backbone network to recalibrate spatial position information and mitigate depth-induced detail loss. Second, the feature pyramid network is reconstructed utilizing a dynamic upsampling operator and an additional bottom-up pathway to prevent edge distortion and preserve fine-grained geometric features. Finally, a lane-aware atrous spatial pyramid pooling module with asymmetric convolutions is designed to aggregate multi-scale global context, effectively reconnecting fragmented lane lines caused by visual occlusions. Extensive experiments on the TuSimple and CULane datasets demonstrate the superiority of the proposed approach. ACL-Net achieves an accuracy of 96.98% on TuSimple and a total F1-measure of 80.34% on CULane, outperforming the baseline Cross Layer Refinement Network while maintaining a real-time inference speed of 61.90 FPS. The results indicate that ACL-Net significantly improves the utilization of geometric features and exhibits enhanced robustness in challenging road conditions, including severe occlusions, nighttime, and large-curvature curves. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

25 pages, 1539 KB  
Article
RFE-YOLO: A Lightweight Receptive Field-Enhanced Network for UAV Imagery Object Detection
by Yimo Peng and Xiangyu Ge
Sensors 2026, 26(9), 2903; https://doi.org/10.3390/s26092903 - 6 May 2026
Viewed by 860
Abstract
Object detection in unmanned aerial vehicle (UAV) remote sensing imagery remains a formidable challenge due to the diminutive scale of targets, complex background clutter, and extreme variability in target morphology. Standard convolutional neural networks typically suffer from irreversible fine-grained information loss during downsampling, [...] Read more.
Object detection in unmanned aerial vehicle (UAV) remote sensing imagery remains a formidable challenge due to the diminutive scale of targets, complex background clutter, and extreme variability in target morphology. Standard convolutional neural networks typically suffer from irreversible fine-grained information loss during downsampling, as strided operations discard critical spatial details essential for the localization of tiny objects. To address these issues, we propose RFE-YOLO, a lightweight receptive field-enhanced network specifically tailored for high-precision small object detection in UAV scenarios. First, the Cross-Scale Receptive Field Enhancement (CSRE) module is designed to mitigate intrinsic information loss by integrating space-to-depth convolution (SPD-Conv), which preserves spatial details by migrating them into the channel dimension. This module further employs an energy-based adaptive weight generation mechanism to distinguish target signals from environmental noise. Second, this paper proposes the C3k2-Dynamic Inception Mixer Block (C3k2-DIMB), which adaptively captures anisotropic features—such as slender vehicles—via dynamic kernel weighting and multi-shape inception kernels. Third, the Shuffled Upsampling for Resolution Enhancement (SURE) module is introduced to maintain spatial fidelity during resolution recovery, utilizing a channel shuffle mechanism to overcome information isolation. Finally, the Multi-feature Fusion Module (MFM) replaces conventional static concatenation with a dynamic softmax-based competition mechanism, effectively bridging the semantic gap between multi-level features while suppressing background distractors. Experimental results on the VisDrone dataset demonstrate that RFE-YOLO significantly enhances the representation capability for small objects. Specifically, the proposed model achieves a state-of-the-art mAP50 of 42.70%, representing a substantial 9.3% improvement over the baseline YOLO11n. Furthermore, our architecture maintains an exceptionally lightweight profile with only 1.91 M parameters, demonstrating that high-precision detection can be achieved through structural intelligence rather than excessive parameter scaling. This makes RFE-YOLO highly suitable for real-time inference on edge-deployed UAV platforms. Full article
Show Figures

Figure 1

25 pages, 260979 KB  
Article
RDAH-Net: Bridging Relative Depth and Absolute Height for Monocular Height Estimation in Remote Sensing
by Liting Jiang, Feng Wang, Niangang Jiao, Jingxing Zhu, Yuming Xiang and Hongjian You
Remote Sens. 2026, 18(7), 1024; https://doi.org/10.3390/rs18071024 - 29 Mar 2026
Viewed by 600
Abstract
Generating high-precision normalized digital surface models (nDSMs) from a single remote sensing image remains a challenging and ill-posed problem due to the absence of reliable geometric constraints. In this work, we show that monocular depth provides structurally stable cues of local geometry but [...] Read more.
Generating high-precision normalized digital surface models (nDSMs) from a single remote sensing image remains a challenging and ill-posed problem due to the absence of reliable geometric constraints. In this work, we show that monocular depth provides structurally stable cues of local geometry but lacks the global scale and vertical reference required for absolute height recovery. This intrinsic mismatch limits direct depth-to-height regression, particularly when transferring across heterogeneous terrains, land-cover compositions, and imaging conditions. Building on this idea, we propose the Relative Depth–Absolute Height Prediction Network (RDAH-Net), a framework that exploits relative depth as a geometry-aware prior while learning terrain-dependent height mappings from image appearance to absolute height. As the backbone, we employ a lightweight MobileNetV2 enhanced with a Convolutional Block Attention Module (CBAM), and further incorporate a cross-modal bidirectional attention fusion scheme with positional encoding to achieve a deep and effective fusion of image appearance and depth prior cues. Finally, a PixelShuffle-based upsampling strategy is used to sharpen prediction details and mitigate typical upsampling artifacts. Extensive experiments across diverse regions demonstrate that RDAH-Net achieves robust and generalizable height estimation, providing a practical alternative for large-scale mapping and rapid update scenarios. Full article
Show Figures

Figure 1

22 pages, 7392 KB  
Article
Recursive Deep Feature Learning for Hyperspectral Image Super-Resolution
by Jiming Liu, Chen Yi and Hehuan Li
Appl. Sci. 2026, 16(2), 1060; https://doi.org/10.3390/app16021060 - 20 Jan 2026
Viewed by 491
Abstract
The advancement of hyperspectral image super-resolution (HSI-SR) has been significantly propelled by deep learning techniques. However, current methods predominantly rely on 2D or 3D convolutional networks, which are inherently local and thus limited in modeling long-range spectral–depth interactions. This work introduces a novel [...] Read more.
The advancement of hyperspectral image super-resolution (HSI-SR) has been significantly propelled by deep learning techniques. However, current methods predominantly rely on 2D or 3D convolutional networks, which are inherently local and thus limited in modeling long-range spectral–depth interactions. This work introduces a novel network architecture designed to address this gap through recursive deep feature learning. Our model initiates with 3D convolutions to extract preliminary spectral–spatial features, which are progressively refined via densely connected grouped convolutions. A core innovation is a recursively formulated generalized self-attention mechanism, which captures long-range dependencies across the spectral dimension with linear complexity. To reconstruct fine spatial details across multiple scales, a progressive upsampling strategy is further incorporated. Evaluations on several public benchmarks demonstrate that the proposed approach outperforms existing state-of-the-art methods in both quantitative metrics and visual quality. Full article
(This article belongs to the Special Issue Remote Sensing Image Processing and Application, 2nd Edition)
Show Figures

Figure 1

27 pages, 12605 KB  
Article
YOLOv11n-CGSD: Lightweight Detection of Dairy Cow Body Temperature from Infrared Thermography Images in Complex Barn Environments
by Zhongwei Kang, Hang Song, Hang Xue, Miao Wu, Derui Bao, Chuang Yan, Hang Shi, Jun Hu and Tomas Norton
Agriculture 2026, 16(2), 229; https://doi.org/10.3390/agriculture16020229 - 15 Jan 2026
Viewed by 867
Abstract
Dairy cow body temperature is a key physiological indicator that reflects metabolic level, immune status, and environmental stress responses, and it has been widely used for early disease recognition. Infrared thermography (IRT), as a non-contact imaging technique capable of remotely acquiring the surface [...] Read more.
Dairy cow body temperature is a key physiological indicator that reflects metabolic level, immune status, and environmental stress responses, and it has been widely used for early disease recognition. Infrared thermography (IRT), as a non-contact imaging technique capable of remotely acquiring the surface radiation temperature distribution of animals, is regarded as a powerful alternative to traditional temperature measurement methods. Under practical cowshed conditions, IRT images of dairy cows are easily affected by complex background interference and generally suffer from low resolution, poor contrast, indistinct boundaries, weak structural perception, and insufficient texture information, which lead to significant degradation in target detection and temperature extraction performance. To address these issues, a lightweight detection model named YOLOv11n-CGSD is proposed for dairy cow IRT images, aiming to improve the accuracy and robustness of region of interest (ROI) detection and body temperature extraction under complex background conditions. At the architectural level, a C3Ghost lightweight module based on the Ghost concept is first constructed to reduce redundant feature extraction while lowering computational cost and enhancing the network capability for preserving fine-grained features during feature propagation. Subsequently, a space-to-depth convolution module is introduced to perform spatial rearrangement of feature maps and achieve channel compression via non-strided convolution, thereby improving the sensitivity of the model to local temperature variations and structural details. Finally, a dynamic sampling mechanism is embedded in the neck of the network, where the upsampling and scale alignment processes are adaptively driven by feature content, enhancing the model response to boundary temperature changes and weak-texture regions. Experimental results indicate that the YOLOv11n-CGSD model can effectively shift attention from irrelevant background regions to ROI contour boundaries and increase attention coverage within the ROI. Under complex IRT conditions, the model achieves P, R, and mAP50 values of 89.11%, 86.80%, and 91.94%, which represent improvements of 3.11%, 5.14%, and 4.08%, respectively, compared with the baseline model. Using Tmax as the temperature extraction parameter, the maximum error (Max. Error) and mean error (MAE. Error) in the lower udder region are reduced by 33.3% and 25.7%, respectively, while in the around the anus region, the Max. Error and MAE. Error are reduced by 87.5% and 95.0%, respectively. These findings demonstrate that, under complex backgrounds and low-quality IRT imaging conditions, the proposed model achieves lightweight and high-performance detection for both lower udder (LU) and around the anus (AA) regions and provides a methodological reference and technical support for non-contact body temperature measurement of dairy cows in practical cowshed production environments. Full article
(This article belongs to the Section Farm Animal Production)
Show Figures

Figure 1

22 pages, 92351 KB  
Article
Robust Self-Supervised Monocular Depth Estimation via Intrinsic Albedo-Guided Multi-Task Learning
by Genki Higashiuchi, Tomoyasu Shimada, Xiangbo Kong and Hiroyuki Tomiyama
Appl. Sci. 2026, 16(2), 714; https://doi.org/10.3390/app16020714 - 9 Jan 2026
Viewed by 739
Abstract
Self-supervised monocular depth estimation has demonstrated high practical utility, as it can be trained using a photometric image reconstruction loss between the original image and a reprojected image generated from the estimated depth and relative pose, thereby alleviating the burden of large-scale label [...] Read more.
Self-supervised monocular depth estimation has demonstrated high practical utility, as it can be trained using a photometric image reconstruction loss between the original image and a reprojected image generated from the estimated depth and relative pose, thereby alleviating the burden of large-scale label creation. However, this photometric image reconstruction loss relies on the Lambertian reflectance assumption. Under non-Lambertian conditions such as specular reflections or strong illumination gradients, pixel values fluctuate depending on the lighting and viewpoint, which often misguides training and leads to large depth errors. To address this issue, we propose a multitask learning framework that integrates albedo estimation as a supervised auxiliary task. The proposed framework is implemented on top of representative self-supervised monocular depth estimation backbones, including Monodepth2 and Lite-Mono, by adopting a multi-head architecture in which the shared encoder–decoder branches at each upsampling block into a Depth Head and an Albedo Head. Furthermore, we apply Intrinsic Image Decomposition to generate albedo images and design an albedo supervision loss that uses these albedo maps as training targets for the Albedo Head. We then integrate this loss term into the overall training objective, explicitly exploiting illumination-invariant albedo components to suppress erroneous learning in reflective regions and areas with strong illumination gradients. Experiments on the ScanNetV2 dataset demonstrate that, for the lightweight backbone Lite-Mono, our method achieves an average reduction of 18.5% over the four standard depth error metrics and consistently improves accuracy metrics, without increasing the number of parameters and FLOPs at inference time. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Computer Vision)
Show Figures

Figure 1

17 pages, 9683 KB  
Article
Combined Infinity Laplacian and Non-Local Means Models Applied to Depth Map Restoration
by Vanel Lazcano, Mabel Vega-Rojas and Felipe Calderero
Signals 2026, 7(1), 2; https://doi.org/10.3390/signals7010002 - 7 Jan 2026
Viewed by 756
Abstract
Scene depth information is a key component of any robotic mobile application. Range sensors, such as LiDAR, sonar, or radar, capture depth data of a scene. However, the data captured by these sensors frequently presents missing regions or information with a low confidence [...] Read more.
Scene depth information is a key component of any robotic mobile application. Range sensors, such as LiDAR, sonar, or radar, capture depth data of a scene. However, the data captured by these sensors frequently presents missing regions or information with a low confidence level. These missing regions in the depth data could be large areas without information, making it difficult to make decisions, for instance, for an autonomous vehicle. Recovering depth data has become a primary activity for computer vision applications. This work proposes and evaluates an interpolation model to infer dense depth maps from a Lab color space reference picture and an incomplete-depth image embedded in a completion pipeline. The complete proposal pipeline comprises convolutional layers and a convex combination of the infinity Laplacian and non-local means model. The proposed model infers dense depth maps by considering depth data and utilizing clues from a color picture of the scene, along with a metric for computing differences between two pixels. The work contributes (i) the convex combination of the two models to interpolate the data, and (ii) the proposal of a class of function suitable for balancing between different models. The obtained results show that the model outperforms similar models in the KITTI dataset and outperforms our previous implementation in the NYU_v2 dataset, dropping the MSE by 34.86%, 3.35%, and 34.42% for 4×, 8×, 16× upsampling tasks, respectively. Full article
(This article belongs to the Special Issue Recent Development of Signal Detection and Processing)
Show Figures

Figure 1

37 pages, 14970 KB  
Article
Research on Strawberry Visual Recognition and 3D Localization Based on Lightweight RAFS-YOLO and RGB-D Camera
by Kaixuan Li, Xinyuan Wei, Qiang Wang and Wuping Zhang
Agriculture 2025, 15(21), 2212; https://doi.org/10.3390/agriculture15212212 - 24 Oct 2025
Cited by 5 | Viewed by 1650
Abstract
Improving the accuracy and real-time performance of strawberry recognition and localization algorithms remains a major challenge in intelligent harvesting. To address this, this study presents an integrated approach for strawberry maturity detection and 3D localization that combines a lightweight deep learning model with [...] Read more.
Improving the accuracy and real-time performance of strawberry recognition and localization algorithms remains a major challenge in intelligent harvesting. To address this, this study presents an integrated approach for strawberry maturity detection and 3D localization that combines a lightweight deep learning model with an RGB-D camera. Built upon the YOLOv11 framework, an enhanced RAFS-YOLO model is developed, incorporating three core modules to strengthen multi-scale feature fusion and spatial modeling capabilities. Specifically, the CRA module enhances spatial relationship perception through cross-layer attention, the HSFPN module performs hierarchical semantic filtering to suppress redundant features, and the DySample module dynamically optimizes the upsampling process to improve computational efficiency. By integrating the trained model with RGB-D depth data, the method achieves precise 3D localization of strawberries through coordinate mapping based on detection box centers. Experimental results indicate that RAFS-YOLO surpasses YOLOv11n, improving precision, recall, and mAP@50 by 4.2%, 3.8%, and 2.0%, respectively, while reducing parameters by 36.8% and computational cost by 23.8%. The 3D localization attains millimeter-level precision, with average RMSE values ranging from 0.21 to 0.31 cm across all axes. Overall, the proposed approach achieves a balance between detection accuracy, model efficiency, and localization precision, providing a reliable perception framework for intelligent strawberry-picking robots. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

19 pages, 49708 KB  
Article
MonoLENS: Monocular Lightweight Efficient Network with Separable Convolutions for Self-Supervised Monocular Depth Estimation
by Genki Higashiuchi, Tomoyasu Shimada, Xiangbo Kong, Haimin Yan and Hiroyuki Tomiyama
Appl. Sci. 2025, 15(19), 10393; https://doi.org/10.3390/app151910393 - 25 Sep 2025
Cited by 3 | Viewed by 1586
Abstract
Self-supervised monocular depth estimation is gaining significant attention because it can learn depth from video without needing expensive ground-truth data. However, many self-supervised models remain too heavy for edge devices, and simply shrinking them tends to degrade accuracy. To address this trade-off, we [...] Read more.
Self-supervised monocular depth estimation is gaining significant attention because it can learn depth from video without needing expensive ground-truth data. However, many self-supervised models remain too heavy for edge devices, and simply shrinking them tends to degrade accuracy. To address this trade-off, we present MonoLENS, an extension of Lite-Mono. MonoLENS follows a design that reduces computation while preserving geometric fidelity (relative depth relations, boundaries, and planar structures). MonoLENS advances Lite-Mono by suppressing computation on paths with low geometric contribution, focusing compute and attention on layers rich in structural cues, and pruning redundant operations in later stages. Our model incorporates two new modules, the DS-Upsampling Block and the MCACoder, along with a simplified encoder. Specifically, the DS-Upsampling Block uses depthwise separable convolutions throughout the decoder, which greatly lowers floating-point operations (FLOPs). Furthermore, the MCACoder applies Multidimensional Collaborative Attention (MCA) to the output of the second encoder stage, helping to make edge details sharper in high-resolution feature maps. Additionally, we simplified the encoder’s architecture by reducing the number of blocks in its fourth stage from 10 to 4, which resulted in a further reduction of model parameters. When tested on both the KITTI and Cityscapes benchmarks, MonoLENS achieved leading performance. On the KITTI benchmark, MonoLENS reduced the number of model parameters by 42% (1.8M) compared with Lite-Mono, while simultaneously improving the squared relative error by approximately 4.5%. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Computer Vision)
Show Figures

Figure 1

14 pages, 3062 KB  
Article
Self-Supervised Monocular Depth Estimation Based on Differential Attention
by Ming Zhou, Hancheng Yu, Zhongchen Li and Yupu Zhang
Algorithms 2025, 18(9), 590; https://doi.org/10.3390/a18090590 - 19 Sep 2025
Cited by 1 | Viewed by 1906
Abstract
Depth estimation algorithms are widely applied in various fields, including 3D reconstruction, autonomous driving, and industrial robotics. Monocular self-supervised algorithms for depth prediction offer a cost-effective alternative to acquiring depth through hardware devices such as LiDAR. However, current depth prediction networks, predominantly based [...] Read more.
Depth estimation algorithms are widely applied in various fields, including 3D reconstruction, autonomous driving, and industrial robotics. Monocular self-supervised algorithms for depth prediction offer a cost-effective alternative to acquiring depth through hardware devices such as LiDAR. However, current depth prediction networks, predominantly based on conventional encoder–decoder architectures, often encounter two critical limitations: insufficient feature fusion mechanisms during the upsampling phase and constrained receptive fields. These limitations result in the loss of high-frequency details in the predicted depth maps. To overcome these issues, we introduce differential attention operators to enhance global feature representation and refine locally upsampled features within the depth decoder. Furthermore, we equip the decoder with a deformable bin-structured prediction head; this lightweight design enables per-pixel dynamic aggregation of local depth distributions via adaptive receptive field modulation and deformable sampling, enhancing the decoder’s fine-grained detail processing by capturing local geometry and holistic structures. Experimental results on the KITTI and Make3D datasets demonstrate that our proposed method produces more accurate depth maps with finer details compared to existing approaches. Full article
(This article belongs to the Special Issue Algorithms for Feature Selection (3rd Edition))
Show Figures

Figure 1

23 pages, 172516 KB  
Article
YOLOv8n-Pose-DSW: A Precision Picking Point Localization Model for Zucchini in Complex Greenhouse Environments
by Hongxiong Su, Sa Wang, Honglin Su, Fumin Ma, Yanwen Li and Juxia Li
Agriculture 2025, 15(18), 1954; https://doi.org/10.3390/agriculture15181954 - 16 Sep 2025
Cited by 2 | Viewed by 1354
Abstract
Zucchini growth in greenhouse environments presents significant challenges for fruit recognition and picking point localization due to characteristics such as foliage occlusion, high density, structural complexity, and diverse fruit morphologies. Current recognition and localization algorithms exhibit limitations including low accuracy, restricted applicability, and [...] Read more.
Zucchini growth in greenhouse environments presents significant challenges for fruit recognition and picking point localization due to characteristics such as foliage occlusion, high density, structural complexity, and diverse fruit morphologies. Current recognition and localization algorithms exhibit limitations including low accuracy, restricted applicability, and procedural complexity, falling short of the requirements for precise and robust intelligent harvesting. To address these issues, this study constructs a zucchini dataset of 942 images using an Intel RealSense D455 depth camera and a smartphone, and proposes a novel keypoint detection model named YOLOv8n-Pose-DSW. The model introduces three key enhancements compared with YOLOv8n-Pose. First, the conventional upsample operator is replaced with an adaptive point sampling operator called Dysample, improving detection accuracy while reducing GPU memory consumption. Second, a Slim-Neck structure is designed to decrease computational overhead through lightweight bottleneck architecture, while preserving robust feature representation. Third, the WIoU-v3 loss is adopted to optimize bounding box regression for object detection, thereby enhancing localization accuracy. Experimental results demonstrate that YOLOv8n-Pose-DSW achieves a zucchini detection P, R, mAP@50, and mAP@50–95 of 92.1%, 90.7%, 94.0%, and 71.4%, respectively. These metrics represent improvements of 3.3%, 11.7%, 7.4%, and 15.4%, respectively, over the original model. For picking point localization, the improved model attains a P of 93.1%, R of 89.5%, mAP@50 of 95.6%, and mAP@50–95 of 95.2%, corresponding to gains of 8.8%, 11.0%, 11.3%, and 27.9% over the original model. Further error analysis shows that picking point localization errors are concentrated within the 0–4-pixel range, demonstrating enhanced localization precision critical for practical harvesting applications. The proposed algorithm effectively addresses greenhouse environmental challenges and provides essential technical support for intelligent zucchini harvesting systems. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

43 pages, 3753 KB  
Review
Comprehensive Review of Deep Learning Approaches for Single-Image Super-Resolution
by Zirun Liu, Shijie Jiang, Shuhan Feng, Qirui Song and Ji Zhang
Sensors 2025, 25(18), 5768; https://doi.org/10.3390/s25185768 - 16 Sep 2025
Viewed by 3876
Abstract
Single-image super-resolution (SISR) is a core challenge in the field of image processing, aiming to overcome the physical limitations of imaging systems and improve their resolution. This article systematically introduces the SISR method based on deep learning, proposes a method-oriented classification framework, and [...] Read more.
Single-image super-resolution (SISR) is a core challenge in the field of image processing, aiming to overcome the physical limitations of imaging systems and improve their resolution. This article systematically introduces the SISR method based on deep learning, proposes a method-oriented classification framework, and explores it from three aspects: theoretical basis, technological evolution, and domain-specific applications. Firstly, the basic concepts, development trajectory, and practical value of SISR are introduced. Secondly, in-depth research is conducted on key technical components, including benchmark dataset construction, a multi-scale upsampling strategy, objective function optimization, and quality assessment indicators. Thirdly, some classic SISR model reconstruction results are listed and compared. Finally, the limitations of SISR research are pointed out, and some prospective research directions are proposed. This article provides a systematic knowledge framework for researchers and offers important reference value for the future development of SISR. Full article
Show Figures

Figure 1

28 pages, 19790 KB  
Article
HSF-DETR: A Special Vehicle Detection Algorithm Based on Hypergraph Spatial Features and Bipolar Attention
by Kaipeng Wang, Guanglin He and Xinmin Li
Sensors 2025, 25(14), 4381; https://doi.org/10.3390/s25144381 - 13 Jul 2025
Cited by 1 | Viewed by 1735
Abstract
Special vehicle detection in intelligent surveillance, emergency rescue, and reconnaissance faces significant challenges in accuracy and robustness under complex environments, necessitating advanced detection algorithms for critical applications. This paper proposes HSF-DETR (Hypergraph Spatial Feature DETR), integrating four innovative modules: a Cascaded Spatial Feature [...] Read more.
Special vehicle detection in intelligent surveillance, emergency rescue, and reconnaissance faces significant challenges in accuracy and robustness under complex environments, necessitating advanced detection algorithms for critical applications. This paper proposes HSF-DETR (Hypergraph Spatial Feature DETR), integrating four innovative modules: a Cascaded Spatial Feature Network (CSFNet) backbone with Cross-Efficient Convolutional Gating (CECG) for enhanced long-range detection through hybrid state-space modeling; a Hypergraph-Enhanced Spatial Feature Modulation (HyperSFM) network utilizing hypergraph structures for high-order feature correlations and adaptive multi-scale fusion; a Dual-Domain Feature Encoder (DDFE) combining Bipolar Efficient Attention (BEA) and Frequency-Enhanced Feed-Forward Network (FEFFN) for precise feature weight allocation; and a Spatial-Channel Fusion Upsampling Block (SCFUB) improving feature fidelity through depth-wise separable convolution and channel shift mixing. Experiments conducted on a self-built special vehicle dataset containing 2388 images demonstrate that HSF-DETR achieves mAP50 and mAP50-95 of 96.6% and 70.6%, respectively, representing improvements of 3.1% and 4.6% over baseline RT-DETR while maintaining computational efficiency at 59.7 GFLOPs and 18.07 M parameters. Cross-domain validation on VisDrone2019 and BDD100K datasets confirms the method’s generalization capability and robustness across diverse scenarios, establishing HSF-DETR as an effective solution for special vehicle detection in complex environments. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Back to TopTop