Applied Sciences

Research

23 pages, 9405 KB

Open AccessArticle

FG-Text-SD: Training-Free Controllable Scene Text–Image Generation for Low-Resource Southeast Asian Languages

by Ning Shi, Chunlei Wu and Yongzhen Zhang

Appl. Sci. 2026, 16(9), 4461; https://doi.org/10.3390/app16094461 (registering DOI) - 2 May 2026

Scene text–image generation aims to synthesize natural images containing readable and visually coherent text. Although recent diffusion-based methods have shown promising results, they often struggle with low-resource Southeast Asian languages because of complex glyph structures, limited language resources, and weak alignment between generated [...] Read more.

Scene text–image generation aims to synthesize natural images containing readable and visually coherent text. Although recent diffusion-based methods have shown promising results, they often struggle with low-resource Southeast Asian languages because of complex glyph structures, limited language resources, and weak alignment between generated text and background carriers. To address this issue, we propose FG-Text-SD, a training-free controllable scene text–image generation framework built on Stable Diffusion. The proposed framework organizes multiple text instances in an instance-level manner, injects rendered glyph structure priors into the denoising process to stabilize complex character shapes, modulates cross-attention with carrier-aware masks to improve text-to-surface alignment, and employs OCR-guided local repainting to correct residual local errors. Experiments are conducted on AnyText-benchmark, CVTG-2K, and a newly constructed evaluation set covering Thai, Lao, Khmer, and Burmese. The proposed method achieves strong performance on both public benchmarks and low-resource language subsets, improving text accuracy, readability, and spatial consistency without additional model retraining. These results demonstrate that FG-Text-SD provides an effective solution for controllable scene text–image generation in low-resource multilingual settings. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)

► Show Figures

Figure 1

25 pages, 11059 KB

Open AccessArticle

Few-Shot Open-Set Object Detection with a Synthesized Monument Guided by Contrastive Distilled Prompts

by Hao Chen and Ying Chen

Appl. Sci. 2026, 16(7), 3474; https://doi.org/10.3390/app16073474 - 2 Apr 2026

Viewed by 346

Abstract

Few-shot open-set object detection (FS-OSOD) remains challenging in real-world scenarios, where detectors must accurately recognize known objects from few examples while reliably rejecting vast unknown categories. Under this setting, decision boundaries between known and unknown classes are easily distorted by data scarcity and [...] Read more.

Few-shot open-set object detection (FS-OSOD) remains challenging in real-world scenarios, where detectors must accurately recognize known objects from few examples while reliably rejecting vast unknown categories. Under this setting, decision boundaries between known and unknown classes are easily distorted by data scarcity and background clutter, leading to severe overfitting on base classes and overconfident misclassification of unknowns. Recent research attempts to alleviate these issues by regularizing detection heads to suppress base-class bias, or by leveraging vision–language priors through open-vocabulary alignment and prompt tuning to enhance semantic transferability. However, these solutions often overlook explicit modeling of truly out-of-set unknowns and the instability of prompt adaptation in low-data regimes, which can cause boundary drifts and make unknown proposals be absorbed by similar seen classes or even suppressed as background. To alleviate these issues, a guided prompt–monument network (GPMN) that is proposed, which jointly enhances prompt learning and feature representation learning for FS-OSOD. First, the contrastive distilled prompts (CDP) module employs a teacher–student prompt framework to decouple optimization across base, novel, and unknown classes. This strategy preserves transferability between zero-shot and few-shot settings while enhancing discrimination on base categories. Second, a synthesized monument module (SMM) maintains class-centered memory with momentum-updated prototypes and a non-parametric classifier, which compresses the overlap between seen and unseen distributions and provides a stable rejection margin for unknowns with strong co-occurrence and background noise. Compared with existing head-regularization and open-vocabulary prompt-tuning pipelines, GPMN explicitly targets both base-class bias and seen–unseen overlap at the region level. Extensive experiments on VOC10-5-5 and VOC-COCO benchmarks demonstrate that GPMN consistently improves unknown recall and few-shot mAP over representative FS-OSOD baselines. These results suggest that prompt-level decoupling mitigates base-class bias, whereas memory-anchored regularization enlarges the seen–unseen margin, jointly supporting reliable unknown rejection in scarce-supervision regimes. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)

► Show Figures

Figure 1

22 pages, 8847 KB

Open AccessArticle

DGAGaze: Gaze Estimation with Dual-Stream Differential Attention and Geometry-Aware Temporal Alignment

by Wei Zhang and Pengcheng Li

Appl. Sci. 2026, 16(7), 3298; https://doi.org/10.3390/app16073298 - 29 Mar 2026

Viewed by 419

Abstract

Gaze estimation plays a crucial role in human-computer interaction and behavior analysis. However, in dynamic scenes, rigid head movements and rapid gaze shifts pose significant challenges to accurate gaze prediction. Most existing methods either process single-frame images independently or rely on long video [...] Read more.

Gaze estimation plays a crucial role in human-computer interaction and behavior analysis. However, in dynamic scenes, rigid head movements and rapid gaze shifts pose significant challenges to accurate gaze prediction. Most existing methods either process single-frame images independently or rely on long video sequences, making it difficult to simultaneously achieve strong performance and high computational efficiency. To address this issue, we propose DGAGaze, a gaze estimation framework based on a difference-driven spatiotemporal attention mechanism. This framework uses a geometry-aware temporal alignment module to mitigate interference from rigid head movements, compensating for them through pose estimation and affine feature warping, thereby achieving explicit decoupling between global head motion and local eye motion. Based on the aligned features, inter-frame differences are used to adjust spatial and channel attention weights, enhancing motion-sensitive representations without introducing an additional temporal modeling layer. Extensive experiments on the EyeDiap and Gaze360 datasets demonstrate the effectiveness of the proposed approach. DGAGaze achieves improved gaze estimation accuracy while maintaining a lightweight architecture based on a ResNet-18 backbone, outperforming existing state-of-the-art methods. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)

► Show Figures

Figure 1

24 pages, 5930 KB

Open AccessArticle

Style-Abstraction-Based Data Augmentation for Robust Affective Computing

by Xu Qiu, Taewan Kim and Bongjae Kim

Appl. Sci. 2026, 16(6), 3109; https://doi.org/10.3390/app16063109 - 23 Mar 2026

Viewed by 409

Abstract

Personality recognition and emotion recognition, two core tasks within affective computing, are fundamentally constrained by data scarcity as collecting and annotating human behavioral data is expensive and restricted by privacy concerns. Under these limited data conditions, existing models tend to rely on superficial [...] Read more.

Personality recognition and emotion recognition, two core tasks within affective computing, are fundamentally constrained by data scarcity as collecting and annotating human behavioral data is expensive and restricted by privacy concerns. Under these limited data conditions, existing models tend to rely on superficial shortcut features such as background appearance, lighting conditions, or color variations, rather than behavior-relevant cues including facial expressions, posture, and motion dynamics. To address this issue, we propose Style-Abstraction-based Data Augmentation, a style transfer-based augmentation strategy that reduces dependency on low-level appearance information while preserving high-level semantic cues. Specifically, we employ cartoonization to generate stylized variants of training videos that retain expressive characteristics but remove stylistic bias. We validate our approach on three diverse personality benchmarks (First Impression v2, UDIVA v0.5, and KETI) and emotion benchmark(Emotion Dataset) using state-of-the-art models including ViViT (Video Vision Transformer), TimeSformer, and VST (Video Swin Transformer). Our experiments indicate that increasing the proportion of style-abstracted data in the training set can improve performance on the evaluated datasets. Notably, our method yields consistent gains across all benchmarks: a 0.0893 reduction in MSE on UDIVA v0.5 (with VST), a 0.0023 improvement in 1-MAE on KETI (with TimeSformer), and a 0.0051 improvement on First Impression v2 (with TimeSformer). Furthermore, extending style-abstraction-based data augmentation to a four-class categorical emotion recognition task demonstrates similar performance gains, achieving up to a 3.44% accuracy increase with the TimeSformer backbone. These findings verify that our style-abstraction-based data augmentation facilitates learning of behavior-relevant features by reducing reliance on superficial shortcuts. Overall, cartoonization-based style abstraction for data augmentation functions as both an effective augmentation strategy and a regularization mechanism, encouraging the model to learn more stable and generalizable representations for affective computing applications. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)

► Show Figures

Figure 1

21 pages, 2720 KB

Open AccessArticle

Adaptive Multi-Branch Feature Fusion for Low-Light Image Enhancement

by Serdar Çiftçi

Appl. Sci. 2026, 16(6), 2712; https://doi.org/10.3390/app16062712 - 12 Mar 2026

Viewed by 434

Abstract

Low-light image enhancement (LLIE) remains a challenging problem due to spatially varying illumination degradation, compressed tonal distributions, and structural detail loss. This paper presents Adaptive Multi-Branch Feature Fusion (AMBFF), a unified framework that formulates LLIE as a multi-domain representation alignment task. The proposed [...] Read more.

Low-light image enhancement (LLIE) remains a challenging problem due to spatially varying illumination degradation, compressed tonal distributions, and structural detail loss. This paper presents Adaptive Multi-Branch Feature Fusion (AMBFF), a unified framework that formulates LLIE as a multi-domain representation alignment task. The proposed architecture explicitly models complementary feature domains, including hierarchical spatial context, luminance–chrominance decoupling, edge–texture structures, frequency-domain information, and differentiable tonal histogram representations. A spatially adaptive gating mechanism dynamically weights multi-feature branches through a convex fusion strategy, enabling location-aware illumination correction while preserving structural integrity and color fidelity. Extensive evaluations on widely used benchmark datasets demonstrate that AMBFF consistently outperforms representative conventional and deep learning-based approaches in terms of PSNR, SSIM, and LPIPS. Ablation analyses confirm the complementarity of the proposed feature domains and the robustness benefits of adaptive fusion. Despite its multi-branch design, AMBFF maintains a favorable performance–complexity trade-off, highlighting the effectiveness of structured multi-domain modeling for low-light image enhancement. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)

► Show Figures

Figure 1

23 pages, 4020 KB

Open AccessArticle

Structure-Aware Pixel Art Scaling via Block Size Detection

by Jun Won Seo, Jun Won Lee, Jong Hyuck Lee, Jun Beom Kim and Jin-Woo Jung

Appl. Sci. 2026, 16(5), 2314; https://doi.org/10.3390/app16052314 - 27 Feb 2026

Viewed by 491

Abstract

Standard interpolation methods degrade pixel art through blurring or geometric distortion. We propose a lossless scaling algorithm that detects the intrinsic block size to normalize the image grid, thereby expanding the set of valid scaling factors beyond standard integer multiples. This approach enables [...] Read more.

Standard interpolation methods degrade pixel art through blurring or geometric distortion. We propose a lossless scaling algorithm that detects the intrinsic block size to normalize the image grid, thereby expanding the set of valid scaling factors beyond standard integer multiples. This approach enables precise, distortion-free resizing closer to user-specified scales. To validate this approach, we introduce a novel evaluation framework consisting of Color Loss (CL), Block Size Consistency (BSC), and reversibility (REV) tests. Experimental results demonstrate that the proposed method maintains the original palette and grid structure without introducing interpolation artifacts. Furthermore, the reversibility tests confirm that the scaling process remains mathematically lossless, ensuring the genre’s structural and chromatic integrity. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)

► Show Figures

Figure 1

20 pages, 6322 KB

Open AccessArticle

Automated Procedure for Centre Localization, Noise Removal, and Background Suppression in Two-Dimensional X-Ray Diffraction Patterns

by Massimo Ladisa

Appl. Sci. 2026, 16(4), 1776; https://doi.org/10.3390/app16041776 - 11 Feb 2026

Viewed by 358

Abstract

In this paper, we present a comprehensive and automated methodology for processing two-dimensional X-ray diffraction (2D-XRD) patterns. The proposed workflow involves three sequential stages: (i) precise localization of the diffraction center, (ii) removal of high-frequency noise, and (iii) suppression of non-physical background signals. [...] Read more.

In this paper, we present a comprehensive and automated methodology for processing two-dimensional X-ray diffraction (2D-XRD) patterns. The proposed workflow involves three sequential stages: (i) precise localization of the diffraction center, (ii) removal of high-frequency noise, and (iii) suppression of non-physical background signals. This method enables improved data quality for subsequent quantitative analysis such as radial integration, phase identification, and structural refinement. Application to experimental datasets from both the Synchrotron Radiation Facility and a table-top X-ray diffractometer demonstrates the method’s robustness, accuracy, and computational efficiency. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)

► Show Figures

Figure 1

23 pages, 3059 KB

Open AccessArticle

Research on Ship Target Detection in Complex Sea Surface Scenarios Based on Improved YOLOv7

by Zhuang Cai and Weina Zhou

Appl. Sci. 2026, 16(4), 1769; https://doi.org/10.3390/app16041769 - 11 Feb 2026

Viewed by 444

Abstract

Ships target detection plays a crucial role in safeguarding maritime transportation. However, affected by factors such as ocean waves, extreme weather, and target diversity (e.g., large size differences, arbitrary rotation, and occlusion), existing deep learning-based detection methods struggle to achieve a satisfactory balance [...] Read more.

Ships target detection plays a crucial role in safeguarding maritime transportation. However, affected by factors such as ocean waves, extreme weather, and target diversity (e.g., large size differences, arbitrary rotation, and occlusion), existing deep learning-based detection methods struggle to achieve a satisfactory balance among accuracy, speed, and model size in complex marine environments. To address this challenge, this paper proposes a real-time ship detection algorithm (C-YOLO) integrating global perception and multi-scale feature enhancement. First, a Transformer encoder is added before the detection head, which suppresses interference from sea clutter and cloud mist occlusion through long-range dependency modeling, improving the detection of small and occluded ships. Second, a Dual-Effect Focused Residual Fusion Module is designed to replace the backbone’s multi-scale pooling structure, combining the advantages of CBAM (background noise suppression) and SK-Net (dynamic scale adaptation) to simultaneously capture features of ships of different sizes. Finally, a CZIoU loss function is proposed, which integrates constraints on angle, center point, vertex, and area to address rotation, deformation, and multi-scale issues in ship detection. Experimental results on the SeaShips 7000 dataset show that the proposed C-YOLO achieves a Recall of 0.842, mAP@50 of 0.797, and mAP@50:95 of 0.552, outperforming mainstream algorithms such as YOLOv7 (Recall = 0.785, mAP@50 = 0.781), YOLOv9s (Recall = 0.819, mAP@50 = 0.755), and SSD (Recall = 0.802, mAP@50 = 0.833). With 76.75 M parameters and an inference speed of 119 FPS, the model maintains efficient real-time performance while ensuring detection accuracy. This method effectively reduces false detection and missed detection rates in complex scenarios such as port monitoring and maritime traffic control, providing a reliable technical solution for intelligent maritime surveillance and safe navigation—with significant practical value for improving maritime transportation efficiency and reducing safety risks. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)

► Show Figures

Figure 1

17 pages, 7122 KB

Open AccessArticle

Feature Enhancement Method for RGB-D Image Through Convolution over Plane Residuals and Plane Parameters

by Dong-seok Lee and Soon-kak Kwon

Appl. Sci. 2026, 16(2), 1036; https://doi.org/10.3390/app16021036 - 20 Jan 2026

Viewed by 288

Abstract

We propose a feature enhancement method for depth images that applies convolutions over residuals and parameters obtained from dominant plane. The proposed method can obtain the initial features of depth images that are less sensitive to surface orientation and more representative of intrinsic [...] Read more.

We propose a feature enhancement method for depth images that applies convolutions over residuals and parameters obtained from dominant plane. The proposed method can obtain the initial features of depth images that are less sensitive to surface orientation and more representative of intrinsic geometric properties. Specifically, the features are obtained through the plane-based convolution that performs operation on residuals with respect to the dominant plane within a local patch of the depth image. For each patch, a dominant plane is fitted to the corresponding depth pixel values using a least-squares method. Then, convolutional operations are performed on plane residuals computed between the original depth values and the corresponding depth values on the dominant plane. In addition, standard convolution is applied to the dominant plane parameters to capture local variations and spatial consistency of surface orientation. A plane-based convolution module incorporating these convolutions is attached to the initial layer of the existing feature extractor in parallel to supplementarily obtain surface geometric features. Experiment results demonstrate that the proposed method consistently achieves performance gains on both segmentation and classification tasks. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)

► Show Figures

Figure 1

33 pages, 5188 KB

Open AccessArticle

Geometric Feature Enhancement for Robust Facial Landmark Detection in Makeup Paper Templates

by Cheng Chang, Yong-Yi Fanjiang and Chi-Huang Hung

Appl. Sci. 2026, 16(2), 977; https://doi.org/10.3390/app16020977 - 18 Jan 2026

Viewed by 952

Abstract

Traditional scoring of makeup face templates in beauty skill assessments heavily relies on manual judgment, leading to inconsistencies and subjective bias. Hand-drawn templates often exhibit proportion distortions, asymmetry, and occlusions that reduce the accuracy of conventional facial landmark detection algorithms. This study proposes [...] Read more.

Traditional scoring of makeup face templates in beauty skill assessments heavily relies on manual judgment, leading to inconsistencies and subjective bias. Hand-drawn templates often exhibit proportion distortions, asymmetry, and occlusions that reduce the accuracy of conventional facial landmark detection algorithms. This study proposes a novel approach that integrates Geometric Feature Enhancement (GFE) with Dlib’s 68-landmark detection to improve the robustness and precision of landmark localization. A comprehensive comparison among Haar Cascade, MTCNN-MobileNetV2, and Dlib was conducted using a curated dataset of 11,600 hand-drawn facial templates. The proposed GFE-enhanced Dlib achieved 60.5% accuracy—outperforming MTCNN (23.4%) and Haar (20.3%) by approximately 37 percentage points, with precision and F1-score improvements exceeding 20% and 25%, respectively. The results demonstrate that the proposed method significantly enhances detection accuracy and scoring consistency, providing a reliable framework for automated beauty skill evaluation, and laying a solid foundation for future applications such as digital archiving and style-guided synthesis. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)

► Show Figures

Figure 1

26 pages, 23681 KB

Open AccessArticle

Semantic-Guided Spatial and Temporal Fusion Framework for Enhancing Monocular Video Depth Estimation

by Hyunsu Kim, Yeongseop Lee, Hyunseong Ko, Junho Jeong and Yunsik Son

Appl. Sci. 2026, 16(1), 212; https://doi.org/10.3390/app16010212 - 24 Dec 2025

Cited by 1 | Viewed by 1192

Abstract

Despite advancements in deep learning-based Monocular Depth Estimation (MDE), applying these models to video sequences remains challenging due to geometric ambiguities in texture-less regions and temporal instability caused by independent per-frame inference. To address these limitations, we propose STF-Depth, a novel post-processing framework [...] Read more.

Despite advancements in deep learning-based Monocular Depth Estimation (MDE), applying these models to video sequences remains challenging due to geometric ambiguities in texture-less regions and temporal instability caused by independent per-frame inference. To address these limitations, we propose STF-Depth, a novel post-processing framework that enhances depth quality by logically fusing heterogeneous information—geometric, semantic, and panoptic—without requiring additional retraining. Our approach introduces a robust RANSAC-based Vanishing Point Estimation to guide Dynamic Depth Gradient Correction for background separation, alongside Adaptive Instance Re-ordering to clarify occlusion relationships. Experimental results on the KITTI, NYU Depth V2, and TartanAir datasets demonstrate that STF-Depth functions as a universal plug-and-play module. Notably, it achieved a 25.7% reduction in Absolute Relative error (AbsRel) and significantly enhanced temporal consistency compared to state-of-the-art backbone models. These findings confirm the framework’s practicality for real-world applications requiring geometric precision and video stability, such as autonomous driving, robotics, and augmented reality (AR). Full article

(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)

► Show Figures

Figure 1

26 pages, 5101 KB

Open AccessArticle

Cross-Modal Adaptive Fusion and Multi-Scale Aggregation Network for RGB-T Crowd Density Estimation and Counting

by Jian Liu, Zuodong Niu, Yufan Zhang and Lin Tang

Appl. Sci. 2026, 16(1), 161; https://doi.org/10.3390/app16010161 - 23 Dec 2025

Viewed by 712

Abstract

Crowd counting is a significant task in computer vision. By combining the rich texture information from RGB images with the insensitivity to illumination changes offered by thermal imaging, the applicability of models in real-world complex scenarios can be enhanced. Current research on RGB-T [...] Read more.

Crowd counting is a significant task in computer vision. By combining the rich texture information from RGB images with the insensitivity to illumination changes offered by thermal imaging, the applicability of models in real-world complex scenarios can be enhanced. Current research on RGB-T crowd counting primarily focuses on feature fusion strategies, multi-scale structures, and the exploration of novel network architectures such as Vision Transformer and Mamba. However, existing approaches face two key challenges: limited robustness to illumination shifts and insufficient handling of scale discrepancies. To address these challenges, this study aims to develop a robust RGB-T crowd counting framework that remains stable under illumination shifts, through introduces two key innovations beyond existing fusion and multi-scale approaches: (1) a cross-modal adaptive fusion module (CMAFM) that actively evaluates and fuses reliable cross-modal features under varying scenarios by simulating a dynamic feature selection and trust allocation mechanism; and (2) a multi-scale aggregation module (MSAM) that unifies features with different receptive fields to an intermediate scale and performs weighted fusion to enhance modeling capability for cross-modal scale variations. The proposed method achieves relative improvements of 1.57% in GAME(0) and 0.78% in RMSE on the DroneRGBT dataset compared to existing methods, and improvements of 2.48% and 1.59% on the RGBT-CC dataset, respectively. It also demonstrates higher stability and robustness under varying lighting conditions. This research provides an effective solution for building stable and reliable all-weather crowd counting systems, with significant application prospects in smart city security and management. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)

► Show Figures

Figure 1

14 pages, 11858 KB

Open AccessArticle

Few-Shot Fine-Grained Image Classification with Residual Reconstruction Network Based on Feature Enhancement

by Ying Liu, Haibin Zhang and Weidong Zhang

Appl. Sci. 2025, 15(18), 9953; https://doi.org/10.3390/app15189953 - 11 Sep 2025

Cited by 1 | Viewed by 1681

Abstract

In recent years, few-shot fine-grained image classification has shown great potential in addressing data scarcity and distinguishing highly similar categories. However, existing unidirectional reconstruction methods, while enhancing inter-class differences, fail to effectively suppress intra-class variations; bidirectional reconstruction methods, although alleviating intra-class variations, inevitably [...] Read more.

In recent years, few-shot fine-grained image classification has shown great potential in addressing data scarcity and distinguishing highly similar categories. However, existing unidirectional reconstruction methods, while enhancing inter-class differences, fail to effectively suppress intra-class variations; bidirectional reconstruction methods, although alleviating intra-class variations, inevitably introduce background noise. To overcome these limitations, this paper proposes a Bidirectional Feature Reconstruction Network that incorporates a Feature Enhancement Attention Module (FEAM) to highlight discriminative regions and suppress background interference, while integrating a Channel-Aware Spatial Attention (CASA) module to strengthen local feature modeling and compensate for the Transformer’s tendency to overemphasize global information. This joint design not only enhances inter-class separability but also effectively reduces intra-class variation. Extensive experiments on the CUB-200-2011, Stanford Cars, and Stanford Dogs datasets demonstrate that the proposed method consistently outperforms state-of-the-art approaches, validating its effectiveness and robustness in few-shot fine-grained image classification. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)

► Show Figures

Figure 1

12 pages, 3508 KB

Open AccessArticle

Improvement of the Cross-Scale Multi-Feature Stereo Matching Algorithm

by Nan Chen, Dongri Shan and Peng Zhang

Appl. Sci. 2025, 15(11), 5837; https://doi.org/10.3390/app15115837 - 22 May 2025

Cited by 1 | Viewed by 1333

Abstract

With the continuous advancement of industrialization and intelligentization, stereo-vision-based measurement technology for large-scale components has become a prominent research focus. To address weak-textured regions in large-scale component images and reduce mismatches in stereo matching, we propose a cross-scale multi-feature stereo matching algorithm. In [...] Read more.

With the continuous advancement of industrialization and intelligentization, stereo-vision-based measurement technology for large-scale components has become a prominent research focus. To address weak-textured regions in large-scale component images and reduce mismatches in stereo matching, we propose a cross-scale multi-feature stereo matching algorithm. In the cost-computation stage, the sum of absolute differences (SAD), census, and modified census cost aggregation are employed as cost-calculation methods. During the cost-aggregation phase, cross-scale theory is introduced to fuse multi-scale cost volumes using distinct aggregation parameters through a cross-scale framework. Experimental results on both benchmark and real-world datasets demonstrate that the enhanced algorithm achieves an average mismatch rate of 12.25%, exhibiting superior robustness compared to conventional census transform and semi-global matching (SGM) algorithms. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Advances in Computer Vision and Digital Image Processing

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (14 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI