applsci-logo

Journal Browser

Journal Browser

Advances in Computer Vision and Digital Image Processing

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 31 December 2026 | Viewed by 9708

Special Issue Editor

Special Issue Information

Dear Colleagues,

Computer vision and digital image processing are vital to intelligent systems, empowering machines to interpret and interact with the visual world. Rapid advances in artificial intelligence, edge computing, and sensor technologies are driving innovation across domains like healthcare, manufacturing, agriculture, security, and autonomous systems. Vision-based applications now match or even surpass human performance in specialized areas such as medical diagnosis and surveillance.

Recent breakthroughs include transformer-based vision models, which leverage attention mechanisms from natural language processing to achieve state-of-the-art results. Self-supervised learning is also reshaping the field by enabling models to learn from unlabeled data, reducing the need for manual annotation. Furthermore, edge computing brings powerful image analysis to resource-constrained devices, supporting real-time applications in smartphones, drones, and more. Additionally, generative AI is pushing boundaries in content synthesis, image augmentation, and reconstruction.

Topics of interest include, but are not limited to, the following:

  • Image and video analysis;
  • Two-dimensional and three-dimensional object detection and recognition;
  • Medical image processing;
  • Deep learning and neural networks for vision;
  • Scene understanding and segmentation;
  • Multimodal data fusion;
  • Real-time and embedded vision systems;
  • Vision-based human–computer interaction;
  • Generative AI for image synthesis and processing;
  • Industrial and biomedical vision applications.

The Special Issue, “Advances in Computer Vision and Digital Image Processing”, aims to bring together cutting-edge research and recent developments in the field of computer vision and image processing. Original research papers are strongly encouraged, while review articles and surveys are also welcome. We seek contributions that highlight novel algorithms, efficient methods, and emerging techniques, as well as interdisciplinary approaches that demonstrate the practical impact of vision systems in real-world scenarios.

Prof. Dr. Panayiotis Vlamos
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • digital image processing
  • deep learning
  • object detection
  • image segmentation
  • generative AI
  • medical imaging
  • scene understanding
  • multimodal data fusion
  • embedded vision systems
  • human–computer interaction
  • real-time image analysis

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (13 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

25 pages, 11059 KB  
Article
Few-Shot Open-Set Object Detection with a Synthesized Monument Guided by Contrastive Distilled Prompts
by Hao Chen and Ying Chen
Appl. Sci. 2026, 16(7), 3474; https://doi.org/10.3390/app16073474 - 2 Apr 2026
Viewed by 310
Abstract
Few-shot open-set object detection (FS-OSOD) remains challenging in real-world scenarios, where detectors must accurately recognize known objects from few examples while reliably rejecting vast unknown categories. Under this setting, decision boundaries between known and unknown classes are easily distorted by data scarcity and [...] Read more.
Few-shot open-set object detection (FS-OSOD) remains challenging in real-world scenarios, where detectors must accurately recognize known objects from few examples while reliably rejecting vast unknown categories. Under this setting, decision boundaries between known and unknown classes are easily distorted by data scarcity and background clutter, leading to severe overfitting on base classes and overconfident misclassification of unknowns. Recent research attempts to alleviate these issues by regularizing detection heads to suppress base-class bias, or by leveraging vision–language priors through open-vocabulary alignment and prompt tuning to enhance semantic transferability. However, these solutions often overlook explicit modeling of truly out-of-set unknowns and the instability of prompt adaptation in low-data regimes, which can cause boundary drifts and make unknown proposals be absorbed by similar seen classes or even suppressed as background. To alleviate these issues, a guided prompt–monument network (GPMN) that is proposed, which jointly enhances prompt learning and feature representation learning for FS-OSOD. First, the contrastive distilled prompts (CDP) module employs a teacher–student prompt framework to decouple optimization across base, novel, and unknown classes. This strategy preserves transferability between zero-shot and few-shot settings while enhancing discrimination on base categories. Second, a synthesized monument module (SMM) maintains class-centered memory with momentum-updated prototypes and a non-parametric classifier, which compresses the overlap between seen and unseen distributions and provides a stable rejection margin for unknowns with strong co-occurrence and background noise. Compared with existing head-regularization and open-vocabulary prompt-tuning pipelines, GPMN explicitly targets both base-class bias and seen–unseen overlap at the region level. Extensive experiments on VOC10-5-5 and VOC-COCO benchmarks demonstrate that GPMN consistently improves unknown recall and few-shot mAP over representative FS-OSOD baselines. These results suggest that prompt-level decoupling mitigates base-class bias, whereas memory-anchored regularization enlarges the seen–unseen margin, jointly supporting reliable unknown rejection in scarce-supervision regimes. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)
Show Figures

Figure 1

22 pages, 8847 KB  
Article
DGAGaze: Gaze Estimation with Dual-Stream Differential Attention and Geometry-Aware Temporal Alignment
by Wei Zhang and Pengcheng Li
Appl. Sci. 2026, 16(7), 3298; https://doi.org/10.3390/app16073298 - 29 Mar 2026
Viewed by 388
Abstract
Gaze estimation plays a crucial role in human-computer interaction and behavior analysis. However, in dynamic scenes, rigid head movements and rapid gaze shifts pose significant challenges to accurate gaze prediction. Most existing methods either process single-frame images independently or rely on long video [...] Read more.
Gaze estimation plays a crucial role in human-computer interaction and behavior analysis. However, in dynamic scenes, rigid head movements and rapid gaze shifts pose significant challenges to accurate gaze prediction. Most existing methods either process single-frame images independently or rely on long video sequences, making it difficult to simultaneously achieve strong performance and high computational efficiency. To address this issue, we propose DGAGaze, a gaze estimation framework based on a difference-driven spatiotemporal attention mechanism. This framework uses a geometry-aware temporal alignment module to mitigate interference from rigid head movements, compensating for them through pose estimation and affine feature warping, thereby achieving explicit decoupling between global head motion and local eye motion. Based on the aligned features, inter-frame differences are used to adjust spatial and channel attention weights, enhancing motion-sensitive representations without introducing an additional temporal modeling layer. Extensive experiments on the EyeDiap and Gaze360 datasets demonstrate the effectiveness of the proposed approach. DGAGaze achieves improved gaze estimation accuracy while maintaining a lightweight architecture based on a ResNet-18 backbone, outperforming existing state-of-the-art methods. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)
Show Figures

Figure 1

24 pages, 5930 KB  
Article
Style-Abstraction-Based Data Augmentation for Robust Affective Computing
by Xu Qiu, Taewan Kim and Bongjae Kim
Appl. Sci. 2026, 16(6), 3109; https://doi.org/10.3390/app16063109 - 23 Mar 2026
Viewed by 382
Abstract
Personality recognition and emotion recognition, two core tasks within affective computing, are fundamentally constrained by data scarcity as collecting and annotating human behavioral data is expensive and restricted by privacy concerns. Under these limited data conditions, existing models tend to rely on superficial [...] Read more.
Personality recognition and emotion recognition, two core tasks within affective computing, are fundamentally constrained by data scarcity as collecting and annotating human behavioral data is expensive and restricted by privacy concerns. Under these limited data conditions, existing models tend to rely on superficial shortcut features such as background appearance, lighting conditions, or color variations, rather than behavior-relevant cues including facial expressions, posture, and motion dynamics. To address this issue, we propose Style-Abstraction-based Data Augmentation, a style transfer-based augmentation strategy that reduces dependency on low-level appearance information while preserving high-level semantic cues. Specifically, we employ cartoonization to generate stylized variants of training videos that retain expressive characteristics but remove stylistic bias. We validate our approach on three diverse personality benchmarks (First Impression v2, UDIVA v0.5, and KETI) and emotion benchmark(Emotion Dataset) using state-of-the-art models including ViViT (Video Vision Transformer), TimeSformer, and VST (Video Swin Transformer). Our experiments indicate that increasing the proportion of style-abstracted data in the training set can improve performance on the evaluated datasets. Notably, our method yields consistent gains across all benchmarks: a 0.0893 reduction in MSE on UDIVA v0.5 (with VST), a 0.0023 improvement in 1-MAE on KETI (with TimeSformer), and a 0.0051 improvement on First Impression v2 (with TimeSformer). Furthermore, extending style-abstraction-based data augmentation to a four-class categorical emotion recognition task demonstrates similar performance gains, achieving up to a 3.44% accuracy increase with the TimeSformer backbone. These findings verify that our style-abstraction-based data augmentation facilitates learning of behavior-relevant features by reducing reliance on superficial shortcuts. Overall, cartoonization-based style abstraction for data augmentation functions as both an effective augmentation strategy and a regularization mechanism, encouraging the model to learn more stable and generalizable representations for affective computing applications. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)
Show Figures

Figure 1

21 pages, 2720 KB  
Article
Adaptive Multi-Branch Feature Fusion for Low-Light Image Enhancement
by Serdar Çiftçi
Appl. Sci. 2026, 16(6), 2712; https://doi.org/10.3390/app16062712 - 12 Mar 2026
Viewed by 400
Abstract
Low-light image enhancement (LLIE) remains a challenging problem due to spatially varying illumination degradation, compressed tonal distributions, and structural detail loss. This paper presents Adaptive Multi-Branch Feature Fusion (AMBFF), a unified framework that formulates LLIE as a multi-domain representation alignment task. The proposed [...] Read more.
Low-light image enhancement (LLIE) remains a challenging problem due to spatially varying illumination degradation, compressed tonal distributions, and structural detail loss. This paper presents Adaptive Multi-Branch Feature Fusion (AMBFF), a unified framework that formulates LLIE as a multi-domain representation alignment task. The proposed architecture explicitly models complementary feature domains, including hierarchical spatial context, luminance–chrominance decoupling, edge–texture structures, frequency-domain information, and differentiable tonal histogram representations. A spatially adaptive gating mechanism dynamically weights multi-feature branches through a convex fusion strategy, enabling location-aware illumination correction while preserving structural integrity and color fidelity. Extensive evaluations on widely used benchmark datasets demonstrate that AMBFF consistently outperforms representative conventional and deep learning-based approaches in terms of PSNR, SSIM, and LPIPS. Ablation analyses confirm the complementarity of the proposed feature domains and the robustness benefits of adaptive fusion. Despite its multi-branch design, AMBFF maintains a favorable performance–complexity trade-off, highlighting the effectiveness of structured multi-domain modeling for low-light image enhancement. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)
Show Figures

Figure 1

23 pages, 4020 KB  
Article
Structure-Aware Pixel Art Scaling via Block Size Detection
by Jun Won Seo, Jun Won Lee, Jong Hyuck Lee, Jun Beom Kim and Jin-Woo Jung
Appl. Sci. 2026, 16(5), 2314; https://doi.org/10.3390/app16052314 - 27 Feb 2026
Viewed by 458
Abstract
Standard interpolation methods degrade pixel art through blurring or geometric distortion. We propose a lossless scaling algorithm that detects the intrinsic block size to normalize the image grid, thereby expanding the set of valid scaling factors beyond standard integer multiples. This approach enables [...] Read more.
Standard interpolation methods degrade pixel art through blurring or geometric distortion. We propose a lossless scaling algorithm that detects the intrinsic block size to normalize the image grid, thereby expanding the set of valid scaling factors beyond standard integer multiples. This approach enables precise, distortion-free resizing closer to user-specified scales. To validate this approach, we introduce a novel evaluation framework consisting of Color Loss (CL), Block Size Consistency (BSC), and reversibility (REV) tests. Experimental results demonstrate that the proposed method maintains the original palette and grid structure without introducing interpolation artifacts. Furthermore, the reversibility tests confirm that the scaling process remains mathematically lossless, ensuring the genre’s structural and chromatic integrity. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)
Show Figures

Figure 1

20 pages, 6322 KB  
Article
Automated Procedure for Centre Localization, Noise Removal, and Background Suppression in Two-Dimensional X-Ray Diffraction Patterns
by Massimo Ladisa
Appl. Sci. 2026, 16(4), 1776; https://doi.org/10.3390/app16041776 - 11 Feb 2026
Viewed by 342
Abstract
In this paper, we present a comprehensive and automated methodology for processing two-dimensional X-ray diffraction (2D-XRD) patterns. The proposed workflow involves three sequential stages: (i) precise localization of the diffraction center, (ii) removal of high-frequency noise, and (iii) suppression of non-physical background signals. [...] Read more.
In this paper, we present a comprehensive and automated methodology for processing two-dimensional X-ray diffraction (2D-XRD) patterns. The proposed workflow involves three sequential stages: (i) precise localization of the diffraction center, (ii) removal of high-frequency noise, and (iii) suppression of non-physical background signals. This method enables improved data quality for subsequent quantitative analysis such as radial integration, phase identification, and structural refinement. Application to experimental datasets from both the Synchrotron Radiation Facility and a table-top X-ray diffractometer demonstrates the method’s robustness, accuracy, and computational efficiency. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)
Show Figures

Figure 1

23 pages, 3059 KB  
Article
Research on Ship Target Detection in Complex Sea Surface Scenarios Based on Improved YOLOv7
by Zhuang Cai and Weina Zhou
Appl. Sci. 2026, 16(4), 1769; https://doi.org/10.3390/app16041769 - 11 Feb 2026
Viewed by 421
Abstract
Ships target detection plays a crucial role in safeguarding maritime transportation. However, affected by factors such as ocean waves, extreme weather, and target diversity (e.g., large size differences, arbitrary rotation, and occlusion), existing deep learning-based detection methods struggle to achieve a satisfactory balance [...] Read more.
Ships target detection plays a crucial role in safeguarding maritime transportation. However, affected by factors such as ocean waves, extreme weather, and target diversity (e.g., large size differences, arbitrary rotation, and occlusion), existing deep learning-based detection methods struggle to achieve a satisfactory balance among accuracy, speed, and model size in complex marine environments. To address this challenge, this paper proposes a real-time ship detection algorithm (C-YOLO) integrating global perception and multi-scale feature enhancement. First, a Transformer encoder is added before the detection head, which suppresses interference from sea clutter and cloud mist occlusion through long-range dependency modeling, improving the detection of small and occluded ships. Second, a Dual-Effect Focused Residual Fusion Module is designed to replace the backbone’s multi-scale pooling structure, combining the advantages of CBAM (background noise suppression) and SK-Net (dynamic scale adaptation) to simultaneously capture features of ships of different sizes. Finally, a CZIoU loss function is proposed, which integrates constraints on angle, center point, vertex, and area to address rotation, deformation, and multi-scale issues in ship detection. Experimental results on the SeaShips 7000 dataset show that the proposed C-YOLO achieves a Recall of 0.842, mAP@50 of 0.797, and mAP@50:95 of 0.552, outperforming mainstream algorithms such as YOLOv7 (Recall = 0.785, mAP@50 = 0.781), YOLOv9s (Recall = 0.819, mAP@50 = 0.755), and SSD (Recall = 0.802, mAP@50 = 0.833). With 76.75 M parameters and an inference speed of 119 FPS, the model maintains efficient real-time performance while ensuring detection accuracy. This method effectively reduces false detection and missed detection rates in complex scenarios such as port monitoring and maritime traffic control, providing a reliable technical solution for intelligent maritime surveillance and safe navigation—with significant practical value for improving maritime transportation efficiency and reducing safety risks. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)
Show Figures

Figure 1

17 pages, 7122 KB  
Article
Feature Enhancement Method for RGB-D Image Through Convolution over Plane Residuals and Plane Parameters
by Dong-seok Lee and Soon-kak Kwon
Appl. Sci. 2026, 16(2), 1036; https://doi.org/10.3390/app16021036 - 20 Jan 2026
Viewed by 268
Abstract
We propose a feature enhancement method for depth images that applies convolutions over residuals and parameters obtained from dominant plane. The proposed method can obtain the initial features of depth images that are less sensitive to surface orientation and more representative of intrinsic [...] Read more.
We propose a feature enhancement method for depth images that applies convolutions over residuals and parameters obtained from dominant plane. The proposed method can obtain the initial features of depth images that are less sensitive to surface orientation and more representative of intrinsic geometric properties. Specifically, the features are obtained through the plane-based convolution that performs operation on residuals with respect to the dominant plane within a local patch of the depth image. For each patch, a dominant plane is fitted to the corresponding depth pixel values using a least-squares method. Then, convolutional operations are performed on plane residuals computed between the original depth values and the corresponding depth values on the dominant plane. In addition, standard convolution is applied to the dominant plane parameters to capture local variations and spatial consistency of surface orientation. A plane-based convolution module incorporating these convolutions is attached to the initial layer of the existing feature extractor in parallel to supplementarily obtain surface geometric features. Experiment results demonstrate that the proposed method consistently achieves performance gains on both segmentation and classification tasks. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)
Show Figures

Figure 1

33 pages, 5188 KB  
Article
Geometric Feature Enhancement for Robust Facial Landmark Detection in Makeup Paper Templates
by Cheng Chang, Yong-Yi Fanjiang and Chi-Huang Hung
Appl. Sci. 2026, 16(2), 977; https://doi.org/10.3390/app16020977 - 18 Jan 2026
Viewed by 851
Abstract
Traditional scoring of makeup face templates in beauty skill assessments heavily relies on manual judgment, leading to inconsistencies and subjective bias. Hand-drawn templates often exhibit proportion distortions, asymmetry, and occlusions that reduce the accuracy of conventional facial landmark detection algorithms. This study proposes [...] Read more.
Traditional scoring of makeup face templates in beauty skill assessments heavily relies on manual judgment, leading to inconsistencies and subjective bias. Hand-drawn templates often exhibit proportion distortions, asymmetry, and occlusions that reduce the accuracy of conventional facial landmark detection algorithms. This study proposes a novel approach that integrates Geometric Feature Enhancement (GFE) with Dlib’s 68-landmark detection to improve the robustness and precision of landmark localization. A comprehensive comparison among Haar Cascade, MTCNN-MobileNetV2, and Dlib was conducted using a curated dataset of 11,600 hand-drawn facial templates. The proposed GFE-enhanced Dlib achieved 60.5% accuracy—outperforming MTCNN (23.4%) and Haar (20.3%) by approximately 37 percentage points, with precision and F1-score improvements exceeding 20% and 25%, respectively. The results demonstrate that the proposed method significantly enhances detection accuracy and scoring consistency, providing a reliable framework for automated beauty skill evaluation, and laying a solid foundation for future applications such as digital archiving and style-guided synthesis. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)
Show Figures

Figure 1

26 pages, 23681 KB  
Article
Semantic-Guided Spatial and Temporal Fusion Framework for Enhancing Monocular Video Depth Estimation
by Hyunsu Kim, Yeongseop Lee, Hyunseong Ko, Junho Jeong and Yunsik Son
Appl. Sci. 2026, 16(1), 212; https://doi.org/10.3390/app16010212 - 24 Dec 2025
Viewed by 1110
Abstract
Despite advancements in deep learning-based Monocular Depth Estimation (MDE), applying these models to video sequences remains challenging due to geometric ambiguities in texture-less regions and temporal instability caused by independent per-frame inference. To address these limitations, we propose STF-Depth, a novel post-processing framework [...] Read more.
Despite advancements in deep learning-based Monocular Depth Estimation (MDE), applying these models to video sequences remains challenging due to geometric ambiguities in texture-less regions and temporal instability caused by independent per-frame inference. To address these limitations, we propose STF-Depth, a novel post-processing framework that enhances depth quality by logically fusing heterogeneous information—geometric, semantic, and panoptic—without requiring additional retraining. Our approach introduces a robust RANSAC-based Vanishing Point Estimation to guide Dynamic Depth Gradient Correction for background separation, alongside Adaptive Instance Re-ordering to clarify occlusion relationships. Experimental results on the KITTI, NYU Depth V2, and TartanAir datasets demonstrate that STF-Depth functions as a universal plug-and-play module. Notably, it achieved a 25.7% reduction in Absolute Relative error (AbsRel) and significantly enhanced temporal consistency compared to state-of-the-art backbone models. These findings confirm the framework’s practicality for real-world applications requiring geometric precision and video stability, such as autonomous driving, robotics, and augmented reality (AR). Full article
(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)
Show Figures

Figure 1

26 pages, 5101 KB  
Article
Cross-Modal Adaptive Fusion and Multi-Scale Aggregation Network for RGB-T Crowd Density Estimation and Counting
by Jian Liu, Zuodong Niu, Yufan Zhang and Lin Tang
Appl. Sci. 2026, 16(1), 161; https://doi.org/10.3390/app16010161 - 23 Dec 2025
Viewed by 678
Abstract
Crowd counting is a significant task in computer vision. By combining the rich texture information from RGB images with the insensitivity to illumination changes offered by thermal imaging, the applicability of models in real-world complex scenarios can be enhanced. Current research on RGB-T [...] Read more.
Crowd counting is a significant task in computer vision. By combining the rich texture information from RGB images with the insensitivity to illumination changes offered by thermal imaging, the applicability of models in real-world complex scenarios can be enhanced. Current research on RGB-T crowd counting primarily focuses on feature fusion strategies, multi-scale structures, and the exploration of novel network architectures such as Vision Transformer and Mamba. However, existing approaches face two key challenges: limited robustness to illumination shifts and insufficient handling of scale discrepancies. To address these challenges, this study aims to develop a robust RGB-T crowd counting framework that remains stable under illumination shifts, through introduces two key innovations beyond existing fusion and multi-scale approaches: (1) a cross-modal adaptive fusion module (CMAFM) that actively evaluates and fuses reliable cross-modal features under varying scenarios by simulating a dynamic feature selection and trust allocation mechanism; and (2) a multi-scale aggregation module (MSAM) that unifies features with different receptive fields to an intermediate scale and performs weighted fusion to enhance modeling capability for cross-modal scale variations. The proposed method achieves relative improvements of 1.57% in GAME(0) and 0.78% in RMSE on the DroneRGBT dataset compared to existing methods, and improvements of 2.48% and 1.59% on the RGBT-CC dataset, respectively. It also demonstrates higher stability and robustness under varying lighting conditions. This research provides an effective solution for building stable and reliable all-weather crowd counting systems, with significant application prospects in smart city security and management. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)
Show Figures

Figure 1

14 pages, 11858 KB  
Article
Few-Shot Fine-Grained Image Classification with Residual Reconstruction Network Based on Feature Enhancement
by Ying Liu, Haibin Zhang and Weidong Zhang
Appl. Sci. 2025, 15(18), 9953; https://doi.org/10.3390/app15189953 - 11 Sep 2025
Cited by 1 | Viewed by 1638
Abstract
In recent years, few-shot fine-grained image classification has shown great potential in addressing data scarcity and distinguishing highly similar categories. However, existing unidirectional reconstruction methods, while enhancing inter-class differences, fail to effectively suppress intra-class variations; bidirectional reconstruction methods, although alleviating intra-class variations, inevitably [...] Read more.
In recent years, few-shot fine-grained image classification has shown great potential in addressing data scarcity and distinguishing highly similar categories. However, existing unidirectional reconstruction methods, while enhancing inter-class differences, fail to effectively suppress intra-class variations; bidirectional reconstruction methods, although alleviating intra-class variations, inevitably introduce background noise. To overcome these limitations, this paper proposes a Bidirectional Feature Reconstruction Network that incorporates a Feature Enhancement Attention Module (FEAM) to highlight discriminative regions and suppress background interference, while integrating a Channel-Aware Spatial Attention (CASA) module to strengthen local feature modeling and compensate for the Transformer’s tendency to overemphasize global information. This joint design not only enhances inter-class separability but also effectively reduces intra-class variation. Extensive experiments on the CUB-200-2011, Stanford Cars, and Stanford Dogs datasets demonstrate that the proposed method consistently outperforms state-of-the-art approaches, validating its effectiveness and robustness in few-shot fine-grained image classification. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)
Show Figures

Figure 1

12 pages, 3508 KB  
Article
Improvement of the Cross-Scale Multi-Feature Stereo Matching Algorithm
by Nan Chen, Dongri Shan and Peng Zhang
Appl. Sci. 2025, 15(11), 5837; https://doi.org/10.3390/app15115837 - 22 May 2025
Cited by 1 | Viewed by 1316
Abstract
With the continuous advancement of industrialization and intelligentization, stereo-vision-based measurement technology for large-scale components has become a prominent research focus. To address weak-textured regions in large-scale component images and reduce mismatches in stereo matching, we propose a cross-scale multi-feature stereo matching algorithm. In [...] Read more.
With the continuous advancement of industrialization and intelligentization, stereo-vision-based measurement technology for large-scale components has become a prominent research focus. To address weak-textured regions in large-scale component images and reduce mismatches in stereo matching, we propose a cross-scale multi-feature stereo matching algorithm. In the cost-computation stage, the sum of absolute differences (SAD), census, and modified census cost aggregation are employed as cost-calculation methods. During the cost-aggregation phase, cross-scale theory is introduced to fuse multi-scale cost volumes using distinct aggregation parameters through a cross-scale framework. Experimental results on both benchmark and real-world datasets demonstrate that the enhanced algorithm achieves an average mismatch rate of 12.25%, exhibiting superior robustness compared to conventional census transform and semi-global matching (SGM) algorithms. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)
Show Figures

Figure 1

Back to TopTop