applsci-logo

Journal Browser

Journal Browser

Advances in Image Recognition and Processing Technologies

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 31 May 2025 | Viewed by 29829

Special Issue Editors

College of Information Science and Technology, Beijing University of Chemical Technology (BUCT) and Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing 100029, China
Interests: pattern recognition; detection and tracking; visual intelligence
State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing 100191, China
Interests: computer vision; pattern recognition; image processing; edge intelligence

Special Issue Information

Dear Colleagues,

Image Recognition and Processing Technologies have contributed to significant advances in many fields in recent years. However, there remain many challenges to be addressed due to the inherent complexity of computer vision, limiting its performance in various applications. Therefore, this Special Issue is being assembled in order to share various in-depth research results related to image recognition and processing methods, including, but not limited to, object detection, object tracking, image super-resolution, depth estimation, semantic segmentation and so on. We hope that these advanced methods can boost the application of these technologies in the real world.

It is our pleasure to invite you to join this Special Issue, entitled “Advances in Image Recognition and Processing Technologies”, whereby you are welcome to contribute a manuscript presenting your valuable research progress. Thank you very much.

Dr. Yang Zhang
Dr. Shuai Wang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • image recognition
  • image processing
  • image super-resolution
  • depth estimation
  • semantic segmentation
  • object detection
  • object tracking
  • semantic segmentation

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (17 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

26 pages, 16194 KiB  
Article
Defect R-CNN: A Novel High-Precision Method for CT Image Defect Detection
by Zirou Jiang, Jintao Fu, Tianchen Zeng, Renjie Liu, Peng Cong, Jichen Miao and Yuewen Sun
Appl. Sci. 2025, 15(9), 4825; https://doi.org/10.3390/app15094825 - 26 Apr 2025
Viewed by 434
Abstract
Defect detection in industrial computed tomography (CT) images remains challenging due to small defect sizes, low contrast, and noise interference. To address these issues, we propose Defect R-CNN, a novel detection framework designed to capture the structural characteristics of defects in CT images. [...] Read more.
Defect detection in industrial computed tomography (CT) images remains challenging due to small defect sizes, low contrast, and noise interference. To address these issues, we propose Defect R-CNN, a novel detection framework designed to capture the structural characteristics of defects in CT images. The model incorporates an edge-prior convolutional block (EPCB) that guides to focus on extracting edge information, particularly along defect boundaries, improving both localization and classification. Additionally, we introduce a custom backbone, edge-prior net (EP-Net), to capture features across multiple spatial scales, enhancing the recognition of subtle and complex defect patterns. During inference, the multi-branch structure is consolidated into a single-branch equivalent to accelerate detection without compromising accuracy. Experiments conducted on a CT dataset of nuclear graphite components from a high-temperature gas-cooled reactor (HTGR) demonstrate that Defect R-CNN achieves average precision (AP) exceeding 0.9 for all defect types. Moreover, the model attains mean average precision (mAP) scores of 0.983 for bounding boxes (mAP-bbox) and 0.956 for segmentation masks (mAP-segm), surpassing established methods such as Faster R-CNN, Mask R-CNN, Efficient Net, RT-DETR, and YOLOv11. The inference speed reaches 76.2 frames per second (FPS), representing an optimal balance between accuracy and efficiency. This study demonstrates that Defect R-CNN offers a robust and reliable approach for industrial scenarios that require high-precision and real-time defect detection. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

20 pages, 49431 KiB  
Article
Generative Adversarial Network-Based Lightweight High-Dynamic-Range Image Reconstruction Model
by Gustavo de Souza Ferreti, Thuanne Paixão and Ana Beatriz Alvarez
Appl. Sci. 2025, 15(9), 4801; https://doi.org/10.3390/app15094801 - 25 Apr 2025
Viewed by 310
Abstract
The generation of High-Dynamic-Range (HDR) images is essential for capturing details at various brightness levels, but current reconstruction methods, using deep learning techniques, often require significant computational resources, limiting their applicability on devices with moderate resources. In this context, this paper presents a [...] Read more.
The generation of High-Dynamic-Range (HDR) images is essential for capturing details at various brightness levels, but current reconstruction methods, using deep learning techniques, often require significant computational resources, limiting their applicability on devices with moderate resources. In this context, this paper presents a lightweight architecture for reconstructing HDR images from three Low-Dynamic-Range inputs. The proposed model is based on Generative Adversarial Networks and replaces traditional convolutions with depthwise separable convolutions, reducing the number of parameters while maintaining high visual quality and minimizing luminance artifacts. The evaluation of the proposal is conducted through quantitative, qualitative, and computational cost analyses based on the number of parameters and FLOPs. Regarding the qualitative analysis, a comparison between the models was performed using samples that present reconstruction challenges. The proposed model achieves a PSNR-μ of 43.51 dB and SSIM-μ of 0.9917, achieving competitive quality metrics comparable to HDR-GAN while reducing the computational cost by 6× in FLOPs and 7× in the number of parameters, using approximately half the GPU memory consumption, demonstrating an effective balance between visual fidelity and efficiency. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

23 pages, 5525 KiB  
Article
Automatic Identification and Segmentation of Overlapping Fog Droplets Using XGBoost and Image Segmentation
by Dongde Liao, Xiongfei Chen, Muhua Liu, Yihan Zhou, Peng Fang, Jinlong Lin, Zhaopeng Liu and Xiao Wang
Appl. Sci. 2025, 15(5), 2847; https://doi.org/10.3390/app15052847 - 6 Mar 2025
Viewed by 584
Abstract
Water-sensitive paper (WSP) has been widely used to assess the quality of pesticide sprays. However, fog droplets tend to overlap on WSP. In order to accurately measure the droplet size and grasp the droplet distribution pattern, this study proposes a method based on [...] Read more.
Water-sensitive paper (WSP) has been widely used to assess the quality of pesticide sprays. However, fog droplets tend to overlap on WSP. In order to accurately measure the droplet size and grasp the droplet distribution pattern, this study proposes a method based on the optimized XGBoost classification model combined with improved concave-point matching to achieve multi-level overlapping-droplet segmentation. For different types of overlapping droplets, the corresponding improved segmentation algorithm is used to improve the segmentation accuracy. For parallel overlapping droplets, the centre-of-mass segmentation method is used; for non-parallel overlapping droplets, the minimum-distance segmentation method is used; and for strong overlapping of a single concave point, the vertical-linkage segmentation method is used. Complex overlapping droplets were gradually segmented by loop iteration until a single droplet was obtained or no further segmentation was possible, and then ellipse fitting was used to obtain the final single-droplet profile. Up to 105 WSPs were obtained in an orchard field through drone spraying experiments, and were used to validate the effectiveness of the method. The experimental results show that the classification model proposed in this paper achieves an average accuracy of 98% in identifying overlapping-droplet types, which effectively meets the needs of subsequent segmentation. The overall segmentation accuracy of the method is 91.35%, which is significantly better than the contour-solidity and watershed-based algorithm (76.19%) and the improved-concave-point-segmentation algorithm (68.82%). In conclusion, the method proposed in this paper provides an efficient and accurate new approach for pesticide spraying quality assessment. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

16 pages, 436 KiB  
Article
Improved Localization and Recognition of Handwritten Digits on MNIST Dataset with ConvGRU
by Yalin Wen, Wei Ke and Hao Sheng
Appl. Sci. 2025, 15(1), 238; https://doi.org/10.3390/app15010238 - 30 Dec 2024
Viewed by 1010
Abstract
Video location prediction for handwritten digits presents unique challenges in computer vision due to the complex spatiotemporal dependencies and the need to maintain digit legibility across predicted frames, while existing deep learning-based video prediction models have shown promise, they often struggle with preserving [...] Read more.
Video location prediction for handwritten digits presents unique challenges in computer vision due to the complex spatiotemporal dependencies and the need to maintain digit legibility across predicted frames, while existing deep learning-based video prediction models have shown promise, they often struggle with preserving local details and typically achieve clear predictions for only a limited number of frames. In this paper, we present a novel video location prediction model based on Convolutional Gated Recurrent Units (ConvGRU) that specifically addresses these challenges in the context of handwritten digit sequences. Our approach introduces three key innovations. Firstly, we introduce a specialized decoupling model using modified Generative Adversarial Networks (GANs) that effectively separates background and foreground information, significantly improving prediction accuracy. Secondly, we introduce an enhanced ConvGRU architecture that replaces traditional linear operations with convolutional operations in the gating mechanism, substantially reducing spatiotemporal information loss. Finally, we introduce an optimized parameter-tuning strategy that ensures continuous feature transmission while maintaining computational efficiency. Extensive experiments on both the MNIST dataset and custom mobile datasets demonstrate the effectiveness of our approach. Our model achieves a structural similarity index of 0.913 between predicted and actual sequences, surpassing current state-of-the-art methods by 1.2%. Furthermore, we demonstrate superior performance in long-term prediction stability, with consistent accuracy maintained across extended sequences. Notably, our model reduces training time by 9.5% compared to existing approaches while maintaining higher prediction accuracy. These results establish new benchmarks for handwritten digit video prediction and provide practical solutions for real-world applications in digital education, document processing, and real-time handwriting recognition systems. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

20 pages, 10271 KiB  
Article
High-Frequency Workpiece Image Recognition Model Based on Hybrid Attention Mechanism
by Jiaqi Deng, Chenglong Sun, Xin Liu, Gang Du, Liangzhong Jiang and Xu Yang
Appl. Sci. 2025, 15(1), 94; https://doi.org/10.3390/app15010094 - 26 Dec 2024
Viewed by 547
Abstract
High-frequency workpieces are specialized items characterized by complex internal textures and minimal variance in properties. Under intricate lighting conditions, existing mainstream image recognition models struggle with low precision when applied to the identification of high-frequency workpiece images. This paper introduces a high-frequency workpiece [...] Read more.
High-frequency workpieces are specialized items characterized by complex internal textures and minimal variance in properties. Under intricate lighting conditions, existing mainstream image recognition models struggle with low precision when applied to the identification of high-frequency workpiece images. This paper introduces a high-frequency workpiece image recognition model based on a hybrid attention mechanism, HAEN. Initially, the high-frequency workpiece dataset is enhanced through geometric transformations, random noise, and random lighting adjustments to augment the model’s generalization capabilities. Subsequently, lightweight convolution, including one-dimensional and dilated convolutions, is employed to enhance convolutional attention and reduce the model’s parameter count, extracting original image features with robustness to strong lighting and mitigating the impact of lighting conditions on recognition outcomes. Finally, lightweight re-estimation attention modules are integrated at various model levels to reassess spatial information in feature maps and enhance the model’s representation of depth channel features. Experimental results demonstrate that the proposed model effectively extracts features from high-frequency workpiece images under complex lighting, outperforming existing models in image classification tasks with a precision of 97.23%. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

18 pages, 4253 KiB  
Article
RSTSRN: Recursive Swin Transformer Super-Resolution Network for Mars Images
by Fanlu Wu, Xiaonan Jiang, Tianjiao Fu, Yao Fu, Dongdong Xu and Chunlei Zhao
Appl. Sci. 2024, 14(20), 9286; https://doi.org/10.3390/app14209286 - 12 Oct 2024
Viewed by 1230
Abstract
High-resolution optical images will provide planetary geology researchers with finer and more microscopic image data information. In order to maximize scientific output, it is necessary to further increase the resolution of acquired images, so image super-resolution (SR) reconstruction techniques have become the best [...] Read more.
High-resolution optical images will provide planetary geology researchers with finer and more microscopic image data information. In order to maximize scientific output, it is necessary to further increase the resolution of acquired images, so image super-resolution (SR) reconstruction techniques have become the best choice. Aiming at the problems of large parameter quantity and high computational complexity in current deep learning-based image SR reconstruction methods, we propose a novel Recursive Swin Transformer Super-Resolution Network (RSTSRN) for SR applied to images. The RSTSRN improves upon the LapSRN, which we use as our backbone architecture. A Residual Swin Transformer Block (RSTB) is used for more efficient residual learning, which consists of stacked Swin Transformer Blocks (STBs) with a residual connection. Moreover, the idea of parameter sharing was introduced to reduce the number of parameters, and a multi-scale training strategy was designed to accelerate convergence speed. Experimental results show that the proposed RSTSRN achieves superior performance on 2×, 4× and 8×SR tasks to state-of-the-art methods with similar parameters. Especially on high-magnification SR tasks, the RSTSRN has great performance superiority. Compared to the LapSRN network, for 2×, 4× and 8× Mars image SR tasks, the RSTSRN network has increased PSNR values by 0.35 dB, 0.88 dB and 1.22 dB, and SSIM values by 0.0048, 0.0114 and 0.0311, respectively. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

20 pages, 5228 KiB  
Article
Remote Sensing Image Change Detection Based on Deep Learning: Multi-Level Feature Cross-Fusion with 3D-Convolutional Neural Networks
by Sibo Yu, Chen Tao, Guang Zhang, Yubo Xuan and Xiaodong Wang
Appl. Sci. 2024, 14(14), 6269; https://doi.org/10.3390/app14146269 - 18 Jul 2024
Cited by 2 | Viewed by 1619
Abstract
Change detection (CD) in high-resolution remote sensing imagery remains challenging due to the complex nature of objects and varying spectral characteristics across different times and locations. Convolutional neural networks (CNNs) have shown promising performance in CD tasks by extracting meaningful semantic features. However, [...] Read more.
Change detection (CD) in high-resolution remote sensing imagery remains challenging due to the complex nature of objects and varying spectral characteristics across different times and locations. Convolutional neural networks (CNNs) have shown promising performance in CD tasks by extracting meaningful semantic features. However, traditional 2D-CNNs may struggle to accurately integrate deep features from multi-temporal images, limiting their ability to improve CD accuracy. This study proposes a Multi-level Feature Cross-Fusion (MFCF) network with 3D-CNNs for remote sensing image change detection. The network aims to effectively extract and fuse deep features from multi-temporal images to identify surface changes. To bridge the semantic gap between high-level and low-level features, a MFCF module is introduced. A channel attention mechanism (CAM) is also integrated to enhance model performance, interpretability, and generalization capabilities. The proposed methodology is validated on the LEVIR construction dataset (LEVIR-CD). The experimental results demonstrate superior performance compared to the current state-of-the-art in evaluation metrics including recall, F1 score, and IOU. The MFCF network, which combines 3D-CNNs and a CAM, effectively utilizes multi-temporal information and deep feature fusion, resulting in precise and reliable change detection in remote sensing imagery. This study significantly contributes to the advancement of change detection methods, facilitating more efficient management and decision making across various domains such as urban planning, natural resource management, and environmental monitoring. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

18 pages, 3407 KiB  
Article
GenTrajRec: A Graph-Enhanced Trajectory Recovery Model Based on Signaling Data
by Hongyao Huang, Haozhi Xie, Zihang Xu, Mingzhe Liu, Yi Xu and Tongyu Zhu
Appl. Sci. 2024, 14(13), 5934; https://doi.org/10.3390/app14135934 - 8 Jul 2024
Viewed by 997
Abstract
Signaling data are records of the interactions of users’ mobile phones with their nearest cellular stations, which could provide long-term and continuous-time location data of large-scale citizens, and therefore have great potential in intelligent transportation, smart cities, and urban sensing. However, utilizing the [...] Read more.
Signaling data are records of the interactions of users’ mobile phones with their nearest cellular stations, which could provide long-term and continuous-time location data of large-scale citizens, and therefore have great potential in intelligent transportation, smart cities, and urban sensing. However, utilizing the raw signaling data often suffers from two problems: (1) Low positioning accuracy. Since the signaling data only describes the interaction between the user and the mobile base station, they can only restore users’ approximate geographical location. (2) Poor data quality. Due to the limitations of mobile signals, user signaling may be missing and drifting. To address the above issues, we propose a graph-enhanced trajectory recovery network, GenTrajRec, to recover precise trajectories from signaling data. GenTrajRec encodes signaling data through spatiotemporal encoders and enhances the traveling semantics by constructing a signaling transition graph. In fusing the spatiotemporal information as well as the deep traveling semantics, GenTrajRec can well tackle the challenge of poor data quality, and recover precise trajectory from raw signaling data. Extensive experiments have been conducted on two real-world datasets from Mobile Signaling and Geolife, and the results confirm the effectiveness of our approach, and the positioning accuracy can be improved from 315 m per point to 82 m per point for signaling data using our network. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

15 pages, 4267 KiB  
Article
Steel Surface Defect Detection Algorithm Based on Improved YOLOv8n
by Tian Zhang, Pengfei Pan, Jie Zhang and Xiaochen Zhang
Appl. Sci. 2024, 14(12), 5325; https://doi.org/10.3390/app14125325 - 20 Jun 2024
Cited by 10 | Viewed by 2559
Abstract
The traditional detection methods of steel surface defects have some problems, such as a lack of feature extraction ability, sluggish detection speed, and subpar detection performance. In this paper, a YOLOv8-based DDI-YOLO model is suggested for effective steel surface defect detection. First, on [...] Read more.
The traditional detection methods of steel surface defects have some problems, such as a lack of feature extraction ability, sluggish detection speed, and subpar detection performance. In this paper, a YOLOv8-based DDI-YOLO model is suggested for effective steel surface defect detection. First, on the Backbone network, the extended residual module (DWR) is fused with the C2f module to obtain C2f_DWR, and the two-step approach is used to carry out the effective extraction of multiscale contextual information, and then fusing feature maps formed from the multiscale receptive fields to enhance the capacity for feature extraction. Also based on the above, an extended heavy parameter module (DRB) is added to the structure of C2f_DWR to make up for the lack of C2f’s ability to capture small-scale pattern defects between training to enhance the training fluency of the model. Finally, the Inner-IoU loss function is employed to enhance the regression accuracy and training speed of the model. The experimental results show that the detection of DDI-YOLO on the NEU-DET dataset improves the mAP by 2.4%, the accuracy by 3.3%, and the FPS by 59 frames/s compared with the original YOLOv8n.Therefore, this paper’s proposed model has a superior mAP, accuracy, and FPS in identifying surface defects in steel. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

20 pages, 10215 KiB  
Article
RANDnet: Vehicle Re-Identification with Relation Attention and Nuance–Disparity Masks
by Yang Huang, Hao Sheng and Wei Ke
Appl. Sci. 2024, 14(11), 4929; https://doi.org/10.3390/app14114929 - 6 Jun 2024
Viewed by 987
Abstract
Vehicle re-identification (vehicle ReID) is designed to recognize all instances of a specific vehicle across various camera viewpoints, facing significant challenges such as high similarity among different vehicles from the same viewpoint and substantial variance for the same vehicle across different viewpoints. In [...] Read more.
Vehicle re-identification (vehicle ReID) is designed to recognize all instances of a specific vehicle across various camera viewpoints, facing significant challenges such as high similarity among different vehicles from the same viewpoint and substantial variance for the same vehicle across different viewpoints. In this paper, we introduce the RAND network, which is equipped with relation attention mechanisms, nuance, and disparity masks to tackle these issues effectively. The disparity mask specifically targets the automatic suppression of irrelevant foreground and background noise, while the nuance mask reveals less obvious, sub-discriminative regions to enhance the overall feature robustness. Additionally, our relation attention module, which incorporates an advanced transformer architecture, significantly reduces intra-class distances, thereby improving the accuracy of vehicle identification across diverse viewpoints. The performance of our approach has been thoroughly evaluated on widely recognized datasets such as VeRi-776 and VehicleID, where it demonstrates superior effectiveness and competes robustly with other leading methods. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

18 pages, 5405 KiB  
Article
Expressway Vehicle Trajectory Prediction Based on Fusion Data of Trajectories and Maps from Vehicle Perspective
by Yuning Duan, Jingdong Jia, Yuhui Jin, Haitian Zhang and Jian Huang
Appl. Sci. 2024, 14(10), 4181; https://doi.org/10.3390/app14104181 - 15 May 2024
Viewed by 1251
Abstract
Research on vehicle trajectory prediction based on road monitoring video data often utilizes a global map as an input, disregarding the fact that drivers rely on the road structures observable from their own positions for path planning. This oversight reduces the accuracy of [...] Read more.
Research on vehicle trajectory prediction based on road monitoring video data often utilizes a global map as an input, disregarding the fact that drivers rely on the road structures observable from their own positions for path planning. This oversight reduces the accuracy of prediction. To address this, we propose the CVAE-VGAE model, a novel trajectory prediction approach. Initially, our method transforms global perspective map data into vehicle-centric map data, representing it through a graph structure. Subsequently, Variational Graph Auto-Encoders (VGAEs), an unsupervised learning framework tailored for graph-structured data, are employed to extract road environment features specific to each vehicle’s location from the map data. Finally, a prediction network based on the Conditional Variational Autoencoder (CVAE) structure is designed, which first predicts the driving endpoint and then fits the complete future trajectory. The proposed CVAE-VGAE model integrates a self-attention mechanism into its encoding and decoding modules to infer endpoint intent and incorporate road environment features for precise trajectory prediction. Through a series of ablation experiments, we demonstrate the efficacy of our method in enhancing vehicle trajectory prediction metrics. Furthermore, we compare our model with traditional and frontier approaches, highlighting significant improvements in prediction accuracy. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

23 pages, 2717 KiB  
Article
Why Not Both? An Attention-Guided Transformer with Pixel-Related Deconvolution Network for Face Super-Resolution
by Zhe Zhang and Chun Qi
Appl. Sci. 2024, 14(9), 3793; https://doi.org/10.3390/app14093793 - 29 Apr 2024
Viewed by 1291
Abstract
Transformer-based encoder-decoder networks for face super-resolution (FSR) have achieved promising success in delivering stunningly clear and detailed facial images by capturing local and global dependencies. However, these methods have certain limitations. Specifically, the deconvolution in upsampling layers neglects the relationship between adjacent pixels, [...] Read more.
Transformer-based encoder-decoder networks for face super-resolution (FSR) have achieved promising success in delivering stunningly clear and detailed facial images by capturing local and global dependencies. However, these methods have certain limitations. Specifically, the deconvolution in upsampling layers neglects the relationship between adjacent pixels, which is crucial in facial structure reconstruction. Additionally, raw feature maps are fed to the transformer blocks directly without mining their potential feature information, resulting in suboptimal face images. To circumvent these problems, we propose an attention-guided transformer with pixel-related deconvolution network for FSR. Firstly, we devise a novel Attention-Guided Transformer Module (AGTM), which is composed of an Attention-Guiding Block (AGB) and a Channel-wise Multi-head Transformer Block (CMTB). AGTM at the top of the encoder-decoder network (AGTM-T) promotes both local facial details and global facial structures, while AGTM at the bottleneck side (AGTM-B) optimizes the encoded features. Secondly, a Pixel-Related Deconvolution (PRD) layer is specially designed to establish direct relationships among adjacent pixels in the upsampling process. Lastly, we develop a Multi-scale Feature Fusion Module (MFFM) to fuse multi-scale features for better network flexibility and reconstruction results. Quantitative and qualitative experimental results on various datasets demonstrate that the proposed method outperforms other state-of-the-art FSR methods. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

16 pages, 8740 KiB  
Article
Dynamic Downsampling Algorithm for 3D Point Cloud Map Based on Voxel Filtering
by Wenqi Lyu, Wei Ke, Hao Sheng, Xiao Ma and Huayun Zhang
Appl. Sci. 2024, 14(8), 3160; https://doi.org/10.3390/app14083160 - 9 Apr 2024
Cited by 15 | Viewed by 5913
Abstract
In response to the challenge of handling large-scale 3D point cloud data, downsampling is a common approach, yet it often leads to the problem of feature loss. We present a dynamic downsampling algorithm for 3D point cloud maps based on an improved voxel [...] Read more.
In response to the challenge of handling large-scale 3D point cloud data, downsampling is a common approach, yet it often leads to the problem of feature loss. We present a dynamic downsampling algorithm for 3D point cloud maps based on an improved voxel filtering approach. The algorithm consists of two modules, namely, dynamic downsampling and point cloud edge extraction. The former adapts voxel downsampling according to the features of the point cloud, while the latter preserves edge information within the 3D point cloud map. Comparative experiments with voxel downsampling, grid downsampling, clustering-based downsampling, random downsampling, uniform downsampling, and farthest-point downsampling were conducted. The proposed algorithm exhibited favorable downsampling simplification results, with a processing time of 0.01289 s and a simplification rate of 91.89%. Additionally, it demonstrated faster downsampling speed and showcased improved overall performance. This enhancement not only benefits productivity but also highlights the system’s efficiency and effectiveness. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

12 pages, 3423 KiB  
Article
AI Somatotype System Using 3D Body Images: Based on Deep-Learning and Transfer Learning
by Jiwun Yoon, Sang-Yong Lee and Ji-Yong Lee
Appl. Sci. 2024, 14(6), 2608; https://doi.org/10.3390/app14062608 - 20 Mar 2024
Cited by 4 | Viewed by 2261
Abstract
Humans share a similar body structure, but each individual possesses unique characteristics, which we define as one’s body type. Various classification methods have been devised to understand and assess these body types. Recent research has applied artificial intelligence technology utilizing noninvasive measurement tools, [...] Read more.
Humans share a similar body structure, but each individual possesses unique characteristics, which we define as one’s body type. Various classification methods have been devised to understand and assess these body types. Recent research has applied artificial intelligence technology utilizing noninvasive measurement tools, such as 3D body scanner, which minimize physical contact. The purpose of this study was to develop an artificial intelligence somatotype system capable of predicting the three body types proposed by Heath-Carter’s somatotype theory using 3D body images collected using a 3D body scanner. To classify body types, measurements were taken to determine the three somatotype components (endomorphy, mesomorphy, and ectomorphy). MobileNetV2 was utilized as the transfer learning model. The results of this study are as follows: first, the AI somatotype model showed good performance, with a training accuracy around 91% and a validation accuracy around 72%. The respective loss values were 0.26 for the training set and 0.69 for the validation set. Second, validation of the model’s performance using test data resulted in accurate predictions for 18 out of 21 new data points, with prediction errors occurring in three cases, indicating approximately 85% classification accuracy. This study provides foundational data for subsequent research aiming to predict 13 detailed body types across the three body types. Furthermore, it is hoped that the outcomes of this research can be applied in practical settings, enabling anyone with a smartphone camera to identify various body types based on captured images and predict obesity and diseases. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

15 pages, 563 KiB  
Article
Camouflaged Object Detection Based on Deep Learning with Attention-Guided Edge Detection and Multi-Scale Context Fusion
by Yalin Wen, Wei Ke and Hao Sheng
Appl. Sci. 2024, 14(6), 2494; https://doi.org/10.3390/app14062494 - 15 Mar 2024
Cited by 2 | Viewed by 3521
Abstract
In nature, objects that use camouflage have features like colors and textures that closely resemble their background. This creates visual illusions that help them hide and protect themselves from predators. This similarity also makes the task of detecting camouflaged objects very challenging. Methods [...] Read more.
In nature, objects that use camouflage have features like colors and textures that closely resemble their background. This creates visual illusions that help them hide and protect themselves from predators. This similarity also makes the task of detecting camouflaged objects very challenging. Methods for camouflaged object detection (COD), which rely on deep neural networks, are increasingly gaining attention. These methods focus on improving model performance and computational efficiency by extracting edge information and using multi-layer feature fusion. Our improvement is based on researching ways to enhance efficiency in the encode–decode process. We have developed a variant model that combines Swin Transformer (Swin-T) and EfficientNet-B7. This model integrates the strengths of both Swin-T and EfficientNet-B7, and it employs an attention-guided tracking module to efficiently extract edge information and identify objects in camouflaged environments. Additionally, we have incorporated dense skip links to enhance the aggregation of deep-level feature information. A boundary-aware attention module has been incorporated into the final layer of the initial shallow information recognition phase. This module utilizes the Fourier transform to quickly relay specific edge information from the initially obtained shallow semantics to subsequent stages, thereby more effectively achieving feature recognition and edge extraction. In the latter phase, which is focused on deep semantic extraction, we employ a dense skip joint attention module to enhance the decoder’s performance and efficiency, ensuring accurate capture of deep-level information, feature recognition, and edge extraction. In the later stage of deep semantic extraction, we use a dense skip joint attention module to improve the decoder’s performance and efficiency in capturing precise deep information. This module efficiently identifies the specifics and edge information of undetected camouflaged objects across channels and spaces. Differing from previous methods, we introduce an adaptive pixel strength loss function for handling key captured information. Our proposed method shows strong competitive performance on three current benchmark datasets (CHAMELEON, CAMO, COD10K). Compared to 26 previously proposed methods using 4 measurement metrics, our approach exhibits favorable competitiveness. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

20 pages, 6853 KiB  
Article
GLBRF: Group-Based Lightweight Human Behavior Recognition Framework in Video Camera
by Young-Chan Lee, So-Yeon Lee, Byeongchang Kim and Dae-Young Kim
Appl. Sci. 2024, 14(6), 2424; https://doi.org/10.3390/app14062424 - 13 Mar 2024
Cited by 1 | Viewed by 1428
Abstract
Behavioral recognition is an important technique for recognizing actions by analyzing human behavior. It is used in various fields, such as anomaly detection and health estimation. For this purpose, deep learning models are used to recognize and classify the features and patterns of [...] Read more.
Behavioral recognition is an important technique for recognizing actions by analyzing human behavior. It is used in various fields, such as anomaly detection and health estimation. For this purpose, deep learning models are used to recognize and classify the features and patterns of each behavior. However, video-based behavior recognition models require a lot of computational power as they are trained using large datasets. Therefore, there is a need for a lightweight learning framework that can efficiently recognize various behaviors. In this paper, we propose a group-based lightweight human behavior recognition framework (GLBRF) that achieves both low computational burden and high accuracy in video-based behavior recognition. The GLBRF system utilizes a relatively small dataset to reduce computational cost using a 2D CNN model and improves behavior recognition accuracy by applying location-based grouping to recognize interaction behaviors between people. This enables efficient recognition of multiple behaviors in various services. With grouping, the accuracy was as high as 98%, while without grouping, the accuracy was relatively low at 68%. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

20 pages, 7751 KiB  
Article
SCGFormer: Semantic Chebyshev Graph Convolution Transformer for 3D Human Pose Estimation
by Jiayao Liang and Mengxiao Yin
Appl. Sci. 2024, 14(4), 1646; https://doi.org/10.3390/app14041646 - 18 Feb 2024
Viewed by 1888
Abstract
With the rapid advancement of deep learning, 3D human pose estimation has largely freed itself from reliance on manually annotated methods. The effective utilization of joint features has become significant. Utilizing 2D human joint information to predict 3D human skeletons is of paramount [...] Read more.
With the rapid advancement of deep learning, 3D human pose estimation has largely freed itself from reliance on manually annotated methods. The effective utilization of joint features has become significant. Utilizing 2D human joint information to predict 3D human skeletons is of paramount importance. Effectively leveraging 2D joint data can improve the accuracy of 3D human skeleton prediction. In this paper, we propose the SCGFormer model to reduce the error in predicting human skeletal poses in three-dimensional space. The network architecture of SCGFormer encompasses Transformer and two distinct types of graph convolution, organized into two interconnected modules: SGraAttention and AcChebGconv. SGraAttention extracts global feature information from each 2D human joint, thereby augmenting local feature learning by integrating prior knowledge of human joint relationships. Simultaneously, AcChebGconv broadens the receptive field for graph structure information and constructs implicit joint relationships to aggregate more valuable adjacent features. SCGraFormer is tested on widely recognized benchmark datasets such as Human3.6M and MPI-INF-3DHP and achieves excellent results. In particular, on Human3.6M, our method achieves the best results in 9 actions (out of a total of 15 actions), with an overall average error reduction of about 1.5 points compared to state-of-the-art methods, demonstrating the excellent performance of SCGFormer. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

Back to TopTop