applsci-logo

Journal Browser

Journal Browser

Computer Vision and Pattern Recognition: Advanced Techniques and Applications

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 May 2025) | Viewed by 10221

Special Issue Editors


E-Mail Website
Guest Editor
Division of Freight, Transit, and Heavy Vehicle Safety, Virginia Tech Transportation Institute, Blacksburg, VA 24061, USA
Interests: statistical data analysis; statistical modeling; computer vision; machine learning; deep learning; signal processing; affective computing
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA 24061, USA
Interests: computer vision; image processing; biometrics; sensing for autonomous vehicles
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

We are thrilled to announce a Special Issue in Applied Science titled “Computer Vision and Pattern Recognition: Advanced Techniques and Applications”. Computer vision and pattern recognition are driving transformative advances across many domains, from healthcare and autonomous vehicles to robotics and augmented reality. The field is continuously evolving through new innovations in sensors, algorithms, and novel architectures. The last few years have seen advances in vision transformers, foundational models, 3D scene understanding, explainability, and self-supervised models. Advances in computer vision and pattern recognition have the potential to make positive impacts in related fields. This Special Issue seeks to showcase the most innovative and impactful research in this rapidly evolving landscape.

We welcome contributions that bridge the gap between computer vision and other domains, fostering interdisciplinary collaboration and driving real-world applications. We invite submissions on a broad range of topics, including but not limited to:

  • Deep learning for computer vision;
  • Object detection and recognition;
  • Image and video analysis;
  • 3D vision and reconstruction;
  • Scene understanding and segmentation;
  • Sensor fusion for 3D scene understanding;
  • Pattern recognition and machine learning;
  • Robotics and vision-based navigation;
  • Medical imaging and healthcare applications;
  • Autonomous vehicles and drones;
  • Human–computer interactions;
  • Vision transformer and applications;
  • Foundational models and applications.

Dr. Abhijit Sarkar
Prof. Dr. Lynn Abbott
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • pattern recognition
  • 3D vision

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Related Special Issue

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

15 pages, 37521 KiB  
Article
Harnessing Spatial-Frequency Information for Enhanced Image Restoration
by Cheol-Hoon Park, Hyun-Duck Choi and Myo-Taeg Lim
Appl. Sci. 2025, 15(4), 1856; https://doi.org/10.3390/app15041856 - 11 Feb 2025
Viewed by 704
Abstract
Image restoration aims to recover high-quality, clear images from those that have suffered visibility loss due to various types of degradation. Numerous deep learning-based approaches for image restoration have shown substantial improvements. However, there are two notable limitations: (a) Despite substantial spectral mismatches [...] Read more.
Image restoration aims to recover high-quality, clear images from those that have suffered visibility loss due to various types of degradation. Numerous deep learning-based approaches for image restoration have shown substantial improvements. However, there are two notable limitations: (a) Despite substantial spectral mismatches in the frequency domain between clean and degraded images, only a few approaches leverage information from the frequency domain. (b) Variants of attention mechanisms have been proposed for high-resolution images in low-level vision tasks, but these methods still require inherently high computational costs. To address these issues, we propose a Frequency-Aware Network (FreANet) for image restoration, which consists of two simple yet effective modules. We utilize a multi-branch/domain module that integrates latent features from the frequency and spatial domains using the discrete Fourier transform (DFT) and complex convolutional neural networks. Furthermore, we introduce a multi-scale pooling attention mechanism that employs average pooling along the row and column axes. We conducted extensive experiments on image restoration tasks, including defocus deblurring, motion deblurring, dehazing, and low-light enhancement. The proposed FreANet demonstrates remarkable results compared to previous approaches to these tasks. Full article
Show Figures

Figure 1

16 pages, 3816 KiB  
Article
Automated Dead Chicken Detection in Poultry Farms Using Knowledge Distillation and Vision Transformers
by Ridip Khanal, Wenqin Wu and Joonwhoan Lee
Appl. Sci. 2025, 15(1), 136; https://doi.org/10.3390/app15010136 - 27 Dec 2024
Viewed by 1223
Abstract
Detecting dead chickens in broiler farms is critical for maintaining animal welfare and preventing disease outbreaks. This study presents an automated system that leverages CCTV footage to detect dead chickens, utilizing a two-step approach to improve detection accuracy and efficiency. First, stationary regions [...] Read more.
Detecting dead chickens in broiler farms is critical for maintaining animal welfare and preventing disease outbreaks. This study presents an automated system that leverages CCTV footage to detect dead chickens, utilizing a two-step approach to improve detection accuracy and efficiency. First, stationary regions in the footage—likely representing dead chickens—are identified. Then, a deep learning classifier, enhanced through knowledge distillation, confirms whether the detected stationary object is indeed a chicken. EfficientNet-B0 is employed as the teacher model, while DeiT-Tiny functions as the student model, balancing high accuracy and computational efficiency. A dynamic frame selection strategy optimizes resource usage by adjusting monitoring intervals based on the chickens’ age, ensuring real-time performance in resource-constrained environments. This method addresses key challenges such as the lack of explicit annotations for dead chickens, along with common farm issues like lighting variations, occlusions, cluttered backgrounds, chicken growth, and camera distortions. The experimental results demonstrate validation accuracies of 99.3% for the teacher model and 98.7% for the student model, with significant reductions in computational demands. The system’s robustness and scalability make it suitable for large-scale farm deployment, minimizing the need for labor-intensive manual inspections. Future work will explore integrating deep learning methods that incorporate temporal attention mechanisms and automated removal processes. Full article
Show Figures

Figure 1

19 pages, 12541 KiB  
Article
Advanced Hybrid Neural Networks for Accurate Recognition of the Extended Alphabet and Dynamic Signs in Mexican Sign Language (MSL)
by Arturo Lara-Cázares, Marco A. Moreno-Armendáriz and Hiram Calvo
Appl. Sci. 2024, 14(22), 10186; https://doi.org/10.3390/app142210186 - 6 Nov 2024
Viewed by 903
Abstract
The Mexican deaf community primarily uses Mexican Sign Language (MSL) for communication, but significant barriers arise when interacting with hearing individuals unfamiliar with the language. Learning MSL requires a substantial commitment of at least 18 months, which is often impractical for many hearing [...] Read more.
The Mexican deaf community primarily uses Mexican Sign Language (MSL) for communication, but significant barriers arise when interacting with hearing individuals unfamiliar with the language. Learning MSL requires a substantial commitment of at least 18 months, which is often impractical for many hearing people. To address this gap, we present an MSL-to-Spanish translation system that facilitates communication through a spelling-based approach, enabling deaf individuals to convey any idea while simplifying the AI’s task by limiting the number of signs to be recognized. Unlike previous systems that focus exclusively on static signs for individual letters, our solution incorporates dynamic signs, such as “k”, “rr”, and “ll”, to better capture the nuances of MSL and enhance expressiveness. The proposed Hybrid Neural Network-based algorithm integrates these dynamic elements effectively, achieving an F1 score of 90.91%, precision of 91.25%, recall of 91.05%, and accuracy of 91.09% in the extended alphabet classification. These results demonstrate the system’s potential to improve accessibility and inclusivity for the Mexican deaf community. Full article
Show Figures

Figure 1

16 pages, 23702 KiB  
Article
SMS-Net: Bridging the Gap Between High Accuracy and Low Computational Cost in Pose Estimation
by Won-Jun Noh, Ki-Ryum Moon and Byoung-Dai Lee
Appl. Sci. 2024, 14(22), 10143; https://doi.org/10.3390/app142210143 - 6 Nov 2024
Cited by 1 | Viewed by 1045
Abstract
Human pose estimation identifies and classifies key joints of the human body in images or videos. Existing pose estimation methods can precisely capture human movements in real time but require significant computational time and resources, which restricts their usage in specific conditions. Thus, [...] Read more.
Human pose estimation identifies and classifies key joints of the human body in images or videos. Existing pose estimation methods can precisely capture human movements in real time but require significant computational time and resources, which restricts their usage in specific conditions. Thus, we propose a lightweight pose estimation model—SMS-Net—based on the sequentially stacked structure of the hourglass network. The proposed model uses various lightweight techniques to enable high-speed pose estimation while requiring minimal storage space and computation. Specifically, a shuffle-gated block was introduced to reduce the computational load and number of parameters during the feature extraction process of the encoder composing each hourglass network. A multi-dilation block was used in the decoder to secure the receptive fields of various scales without increasing the computational load. The performance of the proposed model was assessed using the MPII and Common Objects in Context (COCO) datasets used for pose estimation and certain performance metrics and compared with state-of-the-art lightweight pose estimation models. Furthermore, an ablation study was performed to assess the impact of each module on network performance and efficiency. The results demonstrate that the proposed model achieved an improved balance between computational efficiency and performance compared to existing models in human pose estimation. Overall, the study findings can provide a basis for applications in computer vision technology. Full article
Show Figures

Figure 1

29 pages, 14445 KiB  
Article
The Development of a Prototype Solution for Detecting Wear and Tear in Pedestrian Crossings
by Gonçalo J. M. Rosa, João M. S. Afonso, Pedro D. Gaspar, Vasco N. G. J. Soares and João M. L. P. Caldeira
Appl. Sci. 2024, 14(15), 6462; https://doi.org/10.3390/app14156462 - 24 Jul 2024
Viewed by 1117
Abstract
Crosswalks play a fundamental role in road safety. However, over time, many suffer wear and tear that makes them difficult to see. This project presents a solution based on the use of computer vision techniques for identifying and classifying the level of wear [...] Read more.
Crosswalks play a fundamental role in road safety. However, over time, many suffer wear and tear that makes them difficult to see. This project presents a solution based on the use of computer vision techniques for identifying and classifying the level of wear on crosswalks. The proposed system uses a convolutional neural network (CNN) to analyze images of crosswalks, determining their wear status. The design includes a prototype system mounted on a vehicle, equipped with cameras and processing units to collect and analyze data in real time as the vehicle traverses traffic routes. The collected data are then transmitted to a web application for further analysis and reporting. The prototype was validated through extensive tests in a real urban environment, comparing its assessments with manual inspections conducted by experts. Results from these tests showed that the system could accurately classify crosswalk wear with a high degree of accuracy, demonstrating its potential for aiding maintenance authorities in efficiently prioritizing interventions. Full article
Show Figures

Figure 1

15 pages, 6299 KiB  
Article
Study on a Landslide Segmentation Algorithm Based on Improved High-Resolution Networks
by Hui Sun, Shuguang Yang, Rui Wang and Kaixin Yang
Appl. Sci. 2024, 14(15), 6459; https://doi.org/10.3390/app14156459 - 24 Jul 2024
Cited by 1 | Viewed by 978
Abstract
Landslides are a kind of geological hazard with great destructive potential. When a landslide event occurs, a reliable landslide segmentation method is important for assessing the extent of the disaster and preventing secondary disasters. Although deep learning methods have been applied to improve [...] Read more.
Landslides are a kind of geological hazard with great destructive potential. When a landslide event occurs, a reliable landslide segmentation method is important for assessing the extent of the disaster and preventing secondary disasters. Although deep learning methods have been applied to improve the efficiency of landslide segmentation, there are still some problems that need to be solved, such as the poor segmentation due to the similarity between old landslide areas and the background features and missed detections of small-scale landslides. To tackle these challenges, a proposed high-resolution semantic segmentation algorithm for landslide scenes enhances the accuracy of landslide segmentation and addresses the challenge of missed detections in small-scale landslides. The network is based on the high-resolution network (HR-Net), which effectively integrates the efficient channel attention mechanism (efficient channel attention, ECA) into the network to enhance the representation quality of the feature maps. Moreover, the primary backbone of the high-resolution network is further enhanced to extract more profound semantic information. To improve the network’s ability to perceive small-scale landslides, atrous spatial pyramid pooling (ASPP) with ECA modules is introduced. Furthermore, to address the issues arising from inadequate training and reduced accuracy due to the unequal distribution of positive and negative samples, the network employs a combined loss function. This combined loss function effectively supervises the training of the network. Finally, the paper enhances the Loess Plateau landslide dataset using a fractional-order-based image enhancement approach and conducts experimental comparisons on this enriched dataset to evaluate the enhanced network’s performance. The experimental findings show that the proposed methodology achieves higher accuracy in segmentation performance compared to other networks. Full article
Show Figures

Figure 1

14 pages, 1852 KiB  
Article
Inv-ReVersion: Enhanced Relation Inversion Based on Text-to-Image Diffusion Models
by Guangzi Zhang, Yulin Qian, Juntao Deng and Xingquan Cai
Appl. Sci. 2024, 14(8), 3338; https://doi.org/10.3390/app14083338 - 15 Apr 2024
Viewed by 2278
Abstract
Diffusion models are widely recognized in image generation for their ability to produce high-quality images from text prompts. As the demand for customized models grows, various methods have emerged to capture appearance features. However, the exploration of relations between entities, another crucial aspect [...] Read more.
Diffusion models are widely recognized in image generation for their ability to produce high-quality images from text prompts. As the demand for customized models grows, various methods have emerged to capture appearance features. However, the exploration of relations between entities, another crucial aspect of images, has been limited. This study focuses on enabling models to capture and generate high-level semantic images with specific relation concepts, which is a challenging task. To this end, we introduce the Inv-ReVersion framework, which uses inverse relations text expansion to separate the feature fusion of multiple entities in images. Additionally, we employ a weighted contrastive loss to emphasize part of speech, helping the model learn more abstract relation concepts. We also propose a high-frequency suppressor to reduce the time spent on learning low-frequency details, enhancing the model’s ability to generate image relations. Compared to existing baselines, our approach can more accurately generate relation concepts between entities without additional computational costs, especially in capturing abstract relation concepts. Full article
Show Figures

Figure 1

Review

Jump to: Research

29 pages, 8544 KiB  
Review
Innovative Approaches to Traffic Anomaly Detection and Classification Using AI
by Borja Pérez, Mario Resino, Teresa Seco, Fernando García and Abdulla Al-Kaff
Appl. Sci. 2025, 15(10), 5520; https://doi.org/10.3390/app15105520 - 15 May 2025
Viewed by 449
Abstract
Video anomaly detection plays a crucial role in intelligent transportation systems by enhancing urban mobility and safety. This review provides a comprehensive analysis of recent advancements in artificial intelligence methods applied to traffic anomaly detection, including convolutional and recurrent neural networks (CNNs and [...] Read more.
Video anomaly detection plays a crucial role in intelligent transportation systems by enhancing urban mobility and safety. This review provides a comprehensive analysis of recent advancements in artificial intelligence methods applied to traffic anomaly detection, including convolutional and recurrent neural networks (CNNs and RNNs), autoencoders, Transformers, generative adversarial networks (GANs), and multimodal large language models (MLLMs). We compare their performance across real-world applications, highlighting patterns such as the superiority of Transformer-based models in temporal context understanding and the growing use of multimodal inputs for robust detection. Key challenges identified include dependence on large labeled datasets, high computational costs, and limited model interpretability. The review outlines how recent research is addressing these issues through semi-supervised learning, model compression techniques, and explainable AI. We conclude with future directions focusing on scalable, real-time, and interpretable solutions for practical deployment. Full article
Show Figures

Figure 1

Back to TopTop