Applied Sciences

Research

Jump to: Review

15 pages, 37521 KB

Open AccessArticle

Harnessing Spatial-Frequency Information for Enhanced Image Restoration

by Cheol-Hoon Park, Hyun-Duck Choi and Myo-Taeg Lim

Appl. Sci. 2025, 15(4), 1856; https://doi.org/10.3390/app15041856 - 11 Feb 2025

Viewed by 1207

Abstract

Image restoration aims to recover high-quality, clear images from those that have suffered visibility loss due to various types of degradation. Numerous deep learning-based approaches for image restoration have shown substantial improvements. However, there are two notable limitations: (a) Despite substantial spectral mismatches [...] Read more.

Image restoration aims to recover high-quality, clear images from those that have suffered visibility loss due to various types of degradation. Numerous deep learning-based approaches for image restoration have shown substantial improvements. However, there are two notable limitations: (a) Despite substantial spectral mismatches in the frequency domain between clean and degraded images, only a few approaches leverage information from the frequency domain. (b) Variants of attention mechanisms have been proposed for high-resolution images in low-level vision tasks, but these methods still require inherently high computational costs. To address these issues, we propose a Frequency-Aware Network (FreANet) for image restoration, which consists of two simple yet effective modules. We utilize a multi-branch/domain module that integrates latent features from the frequency and spatial domains using the discrete Fourier transform (DFT) and complex convolutional neural networks. Furthermore, we introduce a multi-scale pooling attention mechanism that employs average pooling along the row and column axes. We conducted extensive experiments on image restoration tasks, including defocus deblurring, motion deblurring, dehazing, and low-light enhancement. The proposed FreANet demonstrates remarkable results compared to previous approaches to these tasks. Full article

(This article belongs to the Special Issue Computer Vision and Pattern Recognition: Advanced Techniques and Applications)

► Show Figures

Figure 1

16 pages, 3816 KB

Open AccessArticle

Automated Dead Chicken Detection in Poultry Farms Using Knowledge Distillation and Vision Transformers

by Ridip Khanal, Wenqin Wu and Joonwhoan Lee

Appl. Sci. 2025, 15(1), 136; https://doi.org/10.3390/app15010136 - 27 Dec 2024

Cited by 1 | Viewed by 2301

Abstract

Detecting dead chickens in broiler farms is critical for maintaining animal welfare and preventing disease outbreaks. This study presents an automated system that leverages CCTV footage to detect dead chickens, utilizing a two-step approach to improve detection accuracy and efficiency. First, stationary regions [...] Read more.

Detecting dead chickens in broiler farms is critical for maintaining animal welfare and preventing disease outbreaks. This study presents an automated system that leverages CCTV footage to detect dead chickens, utilizing a two-step approach to improve detection accuracy and efficiency. First, stationary regions in the footage—likely representing dead chickens—are identified. Then, a deep learning classifier, enhanced through knowledge distillation, confirms whether the detected stationary object is indeed a chicken. EfficientNet-B0 is employed as the teacher model, while DeiT-Tiny functions as the student model, balancing high accuracy and computational efficiency. A dynamic frame selection strategy optimizes resource usage by adjusting monitoring intervals based on the chickens’ age, ensuring real-time performance in resource-constrained environments. This method addresses key challenges such as the lack of explicit annotations for dead chickens, along with common farm issues like lighting variations, occlusions, cluttered backgrounds, chicken growth, and camera distortions. The experimental results demonstrate validation accuracies of 99.3% for the teacher model and 98.7% for the student model, with significant reductions in computational demands. The system’s robustness and scalability make it suitable for large-scale farm deployment, minimizing the need for labor-intensive manual inspections. Future work will explore integrating deep learning methods that incorporate temporal attention mechanisms and automated removal processes. Full article

(This article belongs to the Special Issue Computer Vision and Pattern Recognition: Advanced Techniques and Applications)

► Show Figures

Figure 1

19 pages, 12541 KB

Open AccessArticle

Advanced Hybrid Neural Networks for Accurate Recognition of the Extended Alphabet and Dynamic Signs in Mexican Sign Language (MSL)

by Arturo Lara-Cázares, Marco A. Moreno-Armendáriz and Hiram Calvo

Appl. Sci. 2024, 14(22), 10186; https://doi.org/10.3390/app142210186 - 6 Nov 2024

Cited by 1 | Viewed by 1160

Abstract

The Mexican deaf community primarily uses Mexican Sign Language (MSL) for communication, but significant barriers arise when interacting with hearing individuals unfamiliar with the language. Learning MSL requires a substantial commitment of at least 18 months, which is often impractical for many hearing [...] Read more.

The Mexican deaf community primarily uses Mexican Sign Language (MSL) for communication, but significant barriers arise when interacting with hearing individuals unfamiliar with the language. Learning MSL requires a substantial commitment of at least 18 months, which is often impractical for many hearing people. To address this gap, we present an MSL-to-Spanish translation system that facilitates communication through a spelling-based approach, enabling deaf individuals to convey any idea while simplifying the AI’s task by limiting the number of signs to be recognized. Unlike previous systems that focus exclusively on static signs for individual letters, our solution incorporates dynamic signs, such as “k”, “rr”, and “ll”, to better capture the nuances of MSL and enhance expressiveness. The proposed Hybrid Neural Network-based algorithm integrates these dynamic elements effectively, achieving an F1 score of 90.91%, precision of 91.25%, recall of 91.05%, and accuracy of 91.09% in the extended alphabet classification. These results demonstrate the system’s potential to improve accessibility and inclusivity for the Mexican deaf community. Full article

(This article belongs to the Special Issue Computer Vision and Pattern Recognition: Advanced Techniques and Applications)

► Show Figures

Figure 1

16 pages, 23702 KB

Open AccessArticle

SMS-Net: Bridging the Gap Between High Accuracy and Low Computational Cost in Pose Estimation

by Won-Jun Noh, Ki-Ryum Moon and Byoung-Dai Lee

Appl. Sci. 2024, 14(22), 10143; https://doi.org/10.3390/app142210143 - 6 Nov 2024

Cited by 2 | Viewed by 1349

Abstract

Human pose estimation identifies and classifies key joints of the human body in images or videos. Existing pose estimation methods can precisely capture human movements in real time but require significant computational time and resources, which restricts their usage in specific conditions. Thus, [...] Read more.

Human pose estimation identifies and classifies key joints of the human body in images or videos. Existing pose estimation methods can precisely capture human movements in real time but require significant computational time and resources, which restricts their usage in specific conditions. Thus, we propose a lightweight pose estimation model—SMS-Net—based on the sequentially stacked structure of the hourglass network. The proposed model uses various lightweight techniques to enable high-speed pose estimation while requiring minimal storage space and computation. Specifically, a shuffle-gated block was introduced to reduce the computational load and number of parameters during the feature extraction process of the encoder composing each hourglass network. A multi-dilation block was used in the decoder to secure the receptive fields of various scales without increasing the computational load. The performance of the proposed model was assessed using the MPII and Common Objects in Context (COCO) datasets used for pose estimation and certain performance metrics and compared with state-of-the-art lightweight pose estimation models. Furthermore, an ablation study was performed to assess the impact of each module on network performance and efficiency. The results demonstrate that the proposed model achieved an improved balance between computational efficiency and performance compared to existing models in human pose estimation. Overall, the study findings can provide a basis for applications in computer vision technology. Full article

(This article belongs to the Special Issue Computer Vision and Pattern Recognition: Advanced Techniques and Applications)

► Show Figures

Figure 1

29 pages, 14445 KB

Open AccessArticle

The Development of a Prototype Solution for Detecting Wear and Tear in Pedestrian Crossings

by Gonçalo J. M. Rosa, João M. S. Afonso, Pedro D. Gaspar, Vasco N. G. J. Soares and João M. L. P. Caldeira

Appl. Sci. 2024, 14(15), 6462; https://doi.org/10.3390/app14156462 - 24 Jul 2024

Viewed by 1367

Abstract

Crosswalks play a fundamental role in road safety. However, over time, many suffer wear and tear that makes them difficult to see. This project presents a solution based on the use of computer vision techniques for identifying and classifying the level of wear [...] Read more.

Crosswalks play a fundamental role in road safety. However, over time, many suffer wear and tear that makes them difficult to see. This project presents a solution based on the use of computer vision techniques for identifying and classifying the level of wear on crosswalks. The proposed system uses a convolutional neural network (CNN) to analyze images of crosswalks, determining their wear status. The design includes a prototype system mounted on a vehicle, equipped with cameras and processing units to collect and analyze data in real time as the vehicle traverses traffic routes. The collected data are then transmitted to a web application for further analysis and reporting. The prototype was validated through extensive tests in a real urban environment, comparing its assessments with manual inspections conducted by experts. Results from these tests showed that the system could accurately classify crosswalk wear with a high degree of accuracy, demonstrating its potential for aiding maintenance authorities in efficiently prioritizing interventions. Full article

(This article belongs to the Special Issue Computer Vision and Pattern Recognition: Advanced Techniques and Applications)

► Show Figures

Figure 1

15 pages, 6299 KB

Open AccessArticle

Study on a Landslide Segmentation Algorithm Based on Improved High-Resolution Networks

by Hui Sun, Shuguang Yang, Rui Wang and Kaixin Yang

Appl. Sci. 2024, 14(15), 6459; https://doi.org/10.3390/app14156459 - 24 Jul 2024

Cited by 2 | Viewed by 1276

Abstract

Landslides are a kind of geological hazard with great destructive potential. When a landslide event occurs, a reliable landslide segmentation method is important for assessing the extent of the disaster and preventing secondary disasters. Although deep learning methods have been applied to improve [...] Read more.

Landslides are a kind of geological hazard with great destructive potential. When a landslide event occurs, a reliable landslide segmentation method is important for assessing the extent of the disaster and preventing secondary disasters. Although deep learning methods have been applied to improve the efficiency of landslide segmentation, there are still some problems that need to be solved, such as the poor segmentation due to the similarity between old landslide areas and the background features and missed detections of small-scale landslides. To tackle these challenges, a proposed high-resolution semantic segmentation algorithm for landslide scenes enhances the accuracy of landslide segmentation and addresses the challenge of missed detections in small-scale landslides. The network is based on the high-resolution network (HR-Net), which effectively integrates the efficient channel attention mechanism (efficient channel attention, ECA) into the network to enhance the representation quality of the feature maps. Moreover, the primary backbone of the high-resolution network is further enhanced to extract more profound semantic information. To improve the network’s ability to perceive small-scale landslides, atrous spatial pyramid pooling (ASPP) with ECA modules is introduced. Furthermore, to address the issues arising from inadequate training and reduced accuracy due to the unequal distribution of positive and negative samples, the network employs a combined loss function. This combined loss function effectively supervises the training of the network. Finally, the paper enhances the Loess Plateau landslide dataset using a fractional-order-based image enhancement approach and conducts experimental comparisons on this enriched dataset to evaluate the enhanced network’s performance. The experimental findings show that the proposed methodology achieves higher accuracy in segmentation performance compared to other networks. Full article

(This article belongs to the Special Issue Computer Vision and Pattern Recognition: Advanced Techniques and Applications)

► Show Figures

Figure 1

14 pages, 1852 KB

Open AccessArticle

Inv-ReVersion: Enhanced Relation Inversion Based on Text-to-Image Diffusion Models

by Guangzi Zhang, Yulin Qian, Juntao Deng and Xingquan Cai

Appl. Sci. 2024, 14(8), 3338; https://doi.org/10.3390/app14083338 - 15 Apr 2024

Cited by 1 | Viewed by 2759

Abstract

Diffusion models are widely recognized in image generation for their ability to produce high-quality images from text prompts. As the demand for customized models grows, various methods have emerged to capture appearance features. However, the exploration of relations between entities, another crucial aspect [...] Read more.

Diffusion models are widely recognized in image generation for their ability to produce high-quality images from text prompts. As the demand for customized models grows, various methods have emerged to capture appearance features. However, the exploration of relations between entities, another crucial aspect of images, has been limited. This study focuses on enabling models to capture and generate high-level semantic images with specific relation concepts, which is a challenging task. To this end, we introduce the Inv-ReVersion framework, which uses inverse relations text expansion to separate the feature fusion of multiple entities in images. Additionally, we employ a weighted contrastive loss to emphasize part of speech, helping the model learn more abstract relation concepts. We also propose a high-frequency suppressor to reduce the time spent on learning low-frequency details, enhancing the model’s ability to generate image relations. Compared to existing baselines, our approach can more accurately generate relation concepts between entities without additional computational costs, especially in capturing abstract relation concepts. Full article

(This article belongs to the Special Issue Computer Vision and Pattern Recognition: Advanced Techniques and Applications)

► Show Figures

Figure 1

Review

Jump to: Research

29 pages, 8544 KB

Open AccessReview

Innovative Approaches to Traffic Anomaly Detection and Classification Using AI

by Borja Pérez, Mario Resino, Teresa Seco, Fernando García and Abdulla Al-Kaff

Appl. Sci. 2025, 15(10), 5520; https://doi.org/10.3390/app15105520 - 15 May 2025

Cited by 1 | Viewed by 3936

Abstract

Video anomaly detection plays a crucial role in intelligent transportation systems by enhancing urban mobility and safety. This review provides a comprehensive analysis of recent advancements in artificial intelligence methods applied to traffic anomaly detection, including convolutional and recurrent neural networks (CNNs and [...] Read more.

Video anomaly detection plays a crucial role in intelligent transportation systems by enhancing urban mobility and safety. This review provides a comprehensive analysis of recent advancements in artificial intelligence methods applied to traffic anomaly detection, including convolutional and recurrent neural networks (CNNs and RNNs), autoencoders, Transformers, generative adversarial networks (GANs), and multimodal large language models (MLLMs). We compare their performance across real-world applications, highlighting patterns such as the superiority of Transformer-based models in temporal context understanding and the growing use of multimodal inputs for robust detection. Key challenges identified include dependence on large labeled datasets, high computational costs, and limited model interpretability. The review outlines how recent research is addressing these issues through semi-supervised learning, model compression techniques, and explainable AI. We conclude with future directions focusing on scalable, real-time, and interpretable solutions for practical deployment. Full article

(This article belongs to the Special Issue Computer Vision and Pattern Recognition: Advanced Techniques and Applications)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Computer Vision and Pattern Recognition: Advanced Techniques and Applications

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Related Special Issue

Published Papers (8 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI