Applications of Computer Vision, 3rd Edition

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 20 October 2025 | Viewed by 6568

Special Issue Editor


E-Mail Website
Guest Editor
Centro Singular de Investigación en Tecnoloxías Intelixentes (CITIUS, Research Center of Intelligent Systems), University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
Interests: image segmentation; texture analysis; classification; regression; pattern recognition; applications of computer vision
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Computer vision (CV) techniques are widely used by practicing engineers to solve a range of real vision problems. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible, covering some practical applications of CV methods in all branches of science and engineering. Submitted papers should report some novel aspect of CV use for a real-world engineering application and should also be validated using data sets. There is no restriction to the length of the papers. Electronic files and software regarding the full details of the calculation or experimental procedure, if unable to be published in a normal way, can be deposited as supplementary electronic material.

Focal points of the Special Issue include, but are not limited to, innovative applications of:

  • Medical and biological imaging;
  • Industrial inspection;
  • Robotics;
  • Photo and video interpretation;
  • Image retrieval;
  • Video analysis and annotation;
  • Multimedia;
  • Sensors and more.

Dr. Eva Cernadas
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image and video segmentation
  • image classification
  • video analysis
  • pattern recognition
  • image and video understanding

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Related Special Issues

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

18 pages, 4185 KiB  
Article
An Empirical Study on Pointing Gestures Used in Communication in Household Settings
by Tymon Kukier, Alicja Wróbel, Barbara Sienkiewicz, Julia Klimecka, Antonio Galiza Cerdeira Gonzalez, Paweł Gajewski and Bipin Indurkhya
Electronics 2025, 14(12), 2346; https://doi.org/10.3390/electronics14122346 - 8 Jun 2025
Abstract
Gestures play an integral role in human communication. Our research aims to develop a gesture understanding system that allows for better interpretation of human instructions in household robotics settings. We conducted an experiment with 34 participants who used pointing gestures to teach concepts [...] Read more.
Gestures play an integral role in human communication. Our research aims to develop a gesture understanding system that allows for better interpretation of human instructions in household robotics settings. We conducted an experiment with 34 participants who used pointing gestures to teach concepts to an assistant. Gesture data were analyzed using manual annotations (MAXQDA) and the computational methods of pose estimation and k-means clustering. The study revealed that participants tend to maintain consistent pointing styles, with one-handed pointing and index finger gestures being the most common. Gaze and pointing often co-occur, as do leaning forward and pointing. Using our gesture categorization algorithm, we analyzed gesture information values. As the experiment progressed, the information value of gestures remained stable, although the trends varied between participants and were associated with factors such as age and gender. These findings underscore the need for gesture recognition systems to balance generalization with personalization for more effective human–robot interaction. Full article
(This article belongs to the Special Issue Applications of Computer Vision, 3rd Edition)
Show Figures

Figure 1

22 pages, 2567 KiB  
Article
FA-YOLO: A Pedestrian Detection Algorithm with Feature Enhancement and Adaptive Sparse Self-Attention
by Hang Sui, Huiyan Han, Yuzhu Cui, Menglong Yang and Binwei Pei
Electronics 2025, 14(9), 1713; https://doi.org/10.3390/electronics14091713 - 23 Apr 2025
Viewed by 456
Abstract
Pedestrian detection technology refers to identifying pedestrians within the field of view and is widely used in smart cities, public safety surveillance, and other scenarios. However, in real-world complex scenes, challenges such as high pedestrian density, occlusion, and low lighting conditions lead to [...] Read more.
Pedestrian detection technology refers to identifying pedestrians within the field of view and is widely used in smart cities, public safety surveillance, and other scenarios. However, in real-world complex scenes, challenges such as high pedestrian density, occlusion, and low lighting conditions lead to blurred image boundaries, which significantly impact accuracy of pedestrian detection. To address these challenges, we propose a novel pedestrian detection algorithm, FA-YOLO. First, to address issues of limited effective information extraction in backbone network and insufficient feature map representation, we propose a feature enhancement module (FEM) that integrates both global and local features of the feature map, thereby enhancing the network’s feature representation capability. Then, to reduce redundant information and improve adaptability to complex scenes, an adaptive sparse self-attention (ASSA) module is designed to suppress noise interactions in irrelevant regions and eliminate feature redundancy across both spatial and channel dimensions. Finally, to further enhance the model’s focus on target features, we propose cross stage partial with adaptive sparse self-attention (C3ASSA), which improves overall detection performance by reinforcing the importance of target features during the final detection stage. Additionally, a scalable intersection over union (SIoU) loss function is introduced to address the vector angle differences between predicted and ground-truth bounding boxes. Extensive experiments on the WiderPerson and RTTS datasets demonstrate that FA-YOLO achieves State-of-the-Art performance, with a precision improvement of 3.5% on the WiderPerson and 3.0% on RTTS compared to YOLOv11. Full article
(This article belongs to the Special Issue Applications of Computer Vision, 3rd Edition)
Show Figures

Figure 1

19 pages, 9044 KiB  
Article
PixCon: Pixel-Level Contrastive Learning Revisited
by Zongshang Pang, Yuta Nakashima, Mayu Otani and Hajime Nagahara
Electronics 2025, 14(8), 1623; https://doi.org/10.3390/electronics14081623 - 17 Apr 2025
Viewed by 359
Abstract
Contrastive image representation learning has been essential for pre-training vision foundation models to deliver excellent transfer learning performance. It was originally developed based on instance discrimination, which focuses on instance-level recognition tasks. Lately, the focus has shifted to directly working on the dense [...] Read more.
Contrastive image representation learning has been essential for pre-training vision foundation models to deliver excellent transfer learning performance. It was originally developed based on instance discrimination, which focuses on instance-level recognition tasks. Lately, the focus has shifted to directly working on the dense spatial features to improve transfer performance on dense prediction tasks such as object detection and semantic segmentation, for which pixel-level and region-level contrastive learning methods have been proposed. Region-level methods usually employ region-mining algorithms to capture holistic regional semantics and address the issue of semantically inconsistent scene image crops, as they assume that pixel-level learning struggles with both. In this paper, we revisit pixel-level learning’s potential and show that (1) it can effectively and more efficiently learn holistic regional semantics and (2) it intrinsically provides tools to mitigate the impact of semantically inconsistent views involved with scene-level training images. We prove this by proposing PixCon, a pixel-level contrastive learning framework, and testing different positive matching strategies based on this framework to rediscover the potential of pixel-level learning. Additionally, we propose a novel semantic reweighting approach tailored for pixel-level learning-based scene image pre-training, which outperforms or matches the performance of previous region-level methods in object detection and semantic segmentation tasks across multiple benchmarks. Full article
(This article belongs to the Special Issue Applications of Computer Vision, 3rd Edition)
Show Figures

Figure 1

15 pages, 2232 KiB  
Article
Mixed Label Assignment Realizes End-to-End Object Detection
by Jiaquan Chen, Changbin Shao and Zhen Su
Electronics 2024, 13(23), 4856; https://doi.org/10.3390/electronics13234856 - 9 Dec 2024
Cited by 1 | Viewed by 1051
Abstract
Currently, detectors have made significant progress in inference speed and accuracy. However, these detectors require Non-Maximum Suppression (NMS) during the post-processing stage to eliminate redundant boxes, which limits the optimization of model inference speed. We first analyzed the reason for the dependence on [...] Read more.
Currently, detectors have made significant progress in inference speed and accuracy. However, these detectors require Non-Maximum Suppression (NMS) during the post-processing stage to eliminate redundant boxes, which limits the optimization of model inference speed. We first analyzed the reason for the dependence on NMS in the post-processing stage. The result showed that a score loss in a one-to-many label assignment leads to the presence of high-quality redundant boxes, making them difficult to remove. To realize end-to-end object detection and simplify the detection pipeline, we propose herein a mixed label assignment (MLA) training method, which uses one-to-many label assignment to provide rich supervision signals, alleviating the performance degradation, and we eliminate the need for NMS in the post-processing stage by using one-to-one label assignment. Additionally, a window feature propagation block (WFPB) is introduced, utilizing the inductive bias of images to enable feature sharing in local regions. Through these methods, we conducted experiments on the VOC and DUO datasets; our end-to-end detector MA-YOLOX achieved 66.0 mAP and 52.6 mAP, respectively, outperforming the YOLOX by 1.7 and 1.6. Additionally, our model performed faster than other real-time detectors without NMS. Full article
(This article belongs to the Special Issue Applications of Computer Vision, 3rd Edition)
Show Figures

Figure 1

16 pages, 6180 KiB  
Article
Textile Fabric Defect Detection Using Enhanced Deep Convolutional Neural Network with Safe Human–Robot Collaborative Interaction
by Syed Ali Hassan, Michail J. Beliatis, Agnieszka Radziwon, Arianna Menciassi and Calogero Maria Oddo
Electronics 2024, 13(21), 4314; https://doi.org/10.3390/electronics13214314 - 2 Nov 2024
Cited by 2 | Viewed by 2682
Abstract
The emergence of modern robotic technology and artificial intelligence (AI) enables a transformation in the textile sector. Manual fabric defect inspection is time-consuming, error-prone, and labor-intensive. This offers a great possibility for applying more AI-trained automated processes with safe human–robot interaction (HRI) to [...] Read more.
The emergence of modern robotic technology and artificial intelligence (AI) enables a transformation in the textile sector. Manual fabric defect inspection is time-consuming, error-prone, and labor-intensive. This offers a great possibility for applying more AI-trained automated processes with safe human–robot interaction (HRI) to reduce risks of work accidents and occupational illnesses and enhance the environmental sustainability of the processes. In this experimental study, we developed, implemented, and tested a novel algorithm that detects fabric defects by utilizing enhanced deep convolutional neural networks (DCNNs). The proposed method integrates advanced DCNN architectures to automatically classify and detect 13 different types of fabric defects, such as double-ends, holes, broken ends, etc., ensuring high accuracy and efficiency in the inspection process. The dataset is created through augmentation techniques and a model is fine-tuned on a large dataset of annotated images using transfer learning approaches. The experiment was performed using an anthropomorphic robot that was programmed to move above the fabric. The camera attached to the robot detected defects in the fabric and triggered an alarm. A photoelectric sensor was installed on the conveyor belt and linked to the robot to notify it about an impending fabric. The CNN model architecture was enhanced to increase performance. Experimental findings show that the presented system can detect fabric defects with a 97.49% mean Average Precision (mAP). Full article
(This article belongs to the Special Issue Applications of Computer Vision, 3rd Edition)
Show Figures

Figure 1

21 pages, 14937 KiB  
Article
American Football Play Type and Player Position Recognition
by Audrey Hong, Benjamin Orr, Ephraim Pan and Dah-Jye Lee
Electronics 2024, 13(18), 3628; https://doi.org/10.3390/electronics13183628 - 12 Sep 2024
Viewed by 1448
Abstract
American football is one of the most popular team sports in the United States. There are approximately 16,000 high school and 890 college football teams, and each team plays around 10–14 games per football season. Contrary to most casual fans’ views, American football [...] Read more.
American football is one of the most popular team sports in the United States. There are approximately 16,000 high school and 890 college football teams, and each team plays around 10–14 games per football season. Contrary to most casual fans’ views, American football is more than speed and power, it requires preparation and strategies. Coaches analyze hours of video of their own and opponents’ games to extract important information such as offensive play formations, personnel packages and opposing coaches’ tendency to gain competitive advantages. This time-consuming and slow process called “tagging” takes away the coaches’ time from other duties and limits the players’ time for preparation and training. In this work, we created three datasets for our experiments to demonstrate the importance of player detection accuracy, which is easily affected by camera placement and player occlusion issues. We applied a unique data augmentation technique to generate data for each specific experiment. Our model achieved a remarkable 98.52% accuracy in play type recognition and 92.38% accuracy in player position recognition for the experiment that assumes no missing players or no occlusion problem, which could be achieved by placing the camera high above the football field. Full article
(This article belongs to the Special Issue Applications of Computer Vision, 3rd Edition)
Show Figures

Figure 1

Back to TopTop