applsci-logo

Journal Browser

Journal Browser

Applications, Challenges and Promises of Computer Vision and Digital Imaging Processing

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (10 December 2024) | Viewed by 14616

Special Issue Editors


E-Mail Website
Guest Editor
Department of Industrial Engineering, Universidad de La Laguna, 38203 San Cristóbal de La Laguna, Spain
Interests: smart sensor networks; FPGA image processing; Internet of Things; autonomous driving; sustainable electric mobility
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Computer Engineering and Systems, Universidad de La Laguna, 38203 San Cristóbal de La Laguna, Spain
Interests: image and video processing; computer vision; virtual reality
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Departamento de Tecnología Electrónica y de las Comunicaciones, Universidad Autónoma de Madrid, 28049 Madrid, Spain
Interests: high-performance computing (HPC)
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Computer vision has recently experienced a significant increase in its overall adoption, ranging from facial recognition used in smartphones to navigation in self-driving vehicles. Part of this success can be attributed to the integration of techniques from other areas such as Artificial Intelligence (AI) within the field of Image Processing.

Real-world applications demand new ideas and techniques that solve practical problems. Consequently, this Special Issue is intended to present innovative solutions in this field, from their conception and analysis to their implementation.

Topics of interest for this Special Issue include but are not limited to the following:

  • Low-Level Vision Techniques
  • Detection, Recognition, Classification and Localization in 2D/3D
  • Shape estimation
  • 3D and Multiview processing and sensors
  • Motion and Tracking
  • Image and Video Understanding
  • Image/Video Synthesis and Generative Models
  • Integration of AI techniques

In addition, application areas of interest include but are not limited to:

  • Agriculture
  • Healthcare
  • Manufacturing
  • Remote Sensing
  • Retail
  • Robotics
  • Security
  • Sports
  • Transport
  • Virtual Reality

Dr. Manuel Jesús Rodríguez Valido
Dr. Fernando Perez Nava
Prof. Dr. Gustavo Sutter
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • image and video processing
  • image and video understanding
  • computer vision applications
  • machine learning
  • artificial intelligence

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

22 pages, 10697 KiB  
Article
Lip-Reading Classification of Turkish Digits Using Ensemble Learning Architecture Based on 3DCNN
by Ali Erbey and Necaattin Barışçı
Appl. Sci. 2025, 15(2), 563; https://doi.org/10.3390/app15020563 - 8 Jan 2025
Viewed by 774
Abstract
Understanding others correctly is of great importance for maintaining effective communication. Factors such as hearing difficulties or environmental noise can disrupt this process. Lip reading offers an effective solution to these challenges. With the growing success of deep learning architectures, research on lip [...] Read more.
Understanding others correctly is of great importance for maintaining effective communication. Factors such as hearing difficulties or environmental noise can disrupt this process. Lip reading offers an effective solution to these challenges. With the growing success of deep learning architectures, research on lip reading has gained momentum. The aim of this study is to create a lip reading dataset for Turkish digit recognition and to conduct predictive analyses. The dataset has divided into two subsets: the face region and the lip region. CNN, LSTM, and 3DCNN-based models, including C3D, I3D, and 3DCNN+BiLSTM, were used. While LSTM models are effective in processing temporal data, 3DCNN-based models, which can process both spatial and temporal information, achieved higher accuracy in this study. Experimental results showed that the dataset containing only the lip region performed better; accuracy rates for CNN, LSTM, C3D, and I3D on the lip region were 67.12%, 75.53%, 86.32%, and 93.24%, respectively. The 3DCNN-based models achieved higher accuracy due to their ability to process spatio-temporal data. Furthermore, an additional 1.23% improvement was achieved through ensemble learning, with the best result reaching 94.53% accuracy. Ensemble learning, by combining the strengths of different models, provided a meaningful improvement in overall performance. These results demonstrate that 3DCNN architectures and ensemble learning methods yield high success in addressing the problem of lip reading in the Turkish language. While our study focuses on Turkish digit recognition, the proposed methods have the potential to be successful in other languages or broader lip reading applications. Full article
Show Figures

Figure 1

24 pages, 9631 KiB  
Article
Effect of Camera Choice on Image-Classification Inference
by Jason Brown, Andy Nguyen and Nawin Raj
Appl. Sci. 2025, 15(1), 246; https://doi.org/10.3390/app15010246 - 30 Dec 2024
Viewed by 735
Abstract
The field of image classification using Convolutional Neural Networks (CNNs) to predict the principal object in an image has seen many recent innovations. One aspect that has not been extensively explored is the effect of the camera employed to acquire images for inference. [...] Read more.
The field of image classification using Convolutional Neural Networks (CNNs) to predict the principal object in an image has seen many recent innovations. One aspect that has not been extensively explored is the effect of the camera employed to acquire images for inference. We investigate this by capturing comparable images of five drinking vessels using six cameras in various scenarios. We examine the classification ranking of object classes when these images are input to an independently pretrained Resnet-18 model based on the ImageNet-1k dataset. We find that the camera used can affect the top prediction of object class, particularly in scenarios with a more complex background. This is the case even when the cameras have similar fields of view. We also introduce a metric called selectivity, defined as the mean absolute difference between prediction probabilities of similar relevant object classes (such as cups and mugs). We show that the effect of the camera is largest when the selectivity of the pretrained model between these object classes is small. The effect of camera choice is also demonstrated quantitatively by examining Cohen’s Kappa (κ) statistic. Finally, we make recommendations on mitigating the effect of the camera on image-classification inference. Full article
Show Figures

Figure 1

26 pages, 29170 KiB  
Article
Real-Time Video Processing for Measuring Zigzag Length of Pantograph–Catenary Systems Based on GPS Correlation
by Caius Panoiu, Gabriel Militaru and Manuela Panoiu
Appl. Sci. 2024, 14(20), 9252; https://doi.org/10.3390/app14209252 - 11 Oct 2024
Viewed by 1032
Abstract
Recent years have seen outstanding developments in research and technology, highlighting the importance of railway transportation, especially the implementation of high-speed trains, which is becoming more and more challenging. This facilitates extensive research into the science and technology of the electrical interaction between [...] Read more.
Recent years have seen outstanding developments in research and technology, highlighting the importance of railway transportation, especially the implementation of high-speed trains, which is becoming more and more challenging. This facilitates extensive research into the science and technology of the electrical interaction between the components of pantograph–catenary systems (PCSs). Problems regarding the PCS can result in infrastructure incidents, potentially stopping train operations. A common cause of failure in electrified railway PCS is a contact wire’s zigzag length that exceeds the prescribed technical limit, which can be caused by missing droppers or faults in the mounting mechanism. This work proposes a video camera-based monitoring technique for zigzag geometry measurement that additionally employs a Global Positioning System (GPS) sensor to detect the current geographical position of the point of zigzag length measurement. There are two proposed techniques for measuring the length of the zigzag based on image processing. In the first technique, using previously recorded data, the images are analyzed in the laboratory, and in the second, the images are analyzed in real time. Based on the results, we suggest a model and prediction of zigzag length employing hybrid deep neural networks. Full article
Show Figures

Figure 1

29 pages, 9127 KiB  
Article
A Remote Access Server with Chatbot User Interface for Coffee Grinder Burr Wear Level Assessment Based on Imaging Granule Analysis and Deep Learning Techniques
by Chih-Yung Chen, Shang-Feng Lin, Yuan-Wei Tseng, Zhe-Wei Dong and Cheng-Han Cai
Appl. Sci. 2024, 14(3), 1315; https://doi.org/10.3390/app14031315 - 5 Feb 2024
Viewed by 2367
Abstract
Coffee chains are very popular around the world. Because overly worn coffee grinder burrs can downgrade the taste of coffee, coffee experts and professional cuppers in an anonymous coffee chain have developed a manual method to classify coffee grinder burr wear so that [...] Read more.
Coffee chains are very popular around the world. Because overly worn coffee grinder burrs can downgrade the taste of coffee, coffee experts and professional cuppers in an anonymous coffee chain have developed a manual method to classify coffee grinder burr wear so that worn burrs can be replaced in time to maintain the good taste of coffee. In this paper, a remote access server system that can mimic the ability of those recognized coffee experts and professional cuppers to classify coffee grinder burr wear has been developed. Users only need to first upload a photo of coffee granules ground by a grinder to the system through a chatbot interface; then, they can receive the burr wear classification result from the remote server in a minute. The system first uses image processing to obtain the coffee granules’ size distribution. Based on the size distributions, unified length data inputs are then obtained to train and test the deep learning model so that it can classify the burr wear level into initial wear, normal wear, and severe wear with more than 96% accuracy. As only a mobile phone is needed to use this service, the proposed system is very suitable for both coffee chains and coffee lovers. Full article
Show Figures

Figure 1

17 pages, 2294 KiB  
Article
Action Detection for Wildlife Monitoring with Camera Traps Based on Segmentation with Filtering of Tracklets (SWIFT) and Mask-Guided Action Recognition (MAROON)
by Frank Schindler, Volker Steinhage, Suzanne T. S. van Beeck Calkoen and Marco Heurich
Appl. Sci. 2024, 14(2), 514; https://doi.org/10.3390/app14020514 - 6 Jan 2024
Cited by 7 | Viewed by 2756
Abstract
Behavioral analysis of animals in the wild plays an important role for ecological research and conservation and has been mostly performed by researchers. We introduce an action detection approach that automates this process by detecting animals and performing action recognition on the detected [...] Read more.
Behavioral analysis of animals in the wild plays an important role for ecological research and conservation and has been mostly performed by researchers. We introduce an action detection approach that automates this process by detecting animals and performing action recognition on the detected animals in camera trap videos. Our action detection approach is based on SWIFT (segmentation with filtering of tracklets), which we have already shown to successfully detect and track animals in wildlife videos, and MAROON (mask-guided action recognition), an action recognition network that we are introducing here. The basic ideas of MAROON are the exploitation of the instance masks detected by SWIFT and a triple-stream network. The instance masks enable more accurate action recognition, especially if multiple animals appear in a video at the same time. The triple-stream approach extracts features for the motion and appearance of the animal. We evaluate the quality of our action recognition on two self-generated datasets, from an animal enclosure and from the wild. These datasets contain videos of red deer, fallow deer and roe deer, recorded both during the day and night. MAROON improves the action recognition accuracy compared to other state-of-the-art approaches by an average of 10 percentage points on all analyzed datasets and achieves an accuracy of 69.16% on the Rolandseck Daylight dataset, in which 11 different action classes occur. Our action detection system makes it possible todrasticallyreduce the manual work of ecologists and at the same time gain new insights through standardized results. Full article
Show Figures

Figure 1

23 pages, 19246 KiB  
Article
Comparison of the Performance of Convolutional Neural Networks and Vision Transformer-Based Systems for Automated Glaucoma Detection with Eye Fundus Images
by Silvia Alayón, Jorge Hernández, Francisco J. Fumero, Jose F. Sigut and Tinguaro Díaz-Alemán
Appl. Sci. 2023, 13(23), 12722; https://doi.org/10.3390/app132312722 - 27 Nov 2023
Cited by 9 | Viewed by 2745
Abstract
Glaucoma, a disease that damages the optic nerve, is the leading cause of irreversible blindness worldwide. The early detection of glaucoma is a challenge, which in recent years has driven the study and application of Deep Learning (DL) techniques in the automatic classification [...] Read more.
Glaucoma, a disease that damages the optic nerve, is the leading cause of irreversible blindness worldwide. The early detection of glaucoma is a challenge, which in recent years has driven the study and application of Deep Learning (DL) techniques in the automatic classification of eye fundus images. Among these intelligent systems, Convolutional Neural Networks (CNNs) stand out, although alternatives have recently appeared, such as Vision Transformers (ViTs) or hybrid systems, which are also highly efficient in image processing. The question that arises in the face of so many emerging methods is whether all these new techniques are really more efficient for the problem of glaucoma diagnosis than the CNNs that have been used so far. In this article, we present a comprehensive comparative study of all these DL models in glaucoma detection, with the aim of elucidating which strategies are significantly better. Our main conclusion is that there are no significant differences between the efficiency of both DL strategies for the medical diagnostic problem addressed. Full article
Show Figures

Figure 1

16 pages, 6034 KiB  
Article
Computer-Aided Visual Inspection of Glass-Coated Tableware Ceramics for Multi-Class Defect Detection
by Rafaela Carvalho, Ana C. Morgado, João Gonçalves, Anil Kumar, Alberto Gil e Sá Rolo, Rui Carreira and Filipe Soares
Appl. Sci. 2023, 13(21), 11708; https://doi.org/10.3390/app132111708 - 26 Oct 2023
Cited by 2 | Viewed by 1687
Abstract
Quality control procedures in the manufacturing of tableware ceramics require a demanding, monotonous, subjective, and faulty human manual inspection. This paper presents two machine learning strategies and the results of a semi-automated visual inspection of ceramics tableware applied to a private dataset acquired [...] Read more.
Quality control procedures in the manufacturing of tableware ceramics require a demanding, monotonous, subjective, and faulty human manual inspection. This paper presents two machine learning strategies and the results of a semi-automated visual inspection of ceramics tableware applied to a private dataset acquired during the VAICeramics project. In one method, an anomaly detection step was integrated to pre-select possible defective patches before passing through an object detector and defects classifier. In the alternative one, all patches are directly provided to the object detector and then go through the classification phase. Contrary to expectations, the inclusion of the anomaly detector demonstrated a slight reduction in the performance of the pipeline, which may result from error propagation. Regarding the proposed methodology for defect detection, it exhibits average performance in monochromatic images with more than 600 real defects in total, efficiently identifying the most common defect classes in highly reflective surfaces. However, when applied to newly acquired images, the pipeline encounters challenges revealing a lack of generalization ability and experiencing limitations in detecting specific defect classes, due to their appearance and limited available samples used for training. Only two defect types presented high classification performance, namely Dots and Cracked defects. Full article
Show Figures

Figure 1

15 pages, 6249 KiB  
Article
Software Application for Automatic Detection and Analysis of Biomass in Underwater Videos
by Manuel Rodríguez Valido, Peña Fabiani Bendicho, Miguel Martín Reyes and Alicia Rodríguez-Juncá
Appl. Sci. 2023, 13(19), 10870; https://doi.org/10.3390/app131910870 - 30 Sep 2023
Cited by 1 | Viewed by 1505
Abstract
The use of underwater recording is widely implemented across different marine ecology studies as a substitute for more invasive techniques. This is the case of the Deep Scattering Layer (DSL), a biomass-rich layer in the ocean located between 400 and 600 m deep. [...] Read more.
The use of underwater recording is widely implemented across different marine ecology studies as a substitute for more invasive techniques. This is the case of the Deep Scattering Layer (DSL), a biomass-rich layer in the ocean located between 400 and 600 m deep. The data processing of underwater videos has usually been carried out manually or targets organisms above a certain size. Marine snow, or macroscopic amorphous aggregates, plays a major role in nutrient cycles and in the supply of organic material for organisms living in the deeper layers of the ocean. Marine snow, therefore, should be taken into account when estimating biomass abundance in the water column. The main objective of this project is to develop a new software application for the automatic detection and analysis of biomass abundance relative to time in underwater videos, taking into consideration small items. The application software is based on a pipeline and client-server architecture, developed in Python and using open source libraries. The software was trained with underwater videos of the DSL recorded with low-cost equipment. A usability study carried out with end-users shows satisfaction with the user-friendly interface and the expected results. The software application developed is capable of automatically detecting small items captured by underwater videos. In addition, it can be easily adapted to a web application. Full article
Show Figures

Figure 1

Back to TopTop