Applied Sciences

Research

22 pages, 10697 KiB

Open AccessArticle

Lip-Reading Classification of Turkish Digits Using Ensemble Learning Architecture Based on 3DCNN

by Ali Erbey and Necaattin Barışçı

Appl. Sci. 2025, 15(2), 563; https://doi.org/10.3390/app15020563 - 8 Jan 2025

Viewed by 982

Understanding others correctly is of great importance for maintaining effective communication. Factors such as hearing difficulties or environmental noise can disrupt this process. Lip reading offers an effective solution to these challenges. With the growing success of deep learning architectures, research on lip [...] Read more.

Understanding others correctly is of great importance for maintaining effective communication. Factors such as hearing difficulties or environmental noise can disrupt this process. Lip reading offers an effective solution to these challenges. With the growing success of deep learning architectures, research on lip reading has gained momentum. The aim of this study is to create a lip reading dataset for Turkish digit recognition and to conduct predictive analyses. The dataset has divided into two subsets: the face region and the lip region. CNN, LSTM, and 3DCNN-based models, including C3D, I3D, and 3DCNN+BiLSTM, were used. While LSTM models are effective in processing temporal data, 3DCNN-based models, which can process both spatial and temporal information, achieved higher accuracy in this study. Experimental results showed that the dataset containing only the lip region performed better; accuracy rates for CNN, LSTM, C3D, and I3D on the lip region were 67.12%, 75.53%, 86.32%, and 93.24%, respectively. The 3DCNN-based models achieved higher accuracy due to their ability to process spatio-temporal data. Furthermore, an additional 1.23% improvement was achieved through ensemble learning, with the best result reaching 94.53% accuracy. Ensemble learning, by combining the strengths of different models, provided a meaningful improvement in overall performance. These results demonstrate that 3DCNN architectures and ensemble learning methods yield high success in addressing the problem of lip reading in the Turkish language. While our study focuses on Turkish digit recognition, the proposed methods have the potential to be successful in other languages or broader lip reading applications. Full article

(This article belongs to the Special Issue Applications, Challenges and Promises of Computer Vision and Digital Imaging Processing)

► Show Figures

Figure 1

24 pages, 9631 KiB

Open AccessArticle

Effect of Camera Choice on Image-Classification Inference

by Jason Brown, Andy Nguyen and Nawin Raj

Appl. Sci. 2025, 15(1), 246; https://doi.org/10.3390/app15010246 - 30 Dec 2024

Viewed by 1015

Abstract

The field of image classification using Convolutional Neural Networks (CNNs) to predict the principal object in an image has seen many recent innovations. One aspect that has not been extensively explored is the effect of the camera employed to acquire images for inference. [...] Read more.

The field of image classification using Convolutional Neural Networks (CNNs) to predict the principal object in an image has seen many recent innovations. One aspect that has not been extensively explored is the effect of the camera employed to acquire images for inference. We investigate this by capturing comparable images of five drinking vessels using six cameras in various scenarios. We examine the classification ranking of object classes when these images are input to an independently pretrained Resnet-18 model based on the ImageNet-1k dataset. We find that the camera used can affect the top prediction of object class, particularly in scenarios with a more complex background. This is the case even when the cameras have similar fields of view. We also introduce a metric called selectivity, defined as the mean absolute difference between prediction probabilities of similar relevant object classes (such as cups and mugs). We show that the effect of the camera is largest when the selectivity of the pretrained model between these object classes is small. The effect of camera choice is also demonstrated quantitatively by examining Cohen’s Kappa (κ) statistic. Finally, we make recommendations on mitigating the effect of the camera on image-classification inference. Full article

(This article belongs to the Special Issue Applications, Challenges and Promises of Computer Vision and Digital Imaging Processing)

► Show Figures

Figure 1

26 pages, 29170 KiB

Open AccessArticle

Real-Time Video Processing for Measuring Zigzag Length of Pantograph–Catenary Systems Based on GPS Correlation

by Caius Panoiu, Gabriel Militaru and Manuela Panoiu

Appl. Sci. 2024, 14(20), 9252; https://doi.org/10.3390/app14209252 - 11 Oct 2024

Cited by 1 | Viewed by 1248

Abstract

Recent years have seen outstanding developments in research and technology, highlighting the importance of railway transportation, especially the implementation of high-speed trains, which is becoming more and more challenging. This facilitates extensive research into the science and technology of the electrical interaction between [...] Read more.

Recent years have seen outstanding developments in research and technology, highlighting the importance of railway transportation, especially the implementation of high-speed trains, which is becoming more and more challenging. This facilitates extensive research into the science and technology of the electrical interaction between the components of pantograph–catenary systems (PCSs). Problems regarding the PCS can result in infrastructure incidents, potentially stopping train operations. A common cause of failure in electrified railway PCS is a contact wire’s zigzag length that exceeds the prescribed technical limit, which can be caused by missing droppers or faults in the mounting mechanism. This work proposes a video camera-based monitoring technique for zigzag geometry measurement that additionally employs a Global Positioning System (GPS) sensor to detect the current geographical position of the point of zigzag length measurement. There are two proposed techniques for measuring the length of the zigzag based on image processing. In the first technique, using previously recorded data, the images are analyzed in the laboratory, and in the second, the images are analyzed in real time. Based on the results, we suggest a model and prediction of zigzag length employing hybrid deep neural networks. Full article

(This article belongs to the Special Issue Applications, Challenges and Promises of Computer Vision and Digital Imaging Processing)

► Show Figures

Figure 1

29 pages, 9127 KiB

Open AccessArticle

A Remote Access Server with Chatbot User Interface for Coffee Grinder Burr Wear Level Assessment Based on Imaging Granule Analysis and Deep Learning Techniques

by Chih-Yung Chen, Shang-Feng Lin, Yuan-Wei Tseng, Zhe-Wei Dong and Cheng-Han Cai

Appl. Sci. 2024, 14(3), 1315; https://doi.org/10.3390/app14031315 - 5 Feb 2024

Viewed by 2538

Abstract

Coffee chains are very popular around the world. Because overly worn coffee grinder burrs can downgrade the taste of coffee, coffee experts and professional cuppers in an anonymous coffee chain have developed a manual method to classify coffee grinder burr wear so that [...] Read more.

Coffee chains are very popular around the world. Because overly worn coffee grinder burrs can downgrade the taste of coffee, coffee experts and professional cuppers in an anonymous coffee chain have developed a manual method to classify coffee grinder burr wear so that worn burrs can be replaced in time to maintain the good taste of coffee. In this paper, a remote access server system that can mimic the ability of those recognized coffee experts and professional cuppers to classify coffee grinder burr wear has been developed. Users only need to first upload a photo of coffee granules ground by a grinder to the system through a chatbot interface; then, they can receive the burr wear classification result from the remote server in a minute. The system first uses image processing to obtain the coffee granules’ size distribution. Based on the size distributions, unified length data inputs are then obtained to train and test the deep learning model so that it can classify the burr wear level into initial wear, normal wear, and severe wear with more than 96% accuracy. As only a mobile phone is needed to use this service, the proposed system is very suitable for both coffee chains and coffee lovers. Full article

(This article belongs to the Special Issue Applications, Challenges and Promises of Computer Vision and Digital Imaging Processing)

► Show Figures

Figure 1

17 pages, 2294 KiB

Open AccessArticle

Action Detection for Wildlife Monitoring with Camera Traps Based on Segmentation with Filtering of Tracklets (SWIFT) and Mask-Guided Action Recognition (MAROON)

by Frank Schindler, Volker Steinhage, Suzanne T. S. van Beeck Calkoen and Marco Heurich

Appl. Sci. 2024, 14(2), 514; https://doi.org/10.3390/app14020514 - 6 Jan 2024

Cited by 9 | Viewed by 3223

Abstract

Behavioral analysis of animals in the wild plays an important role for ecological research and conservation and has been mostly performed by researchers. We introduce an action detection approach that automates this process by detecting animals and performing action recognition on the detected [...] Read more.

Behavioral analysis of animals in the wild plays an important role for ecological research and conservation and has been mostly performed by researchers. We introduce an action detection approach that automates this process by detecting animals and performing action recognition on the detected animals in camera trap videos. Our action detection approach is based on SWIFT (segmentation with filtering of tracklets), which we have already shown to successfully detect and track animals in wildlife videos, and MAROON (mask-guided action recognition), an action recognition network that we are introducing here. The basic ideas of MAROON are the exploitation of the instance masks detected by SWIFT and a triple-stream network. The instance masks enable more accurate action recognition, especially if multiple animals appear in a video at the same time. The triple-stream approach extracts features for the motion and appearance of the animal. We evaluate the quality of our action recognition on two self-generated datasets, from an animal enclosure and from the wild. These datasets contain videos of red deer, fallow deer and roe deer, recorded both during the day and night. MAROON improves the action recognition accuracy compared to other state-of-the-art approaches by an average of 10 percentage points on all analyzed datasets and achieves an accuracy of

69.16 %

on the Rolandseck Daylight dataset, in which 11 different action classes occur. Our action detection system makes it possible todrasticallyreduce the manual work of ecologists and at the same time gain new insights through standardized results. Full article

(This article belongs to the Special Issue Applications, Challenges and Promises of Computer Vision and Digital Imaging Processing)

► Show Figures

Figure 1

23 pages, 19246 KiB

Open AccessArticle

Comparison of the Performance of Convolutional Neural Networks and Vision Transformer-Based Systems for Automated Glaucoma Detection with Eye Fundus Images

by Silvia Alayón, Jorge Hernández, Francisco J. Fumero, Jose F. Sigut and Tinguaro Díaz-Alemán

Appl. Sci. 2023, 13(23), 12722; https://doi.org/10.3390/app132312722 - 27 Nov 2023

Cited by 11 | Viewed by 3393

Abstract

Glaucoma, a disease that damages the optic nerve, is the leading cause of irreversible blindness worldwide. The early detection of glaucoma is a challenge, which in recent years has driven the study and application of Deep Learning (DL) techniques in the automatic classification [...] Read more.

Glaucoma, a disease that damages the optic nerve, is the leading cause of irreversible blindness worldwide. The early detection of glaucoma is a challenge, which in recent years has driven the study and application of Deep Learning (DL) techniques in the automatic classification of eye fundus images. Among these intelligent systems, Convolutional Neural Networks (CNNs) stand out, although alternatives have recently appeared, such as Vision Transformers (ViTs) or hybrid systems, which are also highly efficient in image processing. The question that arises in the face of so many emerging methods is whether all these new techniques are really more efficient for the problem of glaucoma diagnosis than the CNNs that have been used so far. In this article, we present a comprehensive comparative study of all these DL models in glaucoma detection, with the aim of elucidating which strategies are significantly better. Our main conclusion is that there are no significant differences between the efficiency of both DL strategies for the medical diagnostic problem addressed. Full article

(This article belongs to the Special Issue Applications, Challenges and Promises of Computer Vision and Digital Imaging Processing)

► Show Figures

Figure 1

16 pages, 6034 KiB

Open AccessArticle

Computer-Aided Visual Inspection of Glass-Coated Tableware Ceramics for Multi-Class Defect Detection

by Rafaela Carvalho, Ana C. Morgado, João Gonçalves, Anil Kumar, Alberto Gil e Sá Rolo, Rui Carreira and Filipe Soares

Appl. Sci. 2023, 13(21), 11708; https://doi.org/10.3390/app132111708 - 26 Oct 2023

Cited by 3 | Viewed by 1952

Abstract

Quality control procedures in the manufacturing of tableware ceramics require a demanding, monotonous, subjective, and faulty human manual inspection. This paper presents two machine learning strategies and the results of a semi-automated visual inspection of ceramics tableware applied to a private dataset acquired [...] Read more.

Quality control procedures in the manufacturing of tableware ceramics require a demanding, monotonous, subjective, and faulty human manual inspection. This paper presents two machine learning strategies and the results of a semi-automated visual inspection of ceramics tableware applied to a private dataset acquired during the VAICeramics project. In one method, an anomaly detection step was integrated to pre-select possible defective patches before passing through an object detector and defects classifier. In the alternative one, all patches are directly provided to the object detector and then go through the classification phase. Contrary to expectations, the inclusion of the anomaly detector demonstrated a slight reduction in the performance of the pipeline, which may result from error propagation. Regarding the proposed methodology for defect detection, it exhibits average performance in monochromatic images with more than 600 real defects in total, efficiently identifying the most common defect classes in highly reflective surfaces. However, when applied to newly acquired images, the pipeline encounters challenges revealing a lack of generalization ability and experiencing limitations in detecting specific defect classes, due to their appearance and limited available samples used for training. Only two defect types presented high classification performance, namely Dots and Cracked defects. Full article

(This article belongs to the Special Issue Applications, Challenges and Promises of Computer Vision and Digital Imaging Processing)

► Show Figures

Figure 1

15 pages, 6249 KiB

Open AccessArticle

Software Application for Automatic Detection and Analysis of Biomass in Underwater Videos

by Manuel Rodríguez Valido, Peña Fabiani Bendicho, Miguel Martín Reyes and Alicia Rodríguez-Juncá

Appl. Sci. 2023, 13(19), 10870; https://doi.org/10.3390/app131910870 - 30 Sep 2023

Cited by 1 | Viewed by 1648

Abstract

The use of underwater recording is widely implemented across different marine ecology studies as a substitute for more invasive techniques. This is the case of the Deep Scattering Layer (DSL), a biomass-rich layer in the ocean located between 400 and 600 m deep. [...] Read more.

The use of underwater recording is widely implemented across different marine ecology studies as a substitute for more invasive techniques. This is the case of the Deep Scattering Layer (DSL), a biomass-rich layer in the ocean located between 400 and 600 m deep. The data processing of underwater videos has usually been carried out manually or targets organisms above a certain size. Marine snow, or macroscopic amorphous aggregates, plays a major role in nutrient cycles and in the supply of organic material for organisms living in the deeper layers of the ocean. Marine snow, therefore, should be taken into account when estimating biomass abundance in the water column. The main objective of this project is to develop a new software application for the automatic detection and analysis of biomass abundance relative to time in underwater videos, taking into consideration small items. The application software is based on a pipeline and client-server architecture, developed in Python and using open source libraries. The software was trained with underwater videos of the DSL recorded with low-cost equipment. A usability study carried out with end-users shows satisfaction with the user-friendly interface and the expected results. The software application developed is capable of automatically detecting small items captured by underwater videos. In addition, it can be easily adapted to a web application. Full article

(This article belongs to the Special Issue Applications, Challenges and Promises of Computer Vision and Digital Imaging Processing)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Applications, Challenges and Promises of Computer Vision and Digital Imaging Processing

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (8 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI