MDPI - Publisher of Open Access Journals

29 pages, 8321 KiB

Open AccessArticle

Sperm YOLOv8E-TrackEVD: A Novel Approach for Sperm Detection and Tracking

by Chongming Zhang, Yaxuan Zhang, Zhanyuan Chang and Chuanjiang Li

Sensors 2024, 24(11), 3493; https://doi.org/10.3390/s24113493 - 28 May 2024

Cited by 5 | Viewed by 3673

Male infertility is a global health issue, with 40–50% attributed to sperm abnormalities. The subjectivity and irreproducibility of existing detection methods pose challenges to sperm assessment, making the design of automated semen analysis algorithms crucial for enhancing the reliability of sperm evaluations. This paper proposes a comprehensive sperm tracking algorithm (Sperm YOLOv8E-TrackEVD) that combines an enhanced YOLOv8 small object detection algorithm (SpermYOLOv8-E) with an improved DeepOCSORT tracking algorithm (SpermTrack-EVD) to detect human sperm in a microscopic field of view and track healthy sperm in a sample in a short period effectively. Firstly, we trained the improved YOLOv8 model on the VISEM-Tracking dataset for accurate sperm detection. To enhance the detection of small sperm objects, we introduced an attention mechanism, added a small object detection layer, and integrated the SPDConv and Detect_DyHead modules. Furthermore, we used a new distance metric method and chose IoU loss calculation. Ultimately, we achieved a 1.3% increase in precision, a 1.4% increase in recall rate, and a 2.0% improvement in mAP@0.5:0.95. We applied SpermYOLOv8-E combined with SpermTrack-EVD for sperm tracking. On the VISEM-Tracking dataset, we achieved 74.303% HOTA and 71.167% MOTA. These results show the effectiveness of the designed Sperm YOLOv8E-TrackEVD approach in sperm tracking scenarios. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

16 pages, 1170 KiB

Open AccessArticle

The McGurk Illusion: A Default Mechanism of the Auditory System

by Zunaira J. Iqbal, Antoine J. Shahin, Heather Bortfeld and Kristina C. Backer

Brain Sci. 2023, 13(3), 510; https://doi.org/10.3390/brainsci13030510 - 19 Mar 2023

Cited by 3 | Viewed by 3318

Abstract

Recent studies have questioned past conclusions regarding the mechanisms of the McGurk illusion, especially how McGurk susceptibility might inform our understanding of audiovisual (AV) integration. We previously proposed that the McGurk illusion is likely attributable to a default mechanism, whereby either the visual system, auditory system, or both default to specific phonemes—those implicated in the McGurk illusion. We hypothesized that the default mechanism occurs because visual stimuli with an indiscernible place of articulation (like those traditionally used in the McGurk illusion) lead to an ambiguous perceptual environment and thus a failure in AV integration. In the current study, we tested the default hypothesis as it pertains to the auditory system. Participants performed two tasks. One task was a typical McGurk illusion task, in which individuals listened to auditory-/ba/ paired with visual-/ga/ and judged what they heard. The second task was an auditory-only task, in which individuals transcribed trisyllabic words with a phoneme replaced by silence. We found that individuals’ transcription of missing phonemes often defaulted to ‘/d/t/th/’, the same phonemes often experienced during the McGurk illusion. Importantly, individuals’ default rate was positively correlated with their McGurk rate. We conclude that the McGurk illusion arises when people fail to integrate visual percepts with auditory percepts, due to visual ambiguity, thus leading the auditory system to default to phonemes often implicated in the McGurk illusion. Full article

(This article belongs to the Special Issue Advances in Understanding the Phenomena and Processing in Audiovisual Speech Perception)

► Show Figures

Figure 1

14 pages, 949 KiB

Open AccessArticle

Study on Sperm-Cell Detection Using YOLOv5 Architecture with Labaled Dataset

by Michal Dobrovolny, Jakub Benes, Jaroslav Langer, Ondrej Krejcar and Ali Selamat

Genes 2023, 14(2), 451; https://doi.org/10.3390/genes14020451 - 9 Feb 2023

Cited by 14 | Viewed by 5594

Abstract

Infertility has recently emerged as a severe medical problem. The essential elements in male infertility are sperm morphology, sperm motility, and sperm density. In order to analyze sperm motility, density, and morphology, laboratory experts do a semen analysis. However, it is simple to err when using a subjective interpretation based on laboratory observation. In this work, a computer-aided sperm count estimation approach is suggested to lessen the impact of experts in semen analysis. Object detection techniques concentrating on sperm motility estimate the number of active sperm in the semen. This study provides an overview of other techniques that we can compare. The Visem dataset from the Association for Computing Machinery was used to test the proposed strategy. We created a labelled dataset to prove that our network can detect sperms in images. The best not-super tuned result is mAP

72.15

. Full article

(This article belongs to the Special Issue Selected Papers from the 9th International Work-Conference on Bioinformatics and Biomedical Engineering (IWBBIO 2022))

► Show Figures

Figure 1

28 pages, 3976 KiB

Open AccessArticle

An Effective Conversion of Visemes to Words for High-Performance Automatic Lipreading

by Souheil Fenghour, Daqing Chen, Kun Guo, Bo Li and Perry Xiao

Sensors 2021, 21(23), 7890; https://doi.org/10.3390/s21237890 - 26 Nov 2021

Cited by 9 | Viewed by 3328

Abstract

As an alternative approach, viseme-based lipreading systems have demonstrated promising performance results in decoding videos of people uttering entire sentences. However, the overall performance of such systems has been significantly affected by the efficiency of the conversion of visemes to words during the lipreading process. As shown in the literature, the issue has become a bottleneck of such systems where the system’s performance can decrease dramatically from a high classification accuracy of visemes (e.g., over 90%) to a comparatively very low classification accuracy of words (e.g., only just over 60%). The underlying cause of this phenomenon is that roughly half of the words in the English language are homophemes, i.e., a set of visemes can map to multiple words, e.g., “time” and “some”. In this paper, aiming to tackle this issue, a deep learning network model with an Attention based Gated Recurrent Unit is proposed for efficient viseme-to-word conversion and compared against three other approaches. The proposed approach features strong robustness, high efficiency, and short execution time. The approach has been verified with analysis and practical experiments of predicting sentences from benchmark LRS2 and LRS3 datasets. The main contributions of the paper are as follows: (1) A model is developed, which is effective in converting visemes to words, discriminating between homopheme words, and is robust to incorrectly classified visemes; (2) the model proposed uses a few parameters and, therefore, little overhead and time are required to train and execute; and (3) an improved performance in predicting spoken sentences from the LRS2 dataset with an attained word accuracy rate of 79.6%—an improvement of 15.0% compared with the state-of-the-art approaches. Full article

(This article belongs to the Section Sensors and Robotics)

► Show Figures

Figure 1

14 pages, 5976 KiB

Open AccessArticle

Deep Learning Based Evaluation of Spermatozoid Motility for Artificial Insemination

by Viktorija Valiuškaitė, Vidas Raudonis, Rytis Maskeliūnas, Robertas Damaševičius and Tomas Krilavičius

Sensors 2021, 21(1), 72; https://doi.org/10.3390/s21010072 - 24 Dec 2020

Cited by 38 | Viewed by 8440

Abstract

We propose a deep learning method based on the Region Based Convolutional Neural Networks (R-CNN) architecture for the evaluation of sperm head motility in human semen videos. The neural network performs the segmentation of sperm heads, while the proposed central coordinate tracking algorithm allows us to calculate the movement speed of sperm heads. We have achieved 91.77% (95% CI, 91.11–92.43%) accuracy of sperm head detection on the VISEM (A Multimodal Video Dataset of Human Spermatozoa) sperm sample video dataset. The mean absolute error (MAE) of sperm head vitality prediction was 2.92 (95% CI, 2.46–3.37), while the Pearson correlation between actual and predicted sperm head vitality was 0.969. The results of the experiments presented below will show the applicability of the proposed method to be used in automated artificial insemination workflow. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

26 pages, 1993 KiB

Open AccessArticle

Alternative Visual Units for an Optimized Phoneme-Based Lipreading System

by Helen L. Bear and Richard Harvey

Appl. Sci. 2019, 9(18), 3870; https://doi.org/10.3390/app9183870 - 15 Sep 2019

Cited by 12 | Viewed by 3814

Abstract

Lipreading is understanding speech from observed lip movements. An observed series of lip motions is an ordered sequence of visual lip gestures. These gestures are commonly known, but as yet are not formally defined, as ‘visemes’. In this article, we describe a structured approach which allows us to create speaker-dependent visemes with a fixed number of visemes within each set. We create sets of visemes for sizes two to 45. Each set of visemes is based upon clustering phonemes, thus each set has a unique phoneme-to-viseme mapping. We first present an experiment using these maps and the Resource Management Audio-Visual (RMAV) dataset which shows the effect of changing the viseme map size in speaker-dependent machine lipreading and demonstrate that word recognition with phoneme classifiers is possible. Furthermore, we show that there are intermediate units between visemes and phonemes which are better still. Second, we present a novel two-pass training scheme for phoneme classifiers. This approach uses our new intermediary visual units from our first experiment in the first pass as classifiers; before using the phoneme-to-viseme maps, we retrain these into phoneme classifiers. This method significantly improves on previous lipreading results with RMAV speakers. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

Search Results (6)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (6)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI