Submit to Applied Sciences Review for Applied Sciences Propose a Special Issue

Journal Menu

Journal Browser

Research on Machine Learning in Computer Vision

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (25 July 2025) | Viewed by 20964

Share This Special Issue

Special Issue Editors

Dr. Eleonora Iotti

E-Mail Website
Guest Editor

Department of Mathematical, Physical and Computer Sciences, University of Parma, 43124 Parma, Italy
Interests: computer science; feature extraction; deep learning; meta-learning; computer vision
Special Issues, Collections and Topics in MDPI journals

Prof. Dr. João M. F. Rodrigues

E-Mail Website1 Website2
Guest Editor

NOVA LINCS and Instituto Superior de Engenharia (ISE), University of the Algarve, 8005-139 Faro, Portugal
Interests: computer vision; human–computer interaction; human–machine cooperation; artificial intelligence
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

This Special Issue is dedicated to the exploration of the latest advancements in Machine Learning (ML) as they apply to computer vision. It is well known that the rapid progress and use of ML techniques have significantly enhanced the capabilities of computer vision systems, enabling them to interpret visual data with unprecedented effectiveness.

The aim of this Special Issue is to delve into and discuss how the most recent ML approaches, including but not limited to the field of deep learning, are being successfully applied to various computer vision tasks. These tasks include object detection, image retrieval, segmentation, recognition, and more.

We find particular interest in ML techniques such as meta-learning, reinforcement learning, and unsupervised and semi-supervised learning. We especially welcome contributions that address the challenges encountered in deploying these techniques, such as the demand for large datasets and high computational power, and that discuss and propose potential solutions, with a specific focus on one-shot or few-shot approaches. Moreover, contributions that highlight the impact of these advancements on various application domains, like healthcare, autonomous vehicles, and surveillance, are also welcomed.

Dr. Eleonora Iotti
Prof. Dr. João M. F. Rodrigues
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

machine learning
computer vision
one- and few-shot learning
meta-learning
reinforcement learning
unsupervised and semi-supervised learning
ML-based computer vision applications

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (10 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

23 pages, 1410 KiB

Open AccessArticle

PneumoNet: Artificial Intelligence Assistance for Pneumonia Detection on X-Rays

by Carlos Antunes, João M. F. Rodrigues and António Cunha

Appl. Sci. 2025, 15(13), 7605; https://doi.org/10.3390/app15137605 - 7 Jul 2025

Viewed by 292

Abstract

Pneumonia is a respiratory condition caused by various microorganisms, including bacteria, viruses, fungi, and parasites. It manifests with symptoms such as coughing, chest pain, fever, breathing difficulties, and fatigue. Early and accurate detection is crucial for effective treatment, yet traditional diagnostic methods often fall short in reliability and speed. Chest X-rays have become widely used for detecting pneumonia; however, current approaches still struggle with achieving high accuracy and interpretability, leaving room for improvement. PneumoNet, an artificial intelligence assistant for X-ray pneumonia detection, is proposed in this work. The framework comprises (a) a new deep learning-based classification model for the detection of pneumonia, which expands on the AlexNet backbone for feature extraction in X-ray images and a new head in its final layers that is tailored for (X-ray) pneumonia classification. (b) GPT-Neo, a large language model, which is used to integrate the results and produce medical reports. The classification model is trained and evaluated on three publicly available datasets to ensure robustness and generalisability. Using multiple datasets mitigates biases from single-source data, addresses variations in patient demographics, and allows for meaningful performance comparisons with prior research. PneumoNet classifier achieves accuracy rates between 96.70% and 98.70% in those datasets. Full article

(This article belongs to the Special Issue Research on Machine Learning in Computer Vision)

► Show Figures

Figure 1

21 pages, 2806 KiB

Open AccessArticle

A Computer-Aided Approach to Canine Hip Dysplasia Assessment: Measuring Femoral Head–Acetabulum Distance with Deep Learning

by Pedro Franco-Gonçalo, Pedro Leite, Sofia Alves-Pimenta, Bruno Colaço, Lio Gonçalves, Vítor Filipe, Fintan McEvoy, Manuel Ferreira and Mário Ginja

Appl. Sci. 2025, 15(9), 5087; https://doi.org/10.3390/app15095087 - 3 May 2025

Viewed by 611

Abstract

Canine hip dysplasia (CHD) screening relies on radiographic assessment, but traditional scoring methods often lack consistency due to inter-rater variability. This study presents an AI-driven system for automated measurement of the femoral head center to dorsal acetabular edge (FHC/DAE) distance, a key metric in CHD evaluation. Unlike most AI models that directly classify CHD severity using convolutional neural networks, this system provides an interpretable, measurement-based output to support a more transparent evaluation. The system combines a keypoint regression model for femoral head center localization with a U-Net-based segmentation model for acetabular edge delineation. It was trained on 7967 images for hip joint detection, 571 for keypoints, and 624 for acetabulum segmentation, all from ventrodorsal hip-extended radiographs. On a test set of 70 images, the keypoint model achieved high precision (Euclidean Distance = 0.055 mm; Mean Absolute Error = 0.0034 mm; Mean Squared Error = 2.52 × 10⁻⁵ mm²), while the segmentation model showed strong performance (Dice Score = 0.96; Intersection over Union = 0.92). Comparison with expert annotations demonstrated strong agreement (Intraclass Correlation Coefficients = 0.97 and 0.93; Weighted Kappa = 0.86 and 0.79; Standard Error of Measurement = 0.92 to 1.34 mm). By automating anatomical landmark detection, the system enhances standardization, reproducibility, and interpretability in CHD radiographic assessment. Its strong alignment with expert evaluations supports its integration into CHD screening workflows for more objective and efficient diagnosis and CHD scoring. Full article

(This article belongs to the Special Issue Research on Machine Learning in Computer Vision)

► Show Figures

Figure 1

20 pages, 2777 KiB

Open AccessArticle

Video Human Action Recognition Based on Motion-Tempo Learning and Feedback Attention

by Yalong Liu, Chengwu Liang, Songqi Jiang and Peiwang Zhu

Appl. Sci. 2025, 15(8), 4186; https://doi.org/10.3390/app15084186 - 10 Apr 2025

Viewed by 580

Abstract

In video human action-recognition tasks, motion tempo describes the dynamic patterns and temporal scales of human motion. Different categories of actions are typically composed of sub-actions with varying motion tempos. Effectively capturing sub-actions with different motion tempos and distinguishing category-specific sub-actions are crucial for improving action-recognition performance. Convolutional Neural Network (CNN)-based methods attempted to address this challenge, by embedding feedforward attention modules to enhance the action’s dynamic representation learning. However, feedforward attention modules rely only on local information from low-level features, lacking contextual information to generate attention weights. Therefore, we propose a Sub-action Motion information Enhancement Network (SMEN) based on motion-tempo learning and feedback attention, which consists of the Multi-Granularity Adaptive Fusion Module (MgAFM) and Feedback Attention-Guided Module (FAGM). MgAFM enhances the model’s ability to capture crucial sub-action intrinsic information by extracting and adaptively fusing motion dynamic features at different granularities. FAGM leverages high-level features that contain contextual information in a feedback manner to guide low-level features in generating attention weights, enhancing the model’s ability to extract more discriminative spatio-temporal and channel-wise features. Experiments are conducted on three datasets, and the proposed SMEN achieves top-1 accuracies of 52.4%, 63.3% on the Something-Something V1 and V2 datasets, and 76.9% on the Kinetics-400 dataset. Ablation studies, evaluations, and visualizations demonstrate that the proposed SMEN is effective for sub-action motion tempo and representation learning, and outperforms compared methods for video action recognition. Full article

(This article belongs to the Special Issue Research on Machine Learning in Computer Vision)

► Show Figures

Figure 1

16 pages, 6883 KiB

Open AccessArticle

Integrated AI System for Real-Time Sports Broadcasting: Player Behavior, Game Event Recognition, and Generative AI Commentary in Basketball Games

by Sunghoon Jung, Hanmoe Kim, Hyunseo Park and Ahyoung Choi

Appl. Sci. 2025, 15(3), 1543; https://doi.org/10.3390/app15031543 - 3 Feb 2025

Viewed by 4456

Abstract

This study presents an AI-based sports broadcasting system capable of real-time game analysis and automated commentary. The model first acquires essential background knowledge, including the court layout, game rules, team information, and player details. YOLO model-based segmentation is applied for a local camera view to enhance court recognition accuracy. Player’s actions and ball tracking is performed through YOLO algorithms. In each frame, the YOLO detection model is used to detect the bounding boxes of the players. Then, we proposed our tracking algorithm, which computed the IoU from previous frames and linked together to track the movement paths of the players. Player behavior is achieved via the R(2+1)D action recognition model including player actions such as running, dribbling, shooting, and blocking. The system demonstrates high performance, achieving an average accuracy of 97% in court calibration, 92.5% in player and object detection, and 85.04% in action recognition. Key game events are identified based on positional and action data, with broadcast lines generated using GPT APIs and converted to natural audio commentary via Text-to-Speech (TTS). This system offers a comprehensive framework for automating sports broadcasting with advanced AI techniques. Full article

(This article belongs to the Special Issue Research on Machine Learning in Computer Vision)

► Show Figures

Figure 1

23 pages, 20134 KiB

Open AccessArticle

The Development and Validation of an Artificial Intelligence Model for Estimating Thumb Range of Motion Using Angle Sensors and Machine Learning: Targeting Radial Abduction, Palmar Abduction, and Pronation Angles

by Yutaka Ehara, Atsuyuki Inui, Yutaka Mifune, Kohei Yamaura, Tatsuo Kato, Takahiro Furukawa, Shuya Tanaka, Masaya Kusunose, Shunsaku Takigami, Shin Osawa, Daiji Nakabayashi, Shinya Hayashi, Tomoyuki Matsumoto, Takehiko Matsushita and Ryosuke Kuroda

Appl. Sci. 2025, 15(3), 1296; https://doi.org/10.3390/app15031296 - 27 Jan 2025

Viewed by 1170

Abstract

An accurate assessment of thumb range of motion is crucial for diagnosing musculoskeletal conditions, evaluating functional impairments, and planning effective rehabilitation strategies. In this study, we aimed to enhance the accuracy of estimating thumb range of motion using a combination of MediaPipe, which is an AI-based posture estimation library, and machine learning methods, taking the values obtained using angle sensors to be the true values. Radial abduction, palmar abduction, and pronation angles were estimated using MediaPipe based on coordinates detected from videos of 18 healthy participants (nine males and nine females with an age range of 30–49 years) selected to reflect a balanced distribution of height and other physical characteristics. A conical thumb movement model was constructed, and parameters were generated based on the coordinate data. Five machine learning models were evaluated, with LightGBM achieving the highest accuracy across all metrics. Specifically, for radial abduction, palmar abduction, and supination, the root mean square error (RMSE), mean absolute error (MAE), coefficient of determination (R²), and correlation coefficient were 4.67°, 3.41°, 0.94, and 0.97; 4.63°, 3.41°, 0.95, and 0.98; and 5.69°, 4.17°, 0.88, and 0.94, respectively. These results demonstrate that when estimating thumb range of motion, the AI model trained using angle sensor data and LightGBM achieved accuracy that was high and comparable to that of prior methods involving the use of MediaPipe and a protractor. Full article

(This article belongs to the Special Issue Research on Machine Learning in Computer Vision)

► Show Figures

Figure 1

16 pages, 2038 KiB

Open AccessArticle

Enhancing Colony Detection of Microorganisms in Agar Dishes Using SAM-Based Synthetic Data Augmentation in Low-Data Scenarios

by Kim Mennemann, Nikolas Ebert, Laurenz Reichardt and Oliver Wasenmüller

Appl. Sci. 2025, 15(3), 1260; https://doi.org/10.3390/app15031260 - 26 Jan 2025

Viewed by 1024

Abstract

In many medical and pharmaceutical processes, continuous hygiene monitoring relies on manual detection of microorganisms in agar dishes by skilled personnel. While deep learning offers the potential for automating this task, it often faces limitations due to insufficient training data, a common issue in colony detection. To address this, we propose a simple yet efficient SAM-based pipeline for Copy-Paste data augmentation to enhance detection performance, even with limited data. This paper explores a method where annotated microbial colonies from real images were copied and pasted into empty agar dish images to create new synthetic samples. These new samples inherited the annotations of the colonies inserted into them so that no further labeling was required. The resulting synthetic datasets were used to train a YOLOv8 detection model, which was then fine-tuned on just 10 to 1000 real images. The best fine-tuned model, trained on only 1000 real images, achieved an mAP of

60.6

, while a base model trained on 5241 real images achieved 64.9. Although far fewer real images were used, the fine-tuned model performed comparably well, demonstrating the effectiveness of the SAM-based Copy-Paste augmentation. This approach matches or even exceeds the performance of the current state of the art in synthetic data generation in colony detection and can be expanded to include more microbial species and agar dishes. Full article

(This article belongs to the Special Issue Research on Machine Learning in Computer Vision)

► Show Figures

Figure 1

26 pages, 1303 KiB

Open AccessArticle

On Explainability of Reinforcement Learning-Based Machine Learning Agents Trained with Proximal Policy Optimization That Utilizes Visual Sensor Data

by Tomasz Hachaj and Marcin Piekarczyk

Appl. Sci. 2025, 15(2), 538; https://doi.org/10.3390/app15020538 - 8 Jan 2025

Cited by 1 | Viewed by 1579

Abstract

In this paper, we address the issues of the explainability of reinforcement learning-based machine learning agents trained with Proximal Policy Optimization (PPO) that utilizes visual sensor data. We propose an algorithm that allows an effective and intuitive approximation of the PPO-trained neural network (NN). We conduct several experiments to confirm our method’s effectiveness. Our proposed method works well for scenarios where semantic clustering of the scene is possible. Our approach is based on the solid theoretical foundation of Gradient-weighted Class Activation Mapping (GradCAM) and Classification and Regression Tree with additional proxy geometry heuristics. It excels in the explanation process in a virtual simulation system based on a video system with relatively low resolution. Depending on the convolutional feature extractor of the PPO-trained neural network, our method obtains 0.945 to 0.968 accuracy of approximation of the black-box model. The proposed method has important application aspects. Through its use, it is possible to estimate the causes of specific decisions made by the neural network due to the current state of the observed environment. This estimation makes it possible to determine whether the network makes decisions as expected (decision-making is related to the model’s observation of objects belonging to different semantic classes in the environment) and to detect unexpected, seemingly chaotic behavior that might be, for example, the result of data bias, bad design of the reward function or insufficient generalization abilities of the model. We publish all source codes so our experiments can be reproduced. Full article

(This article belongs to the Special Issue Research on Machine Learning in Computer Vision)

► Show Figures

Figure 1

13 pages, 1853 KiB

Open AccessArticle

Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification

by Ahmad Mouri Zadeh Khaki and Ahyoung Choi

Appl. Sci. 2025, 15(1), 422; https://doi.org/10.3390/app15010422 - 5 Jan 2025

Cited by 5 | Viewed by 3848

Abstract

Deep learning (DL) has revolutionized image classification, yet deploying convolutional neural networks (CNNs) on edge devices for real-time applications remains a significant challenge due to constraints in computation, memory, and power efficiency. This work presents an optimized implementation of VGG16 and VGG19, two widely used CNN architectures, for classifying the CIFAR-10 dataset using transfer learning on field-programmable gate arrays (FPGAs). Utilizing the Xilinx Vitis-AI and TensorFlow2 frameworks, we adapt VGG16 and VGG19 for FPGA deployment through quantization, compression, and hardware-specific optimizations. Our implementation achieves high classification accuracy, with Top-1 accuracy of 89.54% and 87.47% for VGG16 and VGG19, respectively, while delivering significant reductions in inference latency (7.29× and 6.6× compared to CPU-based alternatives). These results highlight the suitability of our approach for resource-efficient, real-time edge applications. Key contributions include a detailed methodology for combining transfer learning with FPGA acceleration, an analysis of hardware resource utilization, and performance benchmarks. This work underscores the potential of FPGA-based solutions to enable scalable, low-latency DL deployments in domains such as autonomous systems, IoT, and mobile devices. Full article

(This article belongs to the Special Issue Research on Machine Learning in Computer Vision)

► Show Figures

Figure 1

15 pages, 1426 KiB

Open AccessArticle

Attention Score Enhancement Model Through Pairwise Image Comparison

by Yeong Seok Ju, Zong Woo Geem and Joon Shik Lim

Appl. Sci. 2024, 14(21), 9928; https://doi.org/10.3390/app14219928 - 30 Oct 2024

Viewed by 1359

Abstract

This study proposes the Pairwise Attention Enhancement (PAE) model to address the limitations of the Vision Transformer (ViT). While the ViT effectively models global relationships between image patches, it encounters challenges in medical image analysis where fine-grained local features are crucial. Although the ViT excels at capturing global interactions within the entire image, it may potentially underperform due to its inadequate representation of local features such as color, texture, and edges. The proposed PAE model enhances local features by calculating cosine similarity between the attention maps of training and reference images and integrating attention maps in regions with high similarity. This approach complements the ViT’s global capture capability, allowing for a more accurate reflection of subtle visual differences. Experiments using Clock Drawing Test data demonstrated that the PAE model achieved a precision of 0.9383, recall of 0.8916, F1-Score of 0.9133, and accuracy of 92.69%, showing a 12% improvement over API-Net and a 1% improvement over the ViT. This study suggests that the PAE model can enhance performance in computer vision fields where local features are crucial by overcoming the limitations of the ViT. Full article

(This article belongs to the Special Issue Research on Machine Learning in Computer Vision)

► Show Figures

Figure 1

15 pages, 1919 KiB

Open AccessArticle

A Multimodal Recommender System Using Deep Learning Techniques Combining Review Texts and Images

by Euiju Jeong, Xinzhe Li, Angela (Eunyoung) Kwon, Seonu Park, Qinglong Li and Jaekyeong Kim

Appl. Sci. 2024, 14(20), 9206; https://doi.org/10.3390/app14209206 - 10 Oct 2024

Cited by 5 | Viewed by 4589

Abstract

Online reviews that consist of texts and images are an essential source of information for alleviating data sparsity in recommender system studies. Although texts and images provide different types of information, they can provide complementary or substitutive advantages. However, most studies are limited in introducing the complementary effect between texts and images in the recommender systems. Specifically, they have overlooked the informational value of images and proposed recommender systems solely based on textual representations. To address this research gap, this study proposes a novel recommender model that captures the dependence between texts and images. This study uses the RoBERTa and VGG-16 models to extract textual and visual information from online reviews and applies a co-attention mechanism to capture the complementarity between the two modalities. Extensive experiments were conducted using Amazon datasets, confirming the superiority of the proposed model. Our findings suggest that the complementarity of texts and images is crucial for enhancing recommendation accuracy and performance. Full article

(This article belongs to the Special Issue Research on Machine Learning in Computer Vision)

► Show Figures

Journal Menu

Journal Browser

Research on Machine Learning in Computer Vision

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (10 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI