MDPI - Publisher of Open Access Journals

34 pages, 6554 KB

Open AccessArticle

Syncretic Grad-CAM Integrated ViT-CNN Hybrids with Inherent Explainability for Early Thyroid Cancer Diagnosis from Ultrasound

by Ahmed Y. Alhafdhi, Gibrael Abosamra and Abdulrhman M. Alshareef

Diagnostics 2026, 16(7), 999; https://doi.org/10.3390/diagnostics16070999 - 26 Mar 2026

Viewed by 400

Abstract

Background/Objectives: Accurate detection of thyroid cancer using ultrasound remains a challenge, as malignant nodules can be microscopic and heterogeneous, easily confused with point clusters and borderline-featured tissues. Current studies in deep learning demonstrate good performance with convolutional neural networks (CNNs) and clustering; however, [...] Read more.

Background/Objectives: Accurate detection of thyroid cancer using ultrasound remains a challenge, as malignant nodules can be microscopic and heterogeneous, easily confused with point clusters and borderline-featured tissues. Current studies in deep learning demonstrate good performance with convolutional neural networks (CNNs) and clustering; however, many approaches focus on local tissue and provide limited, non-quantitative interpretation, reducing clinical confidence. This study proposes an integrated framework combining enhanced convolutional feature encoders (DenseNet169 and VGG19) with an enhanced vision transformer (ViT-E) to integrate local feature and global relational context during learning, rather than delayed integration. Methods: The proposed framework integrates enhanced convolutional feature encoders (DenseNet169 and VGG19) with an enhanced vision transformer (ViT-E), enabling simultaneous learning of local feature representations and global relational context. This design allows feature fusion during the learning stage instead of delayed integration, aiming to improve diagnostic performance and interpretability in thyroid ultrasound image analysis. Results: The best-performing model, ViT-E–DenseNet169, achieved 98.5% accuracy, 98.9% sensitivity, 99.15% specificity, and 97.35% AUC, surpassing the robust basic hybrid model (CNN–XGBoost/ANN) and existing systems. A second contribution is improved interpretability, moving from mere illustration to validation. Gradient-weighted class activation mapping (Grad-CAM) maps demonstrated distinct and clinically understandable concentration patterns across various thyroid cancers: precise intralesional concentration for high-confidence malignancies (PTC = 0.968), edge/interface concentration for capsule risk patterns (PTC = 0.957), and broader-field activation consistent with infiltration concerns (PTC = 0.984), while benign scans showed low and diffuse activation (PTC = 0.002). Spatial audits reinforced this behavior (IoU/PAP: 0.72/91%, 0.65/78%, 0.58/62%). Conclusions: The integrated ViT-E–DenseNet169 framework provides highly accurate thyroid cancer detection while offering clinically meaningful interpretability through Grad-CAM-based spatial validation, supporting improved confidence in AI-assisted ultrasound diagnosis. Full article

(This article belongs to the Special Issue Deep Learning Techniques for Medical Image Analysis)

► Show Figures

Figure 1

23 pages, 23944 KB

Open AccessArticle

Video SAR Enhanced Imaging Using a Self-Supervised Super-Resolution Reconstruction Network

by Xuejun Huang, Yan Zhang, Chao Zhong, Jinshan Ding and Liwu Wen

Remote Sens. 2026, 18(5), 670; https://doi.org/10.3390/rs18050670 - 24 Feb 2026

Viewed by 614

Abstract

Video synthetic aperture radar (SAR) enables observation of moving targets by leveraging temporal information across successive frames. In particular, dynamic shadows in video SAR image sequences provide critical cues for detecting moving objects whose energy is smeared or Doppler-shifted. To achieve high-resolution imaging [...] Read more.

Video synthetic aperture radar (SAR) enables observation of moving targets by leveraging temporal information across successive frames. In particular, dynamic shadows in video SAR image sequences provide critical cues for detecting moving objects whose energy is smeared or Doppler-shifted. To achieve high-resolution imaging at a high frame rate for effective dynamic scene monitoring, video SAR systems typically operate at extremely high frequencies or even in the terahertz band, rather than the microwave band. However, terahertz video SAR suffers from significant signal attenuation due to atmospheric absorption. We present a deep learning framework to achieve high-frame-rate and high-resolution imaging for microwave video SAR systems. In this framework, the problem of microwave video SAR imaging is formulated as an image super-resolution reconstruction task for low-resolution yet high-frame-rate image sequences from microwave video SAR. We develop a simple yet effective image super-resolution reconstruction network that is completely built upon convolutional neural networks. The designed network takes a low-resolution image sequence and the corresponding high-resolution image with blurred shadows as input, and then produces a high-resolution image sequence where shadows are clearly visible. Furthermore, the network is trained in a self-supervised manner and thus does not require high-resolution image sequences with unblurred shadows as ground truth, which is appealing to practical applications. Processing results of real data from two different video SAR systems have shown good performance of the proposed approach with convincing generalization ability. Full article

(This article belongs to the Special Issue Advances in Remote Sensing Video Data Processing: Theories, Technologies and Applications)

► Show Figures

Figure 1

19 pages, 13879 KB

Open AccessArticle

RGB-Based Staircase Detection for Quadrupedal Robots: Implementation and Analysis

by Piotr Wozniak, Paweł Penar and Damian Bielecki

Sensors 2025, 25(23), 7247; https://doi.org/10.3390/s25237247 - 27 Nov 2025

Viewed by 1106

Abstract

In this paper, we present a stair detection algorithm verified on various platforms, including a real quadruped robot. The proposed approach detects dangerous situations, such as proximity to stairs, in real time, enabling the robot to move safely. In our research, we utilized [...] Read more.

In this paper, we present a stair detection algorithm verified on various platforms, including a real quadruped robot. The proposed approach detects dangerous situations, such as proximity to stairs, in real time, enabling the robot to move safely. In our research, we utilized data collected by a moving four-legged robot, recorded in 18 sequences containing more than 26,000 color images from two cameras positioned at different perspectives. To address the challenge, we utilize a deep neural network with RGB inputs for object detection, complemented by preprocessing, and post-processing. A key feature of this approach is its adaptability to varying camera views, including both front and bottom perspectives on the robot, with training that incorporates multi-camera images from both views. We implemented and tested this algorithm on the Unitree Go1 robot, as well as on other embedded platforms. Using the trained YOLO version 11n network on a single sequence and testing on 17 sequences, we achieved an average mAP@50 of 51.30 for images containing only stairs and 87.56 for all images. This method enables early hazard detection during stair navigation. The proposed evaluation scenario tests the model’s adaptation from a single training sequence to multiple unseen sequences, extending existing stair detection methods for quadrupedal robots. The dataset presents high variability in stair appearance due to the robot’s perspective and limited real-time processing capacity. Full article

(This article belongs to the Special Issue Intelligent Mobile Robotics: Object Recognition, Human–Robot Interaction and Autonomous Navigation)

► Show Figures

Figure 1

37 pages, 4859 KB

Open AccessReview

Eyes of the Future: Decoding the World Through Machine Vision

by Svetlana N. Khonina, Nikolay L. Kazanskiy, Ivan V. Oseledets, Roman M. Khabibullin and Artem V. Nikonorov

Technologies 2025, 13(11), 507; https://doi.org/10.3390/technologies13110507 - 7 Nov 2025

Cited by 2 | Viewed by 6349

Abstract

Machine vision (MV) is reshaping numerous industries by giving machines the ability to understand what they “see” and respond without human intervention. This review brings together the latest developments in deep learning (DL), image processing, and computer vision (CV). It focuses on how [...] Read more.

Machine vision (MV) is reshaping numerous industries by giving machines the ability to understand what they “see” and respond without human intervention. This review brings together the latest developments in deep learning (DL), image processing, and computer vision (CV). It focuses on how these technologies are being applied in real operational environments. We examine core methodologies such as feature extraction, object detection, image segmentation, and pattern recognition. These techniques are accelerating innovation in key sectors, including healthcare, manufacturing, autonomous systems, and security. A major emphasis is placed on the deepening integration of artificial intelligence (AI) and machine learning (ML) into MV. We particularly consider the impact of convolutional neural networks (CNNs), generative adversarial networks (GANs), and transformer architectures on the evolution of visual recognition capabilities. Beyond surveying advances, this review also takes a hard look at the field’s persistent roadblocks, above all the scarcity of high-quality labeled data, the heavy computational load of modern models, and the unforgiving time limits imposed by real-time vision applications. In response to these challenges, we examine a range of emerging fixes: leaner algorithms, purpose-built hardware (like vision processing units and neuromorphic chips), and smarter ways to label or synthesize data that sidestep the need for massive manual operations. What distinguishes this paper, however, is its emphasis on where MV is headed next. We spotlight nascent directions, including edge-based processing that moves intelligence closer to the sensor, early explorations of quantum methods for visual tasks, and hybrid AI systems that fuse symbolic reasoning with DL, not as speculative futures but as tangible pathways already taking shape. Ultimately, the goal is to connect cutting-edge research with actual deployment scenarios, offering a grounded, actionable guide for those working at the front lines of MV today. Full article

(This article belongs to the Section Information and Communication Technologies)

► Show Figures

Figure 1

24 pages, 6113 KB

Open AccessArticle

Vision-Based Reinforcement Learning for Robotic Grasping of Moving Objects on a Conveyor

by Yin Cao, Xuemei Xu and Yazheng Zhang

Machines 2025, 13(10), 973; https://doi.org/10.3390/machines13100973 - 21 Oct 2025

Cited by 1 | Viewed by 3287

Abstract

This study introduces an autonomous framework for grasping moving objects on a conveyor belt, enabling unsupervised detection, grasping, and categorization. The work focuses on two common object shapes—cylindrical cans and rectangular cartons—transported at a constant speed of 3–7 cm/s on the conveyor, emulating [...] Read more.

This study introduces an autonomous framework for grasping moving objects on a conveyor belt, enabling unsupervised detection, grasping, and categorization. The work focuses on two common object shapes—cylindrical cans and rectangular cartons—transported at a constant speed of 3–7 cm/s on the conveyor, emulating typical scenarios. The proposed framework combines a vision-based neural network for object detection, a target localization algorithm, and a deep reinforcement learning model for robotic control. Specifically, a YOLO-based neural network was employed to detect the 2D position of target objects. These positions are then converted to 3D coordinates, followed by pose estimation and error correction. A Proximal Policy Optimization (PPO) algorithm was then used to provide continuous control decisions for the robotic arm. A tailored reinforcement learning environment was developed using the Gymnasium interface. Training and validation were conducted on a 7-degree-of-freedom (7-DOF) robotic arm model in the PyBullet physics simulation engine. By leveraging transfer learning and curriculum learning strategies, the robotic agent effectively learned to grasp multiple categories of moving objects. Simulation experiments and randomized trials show that the proposed method enables the 7-DOF robotic arm to consistently grasp conveyor belt objects, achieving an approximately 80% success rate at conveyor speeds of 0.03–0.07 m/s. These results demonstrate the potential of the framework for deployment in automated handling applications. Full article

(This article belongs to the Special Issue AI-Integrated Advanced Robotics Towards Industry 5.0)

► Show Figures

Figure 1

26 pages, 4049 KB

Open AccessArticle

A Versatile UAS Development Platform Able to Support a Novel Tracking Algorithm in Real-Time

by Dan-Marius Dobrea and Matei-Ștefan Dobrea

Aerospace 2025, 12(8), 649; https://doi.org/10.3390/aerospace12080649 - 22 Jul 2025

Viewed by 1774

Abstract

A primary objective of this research entails the development of an innovative algorithm capable of tracking a drone in real-time. This objective serves as a fundamental requirement across various applications, including collision avoidance, formation flying, and the interception of moving targets. Nonetheless, regardless [...] Read more.

A primary objective of this research entails the development of an innovative algorithm capable of tracking a drone in real-time. This objective serves as a fundamental requirement across various applications, including collision avoidance, formation flying, and the interception of moving targets. Nonetheless, regardless of the efficacy of any detection algorithm, achieving 100% performance remains unattainable. Deep neural networks (DNNs) were employed to enhance this performance. To facilitate real-time operation, the DNN must be executed within a Deep Learning Processing Unit (DPU), Neural Processing Unit (NPU), Tensor Processing Unit (TPU), or Graphics Processing Unit (GPU) system on board the UAV. Given the constraints of these processing units, it may be necessary to quantify the DNN or utilize a less complex variant, resulting in an additional reduction in performance. However, precise target detection at each control step is imperative for effective flight path control. By integrating multiple algorithms, the developed system can effectively track UAVs with improved detection performance. Furthermore, this paper aims to establish a versatile Unmanned Aerial System (UAS) development platform constructed using open-source components and possessing the capability to adapt and evolve seamlessly throughout the development and post-production phases. Full article

(This article belongs to the Section Aeronautics)

► Show Figures

Figure 1

21 pages, 7041 KB

Open AccessArticle

Synergy of Internet of Things and Software Engineering Approach for Enhanced Copy–Move Image Forgery Detection Model

by Mohammed Assiri

Electronics 2025, 14(4), 692; https://doi.org/10.3390/electronics14040692 - 11 Feb 2025

Cited by 2 | Viewed by 1355

Abstract

The fast development of digital images and the improvement required for security measures have recently increased the demand for innovative image analysis methods. Image analysis identifies, classifies, and monitors people, events, or objects in images or videos. Image analysis significantly improves security by [...] Read more.

The fast development of digital images and the improvement required for security measures have recently increased the demand for innovative image analysis methods. Image analysis identifies, classifies, and monitors people, events, or objects in images or videos. Image analysis significantly improves security by identifying and preventing attacks on security applications through digital images. It is crucial in diverse security fields, comprising video analysis, anomaly detection, biometrics, object recognition, surveillance, and forensic investigations. By integrating advanced software engineering models with IoT capabilities, this technique revolutionizes copy–move image forgery detection. IoT devices collect and transmit real-world data, improving software solutions to detect and analyze image tampering with exceptional accuracy and efficiency. This combination enhances detection abilities and provides scalable and adaptive solutions to reduce cutting-edge forgery models. Copy–move forgery detection (CMFD) has become possibly a major active research domain in the blind image forensics area. Between existing approaches, most of them are dependent upon block and key-point methods or integration of them. A few deep convolutional neural networks (DCNN) techniques have been implemented in image hashing, image forensics, image retrieval, image classification, etc., that have performed better than the conventional methods. To accomplish robust CMFD, this study develops a fusion of soft computing with a deep learning-based CMFD approach (FSCDL-CMFDA) to secure digital images. The FSCDL-CMFDA approach aims to integrate the benefits of metaheuristics with the DL model for an enhanced CMFD process. In the FSCDL-CMFDA method, histogram equalization is initially performed to improve the image quality. Furthermore, the Siamese convolutional neural network (SCNN) model is used to learn complex features from pre-processed images. Its hyperparameters are chosen by the golden jackal optimization (GJO) model. For the CMFD process, the FSCDL-CMFDA technique employs the regularized extreme learning machine (RELM) classifier. Finally, the detection performance of the RELM method is improved by the beluga whale optimization (BWO) technique. To demonstrate the enhanced performance of the FSCDL-CMFDA method, a comprehensive outcome analysis is conducted using the MNIST and CIFAR datasets. The experimental validation of the FSCDL-CMFDA method portrayed a superior accuracy value of 98.12% over existing models. Full article

(This article belongs to the Special Issue Signal and Image Processing Applications in Artificial Intelligence)

► Show Figures

Figure 1

18 pages, 13936 KB

Open AccessArticle

Method for Preprocessing Video Data for Training Deep-Learning Models for Identifying Behavioral Events in Bio-Objects

by Marina Barulina, Alexander Andreev, Ilya Kovalenko, Ilya Barmin, Eduard Titov and Danil Kirillov

Mathematics 2024, 12(24), 3978; https://doi.org/10.3390/math12243978 - 18 Dec 2024

Cited by 3 | Viewed by 4117

Abstract

Monitoring moving bio-objects is currently of great interest for both fundamental and practical research. The advent of deep-learning algorithms has made it possible to automate the qualitative and quantitative analysis of the behavior of bio-objects recorded in video format. When processing such data, [...] Read more.

Monitoring moving bio-objects is currently of great interest for both fundamental and practical research. The advent of deep-learning algorithms has made it possible to automate the qualitative and quantitative analysis of the behavior of bio-objects recorded in video format. When processing such data, it is necessary to consider additional factors, such as background noise in the frame, the speed of the bio-object, and the need to reflect information about the previous (past) and subsequent (future) pose of the bio-object in one video frame. The preprocessed dataset must be suitable for verification by experts. This article proposes a method for preprocessing data to identify the behavior of a bio-object, a clear example of which is experiments on laboratory animals with the collection of video data. The method is based on combining information about a behavioral event presented in a sequence of frames with the addition of a native image and subsequent boundary detection using the Sobel filter. The resulting representation of a behavioral event is easily perceived by both human experts and neural networks of various architectures. The article presents the results of training several neural networks on the obtained dataset and proposes an effective neural network architecture (F1-score = 0.95) for identifying discrete events of biological objects’ behavior. Full article

(This article belongs to the Special Issue Artificial Intelligence for Biomedical Applications)

► Show Figures

Figure 1

26 pages, 6549 KB

Open AccessArticle

Reinforcement-Learning-Based Multi-UAV Cooperative Search for Moving Targets in 3D Scenarios

by Yifei Liu, Xiaoshuai Li, Jian Wang, Feiyu Wei and Junan Yang

Drones 2024, 8(8), 378; https://doi.org/10.3390/drones8080378 - 6 Aug 2024

Cited by 30 | Viewed by 9078

Abstract

Most existing multi-UAV collaborative search methods only consider scenarios of two-dimensional path planning or static target search. To be close to the practical scenario, this paper proposes a path planning method based on an action-mask-based multi-agent proximal policy optimization (AM-MAPPO) algorithm for multiple [...] Read more.

Most existing multi-UAV collaborative search methods only consider scenarios of two-dimensional path planning or static target search. To be close to the practical scenario, this paper proposes a path planning method based on an action-mask-based multi-agent proximal policy optimization (AM-MAPPO) algorithm for multiple UAVs searching for moving targets in three-dimensional (3D) environments. In particular, a multi-UAV high–low altitude collaborative search architecture is introduced that not only takes into account the extensive detection range of high-altitude UAVs but also leverages the benefit of the superior detection quality of low-altitude UAVs. The optimization objective of the search task is to minimize the uncertainty of the search area while maximizing the number of captured moving targets. The path planning problem for moving target search in a 3D environment is formulated and addressed using the AM-MAPPO algorithm. The proposed method incorporates a state representation mechanism based on field-of-view encoding to handle dynamic changes in neural network input dimensions and develops a rule-based target capture mechanism and an action-mask-based collision avoidance mechanism to enhance the AM-MAPPO algorithm’s convergence speed. Experimental results demonstrate that the proposed algorithm significantly reduces regional uncertainty and increases the number of captured moving targets compared to other deep reinforcement learning methods. Ablation studies further indicate that the proposed action mask mechanism, target capture mechanism, and collision avoidance mechanism of the AM-MAPPO algorithm can improve the algorithm’s effectiveness, target capture capability, and UAVs’ safety, respectively. Full article

► Show Figures

Figure 1

15 pages, 10265 KB

Open AccessArticle

Multi-Directional Long-Term Recurrent Convolutional Network for Road Situation Recognition

by Cyreneo Dofitas, Joon-Min Gil and Yung-Cheol Byun

Sensors 2024, 24(14), 4618; https://doi.org/10.3390/s24144618 - 17 Jul 2024

Cited by 7 | Viewed by 2236

Abstract

Understanding road conditions is essential for implementing effective road safety measures and driving solutions. Road situations encompass the day-to-day conditions of roads, including the presence of vehicles and pedestrians. Surveillance cameras strategically placed along streets have been instrumental in monitoring road situations and [...] Read more.

Understanding road conditions is essential for implementing effective road safety measures and driving solutions. Road situations encompass the day-to-day conditions of roads, including the presence of vehicles and pedestrians. Surveillance cameras strategically placed along streets have been instrumental in monitoring road situations and providing valuable information on pedestrians, moving vehicles, and objects within road environments. However, these video data and information are stored in large volumes, making analysis tedious and time-consuming. Deep learning models are increasingly utilized to monitor vehicles and identify and evaluate road and driving comfort situations. However, the current neural network model requires the recognition of situations using time-series video data. In this paper, we introduced a multi-directional detection model for road situations to uphold high accuracy. Deep learning methods often integrate long short-term memory (LSTM) into long-term recurrent network architectures. This approach effectively combines recurrent neural networks to capture temporal dependencies and convolutional neural networks (CNNs) to extract features from extensive video data. In our proposed method, we form a multi-directional long-term recurrent convolutional network approach with two groups equipped with CNN and two layers of LSTM. Additionally, we compare road situation recognition using convolutional neural networks, long short-term networks, and long-term recurrent convolutional networks. The paper presents a method for detecting and recognizing multi-directional road contexts using a modified LRCN. After balancing the dataset through data augmentation, the number of video files increased, resulting in our model achieving 91% accuracy, a significant improvement from the original dataset. Full article

(This article belongs to the Special Issue Machine Learning and Deep Learning in Image/Video Processing and Sensing)

► Show Figures

Figure 1

19 pages, 2999 KB

Open AccessArticle

Novel Deep Learning Domain Adaptation Approach for Object Detection Using Semi-Self Building Dataset and Modified YOLOv4

by Ahmed Gomaa and Ahmad Abdalrazik

World Electr. Veh. J. 2024, 15(6), 255; https://doi.org/10.3390/wevj15060255 - 12 Jun 2024

Cited by 107 | Viewed by 5829

Abstract

Moving object detection is a vital research area that plays an essential role in intelligent transportation systems (ITSs) and various applications in computer vision. Recently, researchers have utilized convolutional neural networks (CNNs) to develop new techniques in object detection and recognition. However, with [...] Read more.

Moving object detection is a vital research area that plays an essential role in intelligent transportation systems (ITSs) and various applications in computer vision. Recently, researchers have utilized convolutional neural networks (CNNs) to develop new techniques in object detection and recognition. However, with the increasing number of machine learning strategies used for object detection, there has been a growing need for large datasets with accurate ground truth used for the training, usually demanding their manual labeling. Moreover, most of these deep strategies are supervised and only applicable for specific scenes with large computational resources needed. Alternatively, other object detection techniques such as classical background subtraction need low computational resources and can be used with general scenes. In this paper, we propose a new a reliable semi-automatic method that combines a modified version of the detection-based CNN You Only Look Once V4 (YOLOv4) technique and background subtraction technique to perform an unsupervised object detection for surveillance videos. In this proposed strategy, background subtraction-based low-rank decomposition is applied firstly to extract the moving objects. Then, a clustering method is adopted to refine the background subtraction (BS) result. Finally, the refined results are used to fine-tune the modified YOLO v4 before using it in the detection and classification of objects. The main contribution of this work is a new detection framework that overcomes manual labeling and creates an automatic labeler that can replace manual labeling using motion information to supply labeled training data (background and foreground) directly from the detection video. Extensive experiments using real-world object monitoring benchmarks indicate that the suggested framework obtains a considerable increase in mAP compared to state-of-the-art results on both the CDnet 2014 and UA-DETRAC datasets. Full article

(This article belongs to the Special Issue Electric Vehicle Autonomous Driving Based on Image Recognition)

► Show Figures

Figure 1

27 pages, 5655 KB

Open AccessArticle

MCSNet+: Enhanced Convolutional Neural Network for Detection and Classification of Tribolium and Sitophilus Sibling Species in Actual Wheat Storage Environments

by Haiying Yang, Yanyu Li, Liyong Xin, Shyh Wei Teng, Shaoning Pang, Huiyi Zhao, Yang Cao and Xiaoguang Zhou

Foods 2023, 12(19), 3653; https://doi.org/10.3390/foods12193653 - 3 Oct 2023

Cited by 6 | Viewed by 3189

Abstract

Insect pests like Tribolium and Sitophilus siblings are major threats to grain storage and processing, causing quality and quantity losses that endanger food security. These closely related species, having very similar morphological and biological characteristics, often exhibit variations in biology and pesticide resistance, [...] Read more.

Insect pests like Tribolium and Sitophilus siblings are major threats to grain storage and processing, causing quality and quantity losses that endanger food security. These closely related species, having very similar morphological and biological characteristics, often exhibit variations in biology and pesticide resistance, complicating control efforts. Accurate pest species identification is essential for effective control, but workplace safety in the grain bin associated with grain deterioration, clumping, fumigator hazards, and air quality create challenges. Therefore, there is a pressing need for an online automated detection system. In this work, we enriched the stored-grain pest sibling image dataset, which includes 25,032 annotated Tribolium samples of two species and five geographical strains from real warehouse and another 1774 from the lab. As previously demonstrated on the Sitophilus family, Convolutional Neural Networks demonstrate distinct advantages over other model architectures in detecting Tribolium. Our CNN model, MCSNet+, integrates Soft-NMS for better recall in dense object detection, a Position-Sensitive Prediction Model to handle translation issues, and anchor parameter fine-tuning for improved matching and speed. This approach significantly enhances mean Average Precision (mAP) for Sitophilus and Tribolium, reaching a minimum of 92.67 ± 1.74% and 94.27 ± 1.02%, respectively. Moreover, MCSNet+ exhibits significant improvements in prediction speed, advancing from 0.055 s/img to 0.133 s/img, and elevates the recognition rates of moving insect sibling species in real wheat storage and visible light, rising from 2.32% to 2.53%. The detection performance of the model on laboratory-captured images surpasses that of real storage facilities, with better results for Tribolium compared to Sitophilus. Although inter-strain variances are less pronounced, the model achieves acceptable detection results across different Tribolium geographical strains, with a minimum recognition rate of 82.64 ± 1.27%. In real-time monitoring videos of grain storage facilities with wheat backgrounds, the enhanced deep learning model based on Convolutional Neural Networks successfully detects and identifies closely related stored-grain pest images. This achievement provides a viable solution for establishing an online pest management system in real storage facilities. Full article

(This article belongs to the Section Food Analytical Methods)

► Show Figures

Figure 1

17 pages, 4724 KB

Open AccessArticle

Multi-Object Detection and Tracking Using Reptile Search Optimization Algorithm with Deep Learning

by Ramachandran Alagarsamy and Dhamodaran Muneeswaran

Symmetry 2023, 15(6), 1194; https://doi.org/10.3390/sym15061194 - 2 Jun 2023

Cited by 5 | Viewed by 2913

Abstract

Multiple-Object Tracking (MOT) has become more popular because of its commercial and academic potential. Though various techniques were devised for managing this issue, it becomes a challenge because of factors such as severe object occlusions and abrupt appearance changes. Tracking presents the optimal [...] Read more.

Multiple-Object Tracking (MOT) has become more popular because of its commercial and academic potential. Though various techniques were devised for managing this issue, it becomes a challenge because of factors such as severe object occlusions and abrupt appearance changes. Tracking presents the optimal outcomes whenever the object moves uniformly without occlusion and in the same direction. However, this is generally not a real scenario, particularly in complicated scenes such as dance events or sporting where a greater number of players are tracked, moving quickly, varying their speed and direction, along with distance and position from the camera and activity they are executing. In dynamic scenes, MOT remains the main difficulty due to the symmetrical shape, structure, and size of the objects. Therefore, this study develops a new reptile search optimization algorithm with deep learning-based multiple object detection and tracking (RSOADL–MODT) techniques. The presented RSOADL–MODT model intends to recognize and track the objects that exist with position estimation, tracking, and action recognition. It follows a series of processes, namely object detection, object classification, and object tracking. At the initial stage, the presented RSOADL–MODT technique applies a path-augmented RetinaNet-based (PA–RetinaNet) object detection module, which improves the feature extraction process. To improvise the network potentiality of the PA–RetinaNet method, the RSOA is utilized as a hyperparameter optimizer. Finally, the quasi-recurrent neural network (QRNN) classifier is exploited for classification procedures. A wide-ranging experimental validation process takes place on DanceTrack and MOT17 datasets for examining the effectual object detection outcomes of the RSOADL–MODT algorithm. The simulation values confirmed the enhancements of the RSOADL–MODT method over other DL approaches. Full article

► Show Figures

Figure 1

18 pages, 2743 KB

Open AccessArticle

Analysis of Movement and Activities of Handball Players Using Deep Neural Networks

by Kristina Host, Miran Pobar and Marina Ivasic-Kos

J. Imaging 2023, 9(4), 80; https://doi.org/10.3390/jimaging9040080 - 13 Apr 2023

Cited by 27 | Viewed by 8429

Abstract

This paper focuses on image and video content analysis of handball scenes and applying deep learning methods for detecting and tracking the players and recognizing their activities. Handball is a team sport of two teams played indoors with the ball with well-defined goals [...] Read more.

This paper focuses on image and video content analysis of handball scenes and applying deep learning methods for detecting and tracking the players and recognizing their activities. Handball is a team sport of two teams played indoors with the ball with well-defined goals and rules. The game is dynamic, with fourteen players moving quickly throughout the field in different directions, changing positions and roles from defensive to offensive, and performing different techniques and actions. Such dynamic team sports present challenging and demanding scenarios for both the object detector and the tracking algorithms and other computer vision tasks, such as action recognition and localization, with much room for improvement of existing algorithms. The aim of the paper is to explore the computer vision-based solutions for recognizing player actions that can be applied in unconstrained handball scenes with no additional sensors and with modest requirements, allowing a broader adoption of computer vision applications in both professional and amateur settings. This paper presents semi-manual creation of custom handball action dataset based on automatic player detection and tracking, and models for handball action recognition and localization using Inflated 3D Networks (I3D). For the task of player and ball detection, different configurations of You Only Look Once (YOLO) and Mask Region-Based Convolutional Neural Network (Mask R-CNN) models fine-tuned on custom handball datasets are compared to original YOLOv7 model to select the best detector that will be used for tracking-by-detection algorithms. For the player tracking, DeepSORT and Bag of tricks for SORT (BoT SORT) algorithms with Mask R-CNN and YOLO detectors were tested and compared. For the task of action recognition, I3D multi-class model and ensemble of binary I3D models are trained with different input frame lengths and frame selection strategies, and the best solution is proposed for handball action recognition. The obtained action recognition models perform well on the test set with nine handball action classes, with average F1 measures of 0.69 and 0.75 for ensemble and multi-class classifiers, respectively. They can be used to index handball videos to facilitate retrieval automatically. Finally, some open issues, challenges in applying deep learning methods in such a dynamic sports environment, and direction for future development will be discussed. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

21 pages, 13877 KB

Open AccessEditor’s ChoiceArticle

Recognition and Counting of Apples in a Dynamic State Using a 3D Camera and Deep Learning Algorithms for Robotic Harvesting Systems

by R. M. Rasika D. Abeyrathna, Victor Massaki Nakaguchi, Arkar Minn and Tofael Ahamed

Sensors 2023, 23(8), 3810; https://doi.org/10.3390/s23083810 - 7 Apr 2023

Cited by 48 | Viewed by 8049

Abstract

Recognition and 3D positional estimation of apples during harvesting from a robotic platform in a moving vehicle are still challenging. Fruit clusters, branches, foliage, low resolution, and different illuminations are unavoidable and cause errors in different environmental conditions. Therefore, this research aimed to [...] Read more.

Recognition and 3D positional estimation of apples during harvesting from a robotic platform in a moving vehicle are still challenging. Fruit clusters, branches, foliage, low resolution, and different illuminations are unavoidable and cause errors in different environmental conditions. Therefore, this research aimed to develop a recognition system based on training datasets from an augmented, complex apple orchard. The recognition system was evaluated using deep learning algorithms established from a convolutional neural network (CNN). The dynamic accuracy of the modern artificial neural networks involving 3D coordinates for deploying robotic arms at different forward-moving speeds from an experimental vehicle was investigated to compare the recognition and tracking localization accuracy. In this study, a Realsense D455 RGB-D camera was selected to acquire 3D coordinates of each detected and counted apple attached to artificial trees placed in the field to propose a specially designed structure for ease of robotic harvesting. A 3D camera, YOLO (You Only Look Once), YOLOv4, YOLOv5, YOLOv7, and EfficienDet state-of-the-art models were utilized for object detection. The Deep SORT algorithm was employed for tracking and counting detected apples using perpendicular, 15°, and 30° orientations. The 3D coordinates were obtained for each tracked apple when the on-board camera in the vehicle passed the reference line and was set in the middle of the image frame. To optimize harvesting at three different speeds (0.052 ms⁻¹, 0.069 ms⁻¹, and 0.098 ms⁻¹), the accuracy of 3D coordinates was compared for three forward-moving speeds and three camera angles (15°, 30°, and 90°). The mean average precision (mAP@0.5) values of YOLOv4, YOLOv5, YOLOv7, and EfficientDet were 0.84, 0.86, 0.905, and 0.775, respectively. The lowest root mean square error (RMSE) was 1.54 cm for the apples detected by EfficientDet at a 15° orientation and a speed of 0.098 ms⁻¹. In terms of counting apples, YOLOv5 and YOLOv7 showed a higher number of detections in outdoor dynamic conditions, achieving a counting accuracy of 86.6%. We concluded that the EfficientDet deep learning algorithm at a 15° orientation in 3D coordinates can be employed for further robotic arm development while harvesting apples in a specially designed orchard. Full article

(This article belongs to the Special Issue 3D Reconstruction with RGB-D Cameras and Multi-sensors)

► Show Figures

Figure 1

Search Results (43)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (43)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI