MDPI - Publisher of Open Access Journals

16 pages, 6883 KiB

Open AccessArticle

Integrated AI System for Real-Time Sports Broadcasting: Player Behavior, Game Event Recognition, and Generative AI Commentary in Basketball Games

by Sunghoon Jung, Hanmoe Kim, Hyunseo Park and Ahyoung Choi

Appl. Sci. 2025, 15(3), 1543; https://doi.org/10.3390/app15031543 - 3 Feb 2025

Viewed by 4747

Abstract

This study presents an AI-based sports broadcasting system capable of real-time game analysis and automated commentary. The model first acquires essential background knowledge, including the court layout, game rules, team information, and player details. YOLO model-based segmentation is applied for a local camera [...] Read more.

This study presents an AI-based sports broadcasting system capable of real-time game analysis and automated commentary. The model first acquires essential background knowledge, including the court layout, game rules, team information, and player details. YOLO model-based segmentation is applied for a local camera view to enhance court recognition accuracy. Player’s actions and ball tracking is performed through YOLO algorithms. In each frame, the YOLO detection model is used to detect the bounding boxes of the players. Then, we proposed our tracking algorithm, which computed the IoU from previous frames and linked together to track the movement paths of the players. Player behavior is achieved via the R(2+1)D action recognition model including player actions such as running, dribbling, shooting, and blocking. The system demonstrates high performance, achieving an average accuracy of 97% in court calibration, 92.5% in player and object detection, and 85.04% in action recognition. Key game events are identified based on positional and action data, with broadcast lines generated using GPT APIs and converted to natural audio commentary via Text-to-Speech (TTS). This system offers a comprehensive framework for automating sports broadcasting with advanced AI techniques. Full article

(This article belongs to the Special Issue Research on Machine Learning in Computer Vision)

► Show Figures

Figure 1

28 pages, 6569 KiB

Open AccessArticle

A New Efficient Hybrid Technique for Human Action Recognition Using 2D Conv-RBM and LSTM with Optimized Frame Selection

by Majid Joudaki, Mehdi Imani and Hamid R. Arabnia

Technologies 2025, 13(2), 53; https://doi.org/10.3390/technologies13020053 - 1 Feb 2025

Cited by 3 | Viewed by 2495

Abstract

Recognizing human actions through video analysis has gained significant attention in applications like surveillance, sports analytics, and human–computer interaction. While deep learning models such as 3D convolutional neural networks (CNNs) and recurrent neural networks (RNNs) deliver promising results, they often struggle with computational [...] Read more.

Recognizing human actions through video analysis has gained significant attention in applications like surveillance, sports analytics, and human–computer interaction. While deep learning models such as 3D convolutional neural networks (CNNs) and recurrent neural networks (RNNs) deliver promising results, they often struggle with computational inefficiencies and inadequate spatial–temporal feature extraction, hindering scalability to larger datasets or high-resolution videos. To address these limitations, we propose a novel model combining a two-dimensional convolutional restricted Boltzmann machine (2D Conv-RBM) with a long short-term memory (LSTM) network. The 2D Conv-RBM efficiently extracts spatial features such as edges, textures, and motion patterns while preserving spatial relationships and reducing parameters via weight sharing. These features are subsequently processed by the LSTM to capture temporal dependencies across frames, enabling effective recognition of both short- and long-term action patterns. Additionally, a smart frame selection mechanism minimizes frame redundancy, significantly lowering computational costs without compromising accuracy. Evaluation on the KTH, UCF Sports, and HMDB51 datasets demonstrated superior performance, achieving accuracies of 97.3%, 94.8%, and 81.5%, respectively. Compared to traditional approaches like 2D RBM and 3D CNN, our method offers notable improvements in both accuracy and computational efficiency, presenting a scalable solution for real-time applications in surveillance, video security, and sports analytics. Full article

(This article belongs to the Special Issue Data Science and Big Data in Biology, Physical Science and Engineering—2nd Edition)

► Show Figures

Figure 1

24 pages, 2096 KiB

Open AccessArticle

Human Activity Recognition Using Graph Structures and Deep Neural Networks

by Abed Al Raoof K. Bsoul

Computers 2025, 14(1), 9; https://doi.org/10.3390/computers14010009 - 30 Dec 2024

Cited by 2 | Viewed by 1349

Abstract

Human activity recognition (HAR) systems are essential in healthcare, surveillance, and sports analytics, enabling automated movement analysis. This research presents a novel HAR system combining graph structures with deep neural networks to capture both spatial and temporal patterns in activities. While CNN-based models [...] Read more.

Human activity recognition (HAR) systems are essential in healthcare, surveillance, and sports analytics, enabling automated movement analysis. This research presents a novel HAR system combining graph structures with deep neural networks to capture both spatial and temporal patterns in activities. While CNN-based models excel at spatial feature extraction, they struggle with temporal dynamics, limiting their ability to classify complex actions. To address this, we applied the Firefly Optimization Algorithm to fine-tune the hyperparameters of both the graph-based model and a CNN baseline for comparison. The optimized graph-based system, evaluated on the UCF101 and Kinetics-400 datasets, achieved 88.9% accuracy with balanced precision, recall, and F1-scores, outperforming the baseline. It demonstrated robustness across diverse activities, including sports, household routines, and musical performances. This study highlights the potential of graph-based HAR systems for real-world applications, with future work focused on multi-modal data integration and improved handling of occlusions to enhance adaptability and performance. Full article

(This article belongs to the Special Issue Deep Learning and Machine Learning for Image Processing: Algorithms and Applications)

► Show Figures

Figure 1

16 pages, 5772 KiB

Open AccessArticle

Optimizing Football Formation Analysis via LSTM-Based Event Detection

by Benjamin Orr, Ephraim Pan and Dah-Jye Lee

Electronics 2024, 13(20), 4105; https://doi.org/10.3390/electronics13204105 - 18 Oct 2024

Cited by 2 | Viewed by 2043

Abstract

The process of manually annotating sports footage is a demanding one. In American football alone, coaches spend thousands of hours reviewing and analyzing videos each season. We aim to automate this process by developing a system that generates comprehensive statistical reports from full-length [...] Read more.

The process of manually annotating sports footage is a demanding one. In American football alone, coaches spend thousands of hours reviewing and analyzing videos each season. We aim to automate this process by developing a system that generates comprehensive statistical reports from full-length football game videos. Having previously demonstrated the proof of concept for our system, here, we present optimizations to our preprocessing techniques along with an inventive method for multi-person event detection in sports videos. Employing a long short-term memory (LSTM)-based architecture to detect the snap in American football, we achieve an outstanding LSI (Levenshtein similarity index) of 0.9445, suggesting a normalized difference of less than 0.06 between predictions and ground truth labels. We also illustrate the utility of snap detection as a means of identifying the offensive players’ assuming of formation. Our results exhibit not only the success of our unique approach and underlying optimizations but also the potential for continued robustness as we pursue the development of our remaining system components. Full article

(This article belongs to the Special Issue Deep Learning for Computer Vision Application)

► Show Figures

Graphical abstract

16 pages, 3440 KiB

Open AccessArticle

Towards Automatic Object Detection and Activity Recognition in Indoor Climbing

by Hana Vrzáková, Jani Koskinen, Sami Andberg, Ahreum Lee and Mary Jean Amon

Sensors 2024, 24(19), 6479; https://doi.org/10.3390/s24196479 - 8 Oct 2024

Cited by 2 | Viewed by 2082

Abstract

Rock climbing has propelled from niche sport to mainstream free-time activity and Olympic sport. Moreover, climbing can be studied as an example of a high-stakes perception-action task. However, understanding what constitutes an expert climber is not simple or straightforward. As a dynamic and [...] Read more.

Rock climbing has propelled from niche sport to mainstream free-time activity and Olympic sport. Moreover, climbing can be studied as an example of a high-stakes perception-action task. However, understanding what constitutes an expert climber is not simple or straightforward. As a dynamic and high-risk activity, climbing requires a precise interplay between cognition, perception, and precise action execution. While prior research has predominantly focused on the movement aspect of climbing (i.e., skeletal posture and individual limb movements), recent studies have also examined the climber’s visual attention and its links to their performance. To associate the climber’s attention with their actions, however, has traditionally required frame-by-frame manual coding of the recorded eye-tracking videos. To overcome this challenge and automatically contextualize the analysis of eye movements in indoor climbing, we present deep learning-driven (YOLOv5) hold detection that facilitates automatic grasp recognition. To demonstrate the framework, we examined the expert climber’s eye movements and egocentric perspective acquired from eye-tracking glasses (SMI and Tobii Glasses 2). Using the framework, we observed that the expert climber’s grasping duration was positively correlated with total fixation duration (r = 0.807) and fixation count (r = 0.864); however, it was negatively correlated with the fixation rate (r = −0.402) and saccade rate (r = −0.344). The findings indicate the moments of cognitive processing and visual search that occurred during decision making and route prospecting. Our work contributes to research on eye–body performance and coordination in high-stakes contexts, and informs the sport science and expands the applications, e.g., in training optimization, injury prevention, and coaching. Full article

(This article belongs to the Special Issue Sports Sensors for Athlete Motion Tracking and Physiological Monitoring)

► Show Figures

Figure 1

14 pages, 4122 KiB

Open AccessArticle

A Smart Ski Pole for Skiing Pattern Recognition and Quantification Application

by Yangyanhao Guo, Renjie Ju, Kunru Li, Zhiqiang Lan, Lixin Niu, Xiaojuan Hou, Shuo Qian, Wei Chen, Xinyu Liu, Gang Li, Jian He and Xiujian Chou

Sensors 2024, 24(16), 5291; https://doi.org/10.3390/s24165291 - 15 Aug 2024

Cited by 3 | Viewed by 2008

Abstract

In cross-country skiing, ski poles play a crucial role in technique, propulsion, and overall performance. The kinematic parameters of ski poles can provide valuable information about the skier’s technique, which is of great significance for coaches and athletes seeking to improve their skiing [...] Read more.

In cross-country skiing, ski poles play a crucial role in technique, propulsion, and overall performance. The kinematic parameters of ski poles can provide valuable information about the skier’s technique, which is of great significance for coaches and athletes seeking to improve their skiing performance. In this work, a new smart ski pole is proposed, which combines the uniaxial load cell and the inertial measurement unit (IMU), aiming to provide comprehensive data measurement functions more easily and to play an auxiliary role in training. The ski pole can collect data directly related to skiing technical actions, such as the skier’s pole force, pole angle, inertia data, etc., and the system’s design, based on wireless transmission, makes the system more convenient to provide comprehensive data acquisition functions, in order to achieve a more simple and efficient use experience. In this experiment, the characteristic data obtained from the ski poles during the Double Poling of three skiers were extracted and the sample t-test was conducted. The results showed that the three skiers had significant differences in pole force, pole angle, and pole time. Spearman correlation analysis was used to analyze the sports data of the people with good performance, and the results showed that the pole force and speed (r = 0.71) and pole support angle (r = 0.76) were significantly correlated. In addition, this study adopted the commonly used inertial sensor data for action recognition, combined with the load cell data as the input of the ski technical action recognition algorithm, and the recognition accuracy of five kinds of cross-country skiing technical actions (Diagonal Stride (DS), Double Poling (DP), Kick Double Poling (KDP), Two-stroke Glide (G2) and Five-stroke Glide (G5)) reached 99.5%, and the accuracy was significantly improved compared with similar recognition systems. Therefore, the equipment is expected to be a valuable training tool for coaches and athletes, helping them to better understand and improve their ski maneuver technique. Full article

(This article belongs to the Special Issue Sensors for Human Posture and Movement)

► Show Figures

Figure 1

18 pages, 10031 KiB

Open AccessArticle

Action Recognition of Taekwondo Unit Actions Using Action Images Constructed with Time-Warped Motion Profiles

by Junghwan Lim, Chenglong Luo, Seunghun Lee, Young Eun Song and Hoeryong Jung

Sensors 2024, 24(8), 2595; https://doi.org/10.3390/s24082595 - 18 Apr 2024

Cited by 3 | Viewed by 2018

Abstract

Taekwondo has evolved from a traditional martial art into an official Olympic sport. This study introduces a novel action recognition model tailored for Taekwondo unit actions, utilizing joint-motion data acquired via wearable inertial measurement unit (IMU) sensors. The utilization of IMU sensor-measured motion [...] Read more.

Taekwondo has evolved from a traditional martial art into an official Olympic sport. This study introduces a novel action recognition model tailored for Taekwondo unit actions, utilizing joint-motion data acquired via wearable inertial measurement unit (IMU) sensors. The utilization of IMU sensor-measured motion data facilitates the capture of the intricate and rapid movements characteristic of Taekwondo techniques. The model, underpinned by a conventional convolutional neural network (CNN)-based image classification framework, synthesizes action images to represent individual Taekwondo unit actions. These action images are generated by mapping joint-motion profiles onto the RGB color space, thus encapsulating the motion dynamics of a single unit action within a solitary image. To further refine the representation of rapid movements within these images, a time-warping technique was applied, adjusting motion profiles in relation to the velocity of the action. The effectiveness of the proposed model was assessed using a dataset compiled from 40 Taekwondo experts, yielding remarkable outcomes: an accuracy of 0.998, a precision of 0.983, a recall of 0.982, and an F1 score of 0.982. These results underscore this time-warping technique’s contribution to enhancing feature representation, as well as the proposed method’s scalability and effectiveness in recognizing Taekwondo unit actions. Full article

(This article belongs to the Section Wearables)

► Show Figures

Figure 1

31 pages, 4082 KiB

Open AccessArticle

Validation of the Gaming Skills Questionnaire in Adolescence: Effects of Gaming Skills on Cognitive and Affective Functioning

by Triantafyllia Zioga, Chrysanthi Nega, Petros Roussos and Panagiotis Kourtesis

Eur. J. Investig. Health Psychol. Educ. 2024, 14(3), 722-752; https://doi.org/10.3390/ejihpe14030048 - 19 Mar 2024

Cited by 5 | Viewed by 5142

Abstract

Given the widespread popularity of videogames, research attempted to assess their effects on cognitive and affective abilities, especially in children and adolescents. Despite numerous correlational studies, robust evidence on the causal relationship between videogames and cognition remains scarce, hindered by the absence of [...] Read more.

Given the widespread popularity of videogames, research attempted to assess their effects on cognitive and affective abilities, especially in children and adolescents. Despite numerous correlational studies, robust evidence on the causal relationship between videogames and cognition remains scarce, hindered by the absence of a comprehensive assessment tool for gaming skills across various genres. In a sample of 347 adolescents, this study aimed to develop and validate the Gaming Skill Questionnaire (GSQ) and assess the impact of gaming skills in six different genres (sport, first-person shooters, role-playing games, action-adventure, strategy, and puzzle games) on cognitive and affective abilities of adolescents. The GSQ exhibited strong reliability and validity, highlighting its potential as a valuable tool. Gaming skills positively affected executive function, memory, overall cognition, cognitive flexibility, and emotion recognition, except for empathy. Various game genres had different effects on cognitive and affective abilities, with verbal fluency influenced mainly by sports, executive functions by action, strategy, and puzzle, and emotion recognition positively impacted by action and puzzle but negatively by sports and strategy games. Both age and gaming skills influenced cognitive flexibility, with gaming having a greater effect. These intriguing genre-specific effects on cognitive and affective functioning postulate further research with GSQ’s contribution. Full article

► Show Figures

Figure 1

15 pages, 1207 KiB

Open AccessArticle

From Movements to Metrics: Evaluating Explainable AI Methods in Skeleton-Based Human Activity Recognition

by Kimji N. Pellano, Inga Strümke and Espen A. F. Ihlen

Sensors 2024, 24(6), 1940; https://doi.org/10.3390/s24061940 - 18 Mar 2024

Cited by 8 | Viewed by 2221

Abstract

The advancement of deep learning in human activity recognition (HAR) using 3D skeleton data is critical for applications in healthcare, security, sports, and human–computer interaction. This paper tackles a well-known gap in the field, which is the lack of testing in the applicability [...] Read more.

The advancement of deep learning in human activity recognition (HAR) using 3D skeleton data is critical for applications in healthcare, security, sports, and human–computer interaction. This paper tackles a well-known gap in the field, which is the lack of testing in the applicability and reliability of XAI evaluation metrics in the skeleton-based HAR domain. We have tested established XAI metrics, namely faithfulness and stability on Class Activation Mapping (CAM) and Gradient-weighted Class Activation Mapping (Grad-CAM) to address this problem. This study introduces a perturbation method that produces variations within the error tolerance of motion sensor tracking, ensuring the resultant skeletal data points remain within the plausible output range of human movement as captured by the tracking device. We used the NTU RGB+D 60 dataset and the EfficientGCN architecture for HAR model training and testing. The evaluation involved systematically perturbing the 3D skeleton data by applying controlled displacements at different magnitudes to assess the impact on XAI metric performance across multiple action classes. Our findings reveal that faithfulness may not consistently serve as a reliable metric across all classes for the EfficientGCN model, indicating its limited applicability in certain contexts. In contrast, stability proves to be a more robust metric, showing dependability across different perturbation magnitudes. Additionally, CAM and Grad-CAM yielded almost identical explanations, leading to closely similar metric outcomes. This suggests a need for the exploration of additional metrics and the application of more diverse XAI methods to broaden the understanding and effectiveness of XAI in skeleton-based HAR. Full article

(This article belongs to the Special Issue AI-Enabled Sensing Technology and Data Analysis Techniques for Intelligent Human-Computer Interaction)

► Show Figures

Figure 1

27 pages, 3440 KiB

Open AccessArticle

Sparse Representations Optimization with Coupled Bayesian Dictionary and Dictionary Classifier for Efficient Classification

by Muhammad Riaz-ud-din, Salman Abdul Ghafoor and Faisal Shafait

Appl. Sci. 2024, 14(1), 306; https://doi.org/10.3390/app14010306 - 29 Dec 2023

Viewed by 1831

Abstract

Among the numerous techniques followed to learn a linear classifier through the discriminative dictionary and sparse representations learning of signals, the techniques to learn a nonparametric Bayesian classifier jointly and discriminately with the dictionary and the corresponding sparse representations have drawn considerable attention [...] Read more.

Among the numerous techniques followed to learn a linear classifier through the discriminative dictionary and sparse representations learning of signals, the techniques to learn a nonparametric Bayesian classifier jointly and discriminately with the dictionary and the corresponding sparse representations have drawn considerable attention from researchers. These techniques jointly learn two sets of sparse representations, one for the training samples over the dictionary and the other for the corresponding labels over the dictionary classifier. At the prediction stage, the representations of the test samples computed over the learned dictionary do not truly represent the corresponding labels, exposing weakness in the joint learning claim of these techniques. We mitigate this problem and strengthen the joint by learning a set of weights over the dictionary to represent the training data and further optimizing the same weights over the dictionary classifier to represent the labels of the corresponding classes of the training data. Now, at the prediction stage, the representation weights of the test samples computed over the learned dictionary also represent the labels of the corresponding classes of the test samples, resulting in the accurate reconstruction of the labels of the classes by the learned dictionary classifier. Overall, a reduction in the size of the Bayesian model’s parameters also improves training time. We analytically and nonparametrically derived the posterior conditional probabilities of the model from the overall joint probability of the model using Bayes’ theorem. We used the Gibbs sampler to solve the joint probability of the model using the derived conditional probabilities, which also supports our claim of efficient optimization of the coupled/joint dictionaries and the sparse representation parameters. We demonstrated the effectiveness of our approach through experiments on the standard datasets, i.e., the Extended YaleB and AR face databases for face recognition, Caltech-101 and Fifteen Scene Category databases for categorization, and UCF sports action database for action recognition. We compared the results with the state-of-the-art methods in the area. The classification accuracies, i.e., 93.25%, 89.27%, 94.81%, 98.10%, and 95.00%, of our approach on the datasets have increases of 0.5 to 2% on average. The overall average error margin of the confidence intervals in our approach is 0.24 compared with the second-best approach, JBDC, for which it is 0.34. The AUC–ROC scores of our approach are 0.98 and 0.992, which are better than those of others, i.e., 0.960 and 0.98, respectively. Our approach is also computationally efficient. Full article

(This article belongs to the Special Issue Novel Applications of Machine Learning and Bayesian Optimization)

► Show Figures

Figure 1

16 pages, 4264 KiB

Open AccessArticle

Design and Development of an Imitation Detection System for Human Action Recognition Using Deep Learning

by Noura Alhakbani, Maha Alghamdi and Abeer Al-Nafjan

Sensors 2023, 23(24), 9889; https://doi.org/10.3390/s23249889 - 18 Dec 2023

Cited by 3 | Viewed by 1833

Abstract

Human action recognition (HAR) is a rapidly growing field with numerous applications in various domains. HAR involves the development of algorithms and techniques to automatically identify and classify human actions from video data. Accurate recognition of human actions has significant implications in fields [...] Read more.

Human action recognition (HAR) is a rapidly growing field with numerous applications in various domains. HAR involves the development of algorithms and techniques to automatically identify and classify human actions from video data. Accurate recognition of human actions has significant implications in fields such as surveillance and sports analysis and in the health care domain. This paper presents a study on the design and development of an imitation detection system using an HAR algorithm based on deep learning. This study explores the use of deep learning models, such as a single-frame convolutional neural network (CNN) and pretrained VGG-16, for the accurate classification of human actions. The proposed models were evaluated using a benchmark dataset, KTH. The performance of these models was compared with that of classical classifiers, including K-Nearest Neighbors, Support Vector Machine, and Random Forest. The results showed that the VGG-16 model achieved higher accuracy than the single-frame CNN, with a 98% accuracy rate. Full article

(This article belongs to the Special Issue Deep Learning Methods for Human Activity Recognition and Emotion Detection)

► Show Figures

Figure 1

22 pages, 4831 KiB

Open AccessArticle

Human Action Recognition Based on Hierarchical Multi-Scale Adaptive Conv-Long Short-Term Memory Network

by Qian Huang, Weiliang Xie, Chang Li, Yanfang Wang and Yanwei Liu

Appl. Sci. 2023, 13(19), 10560; https://doi.org/10.3390/app131910560 - 22 Sep 2023

Cited by 2 | Viewed by 1687

Abstract

Recently, human action recognition has gained widespread use in fields such as human–robot interaction, healthcare, and sports. With the popularity of wearable devices, we can easily access sensor data of human actions for human action recognition. However, extracting spatio-temporal motion patterns from sensor [...] Read more.

Recently, human action recognition has gained widespread use in fields such as human–robot interaction, healthcare, and sports. With the popularity of wearable devices, we can easily access sensor data of human actions for human action recognition. However, extracting spatio-temporal motion patterns from sensor data and capturing fine-grained action processes remain a challenge. To address this problem, we proposed a novel hierarchical multi-scale adaptive Conv-LSTM network structure called HMA Conv-LSTM. The spatial information of sensor signals is extracted by hierarchical multi-scale convolution with finer-grained features, and the multi-channel features are fused by adaptive channel feature fusion to retain important information and improve the efficiency of the model. The dynamic channel-selection-LSTM based on the attention mechanism captures the temporal context information and long-term dependence of the sensor signals. Experimental results show that the proposed model achieves Macro F1-scores of 0.68, 0.91, 0.53, and 0.96 on four public datasets: Opportunity, PAMAP2, USC-HAD, and Skoda, respectively. Our model demonstrates competitive performance when compared to several state-of-the-art approaches. Full article

(This article belongs to the Special Issue Human Activity Recognition (HAR) in Healthcare)

► Show Figures

Figure 1

17 pages, 4724 KiB

Open AccessArticle

Multi-Object Detection and Tracking Using Reptile Search Optimization Algorithm with Deep Learning

by Ramachandran Alagarsamy and Dhamodaran Muneeswaran

Symmetry 2023, 15(6), 1194; https://doi.org/10.3390/sym15061194 - 2 Jun 2023

Cited by 3 | Viewed by 2449

Abstract

Multiple-Object Tracking (MOT) has become more popular because of its commercial and academic potential. Though various techniques were devised for managing this issue, it becomes a challenge because of factors such as severe object occlusions and abrupt appearance changes. Tracking presents the optimal [...] Read more.

Multiple-Object Tracking (MOT) has become more popular because of its commercial and academic potential. Though various techniques were devised for managing this issue, it becomes a challenge because of factors such as severe object occlusions and abrupt appearance changes. Tracking presents the optimal outcomes whenever the object moves uniformly without occlusion and in the same direction. However, this is generally not a real scenario, particularly in complicated scenes such as dance events or sporting where a greater number of players are tracked, moving quickly, varying their speed and direction, along with distance and position from the camera and activity they are executing. In dynamic scenes, MOT remains the main difficulty due to the symmetrical shape, structure, and size of the objects. Therefore, this study develops a new reptile search optimization algorithm with deep learning-based multiple object detection and tracking (RSOADL–MODT) techniques. The presented RSOADL–MODT model intends to recognize and track the objects that exist with position estimation, tracking, and action recognition. It follows a series of processes, namely object detection, object classification, and object tracking. At the initial stage, the presented RSOADL–MODT technique applies a path-augmented RetinaNet-based (PA–RetinaNet) object detection module, which improves the feature extraction process. To improvise the network potentiality of the PA–RetinaNet method, the RSOA is utilized as a hyperparameter optimizer. Finally, the quasi-recurrent neural network (QRNN) classifier is exploited for classification procedures. A wide-ranging experimental validation process takes place on DanceTrack and MOT17 datasets for examining the effectual object detection outcomes of the RSOADL–MODT algorithm. The simulation values confirmed the enhancements of the RSOADL–MODT method over other DL approaches. Full article

► Show Figures

Figure 1

18 pages, 2743 KiB

Open AccessArticle

Analysis of Movement and Activities of Handball Players Using Deep Neural Networks

by Kristina Host, Miran Pobar and Marina Ivasic-Kos

J. Imaging 2023, 9(4), 80; https://doi.org/10.3390/jimaging9040080 - 13 Apr 2023

Cited by 20 | Viewed by 6845

Abstract

This paper focuses on image and video content analysis of handball scenes and applying deep learning methods for detecting and tracking the players and recognizing their activities. Handball is a team sport of two teams played indoors with the ball with well-defined goals [...] Read more.

This paper focuses on image and video content analysis of handball scenes and applying deep learning methods for detecting and tracking the players and recognizing their activities. Handball is a team sport of two teams played indoors with the ball with well-defined goals and rules. The game is dynamic, with fourteen players moving quickly throughout the field in different directions, changing positions and roles from defensive to offensive, and performing different techniques and actions. Such dynamic team sports present challenging and demanding scenarios for both the object detector and the tracking algorithms and other computer vision tasks, such as action recognition and localization, with much room for improvement of existing algorithms. The aim of the paper is to explore the computer vision-based solutions for recognizing player actions that can be applied in unconstrained handball scenes with no additional sensors and with modest requirements, allowing a broader adoption of computer vision applications in both professional and amateur settings. This paper presents semi-manual creation of custom handball action dataset based on automatic player detection and tracking, and models for handball action recognition and localization using Inflated 3D Networks (I3D). For the task of player and ball detection, different configurations of You Only Look Once (YOLO) and Mask Region-Based Convolutional Neural Network (Mask R-CNN) models fine-tuned on custom handball datasets are compared to original YOLOv7 model to select the best detector that will be used for tracking-by-detection algorithms. For the player tracking, DeepSORT and Bag of tricks for SORT (BoT SORT) algorithms with Mask R-CNN and YOLO detectors were tested and compared. For the task of action recognition, I3D multi-class model and ensemble of binary I3D models are trained with different input frame lengths and frame selection strategies, and the best solution is proposed for handball action recognition. The obtained action recognition models perform well on the test set with nine handball action classes, with average F1 measures of 0.69 and 0.75 for ensemble and multi-class classifiers, respectively. They can be used to index handball videos to facilitate retrieval automatically. Finally, some open issues, challenges in applying deep learning methods in such a dynamic sports environment, and direction for future development will be discussed. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

19 pages, 3844 KiB

Open AccessArticle

Human Activity Recognition Based on Two-Channel Residual–GRU–ECA Module with Two Types of Sensors

by Xun Wang and Jie Shang

Electronics 2023, 12(7), 1622; https://doi.org/10.3390/electronics12071622 - 30 Mar 2023

Cited by 11 | Viewed by 3735

Abstract

With the thriving development of sensor technology and pervasive computing, sensor-based human activity recognition (HAR) has become more and more widely used in healthcare, sports, health monitoring, and human interaction with smart devices. Inertial sensors were one of the most commonly used sensors [...] Read more.

With the thriving development of sensor technology and pervasive computing, sensor-based human activity recognition (HAR) has become more and more widely used in healthcare, sports, health monitoring, and human interaction with smart devices. Inertial sensors were one of the most commonly used sensors in HAR. In recent years, the demand for comfort and flexibility in wearable devices has gradually increased, and with the continuous development and advancement of flexible electronics technology, attempts to incorporate stretch sensors into HAR have begun. In this paper, we propose a two-channel network model based on residual blocks, an efficient channel attention module (ECA), and a gated recurrent unit (GRU) that is capable of the long-term sequence modeling of data, efficiently extracting spatial–temporal features, and performing activity classification. A dataset named IS-Data was designed and collected from six subjects wearing stretch sensors and inertial sensors while performing six daily activities. We conducted experiments using IS-Data and a public dataset called w-HAR to validate the feasibility of using stretch sensors in human action recognition and to investigate the effectiveness of combining flexible and inertial data in human activity recognition, and our proposed method showed superior performance and good generalization performance when compared with the state-of-the-art methods. Full article

► Show Figures

Figure 1

Search Results (39)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (39)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI