MDPI - Publisher of Open Access Journals

30 pages, 3451 KiB

Open AccessArticle

Integrating Google Maps and Smooth Street View Videos for Route Planning

by Federica Massimi, Antonio Tedeschi, Kalapraveen Bagadi and Francesco Benedetto

J. Imaging 2025, 11(8), 251; https://doi.org/10.3390/jimaging11080251 - 25 Jul 2025

Viewed by 202

This research addresses the long-standing dependence on printed maps for navigation and highlights the limitations of existing digital services like Google Street View and Google Street View Player in providing comprehensive solutions for route analysis and understanding. The absence of a systematic approach [...] Read more.

This research addresses the long-standing dependence on printed maps for navigation and highlights the limitations of existing digital services like Google Street View and Google Street View Player in providing comprehensive solutions for route analysis and understanding. The absence of a systematic approach to route analysis, issues related to insufficient street view images, and the lack of proper image mapping for desired roads remain unaddressed by current applications, which are predominantly client-based. In response, we propose an innovative automatic system designed to generate videos depicting road routes between two geographic locations. The system calculates and presents the route conventionally, emphasizing the path on a two-dimensional representation, and in a multimedia format. A prototype is developed based on a cloud-based client–server architecture, featuring three core modules: frames acquisition, frames analysis and elaboration, and the persistence of metadata information and computed videos. The tests, encompassing both real-world and synthetic scenarios, have produced promising results, showcasing the efficiency of our system. By providing users with a real and immersive understanding of requested routes, our approach fills a crucial gap in existing navigation solutions. This research contributes to the advancement of route planning technologies, offering a comprehensive and user-friendly system that leverages cloud computing and multimedia visualization for an enhanced navigation experience. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

18 pages, 338 KiB

Open AccessArticle

The Temporal–Spatial Parameters of Gait After Total Knee Arthroplasty

by Karina Szczypiór-Piasecka, Paulina Adamczewska, Łukasz Kołodziej and Paweł Ziętek

J. Clin. Med. 2025, 14(13), 4548; https://doi.org/10.3390/jcm14134548 - 26 Jun 2025

Viewed by 378

Abstract

Background/Objectives: Gait abnormalities in advanced knee osteoarthritis (KOA) are characterized by decreased stride length, walking speed, and cadence. Total knee arthroplasty (TKA) is intended to improve temporal–spatial gait parameters; however, the extent and timing of functional recovery remain under investigation. To assess changes [...] Read more.

Background/Objectives: Gait abnormalities in advanced knee osteoarthritis (KOA) are characterized by decreased stride length, walking speed, and cadence. Total knee arthroplasty (TKA) is intended to improve temporal–spatial gait parameters; however, the extent and timing of functional recovery remain under investigation. To assess changes in stride length, walking speed, and cadence following TKA in short- and long-term perspectives, and to compare outcomes with a non-operated KOA cohort. Methods: A prospective observational study was conducted involving 46 patients with unilateral KOA (grades III–IV, Kellgren–Lawrence scale) who underwent cemented TKA via a medial parapatellar approach. Group I (n = 34) was assessed one day prior to surgery and six weeks postoperatively. Group II (n = 12), a follow-up subset, was reassessed 1.5 years postoperatively. Group III (n = 34) served as a non-operated control group, assessed only preoperatively. Temporal–spatial gait parameters were evaluated under standardized conditions using a two-dimensional video analysis (Kinovea^® software version 0.8.27). Stride length (m) and walking speed (m/s) were assessed during continuous walking along a 15 m corridor, with at least three valid gait cycles averaged per trial. Cadence (steps/min) was determined during a one-minute walk and verified frame-by-frame. No structured outpatient physiotherapy was provided; all patients followed a standardized in-hospital rehabilitation protocol. Results: In Group I, the mean stride length increased from 0.40 ± 0.10 m to 0.42 ± 0.10 m (p = 0.247), walking speed improved from 0.41 ± 0.027 m/s to 0.47 ± 0.022 m/s (p = 0.063), and cadence increased significantly from 72.9 ± 7.8 to 77.1 ± 8.6 steps/min (p = 0.044). In Group II, the mean stride length rose from 0.39 ± 0.10 m to 0.52 ± 0.09 m (p < 0.001), walking speed improved from 0.44 ± 0.02 m/s to 0.69 ± 0.01 m/s (p < 0.001), and cadence increased from 73.7 ± 8.8 to 103.6 ± 7.4 steps/min (p < 0.001). Compared to the control group (Group III: stride length 0.42 ± 0.09 m; walking speed 0.41 ± 0.02 m/s; cadence 73.9 ± 7.9 steps/min), Group II demonstrated superior values across all parameters (p < 0.001 for each comparison). No significant correlations were observed between BMI and gait outcomes. Conclusions: Total knee arthroplasty resulted in progressive improvement in temporal–spatial gait parameters. While early postoperative gains were limited, substantial functional restoration was observed at long-term follow-up, emphasizing the importance of extended recovery monitoring in post-TKA evaluation. Full article

(This article belongs to the Special Issue Advanced Approaches in Hip and Knee Arthroplasty)

► Show Figures

Figure 1

21 pages, 1158 KiB

Open AccessArticle

Evaluation of the Impact of External Conditions on Arm Positioning During Punches in MMA Fighters: A Comparative Analysis of 2D and 3D Methods

by Dariusz Skalski, Magdalena Prończuk, Petr Stastny, Kinga Łosińska, Miłosz Drozd, Michal Toborek, Piotr Aschenbrenner and Adam Maszczyk

Sensors 2025, 25(11), 3270; https://doi.org/10.3390/s25113270 - 22 May 2025

Viewed by 536

Abstract

Mixed Martial Arts (MMA) is a highly dynamic combat sport that requires precise motor coordination and technical execution. Video-based motion analysis, including two-dimensional (2D) and three-dimensional (3D) motion capture systems, plays a critical role in optimizing movement patterns, enhancing training efficiency, and reducing [...] Read more.

Mixed Martial Arts (MMA) is a highly dynamic combat sport that requires precise motor coordination and technical execution. Video-based motion analysis, including two-dimensional (2D) and three-dimensional (3D) motion capture systems, plays a critical role in optimizing movement patterns, enhancing training efficiency, and reducing injury risk. However, the comparative validity of 2D and 3D systems for evaluating punching mechanics under external stressors remains unclear. This study aimed to first validate the measurement agreement between 2D and 3D motion analyses during sagittal-plane punches, and second, to examine the impact of fatigue and balance disruption on arm kinematics and punch dynamics in elite MMA athletes. Twenty-one male MMA fighters (mean age: 24.85 ± 7.24 years) performed standardized straight right punches (SRPs) and swing punches (SPs) under three experimental conditions: normal, balance-disrupted, and fatigued. Participants were instructed to deliver maximal-effort punches targeting a designated striking pad placed at a consistent height and distance. Each punch type was executed three times per condition. Kinematic data were collected using the my Dartfish Express(version 7.2.0) app (2D system) and MaxPRO infrared motion capture system (3D system). Statistical analyses included Pearson’s correlation coefficients, one-way analysis of variance (ANOVA), and linear mixed models (LMMs). Strong correlations (r = 0.964–0.999) and high intraclass correlation coefficient (ICC) values (0.81–0.99) confirmed the high reliability of 2D analysis for sagittal-plane techniques. Fatigue significantly decreased punch velocity and impact force (p < 0.01), while increasing joint angle variability (p < 0.01). These findings highlight the complementary use of 2D and 3D motion capture methods, supporting individualized monitoring, adaptive technique evaluation, and performance optimization in combat sports. Full article

(This article belongs to the Section Physical Sensors)

► Show Figures

Figure 1

12 pages, 2074 KiB

Open AccessArticle

Markerless Upper Body Movement Tracking During Gait in Children with HIV Encephalopathy: A Pilot Study

by Maaike M. Eken, Pieter Meyns, Robert P. Lamberts and Nelleke G. Langerak

Appl. Sci. 2025, 15(8), 4546; https://doi.org/10.3390/app15084546 - 20 Apr 2025

Viewed by 404

Abstract

The aim of this pilot study was to investigate the feasibility of markerless tracking to assess upper body movements of children with and without human immunodeficiency virus encephalopathy (HIV-E). Sagittal and frontal video recordings were used to track anatomical landmarks with the DeepLabCut [...] Read more.

The aim of this pilot study was to investigate the feasibility of markerless tracking to assess upper body movements of children with and without human immunodeficiency virus encephalopathy (HIV-E). Sagittal and frontal video recordings were used to track anatomical landmarks with the DeepLabCut pre-trained human model in five children with HIV-E and five typically developing (TD) children to calculate shoulder flexion/extension, shoulder abduction/adduction, elbow flexion/extension and trunk lateral sway. Differences in joint angle trajectories of the two cohorts were investigated using a one-dimensional statistical parametric mapping method. Children with HIV-E showed a larger range of motion in shoulder abduction and trunk sway than TD children. In addition, they showed more shoulder extension and more lateral trunk sway compared to TD children. Markerless tracking was feasible for 2D movement analysis and sensitive to observe expected differences in upper limb and trunk sway movements between children with and without HIVE. Therefore, it could serve as a useful alternative in settings where expensive gait laboratory instruments are unavailable, for example, in clinical centers in low- to middle-income countries. Future research is needed to explore 3D markerless movement analysis systems and investigate the reliability and validity of these systems against the gold standard 3D marker-based systems that are currently used in clinical practice. Full article

(This article belongs to the Special Issue Human Biomechanics and EMG Signal Processing)

► Show Figures

Figure 1

29 pages, 8325 KiB

Open AccessArticle

Insights into Mosquito Behavior: Employing Visual Technology to Analyze Flight Trajectories and Patterns

by Ning Zhao, Lifeng Wang and Ke Wang

Electronics 2025, 14(7), 1333; https://doi.org/10.3390/electronics14071333 - 27 Mar 2025

Cited by 1 | Viewed by 524

Abstract

Mosquitoes, as vectors of numerous serious infectious diseases, require rigorous behavior monitoring for effective disease prevention and control. Simultaneously, precise surveillance of flying insect behavior is also crucial in agricultural pest management. This study proposes a three-dimensional trajectory reconstruction method for mosquito behavior [...] Read more.

Mosquitoes, as vectors of numerous serious infectious diseases, require rigorous behavior monitoring for effective disease prevention and control. Simultaneously, precise surveillance of flying insect behavior is also crucial in agricultural pest management. This study proposes a three-dimensional trajectory reconstruction method for mosquito behavior analysis based on video data. By employing multiple synchronized cameras to capture mosquito flight images, using background subtraction to extract moving targets, applying Kalman filtering to predict target states, and integrating the Hungarian algorithm for multi-target data association, the system can automatically reconstruct three-dimensional mosquito flight trajectories. Experimental results demonstrate that this approach achieves high-precision flight path reconstruction, with a detection accuracy exceeding 95%, an F1-score of 0.93, and fast processing speeds that enables real-time tracking. The mean error of three-dimensional trajectory reconstruction is only 10 ± 4 mm, offering significant improvements in detection accuracy, tracking robustness, and real-time performance over traditional two-dimensional methods. These findings provide technological support for optimizing vector control strategies and enhancing precision pest control and can be further extended to ecological monitoring and agricultural pest management, thus bearing substantial significance for both public health and agriculture. Full article

► Show Figures

Figure 1

35 pages, 3471 KiB

Open AccessReview

An In-Depth Analysis of 2D and 3D Pose Estimation Techniques in Deep Learning: Methodologies and Advances

by Ruiyang Sun, Zixiang Lin, Song Leng, Aili Wang and Lanfei Zhao

Electronics 2025, 14(7), 1307; https://doi.org/10.3390/electronics14071307 - 26 Mar 2025

Cited by 1 | Viewed by 4226

Abstract

Pose estimation (PE) is a cutting-edge technology in computer vision, essential for AI-driven sport analysis, advancing technological applications, enhancing security, and improving the quality of life. Deep learning has markedly advanced accuracy and efficiency in the field while propelling algorithmic frameworks and model [...] Read more.

Pose estimation (PE) is a cutting-edge technology in computer vision, essential for AI-driven sport analysis, advancing technological applications, enhancing security, and improving the quality of life. Deep learning has markedly advanced accuracy and efficiency in the field while propelling algorithmic frameworks and model architectures to greater complexity, yet rendering their underlying interrelations increasingly opaque. This review examines deep learning-based PE techniques, classifying them from two perspectives: two-dimensional (2D) and three-dimensional (3D), based on methodological principles and output formats. Within each category, advanced techniques for single-person, multi-person, and video-based PE are explored according to their applicable conditions, highlighting key differences and intrinsic connections while comparing performance metrics. We also analyze datasets across 2D, 3D, and video domains, with comparisons presented in tables. The practical applications of PE in daily life are also summarized alongside an exploration of the challenges facing the field and the proposal of innovative, forward-looking research directions. This review aims to be a valuable resource for researchers advancing deep learning-driven PE. Full article

(This article belongs to the Special Issue New Insights in 2D and 3D Object Detection and Semantic Segmentation)

► Show Figures

Figure 1

12 pages, 1969 KiB

Open AccessArticle

Comparison of Margin Quality for Intersegmental Plan Identification in Pulmonary Segmentectomy

by Selcuk Gurz, Yurdanur Sullu, Leman Tomak, Necmiye Gul Temel and Aysen Sengul

Medicina 2025, 61(3), 535; https://doi.org/10.3390/medicina61030535 - 19 Mar 2025

Viewed by 598

Abstract

Background and Objectives: Insufficient margin in lung cancer is associated with an increased locoregional recurrence rate. In pulmonary segmentectomy, two commonly used methods for identifying the intersegmental plane are inflation–deflation and indocyanine green dyeing. The aim of this study was to compare these [...] Read more.

Background and Objectives: Insufficient margin in lung cancer is associated with an increased locoregional recurrence rate. In pulmonary segmentectomy, two commonly used methods for identifying the intersegmental plane are inflation–deflation and indocyanine green dyeing. The aim of this study was to compare these two methods in terms of quality margins and to evaluate their superiority. Materials and Methods: A total of 63 patients who underwent segmentectomy via video-assisted thoracoscopic surgery (VATS) for pulmonary nodules and underwent preoperative planning with 3D modeling between October 2020 and February 2024 were included in this study. The location of the nodule and the distance to the intersegmental margins were virtually measured preoperatively using an open-source 3D modeling system. Patients were grouped according to the method of identifying the intersegmental margins. Group 1 included segmentectomies performed by the inflation–deflation method (n = 42), and Group 2 included segmentectomies performed by systemic indocyanine green (ICG) injection (n = 21). The area where the histopathological nodule was measured closest to the intersegmental margin was recorded. Values within (+/−10 mm) compared to the value measured in the three-dimensional model were considered successful. The obtained data were statistically compared between the groups. Results: There was no difference between the groups in terms of virtual and pathological margins. However, in terms of margin quality, the rate of deviation detected in the pathological margin compared to the measured virtual margin was significantly different between the groups (p = 0.04). Accordingly, the success rate was 64.3% in Group 1 and 90.5% in Group 2 (p = 0.05). In Group 1, the failure rate was highly against the adjacent parenchyma. There was no significant difference between the groups in the analysis of simple and complex segmentectomies. Conclusions: Intersegmental plane identification with indocyanine green increases the margin quality by defining resection margins closer to the virtual margins. In the inflation–deflation method, unnecessary parenchymal loss occurs due to disadvantages in identifying intersegmental margins. Full article

(This article belongs to the Section Surgery)

► Show Figures

Figure 1

21 pages, 5384 KiB

Open AccessArticle

A Video SAR Multi-Target Tracking Algorithm Based on Re-Identification Features and Multi-Stage Data Association

by Anxi Yu, Boxu Wei, Wenhao Tong, Zhihua He and Zhen Dong

Remote Sens. 2025, 17(6), 959; https://doi.org/10.3390/rs17060959 - 8 Mar 2025

Viewed by 1093

Abstract

Video Synthetic Aperture Radar (ViSAR) operates by continuously monitoring regions of interest to produce sequences of SAR imagery. The detection and tracking of ground-moving targets, through the analysis of their radiation properties and temporal variations relative to the background environment, represents a significant [...] Read more.

Video Synthetic Aperture Radar (ViSAR) operates by continuously monitoring regions of interest to produce sequences of SAR imagery. The detection and tracking of ground-moving targets, through the analysis of their radiation properties and temporal variations relative to the background environment, represents a significant area of focus and innovation within the SAR research community. In this study, some key challenges in ViSAR systems are addressed, including the abundance of low-confidence shadow detections, high error rates in multi-target data association, and the frequent fragmentation of tracking trajectories. A multi-target tracking algorithm for ViSAR that utilizes re-identification (ReID) features and a multi-stage data association process is proposed. The algorithm extracts high-dimensional ReID features using the Dense-Net121 network for enhanced shadow detection and calculates a cost matrix by integrating ReID feature cosine similarity with Intersection over Union similarity. A confidence-based multi-stage data association strategy is implemented to minimize missed detections and trajectory fragmentation. Kalman filtering is then employed to update trajectory states based on shadow detection. Both simulation experiments and actual data processing experiments have demonstrated that, in comparison to two traditional video multi-target tracking algorithms, DeepSORT and ByteTrack, the newly proposed algorithm exhibits superior performance in the realm of ViSAR multi-target tracking, yielding the highest MOTA and HOTA scores of 94.85% and 92.88%, respectively, on the simulated spaceborne ViSAR data, and the highest MOTA and HOTA scores of 82.94% and 69.74%, respectively, on airborne field data. Full article

(This article belongs to the Special Issue Temporal and Spatial Analysis of Multi-Source Remote Sensing Images)

► Show Figures

Figure 1

12 pages, 2117 KiB

Open AccessArticle

Do Different Two-Dimensional Camera Speeds Detect Different Lower-Limb Kinematics Measures? A Laboratory-Based Cross-Sectional Study

by Abdulaziz Rsheed Alenzi, Msaad Alzhrani, Ahmad Alanazi and Hosam Alzahrani

J. Clin. Med. 2025, 14(5), 1687; https://doi.org/10.3390/jcm14051687 - 2 Mar 2025

Viewed by 815

Abstract

Background/Objectives: Football poses a high risk of sustaining lower-limb injuries, particularly anterior cruciate ligament (ACL) injuries, owing to the frequent jumping and landing movements. Identifying risk factors for these injuries is crucial to successful prevention. Two-dimensional (2D) video analysis is a commonly employed [...] Read more.

Background/Objectives: Football poses a high risk of sustaining lower-limb injuries, particularly anterior cruciate ligament (ACL) injuries, owing to the frequent jumping and landing movements. Identifying risk factors for these injuries is crucial to successful prevention. Two-dimensional (2D) video analysis is a commonly employed tool for assessing movement patterns and determining injury risk in clinical settings. This study aims to investigate whether variations in the camera frame rate impact the accuracy of key angle measurements (knee valgus, hip adduction (HADD), and lateral trunk flexion (LTF)) in male football players during high-risk functional tasks such as single-leg landing and 45° side-cutting. Methods: This laboratory-based cross-sectional study included 29 football players (mean (SD) age: 24.37 [3.14] years). The frontal plane projection angle (FPPA), HADD, and LTF during single-leg landing and side-cutting tasks were measured using two different camera frame rates: 30 frames per second (fps) and 120 fps. The 2D kinematic data were analyzed using Quintic Biomechanics software. Results: Significant differences in FPPA scores during single-leg landing were observed between the 30 fps and 120 fps for both the dominant (mean difference = 2.65 [95% confidence interval [CI]: 0.76–4.55], p = 0.008) and non-dominant leg (3.53 [1.53–5.54], p = 0.001). Additionally, the FPPA of the right leg during the side-cutting task showed significant differences (2.18 [0.43–3.93], p = 0.016). The LTF of the right leg during side-cutting displayed a significant variation between frame rates (−2.69 [−5.17–−0.22], p = 0.034). No significant differences in HADD were observed. Conclusions: Compared with a 30 fps camera, a high-speed (120 fps) camera demonstrated a superior performance in delivering accurate kinematic assessments of lower-limb injury risk factors. This improved precision supports injury screening, rehabilitation monitoring, and return-to-play decision-making through determining subtle biomechanical deficits crucial for lower-limb injury prevention and management. Full article

(This article belongs to the Section Sports Medicine)

► Show Figures

Figure 1

28 pages, 6569 KiB

Open AccessArticle

A New Efficient Hybrid Technique for Human Action Recognition Using 2D Conv-RBM and LSTM with Optimized Frame Selection

by Majid Joudaki, Mehdi Imani and Hamid R. Arabnia

Technologies 2025, 13(2), 53; https://doi.org/10.3390/technologies13020053 - 1 Feb 2025

Cited by 1 | Viewed by 2428

Abstract

Recognizing human actions through video analysis has gained significant attention in applications like surveillance, sports analytics, and human–computer interaction. While deep learning models such as 3D convolutional neural networks (CNNs) and recurrent neural networks (RNNs) deliver promising results, they often struggle with computational [...] Read more.

Recognizing human actions through video analysis has gained significant attention in applications like surveillance, sports analytics, and human–computer interaction. While deep learning models such as 3D convolutional neural networks (CNNs) and recurrent neural networks (RNNs) deliver promising results, they often struggle with computational inefficiencies and inadequate spatial–temporal feature extraction, hindering scalability to larger datasets or high-resolution videos. To address these limitations, we propose a novel model combining a two-dimensional convolutional restricted Boltzmann machine (2D Conv-RBM) with a long short-term memory (LSTM) network. The 2D Conv-RBM efficiently extracts spatial features such as edges, textures, and motion patterns while preserving spatial relationships and reducing parameters via weight sharing. These features are subsequently processed by the LSTM to capture temporal dependencies across frames, enabling effective recognition of both short- and long-term action patterns. Additionally, a smart frame selection mechanism minimizes frame redundancy, significantly lowering computational costs without compromising accuracy. Evaluation on the KTH, UCF Sports, and HMDB51 datasets demonstrated superior performance, achieving accuracies of 97.3%, 94.8%, and 81.5%, respectively. Compared to traditional approaches like 2D RBM and 3D CNN, our method offers notable improvements in both accuracy and computational efficiency, presenting a scalable solution for real-time applications in surveillance, video security, and sports analytics. Full article

(This article belongs to the Special Issue Data Science and Big Data in Biology, Physical Science and Engineering—2nd Edition)

► Show Figures

Figure 1

17 pages, 4918 KiB

Open AccessArticle

CDKD-w+: A Keyframe Recognition Method for Coronary Digital Subtraction Angiography Video Sequence Based on w+ Space Encoding

by Yong Zhu, Haoyu Li, Shuai Xiao, Wei Yu, Hongyu Shang, Lin Wang, Yang Liu, Yin Wang and Jiachen Yang

Sensors 2025, 25(3), 710; https://doi.org/10.3390/s25030710 - 24 Jan 2025

Viewed by 996

Abstract

Currently, various deep learning methods can assist in medical diagnosis. Coronary Digital Subtraction Angiography (DSA) is a medical imaging technology used in cardiac interventional procedures. By employing X-ray sensors to visualize the coronary arteries, it generates two-dimensional images from any angle. However, due [...] Read more.

Currently, various deep learning methods can assist in medical diagnosis. Coronary Digital Subtraction Angiography (DSA) is a medical imaging technology used in cardiac interventional procedures. By employing X-ray sensors to visualize the coronary arteries, it generates two-dimensional images from any angle. However, due to the complexity of the coronary structures, the 2D images may sometimes lack sufficient information, necessitating the construction of a 3D model. Camera-level 3D modeling can be realized based on deep learning. Nevertheless, the beating of the heart results in varying degrees of arterial vasoconstriction and vasodilation, leading to substantial discrepancies between DSA sequences, which introduce errors in 3D modeling of the coronary arteries, resulting in the inability of the 3D model to reflect the coronary arteries. We propose a coronary DSA video sequence keyframe recognition method, CDKD-w+, based on w+ space encoding. The method utilizes a pSp encoder to encode the coronary DSA images, converting them into latent codes in the w+ space. Differential analysis of inter-frame latent codes is employed for heartbeat keyframe localization, aiding in coronary 3D modeling. Experimental results on a self-constructed coronary DSA heartbeat keyframe recognition dataset demonstrate an accuracy of 97%, outperforming traditional metrics such as L1, SSIM, and PSNR. Full article

(This article belongs to the Special Issue Image Processing in Sensors and Communication Systems)

► Show Figures

Figure 1

21 pages, 4884 KiB

Open AccessArticle

Evaluation of Machine Learning Algorithms for Classification of Visual Stimulation-Induced EEG Signals in 2D and 3D VR Videos

by Mingliang Zuo, Xiaoyu Chen and Li Sui

Brain Sci. 2025, 15(1), 75; https://doi.org/10.3390/brainsci15010075 - 16 Jan 2025

Cited by 3 | Viewed by 1517

Abstract

Backgrounds: Virtual reality (VR) has become a transformative technology with applications in gaming, education, healthcare, and psychotherapy. The subjective experiences in VR vary based on the virtual environment’s characteristics, and electroencephalography (EEG) is instrumental in assessing these differences. By analyzing EEG signals, researchers [...] Read more.

Backgrounds: Virtual reality (VR) has become a transformative technology with applications in gaming, education, healthcare, and psychotherapy. The subjective experiences in VR vary based on the virtual environment’s characteristics, and electroencephalography (EEG) is instrumental in assessing these differences. By analyzing EEG signals, researchers can explore the neural mechanisms underlying cognitive and emotional responses to VR stimuli. However, distinguishing EEG signals recorded by two-dimensional (2D) versus three-dimensional (3D) VR environments remains underexplored. Current research primarily utilizes power spectral density (PSD) features to differentiate between 2D and 3D VR conditions, but the potential of other feature parameters for enhanced discrimination is unclear. Additionally, the use of machine learning techniques to classify EEG signals from 2D and 3D VR using alternative features has not been thoroughly investigated, highlighting the need for further research to identify robust EEG features and effective classification methods. Methods: This study recorded EEG signals from participants exposed to 2D and 3D VR video stimuli to investigate the neural differences between these conditions. Key features extracted from the EEG data included PSD and common spatial patterns (CSPs), which capture frequency-domain and spatial-domain information, respectively. To evaluate classification performance, several classical machine learning algorithms were employed: ssupport vector machine (SVM), k-nearest neighbors (KNN), random forest (RF), naive Bayes, decision Tree, AdaBoost, and a voting classifier. The study systematically compared the classification performance of PSD and CSP features across these algorithms, providing a comprehensive analysis of their effectiveness in distinguishing EEG signals in response to 2D and 3D VR stimuli. Results: The study demonstrated that machine learning algorithms can effectively classify EEG signals recorded during watching 2D and 3D VR videos. CSP features outperformed PSD in classification accuracy, indicating their superior ability to capture EEG signals differences between the VR conditions. Among the machine learning algorithms, the Random Forest classifier achieved the highest accuracy at 95.02%, followed by KNN with 93.16% and SVM with 91.39%. The combination of CSP features with RF, KNN, and SVM consistently showed superior performance compared to other feature-algorithm combinations, underscoring the effectiveness of CSP and these algorithms in distinguishing EEG responses to different VR experiences. Conclusions: This study demonstrates that EEG signals recorded during watching 2D and 3D VR videos can be effectively classified using machine learning algorithms with extracted feature parameters. The findings highlight the superiority of CSP features over PSD in distinguishing EEG signals under different VR conditions, emphasizing CSP’s value in VR-induced EEG analysis. These results expand the application of feature-based machine learning methods in EEG studies and provide a foundation for future research into the brain cortical activity of VR experiences, supporting the broader use of machine learning in EEG-based analyses. Full article

(This article belongs to the Section Computational Neuroscience, Neuroinformatics, and Neurocomputing)

► Show Figures

Figure 1

18 pages, 2211 KiB

Open AccessEditor’s ChoiceArticle

Accuracy Evaluation of 3D Pose Reconstruction Algorithms Through Stereo Camera Information Fusion for Physical Exercises with MediaPipe Pose

by Sebastian Dill, Arjang Ahmadi, Martin Grimmer, Dennis Haufe, Maurice Rohr, Yanhua Zhao, Maziar Sharbafi and Christoph Hoog Antink

Sensors 2024, 24(23), 7772; https://doi.org/10.3390/s24237772 - 4 Dec 2024

Cited by 4 | Viewed by 4876

Abstract

In recent years, significant research has been conducted on video-based human pose estimation (HPE). While monocular two-dimensional (2D) HPE has been shown to achieve high performance, monocular three-dimensional (3D) HPE poses a more challenging problem. However, since human motion happens in a 3D [...] Read more.

In recent years, significant research has been conducted on video-based human pose estimation (HPE). While monocular two-dimensional (2D) HPE has been shown to achieve high performance, monocular three-dimensional (3D) HPE poses a more challenging problem. However, since human motion happens in a 3D space, 3D HPE offers a more accurate representation of the human, granting increased usability for complex tasks like analysis of physical exercise. We propose a method based on MediaPipe Pose, 2D HPE on stereo cameras and a fusion algorithm without prior stereo calibration to reconstruct 3D poses, combining the advantages of high accuracy in 2D HPE with the increased usability of 3D coordinates. We evaluate this method on a self-recorded database focused on physical exercise to research what accuracy can be achieved and whether this accuracy is sufficient to recognize errors in exercise performance. We find that our method achieves significantly improved performance compared to monocular 3D HPE (median RMSE of

30.1

compared to

56.3

, p-value below

10^{- 6}

) and can show that the performance is sufficient for error recognition. Full article

(This article belongs to the Special Issue Deep Learning Applications for Pose Estimation and Human Action Recognition)

► Show Figures

Figure 1

18 pages, 1139 KiB

Open AccessArticle

Facial Movements Extracted from Video for the Kinematic Classification of Speech

by Richard Palmer, Roslyn Ward, Petra Helmholz, Geoffrey R. Strauss, Paul Davey, Neville Hennessey, Linda Orton and Aravind Namasivayam

Sensors 2024, 24(22), 7235; https://doi.org/10.3390/s24227235 - 12 Nov 2024

Viewed by 1910

Abstract

Speech Sound Disorders (SSDs) are prevalent communication problems in children that pose significant barriers to academic success and social participation. Accurate diagnosis is key to mitigating life-long impacts. We are developing a novel software solution—the Speech Movement and Acoustic Analysis Tracking (SMAAT) system [...] Read more.

Speech Sound Disorders (SSDs) are prevalent communication problems in children that pose significant barriers to academic success and social participation. Accurate diagnosis is key to mitigating life-long impacts. We are developing a novel software solution—the Speech Movement and Acoustic Analysis Tracking (SMAAT) system to facilitate rapid and objective assessment of motor speech control issues underlying SSD. This study evaluates the feasibility of using automatically extracted three-dimensional (3D) facial measurements from single two-dimensional (2D) front-facing video cameras for classifying speech movements. Videos were recorded of 51 adults and 77 children between 3 and 4 years of age (all typically developed for age) saying 20 words from the mandibular and labial-facial levels of the Motor-Speech Hierarchy Probe Wordlist (MSH-PW). Measurements around the jaw and lips were automatically extracted from the 2D video frames using a state-of-the-art facial mesh detection and tracking algorithm, and each individual measurement was tested in a Leave-One-Out Cross-Validation (LOOCV) framework for its word classification performance. Statistics were evaluated at the

α = 0.05

significance level and several measurements were found to exhibit significant classification performance in both the adult and child cohorts. Importantly, measurements of depth indirectly inferred from the 2D video frames were among those found to be significant. The significant measurements were shown to match expectations of facial movements across the 20 words, demonstrating their potential applicability in supporting clinical evaluations of speech production. Full article

(This article belongs to the Special Issue Deep Learning Based Face Recognition and Feature Extraction)

► Show Figures

Graphical abstract

29 pages, 13487 KiB

Open AccessArticle

Real-Time Tracking Target System Based on Kernelized Correlation Filter in Complicated Areas

by Abdel Hamid Mbouombouo Mboungam, Yongfeng Zhi and Cedric Karel Fonzeu Monguen

Sensors 2024, 24(20), 6600; https://doi.org/10.3390/s24206600 - 13 Oct 2024

Cited by 3 | Viewed by 1711

Abstract

The achievement of rapid and reliable image object tracking has long been crucial and challenging for the advancement of image-guided technology. This study investigates real-time object tracking by offering an image target based on nuclear correlation tracking and detection methods to address the [...] Read more.

The achievement of rapid and reliable image object tracking has long been crucial and challenging for the advancement of image-guided technology. This study investigates real-time object tracking by offering an image target based on nuclear correlation tracking and detection methods to address the challenge of real-time target tracking in complicated environments. In the tracking process, the nuclear-related tracking algorithm can effectively balance the tracking performance and running speed. However, the target tracking process also faces challenges such as model drift, the inability to handle target scale transformation, and target length. In order to propose a solution, this work is organized around the following main points: this study dedicates its first part to the research on kernelized correlation filters (KCFs), encompassing model training, object identification, and a dense sampling strategy based on a circulant matrix. This work developed a scale pyramid searching approach to address the shortcoming that a KCF cannot forecast the target scale. The tracker was expanded in two stages: the first stage output the target’s two-dimensional coordinate location, and the second stage created the scale pyramid to identify the optimal target scale. Experiments show that this approach is capable of resolving the target size variation problem. The second part improved the KCF in two ways to meet the demands of a long-term object tracking task. This article introduces the initial object model, which effectively suppresses model drift. Secondly, an object detection module is implemented, and if the tracking module fails, the algorithm is redirected to the object detection module. The target detection module utilizes two detectors, a variance classifier and a KCF. Finally, this work includes trials on object tracking experiments and subsequent analysis of the results. Initially, this research provides a tracking algorithm assessment system, including an assessment methodology and the collection of test videos, which helped us to determine that the suggested technique outperforms the KCF tracking method. Additionally, the implementation of an evaluation system allows for an objective comparison of the proposed algorithm with other prominent tracking methods. We found that the suggested method outperforms others in terms of its accuracy and resilience. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

Search Results (71)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (71)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI