Resilient AI in Therapeutic Rehabilitation: The Integration of Computer Vision and Deep Learning for Dynamic Therapy Adaptation

Cirillo, Egidia; Conte, Claudia; Moccardi, Alberto; Fonisto, Mattia

doi:10.3390/app15126800

Open AccessArticle

Resilient AI in Therapeutic Rehabilitation: The Integration of Computer Vision and Deep Learning for Dynamic Therapy Adaptation

¹

Department of Electrical Engineering and Information Technology (DIETI), University of Naples Federico II, Via Claudio 21, 80125 Naples, Italy

²

Department of Industrial Engineering (DII), University of Naples Federico II, Via Claudio 21, 80125 Naples, Italy

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(12), 6800; https://doi.org/10.3390/app15126800

Submission received: 25 April 2025 / Revised: 11 June 2025 / Accepted: 12 June 2025 / Published: 17 June 2025

(This article belongs to the Special Issue eHealth Innovative Approaches and Applications: 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Resilient artificial intelligence (Resilient AI) is relevant in many areas where technology needs to adapt quickly to changing and unexpected conditions, such as in the medical, environmental, security, and agrifood sectors. In the case study involving the therapeutic rehabilitation of patients with motor problems, the Resilient AI system is crucial to ensure that systems can effectively respond to changes, maintain high performance, cope with uncertainties and complex variables, and enable the dynamic monitoring and adaptation of therapy in real time. The proposed system integrates advanced technologies, such as computer vision and deep learning models, focusing on non-invasive solutions for monitoring and adapting rehabilitation therapies. The system combines the Microsoft Kinect v3 sensor with MoveNet Thunder – SinglePose, a state-of-the-art deep-learning model for human pose estimation. Kinect’s 3D skeletal tracking and MoveNet’s high-precision 2D keypoint detection together improve the accuracy and reliability of postural analysis. The main objective is to develop an intelligent system that captures and analyzes a patient’s movements in real time using Motion Capture techniques and artificial intelligence (AI) models to improve the effectiveness of therapies. Computer vision tracks human movement, identifying crucial biomechanical parameters and improving the quality of rehabilitation.

Keywords:

resilient artificial intelligence; therapeutic rehabilitation; computer vision; Motion Capture; markerless; eHealth; Healthcare 5.0; diagnosis and therapy support systems

1. Introduction

Artificial intelligence (AI) has revolutionized many industries, finding applications in various fields such as smart homes to improve safety and optimize energy; Industry 5.0 to automate production and manage resources through synergetic collaboration between humans and machines; and agriculture, where AI automates tasks such as planting and harvesting, improving crop yields, and reducing the use of pesticides.

In the medical sector, AI has improved diagnosis [1], disease prediction [2,3], and personalized therapies. Its use in this field requires great care to avoid errors or distortions in the data, and continuous supervision by healthcare professionals is necessary to ensure patient safety. Therefore, AI systems are increasingly used in industrial applications, military and security contexts, finance, healthcare systems, environmental sectors, and the agrifood industry [4].

Considering their widespread use, disruptions to these systems can negatively impact health, mortality, asset value, and environmental sustainability. The protection of such systems and their ability to be resilient to various types of disruptive events is therefore a relevant area of research of great interest.

Resilient AI techniques are characterized by their ability to adapt and respond to unforeseen or changing situations, ensuring optimal performance even under variable conditions and uncertain scenarios. The use of Resilient AI is extending to multiple sectors, such as agrifood, where AI optimizes agricultural production by responding to unforeseen environmental variables; the environment, where intelligent models can adapt to rapid climatic changes and address new ecological problems in real-time; and the medical field, where AI models allow the system to manage and adapt to changes or inaccuracies in data, such as the progression of health conditions or the results of patients’ diagnostic tests, without compromising the quality of diagnoses or treatments.

Artificial intelligence techniques, deep learning, and computer vision models help to improve rehabilitation therapies and monitor patients’ progress by dynamically adapting therapies in real time, responding to changes in movement, and optimizing rehabilitation with a non-invasive approach.

Thanks to Resilient AI, the system can handle variations in data, such as postural defects or changes in the surrounding environment, without compromising the effectiveness of treatment. Resilient AI offers flexible, sustainable, and dynamic solutions through continuous improvement and effective resource management.

Healthcare 5.0

The 2030 United Nations Agenda [5] and all recent global socio-economic developments aim to enhance well-being and increase life expectancy by improving physical and mental health. These goals are achieved through technological advances and new digital initiatives, including Healthcare 5.0, which uses IoT technologies [6], remote monitoring [7], and advanced artificial intelligence applications to offer personalized, high-performance healthcare services [8,9].

Healthcare 5.0 is based on three fundamental pillars:

Resilience: It is essential that the service continues to operate even in adverse conditions [10], effectively recovering from errors [11], failures, or cyber threats, to ensure the continuity of monitoring (Figure 1).
Reliability: The system must guarantee constant and safe operation, performing expected operations without errors; any malfunctions could have critical consequences on the health of patients [12].
Personalization: Services must be able to adapt to the specific needs of each patient, optimizing treatment based on factors such as genetics, behavior, environment, and medical care, especially in the presence of multiple conditions.

2. Motion Capture System

The term Motion Capture refers to a set of technologies and processes used to record and digitize human movement. Currently available technologies are based on two main approaches: optical systems and non-optical systems, as shown in Table 1.

Optical systems are among the most popular technologies for Motion Capture. They rely on cameras placed around the subject’s movement area to capture marker movements from multiple angles.

Non-optical systems do not rely on computer vision or cameras but instead use alternative sensing technologies.

Motion Capture technologies are broadly classified into marker-based and markerless systems, as illustrated in Figure 2.

Marker-based systems are characterized by the use of physical markers, which are placed on the subject’s body and tracked by specialized cameras [13]. This system is known for its high accuracy but requires the application of markers and the use of an expensive camera infrastructure.

The markerless system relies on the use of cameras and sensors to capture and analyze human movement without the need for physical markers [14], reducing operational complexity, improving freedom of movement [15], and making the technology accessible for various practical applications [16].

2.1. Markerless Systems

In markerless systems, Motion Capture is performed using a technique called background subtraction. Initially, an image of the empty environment is recorded without the subject. When the subject enters the frame, the system compares the pixels in the new image with those in the reference scene, detecting the moving pixels.

To refine the detection, additional filters are applied to eliminate shadows and reflections by comparing the RGB vectors of the pixels and discarding those that could distort the analysis. Once the subject is clearly identified, the image is processed using morphological operators such as erosion and dilation, which improve the definition of the moving body’s silhouette.

This process results in an accurate two-dimensional representation of the subject, as shown in Figure 3, which is useful for subsequent three-dimensional processing.

2.1.1. Subject Volume Generation: The Visual Hull

To derive the subject’s volume, a technique called the “visual hull” is used. This involves backprojecting cones into space to obtain a three-dimensional volume, defined as the intersection of multiple cones, representing the volume occupied by the subject.

The working volume is divided into voxels, which are the three-dimensional equivalents of pixels. Each voxel is compared with the silhouettes, and if its projection falls within a silhouette, the voxel is considered part of the visual hull. The process can be accelerated using an “octree” representation, where only voxels located in significant regions (e.g., edges) are compared. This approach is supported by filters that extract 3D contours. The generation of a visual hull, however, can result in ghost volumes, i.e., artifacts that appear when an area of the working volume is not visible to any camera (e.g., a hidden subject limb). Therefore, it is essential to choose the number and position of cameras carefully in order to optimize the process and achieve a more accurate result.

2.1.2. Model Definition, Generation, and Matching

The points generated by the visual hull do not provide information about the specific body parts they represent; for example, it is impossible to determine whether they belong to the arm, leg, or torso. Moreover, these points are not linked to those in subsequent frames, which makes tracking movement over time unfeasible. To overcome this problem, an anatomical–kinematic model is used as a reference to interpret and organize the acquired data. This model is based on two main components:

Anatomical model:
The anatomical model describes the three-dimensional shape of the body as a triangular mesh—a structure composed of an ordered list of 3D points (called vertices) and triangles connecting these points to define surfaces.
Each point is identified by three coordinates (x, y, z) in three-dimensional space, and each triangle is described by the three vertices that form it, listed in counterclockwise order.
Kinematic model:
The body is represented as an articulated system, composed of main segments (such as arms, legs, trunk, etc.) connected to each other by joints that allow complex movements, both rotational and translational. Each segment has a “parent segment” (for example, the arm is connected to the trunk) and may have one or more “child segments” (for example, the forearm is connected to the arm). It also has a local coordinate system that defines its position relative to the segment to which it is connected.

Recently, an advanced algorithm was described in the literature by Corazza et al. [17] that allows the automatic generation of a specific model for each subject and includes kinematic information directly in the mesh of the various body segments, thus allowing for an integrated representation of shape and movement.

The algorithm relies on a database of laser scans from various human subjects. To simplify processing, principal component analysis (PCA) is applied to reduce the complexity of body shapes. The human body is thus described as a combination of principal components, with approximately ten components being sufficient for most applications to achieve the required level of detail for movement analysis.

The human body is described as a combination of some principal components, where, for most applications, about ten of these components are sufficient to obtain a level of detail sufficient to analyze movements.

The mesh registration process is divided into four main phases:

–: A specific algorithm is applied to obtain the necessary transformations useful for aligning the segments of the reference mesh with the mesh of the subject to be analyzed.
–: The mesh of the subject is divided into body segments using a proximity criterion: each point of the mesh is associated with the closest body part of the reference mesh.
–: For each body segment, an inverse transformation is applied to the one adopted in the first step to realign the mesh of the subject to the reference pose.
–: Finally, the reference mesh is transformed to fit the shape of the subject, minimizing the difference between the vertices of the analyzed mesh and those of the reference mesh.

The positions of joint centers (such as elbows, shoulders, and knees) are computed as linear combinations of seven vertices within the reference mesh. These joint locations are then linked to reliable mesh features, enabling a fully automated process.

This approach allows the generation of an accurate anatomical and kinematic model, which is useful for biomechanical analysis and virtual movement representation.

To analyze subject movement over time and reconstruct joint kinematics, it is necessary to determine the instantaneous pose of the subject in each frame. This is carried out using an algorithm that identifies the configuration of the three-dimensional model that best represents the collected data, namely the surface of the visual hull.

The identified configuration must adapt to the available data and respect the anatomical constraints imposed by the structure of the kinematic chain of the human body.

The process involves comparing the points of the visual hull and those of the model through the ICP (Iterative Closest Point) algorithm, which works by aligning the points of the visual hull with those of the model as precisely as possible, in such a way as to progressively reduce the distance between the two sets.

2.2. State of the Art in Markerless Motion Capture Systems and Computer Vision for Rehabilitation

This state-of-the-art review explores the main approaches to markerless Motion Capture, focusing on advances in computer vision and AI technologies for human motion analysis. Motion Capture is an important technique for biomechanics, used to diagnose problems related to musculoskeletal diseases [18,19] and to develop rehabilitation therapies [20]. However, its use in clinical settings is still limited due to high costs and long processing times [21]. Markerless systems represent a significant technological advancement as they allow motion analysis without the need to apply sensors to the body, reducing preparation time and increasing efficiency.

Markerless systems are divided into active and passive systems.

Active markerless systems emit visible or infrared light, such as lasers or light pulses, to detect movement, providing very precise 3D measurements, but require static measurements and controlled environments, such as laser scanning of the human body [22,23].

More interest is focused on passive systems that do not emit signals and allow motion to be captured in more natural environments. This technology is based on computer vision and AI and is used in various fields, such as surveillance, virtual reality, animation, and the study of human movement [24]. The techniques employed vary depending on the number of cameras and algorithms, which can be model-based or model-free.

Model-based approaches rely on a predefined anatomical model to track the human body, using representations such as stick figures [25], cylinders [26], superquadrics [27], or CAD models [28]. Model-free approaches estimate motion without a fixed model, for example, by exploiting bounding boxes [29] or analyzing body transformations through the medial axis [30].

Several studies have attempted to classify Motion Capture methods [31]. Moeslund et al. [23] analyzed more than 130 research studies published between 1980 and 2000, while Wang et al. [22] proposed a classification system based on image processing stages. However, many of these studies focus on surveillance and use simplified models of the human body, often with only one camera, making motion analysis less accurate.

For applications in biomechanics and medicine, it is crucial to accurately measure the 3D movement of the joints. This is challenging because each person has a different body shape, movements are complex, and body parts sometimes occlude others in camera views. Using multiple cameras helps solve this problem and makes the analysis more accurate.

Many studies have developed computer vision methods to analyze human movement, introducing techniques such as simulated annealing to improve posture prediction from markers [32], systems to estimate joint centers [33], methods to track lower limb movement [34], analysis of motor impairments, and evaluation of working postures [35]. Persson [34] developed a markerless method to track the lower extremities, while Legrand et al. [36] used a camera to analyze the contours of the body and match them to a superquadric model. Marzani et al. [37] improved the system using three cameras, articulated 3D models, and fuzzy clustering techniques [38].

The use of markerless motion analysis systems based on computer vision allows assessments to be conducted directly in the patient’s home environment. Often, the need to travel to clinical facilities represents a significant barrier to accessing therapy [39]. Real-time monitoring systems based on computer vision have already been developed and tested to support remote rehabilitation.

Balance control plays a crucial role, especially in elderly individuals, as falls can cause serious injuries. Although most fall-related injuries are mild, between 5% and 10% of cases in people over 65 are severe [40]. To identify people at risk of falling, Nalci et al. [41] used markerless computer vision systems to evaluate the ability to maintain balance while standing on one foot, both with eyes open and with eyes closed. The collected data were then compared with those obtained from a stabilometric platform, the gold standard for analyzing body sway. The experiment utilized a Dynamic Vision Sensor camera capable of detecting changes in pixel-level illumination related to movement. The results showed a high correlation, suggesting that computer vision is a valid tool for balance monitoring in rehabilitation.

Home exercises supervised by a therapist are a key component in recovering from conditions such as osteoarthritis or stroke. These exercises aim to reduce joint pain, improve mobility, and reduce therapy costs [42,43,44]. Systems that use computer vision to record exercises and provide immediate feedback can significantly improve the effectiveness of rehabilitation. Without feedback, it is difficult to personalize treatment and maintain patient motivation [45].

Performing exercises incorrectly, especially after surgery, can slow or compromise recovery [46].

One example is ArthriKin, a simple and user-friendly system developed by Dorado et al. [47], which allows remote supervision by the therapist during home exercise programs. Baptista et al. [44] developed a system dedicated to stroke patients, consisting of two connected applications: one for the therapist and one for the patient, who receives real-time visual feedback and detailed reports on exercise performance. Computer vision models have also been developed to track objects to assist patients with motor impairments in reaching and grasping objects, often with the aid of robotic devices. Rehabilitation for patients with arm or wrist injuries has also employed computer vision systems in which a webcam records a cuboid object and software tracks its position in real time within a virtual 3D space [48].

Salisbury et al. [49] demonstrated how a simple smartphone camera can be used for real-time assessment during vestibular rehabilitation, useful for patients with dizziness symptoms and elderly fall prevention. Vojta [50] monitored the movements of people with motor disabilities during therapy using real-time computer vision-based methods [51]. A system for automatic action recognition has also been proposed in upper limb rehabilitation at home. It analyzes sequences of color and depth images to distinguish between correct and incorrect rehabilitation exercises, capturing up to 125 frames per second [52].

Rammer et al. [53] developed a markerless system for analyzing manual wheelchair propulsion using two Microsoft Kinect sensors. This system focuses on upper extremity movements during propulsion and uses the open-source platform OpenSim [54] for biomechanical modeling and simulation.

Some recent contributions in the field of artificial intelligence propose advanced models based on neural architectures, such as Quantum Transfer Learning [55], pre-trained networks such as ELECTRA [56], and multi-level methodologies for automatic data annotation [57], which also show a notable potential for transferability in the rehabilitation field. Such approaches, in fact, can be adapted to improve the representation, interpretation, and personalization of clinical data, contributing to the development of intelligent systems able to dynamically support therapeutic assistance and monitoring of patient progress.

In medicine and physiotherapy, the study and analysis of human movement and rehabilitation exercises using depth cameras is very relevant and promising.

In the clinical setting, these new technological devices provide healthcare professionals with precise movement data and objective assessment tools, allowing the creation of personalized and effective treatment plans. Furthermore, this technology allows patients to receive rehabilitation support in real time remotely, overcoming geographical barriers and ensuring simplicity and continuity in therapy. In local contexts, however, such as home or territorial rehabilitation centers, technology supports autonomous training and daily monitoring, helping patients to better manage their rehabilitation. The integration of these rehabilitation resources into the medical scenario creates a comprehensive and multilevel assistance system that significantly improves both therapeutic results and the patient experience.

Technological progress, combined with its practical application in the medical field, is revolutionizing the traditional rehabilitation system, directing it toward smarter, more personalized, and more accessible solutions for all. In the case of remote rehabilitation, the main goal is to offer simple and convenient programs to follow at home, thanks to telemedicine and depth cameras that assess and guide patients’ movements even from a distance. For example, Maskeliunas and colleagues [58] developed BiomacVR, a system based on virtual reality that combines an interactive physical training environment with technologies for upper limb rehabilitation, increasing patient motivation and enabling real-time remote therapy. Lim and collaborators [59] created an imitation-based adaptive learning system, which allows patients to perform upper limb rehabilitation exercises directly from home.

Çubukçu [60] and their team designed a system using Kinect 2 sensors to observe and evaluate exercises for patients with shoulder problems. The system includes a web platform for communication between the patient and the therapist and an application that helps the patient perform the exercises correctly. Saratean [61] and colleagues proposed an approach that considers the perceived effort to better monitor home rehabilitation exercises, ensuring they are performed correctly and consistently.

Numerous studies show that, thanks to advanced techniques and intelligent algorithms, it is possible to effectively monitor and assess rehabilitation directly at home or in local settings, thus improving patient outcomes and progress. In physical therapy, studying movement helps to understand how patients move, allowing doctors to make more accurate diagnoses and develop better, more personalized treatment plans.

Wagner et al. [62] carefully analyzed people’s gait using specific biometric parameters, enabling doctors to identify gait-related issues in patients through this detailed analysis. Thanks to the use of artificial intelligence, Maskeliunas et al. [58] employed neural networks to observe skeletal movements through images, successfully studying the posture and gestures of patients with precision. Bijalwan et al. [63] applied deep learning techniques to recognize upper limb rehabilitation exercises, allowing doctors to monitor patient progress during rehabilitation. Keller et al. [64] used machine learning methods to understand motor problems in patients with back pain, offering a scientific foundation for treatments. Trinidad-Fernández et al. [65] captured, analyzed, and displayed movement data of the torso in patients with motor disorders to better understand which movements were limited. Hustinawaty et al. [66] combined calibration, skeletonization, and feature extraction techniques to monitor key movements during rehabilitation. Girase et al. [67] used semi-supervised algorithms with Kinect to estimate joint positions, identifying key factors in pathological movement to improve treatment.

Recent studies focus on the use of advanced algorithms to automatically evaluate rehabilitation exercises and recognize postures, with the goal of enhancing therapy outcomes and helping patients better follow their program by adapting exercises in a resilient way. Lim et al. [68] developed a visual feedback system for patients undergoing rehabilitation, capturing joint movement data in real time to help them perform balance exercises comfortably at home. Raza et al. [69] combined artificial intelligence algorithms with pose recognition and biophysical parameters to accurately estimate human posture. Khan et al. [70] automated the evaluation of exercises using advanced neural networks, making rehabilitation training more efficient. Wei et al. [71] estimated the patient’s center of mass using data augmentation techniques, improving training and balance. Uccheddu et al. [72] used RGB video to accurately estimate joint positions during rehabilitation exercises. Finally, Trinidad-Fernández et al. [73] created virtual skeletal representations to assess patients’ ability to perform functional movements, making home rehabilitation easier.

Computer vision employs various algorithms to process images and videos of patients performing physiotherapy and rehabilitative exercises. These algorithms, through machine learning, enable the analysis of movements, the evaluation of exercise correctness, and the personalization of the recovery process. This leads to more precise and effective treatments, with the added benefit of providing real-time feedback and adapting to each patient’s specific needs.

Traditional machine learning algorithms have been widely used in many early studies. Among them are Random Forest (RF) [74], Logistic Regression (LR) [75], Support Vector Machine (SVM) [76], and principal component analysis (PCA) [77], which are valued for their interpretability and computational efficiency.

Raza et al. [69] applied RF, LR, and LSTM to estimate human posture in structured datasets concerning lower limb movement, achieving excellent results, with RF reaching 99.8% accuracy. Keller et al. [64] combined PCA, NLPCA, and LR to analyze movement strategies in patients suffering from lower back pain [31]. In a comparative study, Girase et al. [67] evaluated several machine learning techniques using a feature set that included posture, posture derivatives, and dynamic characteristics. In this analysis, the multilayer perceptron (MLP) achieved the highest classification accuracy at 52.3%, followed by RF at 51.8% and SVM at 47.2%. The results highlighted that the expansion and diverse combination of extracted features significantly improved the performance of all analyzed algorithms, emphasizing the importance of a wide variety of features to achieve more accurate classifications.

With the rapid progress of deep learning techniques, numerous studies have adopted deep neural network models for the analysis of patients’ rehabilitation data, such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Long Short-Term Memory (LSTM) networks, which have proven effective in managing complex and repetitive data in the form of time series, typical of habitual bodily movements. Maskeliunas et al. [58] highlighted the potential of CNNs in posture and human movement analysis tasks, achieving accuracies ranging from 60.7% to 93.8%. In a study conducted by Bijalwan et al. [63], a hybrid deep learning (HDL) model was developed, combining CNNs, RNNs, and CNN-GRU architectures for recognizing upper limb rehabilitation movements. The results show that the CNN model alone achieved 98% accuracy, while the hybrid CNN-LSTM and CNN-GRU architectures achieved 99% and 100% accuracy, respectively. Wei et al. [71] proposed an innovative approach integrating CNNs and RF classifiers to estimate the center of mass (CoM) of the human body. This hybrid system showed excellent performance, with sensitivity and specificity values above 80% for four-level classification and over 90% for binary classification. These deep learning models have the advantage of automatically extracting complex features directly from raw data, significantly reducing manual work in feature selection and improving generalization capabilities, making them particularly suitable for processing high-dimensional and large-scale visual data.

Recent studies have adopted algorithm fusion strategies to combine the strengths of different methods and enhance the accuracy of pathological movement analysis. For instance, the integration of Support Vector Machine (SVM), Random Forest (RF), Multilayer Perceptron (MLP), and Cascade Convolutional Neural Network (CCNN) architectures, along with semi-supervised learning techniques and the Unscented Kalman Filter (UKF), has proven effective in identifying key factors involved in abnormal movement patterns. Girase et al. [67], through a five-fold cross-validation experiment based on clinically collected data, evaluated the performance of these algorithms. For non-temporal data, the Random Forest algorithm achieved the highest accuracy at 51.8% using a feature fusion approach that combined postural, postural-derivative, and dynamic characteristics. For temporal data, the combination of an unsupervised Convolutional Neural Network (CNN) with a Multilayer Perceptron (MLP) yielded the best performance, achieving 73.4% accuracy, surpassing both Dynamic Time Warping (DTW) at 63.0% and Residual Neural Network (ResNet) at 71.6%. In another study, Wagner et al. [62] integrated various algorithms, including Kinematic Descriptors (KDs), Convex Hull (CH), and Feature Vectors (FVs) with the Zebris FDM pressure platform to accurately estimate gait parameters. Among these, the KD method demonstrated the best performance, with an accuracy of 81.8%. These fusion-based approaches not only improve the predictive power of the models but also improve their robustness, offering promising directions to address the complex analytical challenges in physiotherapy and rehabilitation research.

3. Vision System for Therapeutic Adaptation: Analysis of Functional Requirements

Traditional approaches to neurological rehabilitation and motor recovery rely on standardized protocols, with adjustments made periodically by the therapist and without continuous monitoring of the patient’s progress. Recent technological advances enable the development of sophisticated, personalized motor recovery systems that overcome these limitations. These systems introduce an innovative model based on artificial intelligence, machine learning, and computer vision to dynamically adapt therapy by adjusting assistance levels and exercise difficulty in real time. The proposed system is divided into two main modules, each with specific functionalities:

Vision Module:
–
Acquires and analyzes the patient’s movements through RGB-D cameras and computer vision algorithms.
–
Extracts fundamental biomechanical parameters such as amplitude, precision, speed, fluidity, and symmetry of movements.
–
Converts the data into a structured format for the suggestion module.
Suggestion Module:
–
Analyzes data received from the vision module and develops personalized therapy.
–
If the patient shows improvement, the module reduces the level of assistance and increases the difficulty of the exercises.
–
If progress is limited, it suggests increasing motor support or sends an alert to the therapist for a review of the therapy.
–
All suggestions are recorded to allow for the monitoring of the system’s decisions.

The interaction between the modules follows a well-defined sequence, as shown in Figure 4.

3.1. Functional Requirements of the Vision Module

The vision module is responsible for acquiring, analyzing, and interpreting the patient’s movements during rehabilitation sessions. This module is not limited to a simple monitoring system that captures images but uses computer vision and artificial intelligence techniques to extract relevant biomechanical parameters and provide a detailed analysis of the patient’s progress.

The implementation of this module is developed as a software prototype based on ROS2, tested in simulation to verify its correct functioning and the precision in detecting movements and to guarantee an accurate analysis of the movement. The system uses RGB-D cameras with a minimum resolution of 1080p at 30 fps, allowing a smooth and detailed acquisition, which allows the detection of even the smallest variations in the amplitude and precision of movements, providing fundamental data for therapeutic evaluation.

The vision module calculates and transmits the following biomechanical parameters to the suggestion module.

To ensure efficient information transmission, the vision module uses a publisher–subscriber architecture in ROS2, where the vision module publishes the biomechanical data analyzed (Figure 5) and the suggestion module subscribes to the data and generates personalized therapeutic recommendations.

ROS2 is an open-source software framework widely used in robotics and automation systems. It provides a set of tools, libraries, and conventions for writing robot software in a modular and scalable way. ROS2 helps manage communication between sensors, actuators, processing modules, and control algorithms, making it easier to develop complex systems that integrate hardware and software. In the context of rehabilitation or systems with multiple sensors (such as cameras or Motion Capture devices), ROS can be used to synchronize and coordinate data coming from different sources in real time.

If the system detects positive progress, the suggestion module “suggests” a reduction in motor assistance and an increase in exercise difficulty. When the data show difficulty in movement, the system can suggest an increase in support or alert the therapist.

The analysis of human movement through computer vision presents some critical issues such as the occlusion of body parts; to overcome this limitation, the system uses prediction algorithms to reconstruct the movement even when some joints are temporarily not visible.

Another critical factor concerns postural variations; to improve the system’s ability to adapt to different postures, deep learning models are trained on diversified datasets.

Furthermore, advanced filters are applied to remove irrelevant information from the analysis caused by the presence of disturbing objects.

3.2. Functional Requirements of the Suggestion Module

The suggestion module is an essential component of the system, designed to analyze the patient’s progress, interpret the collected data, and generate personalized recommendations to dynamically adapt the therapy parameters, and unlike traditional methods, which require manual interaction with the therapist to change the treatment settings, this module uses machine learning algorithms and statistical analysis to provide suggestions in real time, improving the effectiveness and personalization of rehabilitation.

The suggestion module receives the data processed by the vision module through ROS2 topics, which transmit information on the patient’s biomechanical parameters. If the module detects an improvement in the patient’s movements, it suggests progressively reducing the motor assistance and increasing the difficulty level of the exercises to keep the therapy stimulating, while if the patient shows difficulty or a regression, the module can suggest increasing motor support, reducing exercise difficulty, or sending an alert to the therapist to evaluate any changes to the therapy.

To make the system more adaptive, the suggestion module will use machine learning algorithms, including the following.

–: Reinforcement Learning: The system progressively learns from the data collected during rehabilitation sessions, optimizing decision thresholds to improve the precision of recommendations.
–: Dynamic threshold analysis: The system automatically adjusts the levels of difficulty and motor assistance, avoiding sudden changes and ensuring gradual adaptation to therapy.

However, the suggestion module does not autonomously set the therapeutic parameters, but simply provides suggestions, which the therapist can evaluate; this ensures that each decision remains under human control, ensuring a balance between automation and medical supervision.

The suggestion module has been tested through simulations in the ROS2 environment, using synthetic data to verify the consistency of the recommendations generated. The results have shown that the system is capable of accurately analyzing patient biomechanical data and generates consistent suggestions, interfaces in real time with the vision module, receives data without significant latencies, and provides reliable recommendations, with an error margin of less than 5% compared to predefined parameters.

3.3. Software Architecture and Compatibility

The software architecture represents a crucial element in ensuring robustness, scalability, and interoperability within the system and is designed to be used in assisted rehabilitation, and for this reason, the software infrastructure must be optimized to ensure accurate synchronization between the different modules.

The system’s software implementation was developed in the ROS2 environment (Figure 6). ROS2 offers modularity and scalability, allowing modules to be distributed across multiple devices, reducing the computational load and allowing the addition of new features without modifying the system’s structure. Furthermore, it ensures fast and reliable data transmission between modules, reducing latency, optimizing motor feedback thanks to real-time communication, and improving security and resource management, ensuring more efficient data transmission, thanks to the support for DDS middleware.

The system architecture is organized through an ROS2 node structure (Figure 7), which allows modular and extensible management:

–: ROS2 Nodes: The vision module acquires biomechanical data, while the suggestion module processes the information received to generate personalized recommendations.
–: ROS2 Topic Biomechanical data is published on a ROS2 topic, to which the suggestion module subscribes to receive the parameters needed for analysis in real time.
–: ROS2 Services and Actions: Services manage specific requests between modules, while actions allow the coordination of long-term operations, such as the progressive adaptation of motor assistance.

To ensure reliable and scalable data transmission, the system uses ROS2 as its main infrastructure, supported by advanced protocols:

–: MQTT (Message Queuing Telemetry Transport): optimizes data transmission by reducing latency and bandwidth consumption.
–: ZeroMQ: ensures fast and reliable communication between modules, ideal for distributed architectures.
–: REST APIs: allow integration with therapeutic dashboards, offering therapists an intuitive interface to monitor patient progress.

4. Methodology: Computer Vision and Deep Learning

The system has been developed to monitor the execution of physical exercise programs, providing real-time corrections when movement anomalies or incorrect body postures are detected. The system architecture is composed of key components that involve computer vision and deep learning techniques.

Computer Vision

The vision module monitors and records body movements during the execution of physical exercises, using the tracking technology provided by Microsoft Kinect v3, which captures a three-dimensional mapping of the user’s skeletal structure frame by frame. Each posture is translated into a series of spatial coordinates representing the main joint points. The system relies on skeletal data acquired through Kinect, but these raw data must be interpreted and structured properly. For this purpose, all information regarding the position and orientation of the 20 main joints is stored in XML files, along with additional metadata such as frame number, a unique user ID, and the capture timestamp. Each exercise session generates a collection of these XML files, which are securely stored using a NoSQL database. Before exercises can be performed effectively, the system must be adapted to the user’s specific physical features and the surrounding environment, which can vary greatly. The calibration process is divided into two phases:

Initial Motion Analysis: During the initial calibration phase, the system captures a continuous sequence of frames in the first few seconds. Each image is converted to a digital skeleton and saved in an XML file. Based on this dataset, the following joint reference points are identified for each limb involved (Figure 8):
–
Starting node: the limb joint closest to the trunk of the body (such as the shoulder or hip).
–
Intermediate node: the central point of the limb (such as the elbow or knee).
–
Terminal node: the extremity of the limb (such as the hand or foot).
Each set of detections produces a series of spatial vectors, from which an average value is computed to represent a reference position for each joint. This strategy improves the reliability of the data by minimizing the impact of temporary tracking errors or minor involuntary movements.
Measurement of Limb Lengths: In this stage, the actual length of the user’s limbs is calculated using the Euclidean distance formula in 3D space. The length is determined by summing the distances between three key joint points:
–
Distance between the starting node (S) and the intermediate node (I);
–
Distance between the intermediate node (I) the and terminal node (T).

The exercises considered for the system implementation are those related to the range of motion. Each exercise focuses on a specific part of the body, either upper or lower, which is why the concept of a functional area is introduced: a space within which a particular joint is expected to move during the activity. To identify these areas, during the preparatory phase, a correctly performed exercise by a physical therapist is used as a reference model. The data obtained are then analyzed to understand which joints are involved and what spatial limits define a safe and appropriate range of motion.

The identification of relevant joints is carried out through a method inspired by space-time interest points (STIP), an approach often used in computer vision to recognize movements and scenes. The system observes the three-dimensional coordinates of the skeletal joints and selects those that show significant variations over time along one or more axes (x, y, z). These joints are labeled as significant and subjected to a more detailed analysis to define the area within which they should remain during execution. For example, during the shoulder abduction exercise, the hand trajectory exhibits wide oscillations on the horizontal (x) and vertical (y) planes, allowing the limit values for extension and flexion to be easily determined. For the depth (z) axis, although a constant position is desirable, a tolerance margin (the depth factor) is introduced, defining a range within which the hand can move without compromising the accuracy of the execution. Although movements are analyzed in three dimensions, the patient’s feedback interface is two-dimensional: an arc is shown that delineates the trajectory to be followed, bounded by the extreme points as shown in Figure 9.

This approach was chosen to make the visual feedback clearer and more accessible. Depending on the exercise, the patient will need to position themselves either facing or sideways to the camera. In the case of the shoulder abduction exercise, for example, the user must face the Kinect sensor, since the movement mainly involves the frontal plane (x, y). However, it remains important to monitor depth variations to avoid incorrect or potentially harmful postures. The correct supervision of exercises therefore depends on the precise definition of the limits within which a movement is considered physiological.

The correct performance of exercises depends not only on identifying the area in which the movement should be carried out but also on the user’s performance. The expected movement area defines the limits within which the limb must move safely without exceeding thresholds that are too high or too low. At the same time, performance evaluation serves to determine how closely the user approaches the intended goal.

To measure performance along the horizontal (

P_{x}

) and vertical (

P_{y}

) axes, the following formulas are used:

\begin{matrix} P_{x} (%) & = \frac{T_{x} - S_{x}}{L} \times 100, \\ P_{y} (%) & = \frac{T_{y} - S_{y} + L}{L} \times 100 . \end{matrix}

where

–: $\vec{S} = (S_{x}, S_{y})$ is the starting node position (e.g., shoulder or hip);
–: $\vec{T} = (T_{x}, T_{y})$ is the terminal node position (e.g., hand or foot);
–: $L = ∥ \vec{T} - \vec{S} ∥ = \sqrt{{(T_{x} - S_{x})}^{2} + {(T_{y} - S_{y})}^{2}}$ is the limb length, i.e., the Euclidean distance between the two nodes;
–: $P_{x}$ and $P_{y}$ represent the percentage of movement extension along the horizontal and vertical axes, respectively, normalized by the limb length.

The formula for

P_{x}

calculates the percentage of horizontal displacement of the terminal node relative to the total limb length.

The formula for $P_{y}$ measures the vertical displacement, adjusted by the initial vertical position of the starting node and also normalized by the limb length.

These percentage values allow assessing how closely the limb’s movement approaches the correct execution, enabling accurate feedback for real-time monitoring.

In some types of exercise, such as those that involve backward movement of the limb (such as shoulder or hip extension), the motion is evaluated based on the angle

α

(where

α

is expressed in radians) between the limb and the torso, as shown in Figure 10.

In these cases, performance

P_{α}

is calculated using the following formula:

P_{α} (%) = \frac{α}{π} \times 100

Another important aspect of the system is the repetition counter, which must be carefully managed to avoid mistakes, especially when movements are minimal or the patient does not maintain the correct posture. For each repetition, it is essential to ensure that the limb is fully extended and the performance values along x and y increase compared to the initial position.

5. Deep Learning

The methodology adopted for the system of detection and correction of rehabilitative movements is divided into several phases: data collection, joint estimation, and movement correction.

–

Data collection: The images used in the system were extracted as frames from videos belonging to the open-source dataset REHAB24-6 [78]. This dataset contains annotated rehabilitation exercise videos, providing a diverse set of movements focused on upper and lower limb motions. The dataset was built by collecting approximately 2500 images and videos of patients performing common rehabilitative exercises, mainly focused on arm and leg movements such as upper limb rotations, flexions and extensions, lifts, and abductions. The exercises were performed both correctly and incorrectly to enable the model to learn the differences between proper and improper executions. The footage was captured using a high-resolution camera in various environments to ensure good variability in lighting conditions, angles, and clothing types. All images were resized to a fixed resolution of 300 × 300 pixels. The dataset was randomly split into a training set (70%) and a test set (30%).

–

Joint estimation: The system was developed using the Python (version 3.13.4) programming language within the Jupyter Notebook (version 7.4) environment. To estimate the positions of the body’s joints, a deep learning model was used—specifically, a modified version of MoveNet Thunder–SinglePose, a pre-trained Convolutional Neural Network designed for human posture analysis. The model, provided by TensorFlow and developed in collaboration with the company Include Health, outputs the coordinates of 17 key body points (such as head, shoulders, elbows, wrists, hips, knees, and ankles) based on RGB images or real-time video streams.

–

Movement correction: The data obtained from the MoveNet model represent the estimated spatial coordinates of the body keypoints (e.g., head, shoulders, elbows, and wrists) during the execution of an exercise.

To assess the correctness of the movement, these estimated coordinates are compared with a set of reference values, which represent the ideal or expected positions of the keypoints based on correctly performed movements.

The comparison is based on two fundamental aspects: the distance and the orientation between the estimated keypoints (

Y_{actual}

) and the correct or reference keypoints (

Y_{correct}

).

The Euclidean distance between the two coordinate vectors is used to measure the overall spatial difference between the estimated and ideal positions. This distance is calculated as

C = ∥ Y_{correct} - Y_{actual} ∥ = \sqrt{\sum_{i = 1}^{n} {(y_{correct, i} - y_{actual, i})}^{2}}

where:

*: $Y_{correct} = (y_{correct, 1}, y_{correct, 2}, \dots, y_{correct, n})$ is the vector of the correct keypoint coordinates (e.g., 3D coordinates of head, shoulders, etc.);
*: $Y_{actual} = (y_{actual, 1}, y_{actual, 2}, \dots, y_{actual, n})$ is the vector of coordinates estimated by the model;
*: n is the total number of coordinates being compared (e.g., 3 for each point in 3D, multiplied by the number of keypoints);
*: C represents the overall deviation measure of the current movement compared to the correct one.

A low value of C indicates that the estimated movement is very close to the reference one and therefore likely performed correctly. Conversely, a high value indicates a significant difference, suggesting an error or deviation in execution.

Moreover, the system also considers the relative orientation of the points to ensure not only that small spatial deviations are detected but also that the posture and movement direction are consistent with the expected model.

This measure C is then used as a correction vector that guides real-time feedback to the user, indicating if and how the movement should be adjusted to match the ideal model.

To quantify the reliability of each estimated keypoint, the system computes a confidence score based on the difference between the predicted and correct keypoint coordinates. This score helps to assess how certain the model is about its estimation.

The confidence score is derived using the Mean Squared Error (MSE), which measures the average squared difference between the predicted coordinates

Y_{predicted}

and the reference coordinates

Y_{correct}

:

MSE = \frac{1}{N} \sum_{i = 1}^{N} {(Y_{correct, i} - Y_{predicted, i})}^{2}

where

*: N = total number of coordinate points considered (e.g., all keypoint coordinates);
*: $Y_{correct, i}$ = the i-th coordinate of the correct/reference keypoint;
*: $Y_{predicted, i}$ = the i-th coordinate of the predicted keypoint by the model.

The Root Mean Squared Error (RMSE) is then calculated as the square root of the MSE:

RMSE = \sqrt{MSE}

The RMSE provides a metric in units similar to the original data, making it easier to interpret the average magnitude of the estimation error.

In this system, the confidence score is derived from the RMSE value and normalized to lie between 0 and 1, where a higher score indicates higher confidence in the keypoint estimation.

A threshold of 0.8 is established: if the confidence score exceeds this value, the estimated movement is considered reliable and can be evaluated further. If the confidence score falls below this threshold, the system displays a warning to the user, indicating that the estimation is uncertain and that the feedback provided may not be accurate.

This mechanism ensures that the system only processes and provides feedback on movements when the pose estimation is sufficiently precise, enhancing the robustness and trustworthiness of the rehabilitative monitoring.

5.1. System Architecture

The system architecture is composed of multiple modules. The patient’s live video feed is processed, converted into images, normalized, and passed to the MoveNet model. The results of joint estimation are then analyzed by a custom classification layer built on top of MoveNet, which classifies each movement execution as correct or incorrect based on the estimated key points (see Figure 11).

5.2. Model Training

The model was trained on a large number of correct and incorrect examples of the exercises, using data augmentation and regularization techniques to avoid overfitting. The main hyperparameters, such as learning rate, batch size, and number of epochs, were optimized to achieve the best performance.

The model uses a kinematic representation based on 17 key points chosen to efficiently represent human movement (Figure 12). These points correspond to the body’s main joints and are essential for understanding posture and motion. The kinematic representation not only detects position but also predicts the evolution of movement over time, making the system suitable for tele-rehabilitation contexts.

To make the system easy to use, a web interface was developed using ReactJS, a JavaScript library to build dynamic user interfaces. The model was integrated into the web app via TensorFlow.js, allowing for real-time inference directly in the browser without requiring server-side computation. The platform guides the user step by step: first, the user selects the exercise to perform; then, the system requests access to the webcam. Once the exercise begins, a reference image is displayed, accompanied by detailed instructions and an explanation of the exercise’s benefits to help the patient understand its importance. The system provides immediate visual feedback, highlighting the correct and incorrect areas.

5.3. Results

As a result of the proposed system, when the user accesses the rehabilitation platform via the website, they are redirected to the homepage, where clear instructions for using the application are displayed. The user can click on the “Start” button to access the exercise selection page. From there, they can choose one of the predefined rehabilitation exercises, targeting the upper or lower limbs, such as arm rotations, shoulder flexion/extension, hip abduction, or knee lifts.

After selecting a movement, the platform presents a detailed guide with step-by-step instructions and an anatomical image illustrating the correct execution of the movement. By clicking the “Start Exercise” button, the webcam is activated and the system begins real-time body tracking. The MoveNet model, integrated into the application, detects and tracks 17 key points on the user’s body.

When the patient correctly performs the prescribed movement (for example, a shoulder rotation or leg lift), the system overlays a green skeleton on the body, indicating correct execution. If the movement deviates from the correct form (for example, due to an incorrect joint angle or limited range of motion), the skeleton turns blue. The system records the duration for which the correct posture is maintained. This feedback mechanism provides immediate assistance to optimize rehabilitation outcomes.

The proposed model achieved a training accuracy of 99.29% and a test accuracy of 99.88%, confirming high reliability. The training and testing loss values are 0.017 and 0.0066, respectively, obtained using 200 epochs and a batch size of 16. These results confirm the model’s strong generalization capabilities across diverse users and varying conditions.

TensorFlow MoveNet proves to be a lightweight and high-performance model for the real-time tracking of rehabilitation activities, particularly suited for mobile devices. Its high accuracy in key point recognition allows it to detect even small deviations caused by body shape, joint mobility, or clothing. Its ability to operate in real time provides users with immediate feedback, which is crucial for injury prevention and promoting functional recovery.

6. Vision System for Therapeutic Adaptation: Definition of Integration Strategies

The integration of a complex system involving both hardware and software modules is critical to ensuring stability and adaptability. In assisted rehabilitation, all components must operate in sync to minimize transmission delays and avoid data–response mismatches.

The integration of the system was developed as a software prototype, tested through simulations in the ROS2 environment, to verify the correct functioning and compatibility between the modules.

This strategy allowed for the establishment of a coherent and reliable information flow, improving the adaptability of therapeutic treatment to the specific needs of the patient.

The entire system was developed on ROS2 (Robot Operating System 2), a distributed middleware that enables the organization of interaction between the components in a modular and dynamic way, ensuring flexibility; expandability over time; ease of development and testing, as each module can be updated without affecting the functioning of the other components; simplification of maintenance with updates that can be applied without having to modify the entire system; and optimization of the computational load through modules that perform a specific task, avoiding overloads on a single node.

6.1. Integration Testing and Verification

The integration testing and verification phase aimed to ensure the correct functioning of the overall system, ensuring software and hardware compatibility and efficiency. The main objectives of the tests included the verification of computational performance, to ensure that the system can process large volumes of data without delays, perform synchronization between modules, avoid latencies in the data flow and resistance to failures, and test the system’s ability to autonomously recover from simulated errors.

Before testing the complete system, each module was individually validated through unit tests that allowed one to evaluate each module’s functionality, robustness, and interoperability with the rest of the system. The tests on the individual modules included the accuracy of motion detection, the adaptability to lighting variations, the accuracy of biomechanical parameter extraction, and the reliability of the generated recommendations. The system was also tested for its ability to adapt to the simulated progress of the patient, the safety of the recommendations to avoid dangerous conditions, the precision of motor assistance, and the system response in the case of failures.

After the individual modules were validated, integration tests were performed to verify that the modules interacted correctly with each other. These tests included the validation of communication between modules, data synchronization, motor response verification, and latency testing to ensure that the biomechanical parameters processed by the vision module were received without delay by the recommendation module. To optimize data transmission, tools such as ZeroMQ and MQTT were considered, as well as multithreading and parallel computing techniques to improve efficiency.

The system was also subjected to stress tests to simulate adverse operating conditions, such as high request loads, and scalability tests to assess the ability to add new modules without compromising the stability of the system. Hardware and software failures were also simulated to verify the system’s resilience and its ability to autonomously recover from failures.

Finally, continuous monitoring tools, such as advanced logging and predictive maintenance based on machine learning algorithms, were implemented to detect anomalies and prevent future failures.

6.2. Test Results and System Validation

After the integration testing and verification phase, the results obtained were analyzed to confirm the effectiveness and reliability of the system, with the aim of ensuring that each module operates as expected and that communication between the components is stable. The validation of the tests confirmed that the system can operate in a simulated context with high performance and precision. Each module was subjected to targeted tests, evaluating parameters such as precision in motion detection, adaptability to environmental variations, error reduction (less than 5%), and transparency for human supervision. Furthermore, it was verified that the system responds with maximum latencies of 50 ms and that the motor assistance adapts in real time to the patient’s progress.

The integration tests confirmed that the modules interact correctly, with a communication latency between the modules of less than 10 ms and error management capable of restoring correct operation in 3 s. The system also demonstrated its ability to handle high computational loads, with an overall response time of less than 120 ms, optimized hardware resource utilization (CPU load ≤ 65% and memory ≤ 4 GB), and fast recovery mechanisms in the event of hardware or software failures.

In general, the system met all initial requirements, demonstrating high accuracy in motion detection, optimal response times, and resilience to failures.

6.3. Security, Regulatory Compliance, and Data Protection Measures

The system, as it handles sensitive patient data, has been designed in accordance with European regulations on privacy and artificial intelligence, including GDPR [79] and the AI Act [80]. To ensure data security, the following measures have been implemented:

–: TLS encryption: protects communication between modules, preventing unauthorized access to transmitted data.
–: Multi-level authentication: controls system access, ensuring that only authorized users can modify therapy parameters.
–: Advanced monitoring and logging: implements fail-safe mechanisms to detect anomalies and activate automatic recovery procedures.

7. Case Study: Rehabilitation of the Rotator Cuff in a Volleyball Player

In the context of sports rehabilitation, one of the most common problems that volleyball players have to deal with is the rotator cuff injury, an injury that can occur after repetitive stresses and sudden movements during game actions, such as spikes and serves. In this case study, the volleyball player suffered a partial rupture of the rotator cuff tendon, compromising the mobility and functionality of the right shoulder.

To monitor and support the rehabilitation process, an advanced artificial vision system was used that analyzed shoulder movement in real time. The system allowed for the detection and quantification of the range of motion, the symmetry between the sides of the body, as well as the speed and fluidity of the movements, identifying any stiffness or motor difficulties, as shown in Table 2. In response to the results obtained, the system suggested personalized therapeutic exercises.

7.1. Main Functions of the Rehabilitation System

The rehabilitation system monitors and analyzes the patient’s movements through advanced computer vision techniques, detecting any postural anomalies and difficulties in movements, such as stiffness or compensations, to identify signs of injury or motor problems. Based on this analysis, the system provides personalized therapeutic suggestions, adapting therapies to the specific needs of the patient to optimize the rehabilitation process and improve recovery.

7.1.1. Movement Detection and Analysis

The system acquires video and analyzes the patient’s movements using computer vision techniques (Figure 13), such as MediaPipe and PoseNet, to track the main joints and evaluate posture and range of motion.

7.1.2. Identification of Problems

The system analyzes the symmetry of the movement, the speed, and the acceleration to detect any anomalies, such as postural asymmetries or motor difficulties, which could indicate injuries or physical problems (e.g., stiffness or compensations in movements).

In this case, as shown in Figure 14, the injury caused difficulty lifting the arm and marked postural asymmetry, with a significant reduction in the range of motion of the injured shoulder.

Another detected problem is the inclination of the trunk as a compensation to try to raise the arm, highlighting stiffness and difficulty in recovering natural movement.

7.1.3. Suggestion of Personalized Therapies

Based on the results of the analysis, the system provides personalized therapeutic suggestions, such as rehabilitation, mobilization, or strengthening exercises, to correct the detected problems and optimize the patient’s recovery. In response to the results obtained, the system suggested personalized therapeutic exercises, such as passive and isometric mobilization exercises, followed by activities that aim to strengthen the rotator cuff and restore scapular stability, as shown in Figure 15.

8. Future Developments

The currently proposed assisted rehabilitation system represents an experimental prototype, designed to simulate the interaction between the vision and suggestion modules. Several development directions have been outlined aimed at enhancing performance, adaptability, and clinical efficacy. Among the main future evolutions are

Continuous real-time machine learning and adaptive personalization;
Integration with virtual reality (VR) and augmented reality (AR) technologies;
Multi-user support and application in clinical and healthcare settings;
Expansion of hardware compatibility and integration with biometric devices;
Improvement of interoperability with existing healthcare systems.

Currently, the suggestion module is based on machine learning models trained on predefined datasets. A key evolution will consist of the implementation of continuous learning techniques, which allow the dynamic updating of predictive models in real time, favoring a greater personalization of therapy. By automatically identifying individual recovery patterns, the system will be able to adapt exercises and levels of assistance based on the patient’s progress. In this context, the use of advanced reinforcement learning models will be crucial to automatically calibrate the difficulty of the exercises. Furthermore, the adoption of an architecture based on federated learning will allow us to refine the predictive models by exploiting anonymous data from multiple users, while ensuring compliance with current privacy regulations (e.g., GDPR).

The integration of immersive technologies, such as virtual and augmented reality, represents a further step forward in the evolution of the system. VR will allow patients to perform exercises within highly motivating and interactive virtual rehabilitation environments. At the same time, AR will provide real-time visual feedback, superimposing graphic indicators on the patient’s movements to facilitate postural correction. The addition of haptic interfaces, capable of returning tactile stimuli, will further enhance the perception and execution of correct movements, increasing the effectiveness of the therapy and patient engagement.

From an architectural point of view, the system is currently designed to manage only one patient at a time. Future iterations will need to include optimization for multi-user support, with the possibility of monitoring parallel rehabilitation sessions in complex clinical environments. The introduction of a multi-device interface (accessible from a tablet or computer), together with the integration with telemedicine systems, will allow therapists to remotely manage multiple patients in real time, significantly expanding the application potential of the system.

A strategic development concerns the adoption of advanced biometric sensors, which can provide a more complete assessment of the patient’s physiological state. These include electromyographic (EMG) sensors to monitor muscle activation, devices for the detection of cardio-respiratory parameters, and intelligent wearables (such as exoskeletons or smart bands) for high-precision postural and biomechanical analysis. The integration of these tools will further improve the customization and precision of therapy. Another evolutionary perspective is represented by the integration of the system with digital clinical platforms, such as electronic health records (EHRs), patient management software, and decision support systems based on artificial intelligence. The development of standardized APIs will facilitate interoperability with IT tools in use at healthcare facilities. Furthermore, the adoption of predictive algorithms based on big data will allow the optimization of treatment planning and recovery time prediction. Automation in the generation of clinical reports will help streamline the work of therapists while increasing the quality of care. The modular architecture of the system facilitates the progressive integration of new features without compromising its stability. Emerging technologies, such as continuous learning, virtual reality, biometric sensors, and telemedicine, will be able to transform this prototype into an advanced and scalable solution for assisted rehabilitation. The tests carried out in the current version have already highlighted the solidity of the adopted approach, providing a promising basis for the future use of the system in real clinical contexts. Future developments could offer an even greater level of adaptability, contributing to the innovation of rehabilitation therapies based on this technology.

9. Conclusions

The work presented in this study is still in the development and implementation phase. However, the preliminary results and proposed architecture show great potential to bring significant innovations in the field of motor rehabilitation. Thanks to the integration of advanced technologies such as resilient artificial intelligence, computer vision, and deep learning models, the system will be able to support personalized therapies, real-time monitoring, and the dynamic adaptation of rehabilitation protocols in the future. Further development and experimentation will be essential to validate the system’s effectiveness and expand its applications, opening new perspectives for digital physiotherapy and improving patients’ quality of life. The presented assisted rehabilitation system represents an important step forward in the field of health technologies, combining the use of computer vision, Resilient AI and motor assistance with vision modules to optimize patient rehabilitation.

The tests carried out on the software prototype have confirmed the validity of the approach, suggesting that the system can significantly improve the effectiveness and personalization of rehabilitation therapies.

The future potential is vast, including the integration of continuous machine learning to enable the adaptation of real-time therapy to the patient’s needs, as well as the use of virtual and augmented reality to enrich the rehabilitation experience.

The introduction of advanced biometric technologies, such as EMG sensors and devices for cardiac and respiratory monitoring, would represent a further step toward a more complete assessment of the patient’s physiological conditions, improving the accuracy of diagnoses and the effectiveness of therapies.

The possibility of monitoring multiple patients simultaneously and integrating the system with digital clinical platforms such as electronic medical records and patient management software would open up application scenarios in the clinical field, transforming the system into a scalable and easily accessible tool for healthcare professionals.

Finally, the adoption of modular architectures and the use of technologies such as federated learning could allow for the constant updating of the system, without compromising data privacy and without requiring manual interventions [81].

Therefore, future developments in the system will therefore contribute to a continuous evolution of rehabilitation therapies, bringing long-term benefits in improving the quality of life of patients and in the efficient management of healthcare resources.

Author Contributions

Conceptualization, E.C.; Methodology, E.C.; Software, E.C.; Validation, E.C. and A.M.; Formal analysis, E.C.; Data curation, E.C., C.C. and M.F.; Writing—original draft, E.C.; Writing—review & editing, E.C.; Visualization, E.C.; Supervision, E.C.; Project administration, E.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are openly available in the Zenodo repository at https://doi.org/10.5281/zenodo.13305826.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bejnordi, B.E.; Veta, M.; van Diest, P.J.; van Ginneken, B.; Karssemeijer, N.; Litjens, G.; van der Laak, J.A.W.M.; The CAMELYON16 Consortium; Hermsen, M.; Manson, Q.F.; et al. Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women with Breast Cancer. JAMA 2017, 318, 2199–2210. [Google Scholar] [CrossRef] [PubMed]
Esteva, A.; Kuprel, B.; Novoa, R.A.; Ko, J.; Swetter, S.M.; Blau, H.M.; Thrun, S. Dermatologist-Level Classification of Skin Cancer with Deep Neural Networks. Nature 2017, 542, 115–118. [Google Scholar] [CrossRef] [PubMed]
Silvestri, S.; Gargiulo, F.; Ciampi, M. Iterative annotation of biomedical ner corpora with deep neural networks and knowledge bases. Appl. Sci. 2022, 12, 5775. [Google Scholar] [CrossRef]
Silvestri, S.; Tricomi, G.; Bassolillo, S.R.; De Benedictis, R.; Ciampi, M. An Urban Intelligence Architecture for Heterogeneous Data and Application Integration, Deployment and Orchestration. Sensors 2024, 24, 2376. [Google Scholar] [CrossRef]
TSUA. Transforming Our World: The 2030 Agenda for Sustainable Development. September 2015. Available online: https://sdgs.un.org/2030agenda (accessed on 28 February 2025).
Ramson, S.J.; Vishnu, S.; Shanmugam, M. Applications of Internet of Things (IoT)—An overview. In Proceedings of the 2020 5th International Conference on Devices, Circuits and Systems (ICDCS), Coimbatore, India, 5–6 March 2020; pp. 92–95. [Google Scholar]
Francesco, G.; Stefano, S.; Mario, C. A big data architecture for knowledge discovery. In Proceedings of the 2017 IEEE Symposium on Computers and Communications (ISCC), Heraklion, Greece, 3–6 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 82–87. [Google Scholar]
AlShorman, O.; AlShorman, B.; Alkhassaweneh, M.; Alkahtani, F. Review of internet of medical things (IoMT)–based remote health monitoring through wearable sensors: A case study for diabetic patients. Indonesian J. Elect. Eng. Comput. Sci. 2020, 20, 414–422. [Google Scholar] [CrossRef]
Silvestri, S.; Islam, S.; Amelin, D.; Weiler, G.; Papastergiou, S.; Ciampi, M. Cyber threat assessment and management for securing healthcare ecosystems using natural language processing. Int. J. Inf. Secur. 2024, 23, 31–50. [Google Scholar] [CrossRef]
Firesmith, D. System Resilience: What Exactly Is It? Available online: https://insights.sei.cmu.edu/blog/system-resilience-what-exactly-is-it/ (accessed on 18 December 2024).
Avizienis, A.; Laprie, J.-C.; Randell, B.; Landwehr, C. Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. Depend. Sec. Comput. 2004, 1, 11–33. [Google Scholar] [CrossRef]
Taimoor, N.; Rehman, S. Reliable and Resilient AI and IoT-Based Personalised Healthcare Services: A Survey. IEEE Access 2021, 10, 535–563. [Google Scholar] [CrossRef]
Mathis, A.; Mamidanna, P.; Cury, K.M.; Abe, T.; Murthy, V.N.; Mathis, M.W.; Bethge, M. DeepLabCut: Markerless pose estimation of user-defined body parts with deep learning. Nat. Neurosci. 2018, 21, 1281. [Google Scholar] [CrossRef]
Kanko, R.M.; Laende, E.K.; Strutzenberger, G.; Brown, M.; Selbie, W.S.; DePaul, V.; Scott, S.H.; Deluzio, K.J. Assessment of spatiotemporal gait parameters using a deep learning algorithm-based markerless Motion Capture system. J. Biomech. 2021, 122, 110414. [Google Scholar] [CrossRef]
Drazan, J.F.; Phillips, W.T.; Seethapathi, N.; Hullfish, T.J.; Baxter, J.R. Moving outside the lab: Markerless Motion Capture accurately quantifies sagittal plane kinematics during the vertical jump. J. Biomech. 2021, 125, 110547. [Google Scholar] [CrossRef] [PubMed]
Mündermann, L.; Corazza, S.; Andriacchi, T.P. The evolution of methods for the capture of human movement leading to markerless Motion Capture for biomechanical applications. J. NeuroEng. Rehabil. 2006, 3, 6. [Google Scholar] [CrossRef] [PubMed]
Corazza, S.; Mündermann, L.; Gambaretto, E.; Andriacchi, T.P. Markerless Motion Capture through Visual Hull, Articulated ICP and Subject Specific Model Generation. Int. J. Comput. Vis. 2010, 87, 156–169. [Google Scholar] [CrossRef]
Andriacchi, T.P.; Alexander, E.J. Studies of human locomotion: Past, present and future. J. Biomech. 2000, 33, 1217–1224. [Google Scholar] [CrossRef] [PubMed]
Harris, G.F.; Smith, P.A. Human Motion Analysis: Current Applications and Future Directions; IEEE Press: New York, NY, USA, 1996. [Google Scholar]
Mündermann, A.; Dyrby, C.O.; Hurwitz, D.E.; Sharma, L.; Andriacchi, T.P. Potential strategies to reduce medial compartment loading in patients with knee OA of varying severity: Reduced walking speed. Arthritis Rheum. 2004, 50, 1172–1178. [Google Scholar] [CrossRef]
Simon, R.S. Quantification of human motion: Gait analysis benefits and limitations to its application to clinical problems. J. Biomech. 2004, 37, 1869–1880. [Google Scholar] [CrossRef]
Wang, L.; Hu, W.; Tan, T. Recent Developments in Human Motion Analysis. Pattern Recognit. 2003, 36, 585–601. [Google Scholar] [CrossRef]
Moeslund, G.; Granum, E. A survey of computer vision-based human Motion Capture. Comput. Vis. Image Underst. 2001, 81, 231–268. [Google Scholar] [CrossRef]
Kanade, T.; Collins, R.; Lipton, A.; Burt, P.; Wixson, L. Advances in cooperative multi-sensor video surveillance. Darpa Image Underst. Workshop 1998, 1, 2. [Google Scholar]
Lee, H.J.; Chen, Z. Determination of 3D human body posture from a single view. Comp Vision Graph. Image Process 1985, 30, 148–168. [Google Scholar] [CrossRef]
Hogg, D. Model-based vision: A program to see a walking person. Image Vis. Comput. 1983, 1, 5–20. [Google Scholar] [CrossRef]
Gavrila, D.; Davis, L. 3-D model-based tracking of humans in action: A multi-view approach. In Proceedings of the Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 18–20 June 1996. [Google Scholar]
Yamamoto, M.; Koshikawa, K. Human motion analysis based on a robot arm model. In Proceedings of the Computer Vision and Pattern Recognition, Maui, HI, USA, 3–6 June 1991. [Google Scholar]
Darrel, T.; Maes, P.; Blumberg, B.; Pentl, A.P. A novel environment for situated vision and behavior. In Proceedings of the Workshop for Visual Behaviors at CVPR, Seattle, WA, USA, 21–23 June 1994. [Google Scholar]
Bharatkumar, A.G.; Daigle, K.E.; Pandy, M.G.; Cai, Q.; Aggarwal, J.K. Lower limb kinematics of human walking with transformation. In Proceedings of the Workshop on Motion of Non-Rigid and Articulated Objects, Austin, TX, USA, 11–12 November 1994. [Google Scholar]
Cedras, C.; Shah, M. Motion-based recognition: A survey. Image And Vis. Comput. 1995, 13, 129–155. [Google Scholar] [CrossRef]
Zakotnik, J.; Matheson, T.; Dürr, V. A posture optimization algorithm for model-based Motion Capture of movement sequences. J. Neurosci. Methods 2004, 135, 43–54. [Google Scholar] [CrossRef] [PubMed]
Lanshammar, H.; Persson, T.; Medved, V. Comparison between a marker-based and a marker-free method to estimate centre of rotation using video image analysis. In Proceedings of the Second World Congress of Biomechanics, Amsterdam, The Netherlands, 10–15 July 1994. [Google Scholar]
Persson, T. A marker-free method for tracking human lower limb segments based on model matching. Int. J. Biomed. Comput. 1996, 41, 87–97. [Google Scholar] [CrossRef]
Pinzke, S.; Kopp, L. Marker-less systems for tracking working postures—Results from two experiments. Appl. Ergon. 2001, 32, 461–471. [Google Scholar] [CrossRef]
Legrand, L.; Marzani, F.; Dusserre, L. A marker-free system for the analysis of movement disabilities. Medinfo 1998, 9, 1066–1070. [Google Scholar]
Marzani, F.; Calais, E.; Legr, L. A 3-D marker-free system for the analysis of movement disabilities—An application to the legs. IEEE Trans. Inf. Technol. Biomed. 2001, 5, 18–26. [Google Scholar] [CrossRef]
Gargiulo, F.; Silvestri, S.; Ciampi, M. A clustering based methodology to support the translation of medical specifications to software models. Appl. Soft Comput. 2018, 71, 199–212. [Google Scholar] [CrossRef]
Ar, I.; Akgul, Y.S. A computerized recognition system for the home-based physiotherapy exercises using an RGBD camera. IEEE Trans. Neural Syst. Rehabil. Eng. 2014, 22, 1160–1171. [Google Scholar] [CrossRef]
Tinetti, M.E.; Speechley, M.; Ginter, S.F. Risk factors for falls among elderly persons living in the community. N. Engl. J. Med. 1988, 319, 1701–1707. [Google Scholar] [CrossRef]
Nalci, A.; Khodamoradi, A.; Balkan, O.; Nahab, F.; Garudadri, H. A computer vision based candidate for functional balance test. In Proceedings of the 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Milan, Italy, 25–29 August 2015; pp. 3504–3508. [Google Scholar]
van Baar, M.E.; Dekker, J.; Oostendorp, R.A.B.; Bijl, D.; Voorn, T.B.; Bijlsma, J.W.J. Effectiveness of exercise in patients with osteoarthritis of hip or knee: Nine months’ follow up. Ann. Rheum. Dis. 2001, 60, 1123–1130. [Google Scholar] [CrossRef] [PubMed]
Stroppa, F.; Stroppa, M.S.; Marcheschi, S.; Loconsole, C.; Sotgiu, E.; Solazzi, M.; Buongiorno, D.; Frisoli, A. Real-time 3D tracker in robot-based neurorehabilitation. In Computer Vision and Pattern Recognition, Computer Vision for Assistive Healthcare; Leo, M., Farinella, G.M., Eds.; Academic Press: Cambridge, MA, USA, 2018; pp. 75–104. [Google Scholar]
Baptista, R.; Ghorbel, E.; Shabayek, A.E.R.; Moissenet, F.; Aouada, D.; Douchet, A.; André, M.; Pager, J.; Bouill, S. Home self-training: Visual feedback for assisting physical activity for stroke survivors. Comput. Methods Programs Biomed. 2019, 176, 111–120. [Google Scholar] [CrossRef] [PubMed]
Wijekoon, A. Reasoning with multi-modal sensor streams for m-health applications. In Proceedings of the Workshop Proceedings for the 26th International Conference on Case-Based Reasoning (ICCBR 2018), Stockholm, Sweden, 9–12 July 2018; pp. 234–238. [Google Scholar]
Rybarczyk, Y.; Medina, J.L.P.; Leconte, L.; Jimenes, K.; González, M.; Esparza, D. Implementation and assessment of an intelligent motor tele-rehabilitation platform. Electronics 2019, 8, 58. [Google Scholar] [CrossRef]
Dorado, J.; del Toro, X.; Santofimia, M.J.; Parreno, A.; Cantarero, R.; Rubio, A.; Lopez, J.C. A computer-vision-based system for at-home rheumatoid arthritis rehabilitation. Int. J. Distrib. Sens. Netw. 2019, 15, 15501477198. [Google Scholar] [CrossRef]
Peer, P.; Jaklic, A.; Sajn, L. A computer vision-based system for a rehabilitation of a human hand. Period Biol. 2013, 115, 535–544. [Google Scholar]
Salisbury, J.P.; Liu, R.; Minahan, L.M.; Shin, H.Y.; Karnati, S.V.P.; Duffy, S.E. Patient engagement platform for remote monitoring of vestibular rehabilitation with applications in concussion management and elderly fall prevention. In Proceedings of the 2018 IEEE International Conference on Healthcare Informatics (ICHI), New York, NY, USA, 4–7 June 2018; pp. 422–423. [Google Scholar]
Internationale Vojta Gesellschaft e.V. Vojta Therapy. 2020. Available online: https://www.vojta.com/en/the-vojta-principle/vojta-therapy (accessed on 23 November 2020).
Khan, M.H.; Helsper, J.; Farid, M.S.; Grzegorzek, M. A computer vision-based system for monitoring Vojta therapy. Int. J. Med. Inform. 2018, 113, 85–95. [Google Scholar] [CrossRef]
Chen, Y.L.; Liu, C.H.; Yu, C.W.; Lee, P.; Kuo, Y.W. An upper extremity rehabilitation system using efficient vision-based action identification techniques. Appl Sci. 2018, 8, 1161. [Google Scholar] [CrossRef]
Rammer, J.; Slavens, B.; Krzak, J.; Winters, J.; Riedel, S.; Harris, G. Assessment of a marker-less motion analysis system for manual wheelchair application. J. Neuroeng. Rehabil. 2018, 15, 96. [Google Scholar] [CrossRef]
OpenSim. Available online: https://simtk.org/projects/opensim (accessed on 24 November 2020).
Buonaiuto, G.; Guarasci, R.; Minutolo, R.; De Pietro, G.; Esposito, M. Quantum Transfer Learning for Acceptability Judgements. Quantum Mach. Intell. 2024, 6, 13. [Google Scholar] [CrossRef]
Gargiulo, F.; Minutolo, A.; Guarasci, R.; Damiano, E.; De Pietro, G.; Fujita, H.; Esposito, M. An ELECTRA-Based Model for Neural Coreference Resolution. IEEE Access 2022, 10, 75144–75157. [Google Scholar] [CrossRef]
Minutolo, A.; Guarasci, R.; Damiano, E.; De Pietro, G.; Fujita, H.; Esposito, M. A multi-level methodology for the automated translation of a coreference resolution dataset: An application to the Italian language. Neural Comput. Appl. 2022, 34, 22493–22518. [Google Scholar] [CrossRef]
Maskeliūnas, R.; Damaševičius, R.; Blažauskas, T.; Canbulut, C.; Adomavičienė, A.; Griškevičius, J. BiomacVR: A Virtual Reality-Based System for Precise Human Posture and Motion Analysis in Rehabilitation Exercises Using Depth Sensors. Electronics 2023, 12, 339. [Google Scholar] [CrossRef]
Lim, J.H.; He, K.; Yi, Z.; Hou, C.; Zhang, C.; Sui, Y.; Li, L. Adaptive Learning Based Upper-Limb Rehabilitation Training System with Collaborative Robot. In Proceedings of the 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Sydney, Australia, 24–27 July 2023; pp. 1–5. [Google Scholar]
Çubukçu, B.; Yüzgeç, U.; Zileli, A.; Zılelı, R. Kinect-Based Integrated Physiotherapy Mentor Application for Shoulder Damage. Future Gener. Comput. Syst. 2021, 122, 105–116. [Google Scholar]
Saratean, T.; Antal, M.; Pop, C.; Cioara, T.; Anghel, I.; Salomie, I. A Physiotheraphy Coaching System Based on Kinect Sensor. In Proceedings of the 2020 IEEE 16th International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, 3–5 September 2020; pp. 535–540. [Google Scholar]
Wagner, J.; Szymański, M.; Błażkiewicz, M.; Kaczmarczyk, K. Methods for Spatiotemporal Analysis of Human Gait Based on ˙ Data from Depth Sensors. Sensors 2023, 23, 1218. [Google Scholar] [CrossRef]
Bijalwan, V.; Semwal, V.B.; Singh, G.; Mandal, T.K. HDL-PSR: Modelling Spatio-Temporal Features Using Hybrid Deep Learning Approach for Post-Stroke Rehabilitation. Neural Process. Lett. 2023, 55, 279–298. [Google Scholar] [CrossRef]
Keller, A.V.; Torres-Espin, A.; Peterson, T.A.; Booker, J.; O’Neill, C.; Lotz, J.C.; Bailey, J.F.; Ferguson, A.R.; Matthew, R.P. Unsupervised Machine Learning on Motion Capture Data Uncovers Movement Strategies in Low Back Pain. Front. Bioeng. Biotechnol. 2022, 10, 868684. [Google Scholar] [CrossRef]
Trinidad-Fernández, M.; Cuesta-Vargas, A.; Vaes, P.; Beckwée, D.; Moreno, F.Á.; González-Jiménez, J.; Fernández-Nebro, A.; Manrique-Arija, S.; Ureña-Garnica, I.; González-Sánchez, M. Human Motion Capture for Movement Limitation Analysis Using an RGB-D Camera in Spondyloarthritis: A Validation Study. Med. Biol. Eng. Comput. 2021, 59, 2127–2137. [Google Scholar] [CrossRef]
Hustinawaty, H.; Rumambi, T.; Hermita, M. Skeletonization of the Straight Leg Raise Movement Using the Kinect SDK. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 0120683. [Google Scholar]
Girase, H.; Nyayapati, P.; Booker, J.; Lotz, J.C.; Bailey, J.F.; Matthew, R.P. Automated Assessment and Classification of Spine, Hip, and Knee Pathologies from Sit-to-Stand Movements Collected in Clinical Practice. J. Biomech. 2021, 128, 110786. [Google Scholar] [CrossRef]
Lim, D.; Pei, W.; Lee, J.W.; Musselman, K.E.; Masani, K. Feasibility of Using a Depth Camera or Pressure Mat for Visual Feedback Balance Training with Functional Electrical Stimulation. Biomed. Eng. Online 2024, 23, 19. [Google Scholar] [CrossRef]
Raza, A.; Qadri, A.M.; Akhtar, I.; Samee, N.A.; Alabdulhafith, M. LogRF: An Approach to Human Pose Estimation Using Skeleton Landmarks for Physiotherapy Fitness Exercise Correction. IEEE Access 2023, 11, 107930–107939. [Google Scholar] [CrossRef]
Khan, M.A.A.H.; Murikipudi, M.; Azmee, A.A. Post-Stroke Exercise Assessment Using Hybrid Quantum Neural Network. In Proceedings of the 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC), Torino, Italy, 26–30 June 2023; pp. 539–548. [Google Scholar]
Wei, W.; Mcelroy, C.; Dey, S. Using Sensors and Deep Learning to Enable On-Demand Balance Evaluation for Effective Physical Therapy. IEEE Access 2020, 8, 99889–99899. [Google Scholar] [CrossRef]
Uccheddu, F.; Governi, L.; Furferi, R.; Carfagni, M. Home Physiotherapy Rehabilitation Based on RGB-D Sensors: A Hybrid Approach to the Joints Angular Range of Motion Estimation. Int. J. Interact. Des. Manuf. (IJIDeM) 2021, 15, 99–102. [Google Scholar] [CrossRef]
Trinidad-Fernández, M.; Beckwée, D.; Cuesta-Vargas, A.; González-Sánchez, M.; Moreno, F.A.; González-Jiménez, J.; Joos, E.; Vaes, P. Validation, Reliability, and Responsiveness Outcomes of Kinematic Assessment with an RGB-D Camera to Analyze Movement in Subacute and Chronic Low Back Pain. Sensors 2020, 20, 689. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Cox, D.R. The regression analysis of binary sequences. J. R. Stat. Soc. Ser. Stat. Methodol. 1958, 20, 215–232. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Wold, S.; Esbensen, K.; Geladi, P. Principal component analysis. Chemom. Intell. Lab. Syst. 1987, 2, 37–52. [Google Scholar] [CrossRef]
Černek, A.; Sedmidubsky, J.; Budikova, P. REHAB24-, Physical Therapy Dataset for Analyzing Pose Estimation Methods. In Proceedings of the 17th International Conference on Similarity Search and Applications (SISAP), Providence, RI, USA, 4–6 November 2024; Springer: Berlin/Heidelberg, Germany, 2024; p. 14. [Google Scholar]
Voigt, P.; von dem Bussche, A. The EU General Data Protection Regulation (GDPR): A Practical Guide; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
European Commission. Proposal for a Regulation Laying Down Harmonized Rules on Artificial Intelligence (Artificial Intelligence Act). Available online: https://digital-strategy.ec.europa.eu/en/library/proposal-regulation-laying-down-harmonised-rules-artificial-intelligence (accessed on 24 November 2020).
Ciampi, M.; Sicuranza, M.; Silvestri, S. A Privacy-Preserving and Standard-Based Architecture for Secondary Use of Clinical Data. Information 2022, 13, 87. [Google Scholar] [CrossRef]

Figure 1. Resilience.

Figure 2. Classification of Motion Capture systems.

Figure 3. Example of silhouette extraction.

Figure 4. Flowchart of the biomechanical data processing and therapy adaptation system.

Figure 5. Publisher/subscriber communication flow.

Figure 6. ROS2 architecture.

Figure 7. ROS2 node structure.

Figure 8. Stylized human skeleton with joint points labeled as S (starting), I (intermediate), and T (terminal) on each limb.

Figure 9. Trajectory to be followed by the patient, represented as an arc.

Figure 10. Performance calculation

P_{α}

.

Figure 10. Performance calculation

P_{α}

.

Figure 11. System workflow.

Figure 12. Kinematic representation of the MoveNet architecture using 17 keypoints.

Figure 13. Movement detection. (a) Side view of the frontal arm lift to evaluate lifting difficulty and posture. (b) Frontal view of lateral arm elevation to analyze symmetry.

Figure 14. Feedback.

Figure 15. Therapeutic suggestions.

Table 1. Motion Capture systems.

Optical Systems	Non-Optical Systems
Marker-based	Electro-mechanical
Marker-less	Electro-magnetic
	Inertial

Table 2. Analysis components and descriptions of the rehabilitation system.

Component	Description
Symmetry of Movement	Identifies asymmetry, such as the right shoulder being lower than the left.
Range of Motion	Assesses the ability of the right arm to be raised compared to the left, indicating limitations.
Speed and Acceleration	Detects compensatory movements like leaning on the back and stiffness, which may indicate difficulty in movement.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cirillo, E.; Conte, C.; Moccardi, A.; Fonisto, M. Resilient AI in Therapeutic Rehabilitation: The Integration of Computer Vision and Deep Learning for Dynamic Therapy Adaptation. Appl. Sci. 2025, 15, 6800. https://doi.org/10.3390/app15126800

AMA Style

Cirillo E, Conte C, Moccardi A, Fonisto M. Resilient AI in Therapeutic Rehabilitation: The Integration of Computer Vision and Deep Learning for Dynamic Therapy Adaptation. Applied Sciences. 2025; 15(12):6800. https://doi.org/10.3390/app15126800

Chicago/Turabian Style

Cirillo, Egidia, Claudia Conte, Alberto Moccardi, and Mattia Fonisto. 2025. "Resilient AI in Therapeutic Rehabilitation: The Integration of Computer Vision and Deep Learning for Dynamic Therapy Adaptation" Applied Sciences 15, no. 12: 6800. https://doi.org/10.3390/app15126800

APA Style

Cirillo, E., Conte, C., Moccardi, A., & Fonisto, M. (2025). Resilient AI in Therapeutic Rehabilitation: The Integration of Computer Vision and Deep Learning for Dynamic Therapy Adaptation. Applied Sciences, 15(12), 6800. https://doi.org/10.3390/app15126800

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Resilient AI in Therapeutic Rehabilitation: The Integration of Computer Vision and Deep Learning for Dynamic Therapy Adaptation

Abstract

1. Introduction

Healthcare 5.0

2. Motion Capture System

2.1. Markerless Systems

2.1.1. Subject Volume Generation: The Visual Hull

2.1.2. Model Definition, Generation, and Matching

2.2. State of the Art in Markerless Motion Capture Systems and Computer Vision for Rehabilitation

3. Vision System for Therapeutic Adaptation: Analysis of Functional Requirements

3.1. Functional Requirements of the Vision Module

3.2. Functional Requirements of the Suggestion Module

3.3. Software Architecture and Compatibility

4. Methodology: Computer Vision and Deep Learning

Computer Vision

5. Deep Learning

5.1. System Architecture

5.2. Model Training

5.3. Results

6. Vision System for Therapeutic Adaptation: Definition of Integration Strategies

6.1. Integration Testing and Verification

6.2. Test Results and System Validation

6.3. Security, Regulatory Compliance, and Data Protection Measures

7. Case Study: Rehabilitation of the Rotator Cuff in a Volleyball Player

7.1. Main Functions of the Rehabilitation System

7.1.1. Movement Detection and Analysis

7.1.2. Identification of Problems

7.1.3. Suggestion of Personalized Therapies

8. Future Developments

9. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI