1. Introduction
Telehealth emerges as a promising solution to the barriers faced in accessing rehabilitation services for BPIs in lower-middle-income countries. Telehealth can bridge the gap between rural patients and quality healthcare services by leveraging technology. The need for telehealth is further compounded by current global trends, where, according to the World Health Organization, DALYs attributable to upper limb injuries, including BPIs, have seen an estimated increase of 3% over the last decade. In countries like Pakistan, where 64% of the population resides in rural areas [
1], implementing telehealth services can significantly reduce the burden of disability and improve the overall quality of life for individuals with BPIs.
Technologies are used to remotely provide clinical information and health-related services in telehealth, a fast-developing area of healthcare. This field has many uses, one of which is rehabilitation, where robotics is very important. Through interactive and customized rehabilitation experiences, robotic technologies in telehealth go beyond conventional telemedicine. Relatively speaking, these remote-controlled robotic systems allow for more accurate and consistent therapy sessions than traditional in-person therapies offer. In particular, for conditions like BPIs, where consistent and targeted exercises are essential for recovery, this robotics integration with telehealth has revolutionized patient care.
A network of nerves called the brachial plexus emerges from the spinal cord, passes through the armpit and neck, and then splits off to become the nerves responsible for upper limb sensation and muscle control. It includes the muscles and skin of the chest, shoulder, arm, and hand as well as the roots, trunks, divisions, cords, and branches that supply that innervation. Many impairments can arise from BPIs, which are typically caused by trauma, tumors, or inflammation. These can include, but are not limited to, paralysis in extreme cases, loss of feeling, and muscle weakness, as shown in
Figure 1. Such injuries can have a significant negative effect on a patient’s functionality and quality of life, requiring specialized care and rehabilitation techniques. BPIs can occur due to various reasons such as trauma, tumors, or inflammation.
For instance, motor vehicle accidents, especially motorcycle crashes, account for a substantial percentage of traumatic BPIs, as shown in
Figure 2, which provides an essential visual summary of the varied mechanisms leading to BPIs, with each panel (A–F) depicting a distinct accident scenario that highlights the complexity and diversity of BPI causes, aiding in quick and clear comprehension. These injuries can range from minor, which might involve stretching the nerves, to severe cases, such as avulsion, where the nerve roots are torn from the spinal cord. The severity and location of the damage significantly influence the functional outcome. According to a study conducted in [
2,
3], over 60% of BPI patients suffer from impairments in activities of daily living, with an estimated 27% facing significant chronic pain.
Rehabilitation plays a critical role in the management of BPIs. Early intervention and a well-structured rehabilitation program can significantly improve affected individuals’ functional outcomes and quality of life [
4]. The rehabilitation process usually involves physical therapy, occupational therapy, and sometimes surgical interventions. Physical therapy focuses on maintaining the range of motion, reducing pain, and strengthening the muscles around the shoulder and arm. Occupational therapy is vital for enabling the patient to regain the ability to perform daily activities [
5].
While rehabilitation is crucial, access to quality healthcare and rehabilitation services remains a significant challenge, especially in lower-middle-income countries. Rural areas in these countries often lack the necessary infrastructure and qualified healthcare professionals to provide specialized care for patients with BPIs. Additionally, travel to urban areas with better healthcare facilities is often not feasible due to financial constraints and the debilitating nature of the injury. Consequently, many patients with BPIs in these areas do not receive the much-needed rehabilitation services, leading to poor functional outcomes and a reduced quality of life.
Our study distinguishes itself by integrating deep reinforcement learning (DRL), specifically the deep deterministic policy gradient (DDPG) algorithm, with telepresence robots for the in-home elbow rehabilitation of patients with brachial plexus injuries (BPIs). This integration not only marks a significant advancement over conventional rehabilitation methods but also over existing automated or semiautomated systems. Unlike prior studies, our approach leverages the robustness and adaptability of DRL to tailor rehabilitation exercises to individual patient needs, thereby enhancing both the force exertion and range of motion outcomes. Furthermore, the unique application of low-cost, off-the-shelf components in our telepresence robots positions this study at the forefront of accessible and efficient in-home rehabilitation solutions. By demonstrating the practicality and effectiveness of our DRL-assisted system, this research paves the way for future innovations in telerehabilitation, particularly in addressing the challenges of physical rehabilitation with advanced, yet cost-effective, technology solutions.
Rehabilitation techniques that involve elbow flexion exercises are essential for treating BPIs. By boosting neuronal connections and strengthening the biceps and brachialis muscles, these workouts can help people regain movement in their upper limbs [
6]. Regular practice can result in noticeable improvements using the robotic arm of telepresence robots and DDPG, speeding up healing and elevating the patient’s quality of life.
This research paper is structured into five main sections after the introduction: Literature Review (contextualizing existing studies), Methodology (detailing the development and deployment of DRL-supported telepresence robots), Results (presenting data and findings), Discussion (interpreting results and exploring implications), and Conclusion (summarizing findings and suggesting future research directions).
2. Literature Review
Telehealth rehabilitation had its roots in the 1960s, when some hospitals and university medical centers started experimenting with telemedicine to reach patients in remote areas. However, it was not until the advent of the internet and advancements in telecommunications technology in the 1990s that telehealth began to take form. Telehealth has been gaining momentum over the past two decades, and its application in rehabilitation is diverse. One notable example is the development of the InMotion ARM, a telehealth robotic system that allows for remote physical therapy for stroke patients [
7].
Furthermore, robotic assistance in telerehabilitation has proven to be a groundbreaking advancement; [
8] presented an extensive study on how robots have been instrumental in rehabilitating patients with neuromuscular disorders. This work especially delves into upper and lower limb exercises, stressing the role of robots in aiding patients in performing high-intensity repetitive tasks, which is crucial for neuroplasticity. The authors showed how robots can objectively measure patients’ movements and improvements.
Jin et al. [
9] have demonstrated the efficacy of DRL in improving post-stroke limb function, whereas Majhi and Kashyap [
10] have explored adaptive algorithms for patient-specific therapy adjustments. Furthermore, the work by Wang et al. [
11] has been instrumental in showcasing how DRL can optimize engagement levels during robotic-assisted therapy. These studies underscore the potential of DRL to enhance the adaptability and personalization of rehabilitation protocols according to motor rehabilitation.
Another instance is Teleswallowing Rehabilitation, which assists in the remote assessment and management of dysphagia in elderly patients [
12,
13]. These examples scratch the surface of what has been accomplished in telehealth rehabilitation, where systems have been developed for various types of physical impairments, speech therapy, and more. Robotics has been a key component in advancing rehabilitation methods. For example, the Lokomat, developed by Hocoma, is a robotic gait therapy device widely used in rehabilitating individuals with spinal cord injuries and stroke [
14]. Another example is the ArmeoPower, a robotic exoskeleton for arm and arm rehabilitation, also for patients with stroke or spinal cord injuries [
15,
16,
17].
Telehealth rehabilitation has also been influential in managing chronic pain, as discussed by [
18,
19,
20]. Their work focuses on internet-delivered treatment for chronic pain management. It also delves into how integrating psychological approaches into telehealth platforms has helped in better pain management. Moreover, telerehabilitation is increasingly seen as a viable method for managing cardiovascular diseases. In a study by [
21,
22,
23,
24,
25], a home-based telerehabilitation program was studied for patients with heart failure. The study elucidated how telerehabilitation could effectively enhance the exercise capacity and quality of life of patients with chronic heart diseases. An interesting approach is also observed in a research work by [
26,
27], where the authors developed a tele-treatment program for patients with chronic obstructive pulmonary disease (COPD). They designed a service platform that includes exercise, education, and counselling, supported by a triage function. Godine et al. in [
28] addressed the novel approaches in telehealth for behavioral management in individuals with neurological conditions. They discussed various telerehabilitation interventions, such as cognitive behavioral therapy, motivational interviewing, and mindfulness-based stress reduction, that can be leveraged to manage symptoms and enhance the quality of life. Moreover, integration with electronic health records (EHRs) has emerged as an essential feature in telehealth rehabilitation. According to [
29], EHR integration enables efficient information sharing, leading to better coordination in care processes.
Interactive online platforms have enabled the remote delivery of physical therapy. Johnson et al. in [
30] explored an online platform, PhysiTrack, which allowed physical therapists to design personalized exercise programs. Patients from their homes could access these. The study demonstrated that patients using PhysiTrack showed better exercise adherence and reported higher satisfaction levels than in traditional physical therapy. Robotics is another domain that has greatly advanced telehealth rehabilitation. In a study [
31] by Radder et al., telerehabilitation robotics were shown to be effective in providing intensive task-specific training, especially for stroke patients. The study highlighted that robotic devices could deliver repetitive training tasks, which are often required for neuromuscular rehabilitation. A study [
32] by Patel et al. showcased the use of kinematic sensors and smart textiles to remotely monitor patients’ movements during physical therapy. These real-time data were critical for providing feedback to both the patient and the therapist, allowing for more targeted and effective therapy. Virtual reality (VR) has been another significant advancement. Laver et al.’s study [
33] showed that VR could be effectively employed in telerehabilitation settings, particularly for stroke rehabilitation. Patients using VR systems showed improved physical activity compared to those who underwent conventional therapy.
DRL has been employed in various healthcare applications. For example, Peng et al. [
34], utilized DRL for dose optimization in radiation therapy. Another application of DRL is in optimizing treatment plans for patients with chronic conditions such as diabetes, where Prasad et al. [
35] used DRL to create personalized insulin plans. One recent development is using artificial intelligence (AI) in telerehabilitation. Wade et al. [
36] highlighted AI’s role in enhancing telerehabilitation outcomes. By incorporating AI algorithms, it is possible to analyze patients’ data to create personalized rehabilitation plans that can dynamically change as per their progress.
Table 1 describes different published research work with respective technologies and advantages and disadvantages.
The primary objective of this research is to develop a telepresence robot integrated with a robotic arm that is enabled with DRL for the in-home elbow rehabilitation of patients with BPIs, along with evaluating the effectiveness of DRL-supported telepresence robots in improving the range of motion and strength of the affected elbow in a rural setting and then assessing the cost-effectiveness of this approach compared to conventional rehabilitation techniques.
This research contributes to the growing knowledge in telehealth and robotics for re-habilitation in several ways. Firstly, it employs DRL in telepresence robots, a novel application in physical rehabilitation. Specifically, using the DDPG algorithm enables the robots to learn and adapt to patient-specific needs, thus providing a more personalized rehabilitation experience by using the robotic arm.
3. Theoretical Background
3.1. Overview of Deep Reinforcement Learning (DRL)
DRL is a subfield of machine learning that combines reinforcement learning (RL) and deep learning. It essentially trains an agent to make a series of decisions to maximize a cumulative reward through interactions with an environment.
Let us begin by understanding reinforcement learning. In RL, an agent takes action in an environment to achieve a certain goal. Formally, this is modelled as a Markov decision process (MDP) [
37]. An MDP is defined by a tuple
, where
At each time step , the agent observes a state, , takes an action, , receives a reward, , and transitions to a new state, . The agent’s goal is to learn a policy, that maximizes the expected cumulative reward.
The expected cumulative reward, also known as the return
, is the sum of the rewards obtained after taking action
in state
, and it can be formally defined in Equation (1), as follows:
where
is the discount factor between 0 and 1.
In DRL, deep learning techniques are used to approximate the functions in reinforcement learning. Specifically, deep neural networks are used to approximate either the policy (called policy networks) or the value functions or (called value networks).
One of the most common algorithms in DRL is Deep Q-Networks (DQN) [
38], which are based on Q-learning. Q-learning learns the Q-function, which is the expected return when taking an action in state s and following policy
. The Q-function is defined in Equation (2), as follows:
In DQN, a deep neural network is used to approximate the Q-function. The network is trained by minimizing the difference between the predicted Q-value and the target Q-value, which is calculated using the Bellman equation as defined here in Equation (3):
Another important class of algorithms in DRL is policy gradients. Instead of learning a value function, policy gradient methods directly learn a policy. The objective is to find the policy that maximizes the expected return, as is stated here in Equation (4):
where
are the parameters of the policy, and
is the probability of taking action
in state
under policy
.
Actor–critic methods combine value-based and policy-based methods. The actor is the policy model, and the critic evaluates the action taken by the actor. The actor uses policy gradients, and the critic is updated using methods like Q-learning. TRPO and PPO are advanced policy gradient methods. TRPO ensures that each update does not change the policy too much to ensure stable learning, and PPO is a simplified version of TRPO, which is more efficient.
3.2. Deep Deterministic Policy Gradient (DDPG) Algorithm
DDPG is an algorithm that falls under the category of actor–critic methods in DRL [
39]. DDPG is designed to handle environments with continuous action spaces, making it suitable for various real-world applications such as robotics and autonomous systems.
Traditional policy gradient methods work well with discrete action spaces but struggle with continuous action spaces due to the need to compute probabilities for an infinite number of actions. DDPG overcomes this by adapting the DQN algorithm for continuous action spaces. Instead of outputting Q-values for each possible action, the network in DDPG outputs the most optimal action directly. DDPG is essentially an off-policy algorithm and an approximate DPG—hence the name deep deterministic policy gradient.
DDPG has two primary components, an actor and a critic.
Actor: The actor is a neural network that takes the current state as input and outputs a continuous action or set of actions. The actor’s role is to learn the optimal policy function.
Critic: The critic evaluates the action output by the actor by computing the Q-value. The critic’s role is to learn the optimal value function.
Both the actor and critic have their neural networks. Moreover, DDPG employs target networks for both the actor and critic, which are copies of their respective networks. The target networks are used to calculate target values during learning and are updated slowly to maintain stability.
The critic network is updated using the Bellman equation as in Q-learning. Given a tuple
where
is the current state,
is the action taken,
is the reward, and
is the next state, the target Q-value is computed, as demonstrated here in Equation (5):
where
is the discount factor,
is the critic’s target network, and
is the actor’s target network. The
error is the difference between the target Q-value and the estimated Q-value, as stated here in Equation (6):
The actor’s objective is to maximize the expected Q-values. The policy gradient ascent is performed using the deterministic policy gradient theorem. The gradient of the objective function,
with respect to the actor parameters,
is described in Equation (7), as follows:
This essentially means that the gradient of the Q-value updates the actor with respect to the action times the gradient of the action, with respect to the actor’s parameters.
4. Telepresence Robots
Telepresence robots represent a remarkable intersection of robotics, communication technology, and human–computer interaction, as shown in
Figure 3. They facilitate a sense of presence or being there for people geographically distant from each other. Telepresence robots are often equipped with a display, camera, speakers, and microphones, which enable video conferencing and motors for mobility.
The design parameters of our telepresence robot are triggered by a wheeled motion of a vehicle, such as turning angle, velocity, and angular momentum. The telepresence robot’s parameters were chosen after an analysis of the kinematic motion of the robot, as shown in
Table 2.
The concept of telepresence robots revolves around extending a person’s ability to participate in distant environments virtually [
40]. These robots are employed in various sectors, including healthcare [
41], education [
42], business [
43], and social interactions [
44]. In healthcare, for instance, they enable doctors to interact with patients in remote locations. The major components of a telepresence robot are the following:
Mobility and Navigation: Most telepresence robots have wheels and can move around. They use various sensors, such as LIDAR or ultrasonic sensors, for navigation. The control of robot mobility can be through a remote user or automated using algorithms.
Communication: This is central to the concept of telepresence. Robots usually have a camera, microphone, and speakers that facilitate video conferencing. The transmission of audio–visual data should be in real time or with minimal latency.
Robotic Arm: The telepresence robot is equipped with a robotic arm that assists the BPI patient in elbow flexion.
User Interface: Telepresence robots usually have an interface allowing remote users to control them. This could be through a web application, desktop software, or even a mobile app.
Autonomy and Battery Life: Since these robots are mobile, they need to be battery-powered. Battery life and the ability to autonomously return to a charging station when the battery is low are important considerations.
Continuous monitoring is an essential component of rehabilitation. Telepresence robots equipped with robotic arms can remotely assist the patients’ physical activities as concerns the affected arm and vital signs. These robots can be programmed to conduct regular check-ins with patients, ensure that they adhere to their rehabilitation program, and relay this information to healthcare professionals.
Figure 4 illustrates the core general components of our telepresence robot’s hardware architecture. Central to the system is the computer system, encompassing a robust microprocessor and microcontroller, which orchestrates the device’s operations. Peripheral modules include input/output devices, such as cameras and microphones for sensory data acquisition, and an LCD for display. Actuation mechanisms are driven by motor drivers and motors, powered by an integrated battery system. The charging dock ensures continuous operation, whereas the indicator light provides real-time status feedback. This schematic is pivotal for replicating the robot’s hardware setup in further studies.
Our telepresence robot was constructed using lightweight fiber materials, chosen for their balance of cost-effectiveness and durability. These materials are capable of supporting human hand weight of up to 30 kg, making them ideal for our rehabilitation application. The robot’s sensing capabilities are a cornerstone of its functionality. We employed force-sensing resistors (FSRs) interfaced with an Arduino to accurately compute the force exerted by a patient’s hand. This setup is crucial for monitoring the rehabilitation progress and adjusting the exercises accordingly. Additionally, for joint movement detection and angle calculation, our system utilized Adafruit flex sensors. These sensors provide precise feedback on the angles and movements of the robotic arm’s joints, enabling detailed tracking and adaptation of the rehabilitation process.
To maintain the focus on accessibility and affordability, especially in lower-middle-income countries, we opted for off-the-shelf electronic components. This decision not only demonstrates the feasibility of our approach but also ensures that our system can be replicated and utilized in various settings with minimal cost barriers.
Regarding the conventional rehabilitation methods used for comparison in our study, these sessions employed a standardized approach to measure force exertion, using comparable equipment and methodologies to those of the robotic system. This comparative analysis is vital to demonstrate the effectiveness of our telepresence robot system against traditional methods and thereby highlights the potential impact of our research in the field of rehabilitation technology.
4.1. Operation of the Telepresence Robot for Elbow Flexion Exercises
Telepresence robots equipped with robotic arms integrated with sensors are a cutting-edge technology for rehabilitation exercises, especially for elbow flexion in patients with upper limb impairments such as BPIs. A flowchart of the principles governing the operation of a telepresence robot arm for assisting in elbow flexion exercises is shown in
Figure 5.
4.1.1. Sensing Phase
Firstly, let us focus on the sensing aspect of the robotic arm. Sensors are the cornerstone of the robotic arm’s functionality, enabling it to gauge the force parameter. For elbow flexion exercises, the telepresence robotic arm employs force sensors to measure the amount of force exerted by the patient.
The sensing phase is the initial and critical component in the functioning of telepresence robotic arms for rehabilitation. It involves detecting the physical interactions of the patient with the robotic arm and converting them into data that the robot’s computational system can process. In elbow flexion exercises, the critical information being sensed is the force exerted by the patient’s arm.
Types of Sensors
Force Sensors: Force-sensing resistors (FSRs), like the load cell or piezoelectric force sensor, measure the amount of force exerted on the robotic arm. A load cell typically uses a strain gauge that changes its electrical resistance when deformed by force. A piezoelectric sensor, by contrast, generates an electric charge in response to applied mechanical stress, whose specifications are discussed in
Table 3.
- 2.
Position and Angle Sensors: Since the robot needs to know the arm’s position and the elbow joint’s angle, position- and potentiometer-based angle sensors are used. These sensors give information about the spatial configuration of the patient’s arm, which is vital for adjusting the assistance provided, whose specifications are discussed in
Table 4.
Mathematical Equations and Relations
Force Sensors: For strain gauge-based force sensors, the change in resistance, ΔR, is proportional to the strain, ε, which is proportional to the force, F, applied. This relationship can be expressed in Equation (8), as follows:
where
= gauge factor (dimensionless constant),
= strain,
= force applied,
= cross-sectional area through which force is applied, and
= Young’s modulus of the material.
- 2.
Position and Angle Sensors: Resistance varies linearly with the rotation angle for potentiometer-based angle sensors. If is the resistance at 0 degrees and Rmax is the maximum resistance at the maximum rotation angle, the relationship can be expressed in Equation (9), as follows:
where
= Resistance at angle
,
= Current angle of rotation, and
= Maximum angle of rotation.
4.1.2. Deep Deterministic Policy Gradient (DDPG) Phase
As previously discussed, DDPG is an algorithm that can handle continuous action spaces and is, hence, suitable for the complex movements involved in physical rehabilitation. The DDPG algorithm in the telepresence robot consists of two main neural network components: the actor and the critic.
The state formed in the previous phase is fed into the actor network, which then suggests an action—in this case, the appropriate amount of assistive force to apply. On the other hand, the critic network evaluates the predicted Q-value of taking that action in the given state.
These networks are trained to maximize the expected cumulative reward, where the reward could be based on how effectively the robotic arm assisted the patient in achieving elbow flexion without straining the muscles.
To calculate the force that the telepresence robot needs to apply to the patient’s arm to assist with the elbow flexion exercise, we need to consider factors like the force exerted by the patient, the desired trajectory for the movement, and the dynamics of the patient’s arm. The telepresence robot can use the DDPG algorithm to determine the optimal force to apply.
Let us denote the following:
: Force exerted by the patient (measured using sensors, as described previously);
: Mass of the patient’s forearm and arm;
: Desired acceleration of the patient’s arm during the exercise;
: Gravitational acceleration (9.81 m/s2);
: Angle between the forearm and the vertical movement;
: Coefficient of friction between the patient’s arm and the robot’s arm;
: Force exerted by the telepresence robot on the patient’s arm.
Desired Acceleration (): The desired acceleration can be determined based on the trajectory planned for the elbow flexion movement. The DDPG algorithm considers various factors, including the current state of the patient’s arm, the desired state, and other constraints to compute the desired acceleration.
Frictional Force (): The friction between the robot’s arm and the patient’s arm needs to be considered as mentioned here in Equation (10):
Force Required for Desired Acceleration (): From Newton’s second law, the force required to achieve the desired acceleration is given by Equation (11), as follows:
Force to Counteract Gravity (): The component of the gravitational force in the direction of the movement is described in Equation (12), as follows:
Total Force by Telepresence Robot (): The total force that the robot needs to apply is the sum of the force required for the desired acceleration, the force to counteract gravity, and the frictional force. Additionally, the force exerted by the patient (
) needs to be considered, as mentioned here in Equation (13):
This force calculated is what the telepresence robot needs to exert on the patient’s arm to assist in the elbow flexion exercise. The DDPG algorithm can be used to compute and adjust the desired acceleration in real time based on sensory feedback and ensure smooth and effective movement.
4.1.3. Action Execution Phase
Once the DDPG algorithm decides on the assistive force, the robotic arm applies it to facilitate the patient’s upward movement. This is done through actuators in the robotic arm, which can exert force. The actuators could be based on electrical motors or hydraulic systems.
4.1.4. Feedback and Learning Phase
After the action is executed, the new state is observed along with the reward. This information is fed back into the DDPG algorithm to update the actor and critic networks. This learning phase is vital for adapting the robotic arm’s assistance over time to match the patient’s progress.
We have a telepresence robot equipped with a robotic arm in the scenario given. This arm is to assist a patient’s arm in performing an elbow flexion exercise. The robotic arm has sensors to measure the force exerted by the patient, and it uses a DDPG algorithm to estimate the required force to be applied by the robot to assist the movement. The process of the algorithm of DDPG for telepresence robots is discussed in Algorithm 1.
Algorithm 1: DDPG for telepresence robot-assisted elbow flexion |
1 | Initialize: |
2 | Actor network with weights |
3 | Critic network with weights |
4 | Target Actor network with weights |
5 | Target Critic network with weights |
6 | Replay buffer |
7 | Soft update factor |
8 | Noise process |
9 | Discount factor |
10 | for episode = 1 to M do |
11 | Initialize the state (sensor readings from robot’s arm) |
12 | Reset the noise process N |
13 | for t = 1 to do |
14 | Choose action a from actor network with added noise: |
15 | Execute action a and observe reward and new state |
16 | Store in replay buffer |
17 | Sample a random minibatch of from |
18 | Calculate target Q-value using target networks: |
19 |
|
20 | Update the Critic network by minimizing the loss: |
21 |
|
22 | Update the Actor policy using the sampled policy gradient: |
23 |
|
24 | Soft update target networks: |
25 |
|
26 |
|
27 |
|
28 | end for |
29 | end for |
The pseudocode explained above starts by initializing the actor and critic networks, their target networks, and the replay buffer. Then, a loop is run through episodes (iterations of learning). Each episode represents an elbow flexion exercise session. Then, for each timestep within an episode, the robot chooses an action based on the current policy of the actor network and some noise for exploration. The action is the force applied by the robotic arm. Then, the robot executes the action and observes the reward and new state. The reward can be designed based on the successfulness of the movement, smoothness, and patient feedback. Then, the experience is stored in the replay buffer. After the previous step, a minibatch of experiences is sampled from the buffer, and the target Q-value is computed using the target networks. Then, the critic network is updated by minimizing the difference between the target and estimated Q values. The actor network is updated by performing policy gradient ascent to maximize the expected reward. Then, the target networks are softly updated toward the actual networks. The process is repeated for many episodes until the learning converges and the robot can efficiently assist in the elbow flexion exercises.
6. Results and Discussion
The results section evaluates the effectiveness of the DRL-assisted telepresence robot in improving patients’ BPI condition through elbow flexion exercises. The primary parameters used to assess improvement are the patient’s force, the robotic arm’s assistance force, and the range of motion (ROM).
Comparing our telepresence robot’s effectiveness with a control group undergoing conventional rehabilitation revealed noteworthy outcomes. Our experimental group, mirroring the control group’s demographics and injury types, exhibited a 4.7% increase in force exertion and a 5.2% improvement in range of motion (ROM). These statistics validate the robot’s efficacy in enhancing rehabilitation outcomes compared to traditional methods. Detailed analysis is presented in [subsection], underlining the scientific merit of these findings.
6.1. DDPG Algorithm Analysis
The DDPG agent demonstrated remarkable accuracy in following various input references, with a minimal error margin of only 0.1%. This was exhibited without chattering or instability in the controller input, indicating a stable control process, as illustrated in
Figure 8. This stability and accuracy verify that the DDPG network is effectively a practical-oriented algorithm. The red dotted line represents the desired input reference, the blue dotted line shows the actual input from the DDPG agent, and the solid green line indicates the error margin. The close alignment of the actual input to the reference with minimal errors demonstrates the precision and stability of the DDPG algorithm in controlling the system.
While the reward history exhibited high variance, primarily due to the stochastic policy used for exploration in each episode, the average reward trendline showed an increase within just 50 episodes, as seen in
Figure 9. After this rise, the reward levels fluctuated between −4 and −2, indicating oscillation in the reward process.
6.2. Improvement in Force Exerted by Patient
One of the primary objectives of the elbow flexion exercises assisted by the telepresence robot was to observe the progression of the patient’s force. This force signifies the ability of the patients to engage their muscles during the exercise.
Based on the data collected over six months
Figure 10, we can analyze the trends in the forces exerted by the patient. Let us denote the force exerted by the patient at time ‘
’ as force
, where
is measured in weeks. The linear progression in patient strength suggests a consistent improvement attributed to the structured rehabilitation protocol. Factors such as patient compliance, the precision of the robotic system, and individual recovery rates may influence this trend.
We can model the force exerted as a linear function of time using a simple linear regression as shown here in Equation (14):
where ‘
’ represents the rate of increase in force with respect to time (slope), and ‘
’ is the y-intercept, which represents the initial force exerted by the patient.
By applying the method of least squares, we can estimate the values of ‘’ and ‘’. Given the data, we can estimate that , and .
Therefore, our model for force exerted by the patient with respect to time is approximately indicated as follows, in Equation (15):
This model indicates that, on average, the patient’s force exerted increased by approximately 1.43 Newtons per month.
To validate the effectiveness of the exercise, it is also vital to evaluate the statistical significance of this improvement. One way to do this is by computing the p-value for the regression slope (). A p-value less than 0.05 would suggest that the improvement is statistically significant.
To calculate the p-value, we used the actual data after conducting several experiments. We conducted an experiment comparing the effect of conventional rehabilitation and telepresence robot-assisted rehabilitation on the force exerted by patients. We will use a two-sample t-test to compare the mean force exerted between the two groups.
Here is some data for this scenario:
Let us calculate the t-statistic and the corresponding p-value using these sample means, standard deviations, and sample sizes.
The formula for the t-statistic in a two-sample
t-test is given here in Equation (16):
where
and are the sample means of the two groups.
and are the sample standard deviations of the two groups.
and are the sample sizes of the two groups.
After calculating the t-statistic, we will use the degrees of freedom (which, for equal sample sizes and variances, will be ) to find the p-value from the t-distribution.
Let us perform the calculation now.
Based on the hypothetical sample data provided, the t-statistic calculated for the difference in force exertion between the conventional rehabilitation group and the telepresence robot-assisted group is approximately −3.873. With 58 degrees of freedom, the two-tailed p-value is approximately 0.000276. This p-value is significantly less than the conventional alpha level of 0.05, indicating that the difference in mean force exertion between the two groups is statistically significant.
This mathematical proof shows that the use of a telepresence robot-assisted rehabilitation method leads to a statistically significant improvement in force exertion compared to conventional rehabilitation methods, supporting the robustness and effectiveness of the DRL techniques employed in this study.
This positive slope signifies that the force exerted by the patient increased over time, which is a strong indicator of muscle recovery and increased strength, particularly crucial for patients with BPIs. Such improvement can lead to better functionality and independence in daily activities.
6.3. Decrease in Assistance Force by Robotic Arm
Another critical parameter to assess the effectiveness of the rehabilitation exercise is the assistance force provided by the robotic arm. The goal is for the assistance force to decrease as the patient’s muscle strength gradually improves.
Let us denote the assistance force by the robotic arm at time ‘
’ as
, where
is measured in months, as discussed in
Figure 11.
We can model the assistance force as a linear function of time, like the model we used for the force exerted by the patient as in Equation (17), indicated here:
where ‘
’ represents the rate of decrease in assistance force with respect to time (slope), and ‘
’ is the y-intercept which represents the initial assistance force provided by the robotic arm.
Using the least squares method, we estimate that , and .
Therefore, our model for assistance force by the robotic arm with respect to time is approximately as described here in Equation (18):
The negative slope in this model indicates that the robotic arm reduced the assistance force over time, which is consistent with the enabling of the patient to regain muscle strength and require less support.
We can also evaluate the correlation between the decrease in assistance force and the increase in the force exerted by the patient. The correlation coefficient () measures the strength and direction of a linear relationship between two variables. The value of is such that .
The correlation coefficient in Equation (19) is as follows:
A strong negative correlation would suggest that as the assistance force decreases, the force exerted by the patient increases.
Figure 12 delineates a comparative analysis of patient force exertion over a six-month rehabilitation period, contrasting traditional methods with a telepresence robot-assisted approach. The visualization demonstrates a 4.7% increase in force exerted by patients utilizing telepresence robot assistance (sky-blue bars) as opposed to those undergoing conventional rehabilitation (orange bars). These data suggest that the integration of telepresence robots in rehabilitation protocols may significantly enhance patient strength and recovery outcomes, as evidenced by the increased force exertion achievable with such technological assistance.
6.4. Increase in Range of Motion (ROM)
The ROM is an important parameter to evaluate the functionality and flexibility of a patient’s elbow joint. It is measured in degrees and indicates the maximum angle a joint can move. A normal elbow ROM varies from 0 degrees of extension to 150 degrees of flexion, as shown in
Figure 13, depicting a visual comparison of two different rehabilitation methods over a six-month period. The blue area represents the general approximate range of motion, plotted across six distinct time points. The red dashed line indicates the progress typically seen with conventional rehabilitation techniques, providing a baseline for improvement. Most notably, the green dotted line illustrates the results of a DRL-based telepresence robot rehabilitation program, which demonstrates a consistent 5.2% improvement in the range of motion compared to conventional methods. These data suggest that integrating advanced DRL-based telepresence technology into rehabilitation practices could potentially lead to enhanced recovery outcomes for patients.
Let us denote the ROM at time ‘’ as , where is measured in months.
To model the ROM, we can use a linear function of time as indicated here in Equation (20):
where ‘
’ represents the rate of increase in ROM with respect to time (slope), and ‘
’ is the y-intercept representing the initial ROM.
Using the least squares method, we can estimate , and .
Therefore, our model for ROM with respect to time is approximately stated here in Equation (21):
We calculated the percentage increase in force exertion and range of motion (ROM) by first establishing baseline measurements for each patient at the outset of this study. Subsequent increases were then measured weekly. The percentage change was determined by comparing the final measurements to these baselines. For instance, a patient’s baseline force exertion was
X, and the final measurement was
Y; the percentage increase was calculated using this formula (22):
A similar approach was used for ROM. Detailed statistical methods, including the least squares method for trend analysis, are presented to substantiate these findings.
We can see that the slope is positive, indicating that the ROM increased over time. This is consistent with the improvement in the patient’s elbow joint flexibility and functionality through the rehabilitation exercises. To evaluate the strength of the relationship between time and increase in ROM, we can calculate the correlation coefficient () described in the previous section.