HRpI System Based on Wavenet Controller with Human Cooperative-in-the-Loop for Neurorehabilitation Purposes

There exist several methods aimed at human–robot physical interaction (HRpI) to provide physical therapy in patients. The use of haptics has become an option to display forces along a given path so as to it guides the physiotherapist protocol. Critical in this regard is the motion control for haptic guidance to convey the specifications of the clinical protocol. Given the inherent patient variability, a conclusive demand of these HRpI methods is the need to modify online its response with neither rejecting nor neglecting interaction forces but to process them as patient interaction. In this paper, considering the nonlinear dynamics of the robot interacting bilaterally with a patient, we propose a novel adaptive control to guarantee stable haptic guidance by processing the causality of patient interaction forces, despite unknown robot dynamics and uncertainties. The controller implements radial basis neural network with daughter RASP1 wavelets activation function to identify the coupled interaction dynamics. For an efficient online implementation, an output infinite impulse response filter prunes negligible signals and nodes to deal with overparametrization. This contributes to adapt online the feedback gains of a globally stable discrete PID regulator to yield stiffness control, so the user is guided within a perceptual force field. Effectiveness of the proposed method is verified in real-time bimanual human-in-the-loop experiments.


Background and Motivation
Research on motor rehabilitation therapy has provided advanced strategies for upper limb of people with cerebrovascular accident (CVA) [1,2], ranging from induced movement therapy [3], electromechanical assisted training [4], to robot-based haptics [5,6]. The emerging technologies of virtual reality [7,8], features rehabilitation by promoting repetition, and task-oriented training in a ludic, motivating and playful environment [9], facilitating functional, useful and improved experience. These not only benefit the user from this experience, but also the therapists who perform and evaluate and document online the tasks, with studies on upper limb motor recovery of CVA patient [10,11]. However, virtual body representation remains an involved issue [12], including its critical variable of interaction force of the patient [13], for viable and feasible interaction given the motor variability of each patient [14]. Platforms that integrate visual rendering with haptic stimuli conveys the user a multimodal perception for improved interaction [15]. There has been studied for motor training and/or rehabilitation protocols to generate a novel environment by executing the time and spatial patters of movements accurately through practice [16][17][18]. However, it remains unclear how to deal with volitional patient interaction force [19].
Neuropsychological rehabilitation offers also promising results based on programming specific tasks from the qualitative analysis of symptoms, some based on motor learning, i.e., changes in patient movements that reflect changes in the structure and function of the nervous system. In [20], it is presented an experimental study to improve the prediction time and reduce the robot response taken to reach a desired position, based on hand and eyegaze motion determination, to foresee the point of user interaction in a virtual environment. However, the essential patient force interaction is not assessed. The development of haptic technology is incipient for rehabilitation purposes, though mature for computer and gaming interaction [21], it stands for subject of research how to deal patient interaction force, given their deviated motor patters, to promote their improvement under kinesthetic stimulus. It stands for a subject of research in several fields, including clinical tests have been implemented [22].
The so-called Porteus Maze Test (PMT) consists of solving mazes ordered in a pattern of increasing difficulty, it has been studied for upper limb rehabilitation of stroke patients; though PMT was proposed originally as a psychometric nonverbal test to measure psychological planning capacity and foresight (performance intelligence) [17]. The PMT is also currently used as a test associated with the activation of the frontal region of the brain, involved in the planning factor and capable of detecting perseverative errors [17]. Thus, it can be extended to evaluate executive functioning since it qualifies observable behaviors in neuropsychological rehabilitation [23], by inferring dysfunction of the Central Nervous System [24]. Then, it becomes a subject of interest to evaluate PMT under haptic guidance.

Contribution
A self-tuning scheme based on neural network identification of nonlinear dynamics is presented to adapt control feedback gains of a discrete PID guidance controller. Since a PID structure can be abstracted as the addition of restitution viscoelastic and memory generalized vector forces computed to converge to the nominal cartesian task, then the robot end-effector guides human user hand with such a vector force, which increases (diminishes) if position error increases (diminishes). Real-time experiments with nine healthy volunteers are presented under bilateral PMT using two Omni haptic interfaces. The user solves the virtual maze navigating under haptic guidance of the Omni at each hand under two modalities: (i) passive haptic guidance (PHG), where the user perceives a contact force each time he/she touches the maze boundaries, the less touches the better; (ii) active haptic guidance (AHG), where the user is guided continuously with a haptic force corresponding to how much it deviates from the nominal maze trajectory, the less position error the better. Results shows that in AHG renders improved trajectory precision from the self-tuning adaptation of force feedback.

The Dynamics of the HRpI System
Consider a human-robot physical interaction system (HRpI) equipped with one haptic robotic device, one per left and right hand, that exhibits a high-end electromechanical performance, such as low friction, backdrivability and low inertia, with a high bandwidth to display force, whose nonlinear dynamic model is [25][26][27] where σ is used to indicate left l, or right r haptic device, q σ ∈ R n ,q σ ∈ R n are the generalized position and velocity joint coordinates, respectively; H σ (q σ ) ∈ R n×n denotes a symmetric positive definite inertial matrix, C σ (q σ ,q σ ) ∈ R n×n represents the Coriolis and centripetal forces, G σ (q σ ) ∈ R n models the gravity loads from earth gravitation field, and τ σ ∈ R n stands for the torque input. Term τ f σ = f b σq σ + f c σ tanh(γ σqσ ) stands for joint friction, where f b σ , f c σ are positive definite n × n matrices modelling viscous damping and the dry friction respectively and its coefficient γ σ > 0. When the human operator is grasping the haptic device through placing its fingertip into its thimble, the dynamics changes remarkably due to human exerts a human torque τ h σ into the haptic robotic device: where τ h σ is assumed Liptchitz, giving rise to a human-in-the-loop configuration [26]. System (2) can be written in continuous time state space representation where g σ (x σ (t))τ h σ (t) is the map of the exogenous time-varying unmeasurable human torque, and are unknown smooth functions, and u σ = τ σ is the control input.

Problem Statement
When the human operator has a motor disability, there has been proposed nonlinear robot controllers u σ to assist motion [28][29][30], including a high performance decentralized continuous nonlinear PID controller [31], however with constant feedback gains that do not adapt to changing conditions, such a time varying persistent human interaction term g σ (x σ (t))τ h σ (t). Then, the problem can be stated as follows: assuming unknown f σ (x σ (t)) and unknown human interaction generalized force g σ (x σ (t))τ h σ (t), design a model-free control u σ with feedback gains that adapts online for each patient so as to his/her performance improves when interacting with the robotic device under haptic guidance. Figure 1 shows a maze solution application.

Adaptive Interaction System
In this section, the adaptive interactive system is presented as shown in Figure 2, as can be seen each of the haptic device has a programmed wavenet PID controller which has communication between them through the computer where the algorithms are run. The wavenet PID controller scheme is based on an identification of inverse dynamics of each haptic device and a IIR filter to tune PID feedback gains and guarantees global regulation. For this purpose, the control scheme shown in Figure 3 is used, where four main blocks can be observed: HRpI, Wavenet identification, Discrete PID controller and Auto-tunning gains. The following subsections will describe each of them.
Auto-tunning gains Wavenet identification Figure 3. PID Wavenet controller scheme where σ can be left, l or right, r, i.e., σ = {l, r}, y re f σ (k) is the reference signal, ε σ (k) stands for the error signal, the control input is u σ (k), r σ (k) models the noise signal, y σ (k) is the HRpI (human patient in the haptic loop) output withŷ σ (k) its estimate, and e σ (k) the error estimated, finally, v σ (k) stands for the persistence signal.

Input-Output Dynamics of the HRpI
It is well-known that any sufficiently smooth continuous time non-linear system admits a discrete-time representation [32,33]. Then, (2) or (3), (4) can be represented in discrete time state space, by assuming access to all state at each time, with small enough sampling period T > 0, and provided that input remains constant between sampling as follows Solving (6) for x σ (t + T) leads to Evaluating (7) and (4) at t = kT yields the following nonlinear discrete time system Substituting (8) into (9), the discrete output at instant (k + 1)T is given by where Φ σ ( * ) stands for the flow of discrete dynamic system with Γ σ ( * ) the input matrices of robot, with Notice that (10) describes the input-output dynamics of the haptic device σ = {l, r}, at instant k + 1. Notice that input u σ (k) and system output y σ (k) are the only data available. In this paper, we exploit the properties of wavenets to approximate the input-output dynamics (10) of each haptic device, but additionally we consider IIR filter in the output layer to prune irrelevant signals to build an efficient identification scheme useful to tune PID feedback gains [34].

Wavenet Identification
A scheme is proposed to identify approximately the inverse plant (HRpI system), to this end, a radial basis neural network is used. The activation functions ψ(τ σ ) are daughter wavelet functions ψ j (τ σ ) of RASP1 type. To filter neurons that have little contribution in the identification process, three IIR filters are incorporated in cascade, using the least number of neurons possible and reduce the computational load in the learning process [35]. In Figure 4, the signal propagation and the general interconnection are presented, where Infinite impulse response (IIR) recurrent structure ( Figure 5), in cascading structure, yields double improving speed of learning by pruning those nodes with insignificant relevant information from the cross contribution summation of daugthers wavelets. Notice in the scheme, the forward delayed structure modulated by the input and the feedback loop modulated by the persistent signal to allows swapping a range of frequency [36].  Figure 5. IIR filter structure.
The mother wavelet function ψ(k) generates daughter wavelets ψ L (τ Lσ ) by its property of expansion or contraction and translation, represented as [37]: with a = 0; a, b ∈ R and the j scale variable, a L σ allows expansion and contraction, and b L σ stands for the (L σ ) translation variable at k, in the classical role of RBF, with the advantage of dealing with more refinement through daughters wavelets ψ L (τ L σ ). This last feature is essential in the present algorithm together with the pruning capability of the IIR filter. As suggested in [37], the mathematical representation of wavelet RASP1 is a singularity-free normalization of the argument of the wavelet whose partial derivative with respect to b L σ is In this way, for the letf o rigth haptic device, the i wavenet approximation signal with IIR filter can be calculated as: where L stands for the number of daughter wavelets, w i,l the weights of each neuron in the wavenet, c i,l and d i,j are the coefficients of forward and backward IIR filter, respectively, and M and N the coefficients number of forward and backward IIR σ filter, respectively. As can be seen, the (17) has the following matrix structurê System (19) is estimated by two wavenet functions as followŝ with adjustable parameters Θ Φ σ and Θ Γ σ . Therefore, since nonlinear functions wavenet can be used as a Lebesgue measure useful to tune feedback gains.

Weavenet Learning
The parameters of the neural network and the IIR filters in their matrix form are: con- . . ,ŷ p σ ] T ,and the synaptic weight matrices W σ ∈ R p×L ; and the coefficients C σ ∈ R p×M and D σ ∈ R p×N for the filters are: The output z σ (k) of the wavenet is given by which is passed through the IIR filters to obtain the estimated positionŷ σ (k), whereΓ and is the persistent filter signal. The wavenet parameters are optimized by a least mean square algorithm (LMS) subject to minimizing a convex radially unbounded cost functions E σ , defined by Let the estimation error between wavenet output signal with IIR σ filter and system output be To minimize E σ (k), the steepest gradient-descent method is considered. To this end, notice that partial derivatives of E σ (k) with respect to a σ (k), b σ (k), W σ (k), C σ (k) and D σ (k) are required for each haptic device to update the incremental changes of each parameter along its negative gradient direction. That is, then, the tuning update parameter for each haptic device becomes:

Discrete PID Controller for Each Haptic Device
The following discrete PID controller is proposed: where k p σ (k), k i σ (k) and k d σ (k) stand for strictly positive definite proportional, integral and derivative feedback gains, respectively; u σ (k) is the controller at instant k, and error is defined as for σ = {l, r}. Each feedback gain is tuned according to the corresponding error they affect in (39) and modulated byΓ σ , the input matrix of (19).

Auto-Tuning PID Gains
Due to the gains k p σ , k i σ and k d σ are considered within the cost function (27), those can be updated similar to (34)- (38). Let whereΓ σ is defined by (21), for 0 < µ < 1 the learning rate of the PID controller gains. Notice that learning rates µ are designer parameters and are used for both controllers.

23: end while
The flowchart for the PID wavenet algorithm is illustrated in Figure 6.

The Experimental System
The goal of this section is to present the experimental platform as well as the design of experiments.

Experimental Platform
Consider a Geomagic Touch [38], as the haptic interface for each hand, modeled as a three degrees of freedom nonlinear robot given in (3)

Design of Experiments
The experiments aim at evaluating competency of solving a maze with motor commands within a given order and precision, involving executive decision making and motor patters, using PMT protocol. It is surmised that such motion patters leads to coordinate bimanual cooperation of both hands that improves under haptic guidance. Then, it is considered two experiments, one providing only haptic stimuli when user touches la boundaries of the maze and other one with continuous haptic guidance, not only when it deviates as much as touching the limits of the maze.
To this end, the user solves the maze by commanding the 3D haptic interface, which is represented in the virtual world as the pointer within the virtualized maze as shown in Figure 8. The middle road of the maze is considered as the position reference, then the task of the haptic control is to converge to such position reference path, whatever how the user navigates to solve the maze. In this way, the novel paradigm of invariant motor learning is implemented in our scheme: User tracks at his/her own pace and motor capacity the defined invariant position points P i shown in Figure 8, i.e., the algorithm does not impose a desired time base since desired velocity is neither enforced visually nor imposed throughout the control scheme. In this way, user intentional movement is deployed to solve the maze at will, which es essential for motor rehabilitation.

How the Haptic Control Occurs and How Human is Guided Spatially
Let Figures 9 and 10 show the nominal trajectory for the left and right hand SCM and MCM, where T i represents the nominal transect segments. The ends of each T i are constant spatial points. Assuming that haptic device pointer is at any given instant in a given spatial Cartesian location ξ r , the closest T i is chosen, and it is determined the closest point ξ ∈ T i as the reference point at that instant, i.e., y σ−re f = ξ r . Then, the controller injects a torque u σ Nm to the haptic device to attract ξ r → y σ−re f . In this way, wherever the pointer is, it is attracted to the closest point of transect T i , independently of time, and independently of how fast or slow the velocity of user pointer is. Since human fingertip is inserted at thimble of the haptic device, then human perceive such torque as a vector of haptic force f h , given by For the MCM exercises, a maze of medium difficulty is shown in Figure 11, whose virtualization was programmed in Unity3D, with a unique solution for both right and left haptic devices.

Experimental Results
A pre-training phase is performance to obtain the initial values of the neural network parameters, see Tables 1 and 2. THis phase is conducted in a human-in-the-loop configuration.  Table 2. Initial values of the parameters of wavenet and PID controller.  Figure 14 shows the initial position that the user must have when starting each of the experiments. Figure 15 shows the virtual navigation behavior of the user in the workspace to solve the SCM bimanually, with smooth position behaviour, as shown in Figure 16.   In this passive navigation configuration, user exhibits low performance since haptic guidance not only is scarce but intermitent (user perceives a force at fingertip only when it touches the walls of the maze).

Experiment 2: Passive Haptic Guidance with SCM and MCM
The following exercise consists of the implementation and application of a control law for trajectory tracking, from the construction of a desired trajectory through motion planning ( Figure 17), a different one for each of the haptic devices integrated in the platform ( Figure 18). The experiment consists of each device performing tracking-based regulation with the user in the loop, giving the user visual and force feedback on the planned trajectory, where the applied controller guarantees position convergence, the goal is that all this can be used for rehabilitation purposes. After the development of exercise 1, Figures 19 and 20 show the position errors of each of the haptic devices. Figures 21 and 22 show the control signal that is sent to each device to generate trajectory tracking.

Exercise 1: Passive Haptic Guidance without User in the Loop
For the next test, passive haptic guidance is applied on the device without user in the loop, Figure 23 shows the results in position convergence and energy exchange.

Exercise 1: Passive Haptic Guidance with User in the Loop and Disturbances
The following test was performed to check the effectiveness of the controller implemented to compensate uncertainty and disturbance generated by the user when it is coupled with the device. Figure 24 shows the moments where there are disturbances, the same instant where there is an increase of energy, the same that the controller uses to compensate and redirect the device to the desired trajectory.

Exercise 2: Active Haptic Guidance
The present subsection shows the experimental results for the case of Active Haptic Guidance. The Figure 25 shows the behavior of the two haptic devices in the workspace and Figure 26 shows the operational position of two haptic devices in active haptic guidance task.

Exercise 3: Passive Haptic Guidance
The platform with two haptic devices (right hand and left hand), was evaluated in 2 different experiments (mazes with different level of difficulty), each maze with 2 conditions (without control and with control), as defined below: (i) The user uses two haptic devices to solve a maze in free motion (active haptic guidance), i.e., the user controls their own movements. In this condition (without interactive forces), the execution time is allusive to the performance of each user (different in each hand); visual feedback plays an important role. The compensation of the vector of gravitational forces is established. The optical encoders of the haptic device allow performance measurement by mapping the vector of joint variables to the operational space; and (ii) The user interacts with the platform actively, that is, a tracking control law is implemented, which has the objective of teaching the user how to solve the maze. The control law has the goal of compensating for the uncertainties generated by the user when performing the task (disturbances and position errors). In these conditions (adaptive force that conditions the guide in the operational space), it describes a kinesthetic learning task. The performance of each user in the task represents involuntary movements of the trajectory and establishes the energy requirement, defined in an adaptive way by the control.
As a result of the application of the exercise on the labyrinth of medium difficulty, Figure 27 is presented, which corresponds to the trajectory on the workspace of the two haptic devices, in Figure 28 the position operation on each axis of both devices (x, y, z), Figures 29 and 30 show the position errors generated from tracking the desired trajectory of the two haptic devices. These graphs show the performance of the controller in passive haptic guidance tasks in position tracking and convergence.

PID Wavenet-IIR Parematers
The performance of the PID wavenet control was evaluated based on the convergence time to the desired trajectory. The following figures describe the behavior of the adaptive control implemented on the maze of exercise 1 and 2. Figure 31 and 32 show the trajectory tracking in the workspace, the response estimation of the plant (haptic device), as well as the maze estimation error of exercise 1 and exercise 2 respectively.    a and b, where a is the scaling variable, which allows for dilations and contractions; and b is the translation variable, which allows for displacement at instant k, as well as (c) parameters C and D which are the forward and backward coefficients of the IIR filter respectively.
It is observed that all of them change their value in each instant of time of the exercise, as they evolve to the dynamics generated by the user and the region in which the haptic device is located within the workspace.

Conclusions
A novel identification and control scheme for the 3D nonlinear haptic robotic devices is implemented efficiently based in wavenet with IIR filter; it identifies inverse dynamics aimed at tuning PID feedback gains, not to approximate dynamics as usual neural networksbased control. Purposely, this scheme yields self-tuning of feedback gains to react to human interaction and commanding forces, notably, without any a priori knowledge of the haptic device to guarantee global asymptotic convergence. Real-time human-in-the-loop bimanual experiments show human cooperative decision making since both hands maneuver in the same workspace. The proposed scheme is viable for for practical implementation, where typically not only the exact nonlinear dynamics is now known but it accounts to varying and persistent exciting human interacting force. There was implemented the patterns of a clinical test with a healthy volunteer to assess the usefulness of the platform in real conditions, showing potential for patients.

Future Work
Platform was tested with healthy subject exhibiting velocities and range of motion within the expected regimes of patients. Next step is to run a controlled and clinically supervised protocol with upper limb motor disability patients who require motor rehabilitation. Particular interest is on cerebrovascular accident patients that requires also motor and cognitive coordination, for which virtual mazes tests are an option. Institutional Review Board Statement: Ethical review and approval were waived for this study due to the reason that no medical and psychological measurements were performed.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: All the data recorded during the tasks of active haptic guidance and passive haptic guidance on the human-robot interaction platform can be provided by the authors of this paper upon request.