The Wearable Robotic Forearm: Design and Predictive Control of a Collaborative Supernumerary Robot

: This article presents the design process of a supernumerary wearable robotic forearm (WRF), along with methods for stabilizing the robot’s end-effector using human motion prediction. The device acts as a lightweight “third arm” for the user, extending their reach during handovers and manipulation in close-range collaborative activities. It was developed iteratively, following a user-centered design process that included an online survey, contextual inquiry, and an in-person usability study. Simulations show that the WRF signiﬁcantly enhances a wearer’s reachable workspace volume, while remaining within biomechanical ergonomic load limits during typical usage scenarios. While operating the device in such scenarios, the user introduces disturbances in its pose due to their body movements. We present two methods to overcome these disturbances: autoregressive (AR) time series and a recurrent neural network (RNN). These models were used for forecasting the wearer’s body movements to compensate for disturbances, with prediction horizons determined through linear system identiﬁcation. The models were trained ofﬂine on a subset of the KIT Human Motion Database, and tested in ﬁve usage scenarios to keep the 3D pose of the WRF’s end-effector static. The addition of the predictive models reduced the end-effector position errors by up to 26% compared to direct feedback control.


Introduction
Wearable robotic devices can augment the abilities of the human body beyond its natural limits, allowing users to extend their reach, lift more weight, and reduce physical and cognitive loads in repetitive tasks. The research on wearable robotics is mostly focused on prostheses and exoskeletons, reaching considerable maturity over the past years [1]. Prostheses serve as replacements for lost human functionality, while exoskeletons adhere to existing human limbs, either for support and rehabilitation of unhealthy joints, or for enhancing healthy limbs in tasks such as walking and lifting loads.
Expanding the scope of robotic augmentation, researchers have proposed supernumerary robotic (SR) devices for able-bodied persons, which add to human capabilities instead of replacing or supporting existing ones. The wearable robotic forearm (WRF) described in this article is aimed at assisting users in close-range collaborative tasks, while remaining lightweight and maneuverable. Building upon previous conference publications regarding its design process [2] and biomechanics [3], we provide additional descriptions of the robot's mechanical architecture, kinematics, and control systems. We also extend upon preliminary work for 2D stabilization of the WRF's end-effector previously published in conference proceedings [4], applying more robust predictive models to 3D stabilization scenarios involving a larger range of human motion.
One common configuration of SR devices seen in the literature is of large human-scale arms mounted on the user's upper body, for instance a torso-mounted pair of robotic limbs for bracing and support of a crouching worker in settings such as aircraft manufacturing [5].
A second common configuration is wrist-mounted robotic fingers, designed to perform twohanded tasks with a single hand [6,7]. Wrist-mounted and torso-mounted configurations represent two extremes in terms of application, mounting point, size, weight, and power. The torso-mounted SR arms weigh over 15 kg and are capable of 50-70 Nm of torque, while the additional fingers typically weigh under 0.5 kg, with torques of up to 2 Nm. The WRF lies in between this spectrum of weight, power, and scale. Its latest prototype weighs about 2 kg with peak torques of 6 Nm, and is aimed at tasks with lower demands than the torso-mounted robots, leading to a smaller footprint. At the same time, its mounting configuration on the upper-arm allows for extended reach and multi-location work capabilities, in contrast to wrist-worn robots.
Aside from this form-factor, we envision the WRF to be an autonomous assistive agent, able to react and mutually adapt to the user through non-verbal communication based on body movements. This is in contrast to the prevailing paradigm in SR devices of direct control by the user, either through biological signals such as EMG [7], or push-button interfaces [8,9].
Owing to the novelty of this system, its design process took into account inputs from potential users at various stages of development. Starting with a rudimentary prototype (Model I), user surveys were conducted to generate design guidelines and requirements for the next version of WRF (Model II), which was evaluated in terms of workspace volume enhancement and biomechanical loads on the wearer due to the robot's motion. Following a pilot interaction study, the final dexterous prototype of the WRF was developed (Model III).
The biomechanical loads on the wearer due to the robot's motion (robot-to-human) constitute an important area of analysis for improving the ergonomics of usage. The other side to this interaction (human-to-robot), arising due to the physical coupling between the wearer's arm and the robot's base link, must also be considered. This interaction manifests in the form of disturbances introduced in the robot's motion plan caused by the wearer's independent body movements. The WRF must compensate for these disturbances in order to successfully acheive its goal during a shared activity. We describe each of these aspects of the WRF, and present novel methods for human-induced disturbance compensation. The contributions of this article can be summarized as follows: • Description of the user-centered design process for a novel configuration of a wearable robotic supernumerary arm, resulting in three successive prototypes. • Detailed analysis of the device's hardware implementation and kinematics. • Summary of the enhancement afforded by the WRF in tems of workspace volume, and a preliminary biomechanics analysis. • Development of human motion prediction models (autoregressive time series and recurrent neural network), trained offline on the KIT Human Motion Database [10]. • Description of linear system identification of the actuators, and techniques for stabilizing the WRF's end-effector against human-induced disturbances, with up to 26% improvement in performance with the inclusion of the prediction models compared to direct feedback control.

User-Centered Design Process
In this section, we summarize the considerations for design and usability that were involved in iterating through successive prototypes of the WRF.
Robotic arms designed for industrial applications typically have well defined usage scenarios and task specifications. Since the WRF was a novel configuration, we needed to identify contexts where it would provide the most useful assistance, and the design features required to perform tasks in these contexts. Applying the principle of user-centered design [11], we collected feedback from potential end-users during the development of the robot. Starting with an initial low-fidelity prototype, three studies were conducted: an online survey, a contextual inquiry, and an in-lab usability study. Figure 1 shows the overall user-centered design process, that goes from abstract inquiries to concrete guidelines informing the final design. A full description of the design process can be found in [2].  Steps involved in user-centred design of the wearable robotic forearm (WRF), going from initial concepts to an evaluated functional design.

Initial Design
To keep the weight to a minimum while effectively increasing the wearer's "wingspan", the initial design included horizontal panning at the elbow and length extension as the degrees of freedom (DoFs), to allow for usage across multiple workspaces. The WRF Model I prototype was constructed out of acrylonitrile butadiene styrene (ABS) mounting components and stainless steel sliders, with a two-fingered gripper based on the Yale OpenHand T42 [12] as the end-effector, shown in Figure 2.

Online Survey and Contextual Inquiry
After constructing the prototype, we categorized usage affinity diagrams [13] into a taxonomy of usage contexts and functions ( Figure 3)  Using this taxonomy, an online survey was conducted on Amazon Mechanical Turk with 105 participants (57 male, 48 female). They were first asked to rate potential usage contexts for both classes of interaction (social/functional) with the prompts "A robotic third arm is useful for <context>", and "I can see myself using a robotic third arm for <context>". This was followed by similar prompts for rating the functions in both classes of interaction, and concluded with soliciting open-ended responses about desired features for the robot, and demographic details.
Participants considered the device to be more useful as a functional tool, particularly in tasks such as carrying objects and handling hazardous materials, than for social uses by a wide margin. They also considered it to be more useful in professional and military settings, and least in recreational contexts.
Informed by these results, a need-finding contextual inquiry [14] was conducted with a building construction crew on a university campus. The protocol for this inquiry was approved by the Cornell University Institutional Review Board (CU IRB) for human participant research (number 1611006802). The inquiry proceeded with guidance from a supervisor who offered expert testimony about the tasks of 10 workers. During and after observing their day-to-day tasks, workers were asked about the cognitive loads and common frustrations, which led to the identification of three usability "need themes": assistance in reaching for objects and self handovers, stabilization of workpieces, and coordination of repetitive actions.

Usability Study
To generate actionable design principles grounded in physical interaction with the device as opposed to the previous conceptual inquiries, an in-lab study was conducted with users wearing the robot. In total, 14 university students (9 female, 5 male) participated in this study, which involved a semi-structured interview protocol (number 1608006549 approved by CU IRB). They began by describing a typical day in their lives, and were asked to imagine having a third arm attached to their body in some activities at home, at work, and performed for recreation. In order to narrow down their thought process towards the robotic arm, the same questions were repeated after showing them pictures of a 3D model of the prototype. Then they wore the robot and performed two scripted tasks: moving a cup on a table while seated, and handing over the cup to the interviewer. The robot was pre-programmed during these tasks, running an open-loop trajectory without feedback, sensing, or adaptation. Finally, participants were debriefed and asked for improvements and suggestions that they would like to see in future prototypes, and desirable features in a commercial product.

Design Guidelines
The verbal responses of the in-lab study participants were analyzed, and recurring opinions were grouped into common themes. For instance, multiple participants commented on the weight and ergonomics of the device (e.g., "It's too heavy, definitely a strain on my arm"), and manipulation capabilities (e.g., "Gripping capacity should be better", "More degrees of freedom [are needed]").
The following guidelines and requirements for improving the WRF emerged from the responses to the contextual inquiry and the in-lab study: • Weight and Balance: A majority of participants suggested reducing the weight of the robot, as well as selecting a more ergonomic attachment point to the human arm. • Dexterity: Participants desired more dexterity than was presented to them at the end-effector, such as vertical pivoting for improved handovers and increasing the robot's reach. • Control and Autonomy: Participants suggested control schemes based on voice commands, and intention recognition from the wearer's movements, with varying levels of robot autonomy based on the task scenario. • Feedback: Most participants commented that the robot's intentions were not clear throughout the usability study trajectories, suggesting that its intention be shown through lights, sounds, and voice feedback. • Appearance: Participants suggested modeling the device on existing prosthetic devices, finding the idea of another human-like arm attached to their bodies to be discomforting. They also suggested selecting materials capable of handling hazardous substances.

Design Iterations
To address the concerns in the design guidelines, two additional degrees of freedom (DoFs) were added in the next WRF prototype (Model II): vertical pitching of the arm, and wrist rotation before the gripper ( Figure 4). Additionally, the robot's mounting point was shifted closer to the human elbow for improved ergonomic performance. This follows from the weight and balance of the robot being a consistent point of discussion. These considerations were balanced in trade-offs between ergonomics, motor power and robot dexterity, resulting in design improvements that allowed the robot to perform tasks similar to the contextual inquiry ( Figure 5).  (c) (d) Figure 5. Usage scenarios for the WRF: (a) one-handed self-handover, (b) stabilizing a workpiece for bi-manual manipulation (bracing), (c) fetching an object from below, (d) assisted two-person handover [3] (©2018 IEEE).
The vertical pitching, along with complete 360°panning, results in a full 3D workspace, and allows the robot to reach objects placed below as well as behind the user. The horizontal panning and vertical pitching DoFs are primarily responsible for bulk positioning of the WRF, while length extension further enhances its reach.
In the design of the wrist and end-effector, the most important trade-off is between dexterity and ergonomics. Articulated spherical wrists, including serial and parallel mechanisms, have been extensively studied and deployed in commercial and research robots [15]. A fully articular 6-DoF parallel mechanism, similar to a Gough-Stewart platform [16,17], was initially considered for the WRF's wrist. However, based on the indicative usage scenarios discovered in the studies, a wrist with a single rotational DoF similar to human wrist pronation and supination was thought to be sufficient, along with rudimentary grasping capabilities with a two-fingered gripper.
Following a pilot interaction study, described in detail in [3], more dexterity was desired from the WRF's wrist, especially in pick-and-place tasks to orient grasped objects for handovers and drop-offs. As a result, in the final prototype (Model III), another DoF was added before the end-effector: vertical pitching of the wrist (DoF-5 in Figure 6). The other changes were to the place the DoF-2 motor right below DoF-1, and adding a planetary gearset at the output of DoF-2 with a 4:1 reduction.

Mechanical Design and Architecture
Following the progression in prototypes, in this section we describe the physical structure, actuators, and electronics architecture in the WRF's hardware implementation, as well as the kinematics for Model III. Its features are summarized in Table 1.

Physical Structure
The WRF consists of an arm medical brace made of plastic, foam, and steel, used as the base for attaching the serial chain kinematic robot structure. Material selection played a major role towards weight reduction in the WRF from Model I to Model II. The ABS mounting platform for the robot was replaced with a waterjet-machined sheet aluminum structure ( Figure 7a). Aluminum sliders were used instead of stainless-steel ones in the length extension mechanism, serving as both actuation and structural elements.
The initial gripper was designed after the Yale OpenHand Model T42 [12], adapted to constrain both fingers to move together using a single motor for weight considerations. In Model II, gripper finger sizes were reduced, and the motor housing and adaptor were removed, resulting in the motor body itself acting as a structural element connecting it to the previous DoF ( Figure 7b). A serial mechanism in the form of a connector directly mounted on the motor horn of the wrist actuator was preferred over a parallel mechanism such as a four-bar linkage [18]. This is due to the mechanical simplicity, and lack in singularities within the workspace of a serial mechanism [19], as well as the fact that the motor body can itself act as a structural element in the relatively low-load applications for the WRF.

Actuation
The WRF was actuated with ROBOTIS Dynamixel servo motors. The horizontal panning and vertical pitching DoFs used MX-64T motors weighing 135 g each, with built-in proportional-integral-derivative (PID) feedback control for position and velocity, stall torque of 6.0 Nm at 12 V, and maximum speed of 63 rpm. These two DoFs required the most powerful motors since they were subject to the bulk of lifting and carrying loads. The length extension and gripper used smaller MX-28T motors, weighing 77 g with a stall torque of 2.5 Nm at 12 V, also with PID position and velocity control.
The wrist rotation and wrist pitching motors were subject to the least loads during operation, being at the end of the robot's serial kinematic chain and not needing to generate contact forces for gripping. As a result, lower-end AX-12A motors were used for thes DoFs, with 1.5 Nm stall torque at 12 V, weighing 54.6 g each, and with only proportional feedback controllers for position and velocity.
The rack-and-pinion length extension mechanism in Model I was direct-driven, with the pinion gear mounted directly on the motor horn (Figure 7c, left). This design was updated to a belt-driven mechanism with a 7:1 transmission ratio and separated pinion gear, resulting in a faster extension speed and lower chance of slippage (Figure 7c, right).
Combined with aluminum sliders instead of steel, these design choices resulted in improved ergonomics and weight distribution.

Electronics
The motors in the WRF communicate at 1 mbps over a TTL protocol, attached serially in a daisy-chain fashion (Figure 8). The arm was tethered, receiving control commands from a PC, connected using a Xevelabs USB2AX v3.2a USB to TTL Dynamixel Servo Interface. It was powered by a 12 V, 5 A DC supply through an SMPS2Dynamixel Adapter. The MX-64T and MX-28T motors had onboard Cortex M3 CPUs, while the AX-12A had an Atmega8-16AU CPU.

Forward Kinematics
The WRF consists of a serial kinematic chain attached to the human forearm ( Figure 9). Generally, the human arm can be represented as a 7-DoF chain [20]. However, since the robot's motion is unaffected by human wrist movements, we used a reduced 5-DoF model, with three joints at the shoulder and two at the elbow. The forward kinematics of each of these serial chains was described with coordinate frames derived using the Denavit-Hartenberg (D-H) convention [21], resulting in a homogeneous transformation matrix T n 0 between the frame H 0 at the origin (human shoulder joint) and the frame H 5 at the human's hand: Here n = 5 is the number of joints, and (α i , a i , d i , θ i ) are the D-H parameters for human arms.
In Table 2, (top), the anthropometric parameters and ranges of motion have been adapted from [20] and the NASA Man-System Integration Standards [22]. Table 2. D-H parameters for human arm and WRF Models.  (1), transformation matrices U m 0 can be constructed for the WRF models using the D-H parameters listed in Table 2, with m = 6 for Model III, and concatenating them with T n 0 to get the transformation S n+m 0 for the combined human-robot model: To account for the attachment point offset between the human and robot, parameters for the fifth DoF in T n 0 were modified to a 5 = 0.075 m, d 5 = 0.016 m in S n+m 0 .

Inverse Kinematics
The inverse kinematics (IK) problem involves finding the values of the joint variables for a desired position and orientation (pose) of the end-effector. To find the IK for WRF Model III for a fixed human pose, we assigned coordinate frames according to the D-H convention, starting from the human arm attachment point, O 0 , to the mid-point of the robot end-effector's fingers, O 6 ( Figure 10).
The robot is over-constrained, having five articulated DoFs instead of six, resulting in no guaranteed solutions to the general position and orientation IK problem [23].  However, in most situations, the WRF's wrist remains vertical, with θ 5 = π/2. In this case, the position-only IK problem has an analytical solution in the first three DoFs. The position vector P = (x, y, z) T between the base frame O 0 and end-effector frame O 6 is a part of the transformation matrix U 6 0 , and can be written in terms of D-H parameters: The joint variables for the first three DoFs can be computed for a given P in terms of the parameters θ 1 , d 3 , θ 2 : A more detailed analysis of the forward and inverse kinematics of the WRF, including for variable wrist orientations, has been presented in [24]. Another approach for solving the position-only IK problem with variable wrist orientation is by approximating the change in joint variables (∆ θ) required for a small change in end-effector position (∆ P). This involves determining the Jacobian matrix, J for the transformation between P and θ, followed by computing its Moore-Penrose inverse (pseudoinverse) to find the change in joint angles [25]. Each element of the Jacobian matrix J is defined as: For the WRF's position-only IK, J is a 3×5 matrix such that: This leads to the following approximate solution for ∆ θ, involving J + , the pseudoinverse of J: This approach resulted in a fast computational method to implement IK for the WRF, further used for end-effector stabilization.

Preliminary Analyses
Along the development cycle, preliminary analyses were performed with the WRF prototypes to validate the changes in design. Model II significantly enhanced the user's reachable workspace volume compared to Model I and the normal human range, while remaining within acceptable limits of biomechanical loads. These conclusions remain valid for Model III as well, indicating that at least physically, the WRF can be an effective augmentation without imposing unreasonable loads on the user.

Workspace Volume
With the WRF, a user can reach objects farther than the normal human range. This enhancement was measured in terms of the total reachable workspace volume, which is the 3D region containing all possible end-effector positions when a mechanism undergoes its full range of motion (RoM).
Using a Monte-Carlo sampling procedure, point clouds of the end-effector positions were collected for the kinematic chains of the human arm, and combined human-robot chains with Models I and II, undergoing their full RoMs. These point clouds were decomposed into 2D horizontal slices, and numerically integrated along the vertical direction to compute the volumes, as described in [26].
The total reachable workspace volume for the human arm alone was found to be 1.003 m 3 . This was enhanced to 2.389 m 3 while wearing Model I, an improvement of 138%. Wearing Model II further increased the total reachable workspace volume to 3.467 m 3 , an improvement of 246%, as illustrated in Figure 11.

Biomechanical Loads
The biomechanical loads on a wearer are an important consideration during prolonged usage of the WRF. In typical scenarios, such as those shown in Figure 5, the human's arm remained fairly static, while the robot moved to fetch or grasp an object.
Building on the kinematics, as shown in Figure 12a, the dynamics of interaction between the human arm and robot was modeled as a point force F R and moment M R , considering them as separate bodies. The biomechanical load consisted of the force norms at the human shoulder and elbow: F A , F B , and corresponding moment norms: M A , M B . The statically determinate scenarios, fetching from below (Figure 5c), and assisted two-person handover (Figure 5d), were considered for this analysis. The forces and moments in these scenarios were computed using the iterative Newton-Euler dynamics algorithm [27]. The peak moment loads on the wearer's shoulder ( M A ) and elbow ( M B ) during these tasks were~24.8 Nm and~11.6 Nm, respectively, (Figure 12b,c). The peak force loads were~55.8 N at the shoulder ( F A ), and~31.3 N at the elbow ( F B ). For comparison, the human shoulder can withstand moment loads of magnitude~85 to 130 Nm and force loads of~100 to 500 N, while the elbow can withstand moments of~40 to 80 Nm and forces of 50 to 400 N [28,29].
The anthropomorphic parameters (link lengths, masses, inertias) for the workspace volume computation, as well as biomechanical load analysis, have been adapted from the NASA Man-System Integration Standards [22].
Details on the procedures for computation of workspace volumes and biomechanical loads can be found in [3]. Concurrent to this article, we have also conducted a more detailed inquiry into the biomechanical effects of the WRF's motion on the user's body at the musculoskeletal level, and have developed trajectory optimization techniques to generate motion plans for the WRF that minimze these effects [30].

End-Effector Stabilization
Having established that the WRF enhances a user's reachable workspace volume while remaining within ergonomic biomechanical load limits, we now consider the interaction effects between the user and the robot. During collaborative activities, disturbances are introduced in the robot's motion plan due to the user's independent arm movements. In order for the WRF to be an effective augmentation, it needs to be able to counteract these disturbances. In this section, we describe strategies for stabilizing the WRF's end-effector while it is worn by a user performing close-range tasks.
We had previously achieved promising stabilization results for small, 2D planar movements using time series forecasting of human arm motion [4]. This approach was extended here to include a recurrent neural network (RNN) model for human motion prediction, and applied to 3D stabilization in five common tasks such as wiping a desk and stowing items into drawers.
A direct feedback control strategy is outlined in Figure 13a. The joint angle reference signals for each motor are determined from the poses of the human and WRF detected by an optical motion capture system, as well as the desired pose of the end-effector. The aim is to stabilize the end-effector at a static position in 3D, with the relevant joint angles computed using the inverse kinematics described in Section 3.5.
(a) Overview of direct feedback control (b) Feedback control with human prediction model incorporated Figure 13. The predictive models generate motor joint angle references over a finite horizon. This paper compares the end-effector stabilization (a) without human motion prediction, and (b) with human motion prediction included.
It was discovered in [4] that, while the WRF's actuators possess sufficient bandwidth to be effective in a direct feedback control strategy even with stock tuning, in practice, their performance is hindered due to sensing and actuation delays. A predictive approach, where the user's arm motion is determined over a finite horizon, was found to mitigate these effects. Linear system identification techniques were applied to estimate the delays and determine the prediction horizon.
Two approaches were considered for predicting human motion over this horizon for extended tasks in 3D: an autoregressive (AR) time series model as in [4], and a recurrent neural network (RNN) model adapted from [31]. These models take in the poses for the WRF and human, and generate a sequence of joint angle references over the time horizon. Both of these approaches were trained offline using the KIT Whole-Body Human Motion Database [10] and adapted for online predictive control through the framework shown in Figure 13b.

System Identification
As a precursor to the application of human motion prediction for stabilizing the WRF, the dynamic response of its motors were studied in typical usage scenarios, and system identification was performed to recover the in situ motor parameters for Model III. This allowed for the estimation of sensing and actuation delays in the physical system by augmenting a delay term to the linear models, and fitting to data from the motion capture system.
Each of the Dynamixel motors used in the robot have built-in PID controllers, apart from the AX-12A motors for wrist rotation and pitching that only have proportional control. Each motor receives a reference angle θ R as input from the PC, driving a DC motor plant, with output angle θ measured using built-in encoders (Figure 14a).
The plant transfer function G(s) between voltage V and output angle θ is based on an L-R circuit DC motor model [32], resulting in a third order system in terms of parameters α 0 , γ 0 and γ 1 : During system identification, the PID controller's transfer function C(s) used manufacturer supplied values for the gains K p = 4, K i = 0, and K d = 0. This resulted in the closed-loop transfer function P(s) between the motor output angle θ and reference signal θ R to be a third-order system with no zeros: The closed-loop model parameters A 0−2 and B 0 were fit to the measured output signals using the Simplified Refined Instrumental Variable method for Continuous-time model identification (SRIVC) method [33]. No explicit delays were assumed in this transfer function since the encoders are built-in to the motors. The plant parameters α 0 , γ 0 and γ 1 were then obtained from A 0−2 and B 0 . Each DoF was identified individually, keeping all other motors fixed, and the magnitudes of the step reference input signals were determined from the usage scenarios (e.g., steps of 0.7 rad over 2 s for DoF-1 as shown in Figure 14b).
The accuracy of the identified system models was evaluated by computing the Normalized Root Mean Squared Error (NRMSE) goodness of fit between the output signals measured by the encoders and the simulated motor model outputs, for the same reference input. The plant parameters and model fitting metrics for each DoF are listed in Table 3.  Having obtained the open-loop plant transfer function parameters for each of the DoFs in the WRF, we can use augment these models to estimate the sensing and actuation delays in the overall system.

Delay Estimation
The first step in developing predictive models was to estimate the time horizon for predictions over which the WRF's motors need to be controlled to compensate for sensing and actuation delays. This time horizon h (Figure 15a) was determined by system identification using the linear model described in Equation (10) with a delay term τ d included: (11) θ is the motor response to an input step signal θ R , reconstructed though the inverse kinematics equations in Section 3.5 using data from the motion capture system (Figure 15b). The other terms in the transfer function, A 0−2 and B 0 , were obtained from the system identification performed earlier by using the parameters in Table 3 and the stock PID control gains K p = 4, K i = 0, and K d = 0. This allowed for the isolation of system delays in the motion capture and communication channels from the in situ motor dynamics.
The delay τ d was estimated to be 86 ms using the same SRIVC method as before, averaged across DoFs 1-3 which showed relatively slower responses due to larger loads. This corresponded to a prediction time horizon h of about 10 time steps for the OptiTrack motion capture system used in this work with a frame rate of 120 Hz [34].

Previous Work on Planar Stabilization
In previous work [4], we had developed an end-effector stabilization strategy for a reduced 2D scenario. The positions of the WRF's base and end-effector were tracked using fiducial markers and a stereo camera (Figure 16a) while the user's arm moved in a periodic manner in the XY plane with small displacements of~15 cm from an initial position at frequencies typically less than 1 Hz. Using the identified linear system models through the procedure described in Section 5.1, the step response charactersitics were estimated for the DoF-1 and DoF-3 motors (Table 4). In particular, the bandwidth for both motors was found to be above 1 Hz, which should have been sufficient to stabilize the WRF against small, planar human arm motions through a direct feedback control strategy (Figure 13a). However, this performance was affected by delays in sensing and actuation. After estimating these delays using similar linear models (Section 5.1.1), an autoregressive (AR) time series model for human arm motion was developed to determine the joint angle reference signals for DoF-1 and DoF-3 using the approach shown in Figure 13b to stabilize the end-effector in 2D. Compared to a direct feedback control approach, the AR model helped reduce position errors by 19.4% in X and 20.1% in Y (Figure 16b).
Related work in this domain includes stabilization of SR limbs using a time-series model of the forces and torques due to the wearer's change in posture [35], as well as modeling of hand tremors as Fourier series for tool-tip compensation in a handheld surgical device [36]. This literature informed the choice of AR models for predictive control of the WRF, both in [4], as well as being applied to the full 3D case here.

Human Motion Prediction
The estimated system time delays for the WRF served as prediction horizons for the human motion prediction models for end-effector stabilization. The criteria for these models were real-time (or close to real-time) prediction with optical motion capture data, and good performance over the required controller time horizon in close-range tasks.
Two methods were utilized for this purpose: an autoregressive (AR) time series model, and a single-layered gated recurrent unit (GRU) adapted from [31] and modified for realtime performance. Both of these models were trained offline using the KIT Whole-Body Human Motion Database [10], available at [37]. It consists of a wide selection of task and motion scenarios, with annotated recordings from optical motion capture systems, raw video, as well as auxiliary sensors (e.g., force plates). For this work, we utilized labeled human skeleton marker data ( Figure 17) from nine tasks in the database that involved periodic movement of the subject's right arm. They are listed in Table 5 along with the number of trials performed for each task, and the total number of data points with human right arm movements extracted from all trials.   The full-body skeleton marker set consists of 56 points, out of which 10 are relevant for prediction of human right arm motion, with the positions on the body determined by a weighted sum of the individual 3D positions of the markers (Figure 17b): 3 for the clavicle (C), 3 for the shoulder (S), 3 for the elbow (E), and 4 for the wrist (W).
Three relative position vectors were generated from the four body points: v 1 = CS, v 2 = SE, and v 3 = EW. This allowed for prediction of movements of a particular body segment independent of its previous neighbor, and improved the training accuracy of the models.

Autoregressive Time Series Model
As in [4], the time series model started with the initial assumption of an Autoregressive Moving-Average (ARMA) process: Here x t is a discrete univariate series, composed of a constant term C, past terms x t−k weighted by coefficients A k for lag k (AR term), and past white noise terms t−j weighted by the coefficients B j . The number of past terms, p and q determine the orders of the AR and MA parts, respectively.
Each component of the relevant body vectors v 1 , v 2 and v 3 , was considered to be an independent univariate series. The stationarity of these series was verified with augmented Dickey-Fuller hypothesis tests [38].
The autocorrelation (ρ k ) and partial autocorrelation (r k ) functions at lags k were computed for these series. There were sharp drop-offs in r k compared to ρ k over successive lags for each component of the body vectors, illustrated Figure 18 for the X component of v 2 . This indicated that the ARMA processes could be simplified into purely autoregressive (AR) models [39]: The model order p for each of the nine components in the body vectors was determined using the Akaike Information Criterion (AIC), a maximum-likelihood measure of the goodness of fit [40]. The AIC was computed for model orders up to 30 for each of the nine series, and the one with minimum AIC was selected as p for that series. The minimum AIC values were obtained at different model orders for each series, ranging from p = 18 to p = 25. The model parameters A k , C, and t were determined using the Yule-Walker method [41], trained on the task motions listed in Table 5.

Recurrent Neural Network Model
While an AR model is able to forecast human motions through local predictions, it does not capture dependencies over a longer time period, or encode structural information about the correlations between body components over time. To account for these factors and improve on the predictions from the AR models, we used a recurrent neural network (RNN) model for human arm motion prediction, and compared the performance between the methods.
Independent of robotics, RNNs have been applied extensively for human motion prediction, including architectures with Long-Short Term Memory (LSTM) cells [42], and structural RNNs that encapsulate semantic knowledge through spatio-temporal graphs [43]. These approaches include multiple recurrent layers as they are aimed at offline prediction of the entire human skeleton, and task classification in general motion scenarios. As the task scenarios for WRF stabilization involve periodic motions and require prediction of only the wearer's arm, we used a simpler model with a sequence-to-sequence architecture [44] and a single Gated Recurrent Unit (GRU), as proposed in [31], which also includes a residual connection for modeling velocities. Compared to an AR model, this resulted in higher prediction accuracy of human arm motion, and improved the end-effector stabilization in most task scenarios.
The schematic of the RNN model is shown in Figure 19a. It consists of an encoder network that takes in a 9-dimensional input of the body vectors, [ v 1 , v 2 , v 3 ], 50 frames at a time from the KIT database or motion capture system, and a decoder network that converts the output from a single GRU cell with 1024 units into 9-dimensional predictions over k steps. Based on the estimated system delay, we set k = 10, and the learning rate to be 0.05 for batch sizes of 16, as specified in [31] for predictions up to 400 ms. This RNN model was trained on the KIT Database motions listed in Table 5, and converged at about 5000 iterations, as shown in Figure 19b with Mean-Squared Error (MSE) losses.

Model Evaluation
Both models were evaluated on the relevant motions from the KIT Database listed in Table 5. They were trained offline using all but two trials for each task, with one of remaining trials serving as the validation set, and the other as the test set. The training set was expanded to four times its original size by adding Gaussian white noise with standard deviation 1 cm to each of the nine components of the body vectors, leading to 89,864 data points for training. The test and validation sets had 18,922 and 15,042 data points, respectively.
The Root-Mean-Square (RMS) prediction errors were computed on the test set for both models, and are listed in Table 6. While the RNN model did not improve upon the AR model for every component, it reduced the prediction errors in the components with the worst performance using AR (Figure 20). The RNN model also performed better overall, with an average RMS error of~0.90 cm, compared to~1.25 cm for the AR model. Figure 21 shows that while the RNN model tended to overshoot the ground truth, and be offset from it, the tracking of overall motion trends was better than the AR model.

Implementation on the WRF
Having obtained two predictive models for human arm motion that performed well on the KIT Database, they were applied for stabilization of the WRF's end-effector at an initial pose when subjected to disturbances due to movement of the user's right arm. For validation of these models, we considered five task scenarios, shown in Figure 22, that involved periodic arm movements of relatively small magnitude-(a) tracing a line of length 10 cm, (b) tracing a circle of diameter 10 cm, (c) wiping a desk top, (d) painting with small brush strokes on a canvas, and (e) placing ten objects into shelves of a table-top drawer unit. Each task was performed for~5 min, with each iteration lasting between 5 s (for tacing lines) and 30 s (for placing objects) depending on the complexity of the task. The initial end-effector pose was selected to be on the right of the user and below them, so as to not impede the task.
Optical markers were placed on the user's right hand and elbow, as well as on the WRF's end-effector and near the DoF-1 motor (Figure 23).
These markers were tracked at 120 Hz using an OptiTrack motion capture system [34]. The raw marker position data was smoothed and filtered using an IIR low-pass digital filter with transfer function coefficients for 6 Hz normalized cutoff frequency [45], following the techniques discussed in [46,47].
In all the scenarios shown in Figure 22, the body vector v 1 was assumed to be constant in each task, as the human shoulder and torso remained almost stationary at their initial positions. The other relevant points, B (base position of the WRF), and R (position of the end-effector), to be tracked are shown in Figure 23, We aimed to keep the end-effector static at the initial point R = R 0 at the start of each task. If the user's arm were to move, the end-effector would also move by an amount ∆ P = R t − R 0 at time t. To generate appropriate setpoints for the WRF's motors, ∆ P is converted from the a global frame G (fixed lab frame) to the robot's base frame B. Using the convention T B A for the homogenous transformation of the pose of frame B as seen in frame A, we need to convert from T R G to T R B . Using the elbow frame E as an intermediate, (e) (f) Figure 22. Scenarios in which the WRF's end-effector was stabilized while the user performed a task.  The transformation between the robot base B and elbow E is constant, while the transformation T E G consists of two variable parts: the rotation matrix R E G between the elbow and ground frames, and the position of the elbow, P E which is tracked directly by the motion capture system. R E G is the rotation matrix that takes the unit vector along the local X-axis,î = [0, 0, 1] T , and aligns it with the unit vector along the human forearm,v 3 , in the ground frame. Using the approximate method for position-only inverse kinematics (Jacobian pseudoinverse) discussed in Section 3, the change in WRF joint variables can be determined: At time t, this gives the desired setpoint reference for each motor used for direct feedback control:θ Following the procedure shown in Figure 13b, the predictive models were used to generate setpoint references over a time horizon of~86 ms for each motor in the WRF: For a stereo camera frame received at time t, a sequence of k = 10 joint angle references θ d were sent to each motor, with ∆t i ∈ [0, 86] ms, i ∈ [1, k]. As described above,θ d [t] is the desired joint angle in direct feedback control, computed using inverse kinematics for the detected human and robot poses at time t. The predictions from the AR and RNN models are represented as residuals ∆θ i added toθ d [t].
During implementation, it was found that the AR model could generate predictions nearly in real time, though requiring a few seconds of sensor data collection to initialize the predictors at the start of each task. In comparison, the RNN model had lags of up tõ 50 ms due to computational bottlenecks when predicting over the specified time horizon. To account for these lags, the pre-trained RNN model was executed in parallel with the AR model. Until a prediction was received from the RNN model, the AR prediction was used for computing θ d . Depending on the amount of lag, determined through time stamps, a corresponding number of RNN predictions were discarded (typically the first 5-6 terms), and the remaining ones were added to the sequence ∆θ i to be sent to the motors.
This implementation of human motion prediction (RNN + AR) reduced the mean error in end-effector position by up to~26 % over direct feedback control, while the AR model alone was able to improve upon direct feedback control by up to~19 %, as listed in Table 7. Figure 24 shows that the performance of all three control methods varied according to the task, with more structured and periodic motions such as tracing a line and circle showing better stabilization performance compared to motions with less structured or periodic behavior such as stowing items into a drawer.

Conclusions
This paper summarized a novel configuration for a wearable supernumerary robotic (SR) arm aimed at close-range human-robot collaboration, and studied the performance of human motion prediction for stabilizing its end-effector in illustrative usage scenarios.

Design
The robot's design process was driven by usage contexts determined through taxonomy development and surveys; specific functions were then derived from need-finding through contextual inquiry with construction workers, and further informed by a laboratory study with a physical robot prototype. This process was published earlier in conference proceedings [2]. These led to a robot design that increased the human-reachable workspace by 246%. Furthermore, it supported picking up objects for self-handovers, assisting in human-human handovers, and providing object stabilization. The design added action capabilities while being low in weight and well-balanced enough to stay within human biomechanical load limits. These results were published earlier in conference proceedings as well [3]. Since then, following further interaction studies, another design iteration led to a more dexterous prototype of the WRF, Model III, with five articular DoFs. Additional descriptions of the mechanical architecture, forward and inverse kinematics, and actuation systems of the WRF have been provided here.

Predictive Control
Linear system identification was performed to estimate the delays in sensing and actuation used as time horizons for the human motion prediction models. Previously published work involved 2D planar stabilization of the WRF's end-effector with small human movements [4]. The primary contribution of this article is the extension of this work to 3D scenarios with a wider range of human movements. The human motion prediction models took the form of an autoregessive time series and a recurrent neural network, both trained and validated offline on the KIT Human Motion Database.
These trained models were tested directly on the physical system, resulting in lower mean position errors of the end-effector compared to direct feedback control. However, the absolute improvement in performance was relatively small for the tasks considered here. A potential solution could involve using more powerful and heavier actuators with greater bandwidths, though resulting in larger loads borne by the user during operation. Along with vision-based sensing, systems with wearable sensors such as Inertial Measurement Units (IMUs) mounted on the human and robot might help improve performance.

Future Work
Extensions of this work can include stabilizing over a trajectory in free space, over bulk human motions such as walking, handling heavier grasped objects, and wider human subjects studies to explore variations across tasks and users. These scenarios would be even more challenging due to greater uncertainties in human motion, requiring sensing and actuation with minimal delay. Having the actuation and control systems off-board [48] would, for a limited workspace, sidestep the trade-off between the motor power and weight of an SR device. To allow for a wider range of users for these systems, the length parameterization of the body vectors needs to be non-dimensionalized during training and testing of the predictive models. This can be achieved by separating the positions and orientations of the body vectors, and performing initial calibrations for each user.
While this article addresses the design challenges and stabilization of an SR device in close-range tasks, there are also challenges related to the human-robot collaboration aspects that remain to be studied. The work presented here summarizes the foundations for a research platform to achieve fluent performance in closely entangled human-robot collaborative setups, while accounting for the uncertainty introduced as a consequence of the wearer being an integral part of the system.

Data Availability Statement:
The data for this article may be obtained from the corresponding author upon request.