FPGA Applied to Latency Reduction for the Tactile Internet

Tactile internet applications allow robotic devices to be remotely controlled over a communication medium with an unnoticeable time delay. In bilateral communication, the acceptable round trip latency is usually 1 ms up to 10 ms, depending on the application requirements. The communication network is estimated to generate 70% of the total latency, and master and slave devices produce the remaining 30%. Thus, this paper proposes a strategy to reduce 30% of the total latency produced by such devices. The strategy is to use FPGAs to minimize the execution time of device-associated algorithms. With this in mind, this work presents a new hardware reference model for modules that implement nonlinear positioning and force calculations and a tactile system formed by two robotic manipulators. In addition to presenting the implementation details, simulations and experimental tests are performed in order to validate the hardware proposed model. Results associated with the FPGA sampling rate, throughput, latency, and post-synthesis occupancy area are analyzed.


Introduction
Tactile internet is conceptually defined as the new generation of internet connectivity which will combine very low latency with extremely high availability, reliability and security [1,2]. Another feature that has been pointed out is that this new generation will be centered around applications that use human-machine communications (H2M) alongside devices that are compatible with tactile sensations [3][4][5]. Currently, the IEEE 1918.1 standard [6] defines characteristics of the tactile internet, where both the structure and description of application scenarios and definitions are presented. A tactile internet environment is basically composed of a local device (known as a master) and a remote device (known as a slave), where the master device is responsible for controlling the slave device over the internet through a two-way data communication network [7,8]. Bidirectional communication is needed to simulate the physical laws of action and reaction, where action can be represented as sending operational commands and reaction can be represented as the forces resulting from that action. In tactile internet applications, the desired time delay for device communication is characterized by an ultra-low latency. In bilateral communication, the required round trip latency ranges from 1 ms up to 10 ms depending on the application requirements [9][10][11][12].
According to [13], it can be noticed that in a tactile internet application, 30% of the total system latency is generated by the master and slave devices. These devices demand high processing speeds as repeated execution of a variety of computationally expensive algorithms and techniques are required. These algorithms involve the use of arithmetic operations and calculations of linear and nonlinear equations that need to be computed at high sampling rates in order to maintain application fidelity. The remaining 70% of the latency is caused by the communication network, which makes them unsuitable for such latency constraints [14]. To address this problem, some research groups have been studying communication networks in the context of tactile internet. The works [15][16][17] shows some types of techniques that can be used to reduce network latency.
Other groups have been studying prediction techniques, where many algorithms have been studied and proposals using artificial intelligence (AI) have proved to be effective [18]. On the other hand, the implementation of complex AI-based prediction methods can further increase the latency of the computer systems present in master and slave devices. Alternatively, new approaches such as using field-programmable gate arrays (FPGAs) can improve the performance of master and slave devices in a tactile system environment. The FPGAs enables the creation of customizable hardware which allow algorithms to be parallelized and optimized at the logical gate level to speed up their operations. Literature results show that computationally expensive algorithms can achieve speedups of up to 1000× over software implementations when custom-implemented in FPGAs [19][20][21][22][23][24][25].
In this context, this paper has, with motivation, a hardware proposed implementation to target reducing the 30% of the total latency related to tactile devices. The project uses FPGA devices to minimize the execution time of algorithms associated with master and slave devices. FPGAs allow the parallelization of algorithms and latency reduction compared to software systems embedded in traditional architectures with general purpose processors and microcontrollers. In an effort to validate the proposed strategy, this paper presents a discrete reference model that can be adjusted for different types of master and slave devices in a tactile internet system. Validation results, throughput, and post-synthesis figures obtained for the proposed hardware implementation using FPGA devices are presented. Comparisons with other works in the literature show that using FPGA can significantly accelerate the processing speed in tactile devices. Thus, this work makes the following contributions: • A new discrete reference model for a tactile internet system. • The novel reference architecture for hardware design on FPGA for tactile master and slave devices. The remainder of this paper is organized as follows: Section 2 presents the related works in the literature; Section 3 introduces a new discrete reference model for a tactile internet system; Section 4 describes the PHANToM Omni robot used with master and slave device; Section 5 presents the simulated tactile internet model; Section 6 gives all detailed description of the reference hardware architectures proposed in this paper; Section 7 presents and analyzes the synthesis results obtained from the described implementation, including a comparison to other works; Section 8 presents the final considerations.

Related Work
The authors of [26] presented a tactile internet environment that used a glove type device in conjunction with a robotic manipulator. The environment was developed using a general purpose processor, which made the execution of the algorithms sequential. In order to send the data, the tactile glove produced a latency of approximately 4.82 ms, and the hardware responsible for performing the inverse kinematics calculations took an interval of 0.95 ms. The latency values obtained in this application could be improved by hardware structures that allow algorithms parallelization.
Studies in the literature demonstrate the benefit of using FPGA to accelerate the sample rate for data acquisition from devices associated with haptic systems. The authors of [27] presented an implementation for controlling a 3-DoF (Degree of Freedom) device. The Another hardware implementation of inverse kinematics was presented in [33]. The device used was a 10-DoF biped robot. A CORDIC implementation was used to perform the trigonometric calculations. The execution time needed to compute the kinematics of the 10 joints in FPGA was of 0.44 µs. In this paper, a comparison with a software implementation was also performed, and the time taken to perform the same calculations was 3342 µs, i.e., the gain on execution, or speedup, on custom FPGA hardware was 7595×. The resulting error between both implementations was acceptable for this specific control.
In [34], an FPGA implementation of the forward and inverse kinematics of a 5-DoF device was presented. The hardware was developed using a fixed point representation where 32 bits were used for the angles representation and 15 bits for the fractional part. For the device spatial positioning, 16 bits were used of which 7 bits for the fractional part. In the implementation of trigonometric functions, a combination of techniques using lookup tables (LUTs) and Taylor series was used. To perform the necessary calculations, a finitestate machine model (FSM) was used to reduce the use of hardware resources; however, the use of such FSM generated a sequential computation of the robotic manipulation algorithms. In this model, the forward kinematics implementation achieved a runtime of 680 ns and the inverse 940 ns, that is, for the 50 MHz clock, the forward kinematics took 34 clock cycles and the inverse kinematics took 47 cycles. Using such approaches to reduce the use of hardware resources increases computation runtime. For tactile device applications, it is important to optimize the runtime rather than the use of hardware resources.
Similarly, an FPGA implementation of forward and inverse kinematics for a 7-DoF device was presented in [35]; however, only 3-DoF required to control the device movement were implemented in hardware. The proposal used a 32-bit fixed point representation and a CORDIC was used to execute the trigonometric functions. To validate the proposal, the FPGA was set to receive the three reference angles, perform the forward kinematics and then the inverse. The model was developed based on pipeline and the operating frequency used was of 100 MHz. As a result, the model calculation took 2 µs to perform the entire kinematics algorithm, which represented 200 clock cycles.
In this context, it is possible to realize that the use of FPGA-based computing can accelerate haptic device control algorithms. Unlike traditional hardware that processes information sequentially, FPGA enables parallel information processing. However, most studies from the literature have developed partially parallel implementations, that is, implementations in which parts of the used algorithms are executed sequentially. Unlike the research previously mentioned, this study presents a new approach in which the execution of the robotic manipulation algorithms are performed in a full-parallel hardware implementation. This proposed implementation provides a latency reduction for the tactile devices and enables tactile internet applications.

Discrete Model of the Tactile Internet
A discrete model of the tactile internet system is proposed and presented in Figure 1. The OP is an entity responsible for generating stimuli that can be in the form of position signals, speed, force, image, sound or any other. These stimuli are sent to the devices involved so that some kind of task can be performed in some kind of environment. The environment, the ENV subsystem, receives the stimuli from the OP and generates feedback signals associated with sensations such as reactive force information and tactile information that are sent back to the OP. The interaction between the OP and the ENV is performed through the master and slave devices, MD and SD, respectively.
Specifically in this work, MD is characterized as a local device, SD as remote one and both of them are responsible for transforming the stimuli and sensations associated with OP and ENV into signals to be processed. Tactile devices (MD and SD) can take the form of robotic manipulators, haptic devices, tactile gloves and others that may be developed in the future. In the coming years, the introduction of new types of sensors and actuators is expected that will form the basis for the development of new tactile devices.
Although there are no tactile internet standards nor products yet, it can be affirmed that future tactile devices will be integrated with a hardware responsible for all operational metrics and calculations. Within this conjecture, this work adds a couple of modules to the discrete model (as per Figure 1), called HMD and HSD. HMD is responsible for performing all transformations and calculations associated with MD, and HSD performs the equivalent operations for the SD. Several algorithms associated with transformation, compression, control, prediction will be under the responsibility of these two modules. The authors from [36] present a few approaches focusing on the reduction of kinesthetic data and tactile signal compression, which can be applied to the model. Based on the model presented in Figure 1, the signals generated by the OP can be characterized by the array a(n) expressed as where a i (n) is the i-th stimulus at the n-th instant and N OP is the total number of stimuli signals generated by the OP. At every n-th moment the stimulus array, a(n), is received by the MD which transforms the stimuli into a set of N MD signals expressed as where b i (n) is the i-th signal generated by MD at the n-th instant. It can be stated that at each n-th moment a set of stimuli a(n) generates a set of signals b(n) that depend on the type of MD and the sensor set associated with the device. Especially important is the fact that the signals generated by MD, b(n), have heterogeneous characteristics in which each i-th signal b i (n) can represent an angle, spatial coordinate, pixel of an image, audio sample or any other information associated with a stimulus generated by OP. In practice, the signals grouped by the b(n) array originate from sensors coupled to the MD and the amount of data may vary according to the amount of information to be sent, N MD . The set of signals, expressed by b(n) are sent to the HMD ( Figure 1) which has the function of processing this information before sending it to the NW subsystem. Calculations associated with calibration, linear and nonlinear transformations and signal compression are performed by the HMD. Essentially, the majority of the computational effort of MD is in this subsystem. At each n-th instant t s the HMD processes the array b(n) generating an information array c(n) expressed by where c i (n) is the i-th signal generated by HMD towards the subsystem NW at the n-th instant t s and N f HMD is the numbers of signals. N f HMD < N MD is expected to minimize latency during the transmission in the NW subsystem.
The NW subsystem, as shown in Figure 1, characterizes the communication medium that links OP to ENV. In this model, the data propagates through two different channels called the forward channel, that transmits the OP data towards the ENV, and the backwards channel, that transmits the ENV signals towards the OP. The signal transmitted by the forward and backwards channels may be disturbed and delayed. In the case of the forward channel, the received signal, v(n), may be expressed as where in which, r f i (n) represents the added noise and d f i (n) represents a delays associated with the i-th information sent in c(n). In this model, the noise can be characterized as a random Gaussian variable of zero mean and σ 2 r f variance, and the delays are characterized as integers, that is, they occur at a granularity of t s . It is important to note that the NW subsystem can take the shape of the Internet, a metropolitan network (MAN), a local area network (LAN), or even a direct connection between an MD and a workstation or computer.
As shown in Figure 1, the HSD receives the v(n) signal through the forward channel and has the role of generating control signals to the SD through the signal where N f HSD is the number of control signals and u i (n) is i-th control signal at the n-th instant t s associated with the array u(n). It is important to note that there may be various types of SD: from real robotic handlers to virtual tools in computational environments. Thus, it can be stated without loss of generality that HSD can perform an inverse processing to HMD in addition to specific algorithms associated with the type of SD. For example, if the SD is a robotic handler, HSD must additionally implement closed loop control algorithms, whereas if SD is a virtual arm HSD must implement positioning algorithms for a given virtual reality platform. SD does not have to correspond directly with MD, e.g., MD can be a glove while SD is a drone. However, it is desirable that the stimulus generated by the SD is a copy of the stimulus generated by the OP, that is, within the model presented in Figure  1, it can be understood that SD generate a signal expressed aŝ a(n) = â 1 (n), . . . ,â i (n), . . . ,â N OP (n) , whereâ i (n) is an estimate of the i-th stimulus a i (n) generated by the OP. Thus, the estimate of the stimulus generated by OP,â i (n), is applied to the ENV subsystem representing a given real or virtual environment in which OP is interacting.
In the backwards direction, the stimulus actions generated by OP, a(n), and represented byâ(n), receives a group of reactions from the ENV subsystem that can be characterized in the model by the set of signals expressed by where N ENV is the number of stimulus signals and o i (n) is i-th stimulus signal at the n-th instant t s . Reaction signals grouped into o(n) can be in the form of strength, touch, temperature, etc. Reaction signals are captured by the SD that turns this information into electrical signals from real or virtual sensors, if the SD is in a virtual reality environment. After capturing this information the SD transmits these signals to the HSD. In the model presented in Figure 1, the signals generated by the SD are expressed as g(n) = g 1 (n), . . . , g i (n), . . . , g N SD (n) , where g i (n) is the i-th signal generated by the SD at the n-th instant of time, t s and N SD is the amount of signals. The HSD in turn processes this information and sends to the NW subsystem through the array h(n), expressed by where h i (n) is the i-th signal generated by HSD at the n-th instant of time, t s and N b HSD is the amount of signals.
The signal received by the HMD through the backwards channel of the NW subsystem can be expressed as where (12) in which, r b i (n) represents an added noise and d b i (n) represents a delay associated with the i-th information transmitted in q(n) by the backwards channel. Similarly to the forward channel, noise can also be characterized as a random variable Gaussian of zero mean and variance σ 2 rb and delays are characterized as integers with t s granularity. The HMD processes the q(n) signal information and generates a set of control signals that will act on the MD and can be characterized as where p i (n) is the i-th signal generated by the HMD at the n-th instant of time t s and N b HMD is the number of signals. The MD in turn will synthesize the reaction stimuli generated by the environment, i.e., the ENV subsystem. Based on the model, it is possible to characterize these reaction stimuli as a signal expressed bŷ whereô i (n) is an estimate of the i-th stimulus o i (n) generated in the ENV subsystem. Examples of reaction stimuli generated or synthesized by MD are touch, strength and temperature.
In addition to the latency associated with the NW subsystem that characterizes the communication medium between the OP and ENV subsystems, the MD, HMD, HSD, and SD subsystems also add latency to the system. Based on the work presented in [13,14] these components represent 30% of total latency. The latency of the MD and SD subsystems are associated with sensors and actuators that can be mechanical, electrical, electromechanical and other variations. HMD and HSD latencies are associated with the processing time of the algorithms in these devices and depending on the type of hardware and implementation architecture this latency can be considerably reduced.

PHANToM Omni Device Model (MD & SD)
Based on the scheme presented in Figure 1, this section presents details associated with the MD and SD used as reference for the hardware system proposed in this research. The MD and SD are characterized as a three degree of freedom robotic manipulator, 3-DoF, called the PHANToM Omni [37] (Figure 2). The PHANToM Omni has been widely used in literature as presented in [38,39]. In this work two of this devices are going to be used: one as an MD and the other as a SD. As can be seen from Figure 3, the PHANToM Omni physical structure is formed by a base, an arm with two segments L 1 and L 2 which are interconnected by three rotary joints θ 1 , θ 2 and θ 3 and a tool. The variables presented in Figure 3 are represented by: L 1 = 0.135 mm, L 2 = L 1 , L 3 = 0.025 mm and L 4 = L 1 + A where A = 0.035 mm as described in [40]. These detailed features of the device are essential for performing the kinematics and dynamic calculations.

Forward Kinematics
The kinematics of manipulative devices makes use of the relationship between operational coordinates and joint coordinates. Forward kinematics (FK) correlates the angular variables of the joints with the Cartesian system. That is, given an array of joint coordinates it is possible to determine the spatial position of the tool through the equation that can be expressed by where x, y and z are variables that determine the spatial position of the tool in the Cartesian plane.

Inverse Kinematics
In inverse kinematics (IK), the relationship between the joint angles and the Cartesian system is reversed, that is, given the spatial position of the tool it may be possible to determine the joint coordinates. The solution to this process is not as straightforward as in the direct kinematics. In direct kinematics, the position of the tool is determined solely by the displacements of the joints. In inverse kinematics, equations are composed of nonlinear calculations formed by trigonometric functions. Depending on the manipulator structure, multiple solutions may be possible for the same tool position, or there may be no solution for a particular set of tool positions. Based on the works [40][41][42], the value of θ 1 can be defined through the equation expressed by where x and z represent coordinates in the Cartesian plane and L 4 corresponds to the size of the the arm segments, as shown in Figure 3.
To calculate the other two joints θ 2 and θ 3 it is necessary to perform intermediate calculations. Thus, one can obtain R, r, β, γ and α through the equations and α = acos After performing the intermediate calculations it is possible to calculate θ 2 through the equation Finally, the value corresponding to the θ 3 joint can be obtained through the equation

Kinesthetic Feedback Force
The kinesthetic feedback force allows the environment to be "felt", i.e., when the SD comes into physical contact with an object, the MD will receive a counter force. This model can be implemented through the equation where τ defines the torque array that will be applied to each joint (θ 1 , θ 2 and θ 3 ) of the PHANToM Omni associated with the MD, J T is the transpose of the Jacobian matrix and F is the force array resulting from the interaction of SD with ENV. The torque array τ can be expressed as The J Jacobian matrix incorporates structural information about the handler and it is identified as and The force array F is expressed as and can be obtained through sensors internal or external to the device. According to Equation (26), the τ torque array representing the resulting force at each joint can be defined as and

Simulated Tactile Internet Model
Figures 1 and 4 details the structure used for the hardware design in FPGA, in which a given operator, OP, handles a PHANToM Omni on the master side, MD, which is connected to HMD that, in this case, is a dedicated FPGA hardware. Data are transmitted through the network, the NW subsystem, to HSD which is also a dedicated hardware in FPGA. The HSD is also connected to a PHANToM Omni that interacts with the environment, the ENV subsystem. Figure 4 also details the backwards direction from the ENV and the OP.  The OP is modeled as an information source responsible for generating a spatial trajectory through discrete signals expressed in the a(n) array. At each n-th instant t s the OP sends three variables x OP (n), y OP (n) and z OP (n) representing the positioning of the MD tool (Figures 2 and 3) in the Cartesian space an this is expressed by Both devices, master and slave PHANToM Omni, and the structures that form the system were modeled and simulated on Matlab/Simulink [43] and Xilinx System Generator. This step simulates the spatial movement of the MD tool by the operator, that is, at each instant of time, t s , a spatial movement is performed and a new signal a(n) is generated by the OP.
The PHANToM Omni has encoders at its three joints that translate spatial positioning at the three angles θ 1 , θ 2 and θ 3 (Figures 2 and 3). Thus, based on Figure 4, it can be said that MD converts the signal a(n) into a signal expressed as and forwards it to the HMD at every n-th instant of time t s . Then, as can be seen in Figure 4, the b(n) signal propagates to the HMD, which on receiving the signal transforms the joint positioning angles, b(n), into spatial position by calculating the FK according to Equations (15)- (17). All equations are implemented in FPGA through a hardware module called the FK-HMD. The equations are implemented in parallel which can significantly increase the processing time. The use of FK is motivated by an reduction of the amount of information utilized, i.e., for a N-DoF robotic manipulator N joint angles will be generated and that can be converted into only three values associated with the spatial position of the tool, x, y and z. On the other hand, the use of this strategy increases the amount of calculations to be performed by the MD, which is compensated by the parallel implementation of the algorithm in FPGA. It is essential to note that the use of custom hardware operating in parallel allows processing time not to be substantially affected by N.
Based on Section 3, after the FK calculation by the FK-HMD hardware module, a new discrete signal is created that can be expressed by where x HMD (n), y HMD (n) and z HMD (n) are the values of the spatial coordinate array generated by the HMD to be sent to HSD via the communication medium, NW. The FK-HMD hardware module generates a new c(n) array every n-th instant of time.
After the transmission through the forward channel, here called FC, the signal received by the HSD can be expressed as v(n) = x HSD (n), y HSD (n), z HSD (n) .
Based on Equation (5) the spatial coordinate signal received by HSD can be expressed as and are the delays and noises associated with CF.
As, in this case, the Slave PHANToM Omni, SD, copies the movement of the master PHANToM Omni, MD, it is necessary for the HSD to perform a feedback control system on the three joints of the PHANToM Omni slave, here expressed as that is, θ SD 1 (n), θ SD 2 (n), θ SD 3 (n) are control variables associated with DS. The control system illustrated in Figure 4 as FCS shall minimize the error, e FCS (n), between θ SD (n) and the reference signal θ HSD (n) characterized as where e(n) = θ HSD (n) − θ SD (n)and (51) and The θ SD (n) signal is obtained from the SD via sensors (encoders) at the SD joints and the θ HSD (n) signal is obtained from the IK-HSD hardware module shown in Figure 4. This hardware module implements all inverse kinematics equations presented in Section 4.2, i.e., Equations (18)- (25). There are several techniques and approaches that can be used in the FCS module ranging from more traditional techniques such as a proportional-integralderivative controller [44] to more innovative artificial intelligence based techniques [45,46].
The CPD-HSD and JPD-HSD modules, illustrated in Figure 4, represent the algorithms of prediction and detection in cartesian space and joints, respectively. These modules are responsible for minimizing the latency and noise added by the FC associated with the tactile internet system (Equations (46)-(48)). Depending on the prediction and detection technique used, the HSD may use only one of the modules, namely the CPD-HSD or JPD-HSD. There is still no consensus about whether the Cartesian space or joints is the best for minimizing latency and noise inserted by the channel. There are several works in the literature that present proposals using only one of the spaces and proposals that try to use the information from both simultaneously.
Similarly to the FCS module, approaches ranging from the more traditional techniques up to more innovative techniques based on artificial intelligence have been used in the CPD-HSD and JPD-HSD modules [47][48][49][50][51]. Thus, it can be said that θ HSD (n) is an estimate of the b(n) signal generated by the MD.
At each n-th time, the FCS acts on the SD through the u(n) signal, detailed in Figures 1 and 4, which in the case of the PHANToM Omni can be expressed as where τ HSD i (n) is the i-th torque applied every i-th joint. The FCS will act as a tracking mechanism, making the SD follow the path traveled by the MD. Finalizing the data stream associated with the forward channel, it can be said that theâ(n) signal is formed by an estimate of the spatial position generated by the OP,â(n), i.e., The interaction of the PHANToM Omni, SD, with ENV can vary from free movement to physical contact. When some kind of physical contact occurs, the SD detects the touch and sends this information back to the HSD. As per the model detailed in Figure 4 the ENV sends back to SD the information associated with the contact force in the spatial plane, expressed here as, The value associated with the contact force information can be measured directly through SD-coupled force sensors or indirectly estimated through other types of sensors that may be SD-coupled or inserted into the environment [52]. In the case of the model presented in Figure 4, the SD sends to HSD the objects surface's spatial positions through sensors spread in the ENV. The signal expressed as represents the spatial position of the closest object from the SD tool. Thus, based on the information already described, every n-th time t s the SD sends to the HSD a signal characterized by the array g(n) expressed as In the HSD, when the signal g(n) is received, the Split module separates the θ SD (n) signal and sends it to the FCS and the FK-HSD hardware module. In addition, the signal s OBJ (n) is sent to the FB-HSD hardware module, as detailed in Figure 4. The FK-HSD hardware module performs the forward kinematics calculation similarly to FK-HMD and thus the current spatial position of the SD tool in the environment, ENV, can be obtained. Every n-th instant t s FK-HSD generates a signal expressed as where x ENV (n), y ENV (n) and z ENV (n) are the spatial position of the tool in the ENV module from θ SD (n). The FBF-HSD hardware module implements the calculations associated with the generation of the feedback force from the contact between the tool and the object. Based on the work presented in [52] the contact force, represented by the h(n) signal, can be expressed as where and In these equations, the constants h x (n), h y (n) and h z (n) represent the elasticity coefficients associated with the object. It is important to note that in this model the h(n) signal is a synthesized version of the real force value here characterized by the o(n) array.
After the feedback force calculation process, as illustrated in Figure 4, the h(n) signal is transmitted to the HMD via the backwards channel (BC) which, similarly to FC, adds latency and noise. The signal received by the HMD can be expressed as where and , r b y (n) and r b z (n) are the latencies and the noises associated with the BC.
Similarly to HSD, the HMD will minimize the effect of latency and noise from operations of Cartesian and joint space. For HMD, the calculations associated with the Cartesian space will be performed by the CPD-HMD module and associated with the joint space by the JPD-HMD module. In addition to the prediction and detection calculations, the HMD must transform the force signals received through signal q(n) into a torque to be applied to the MD joints which is accomplished by the KFF-HMD hardware module. KFF-HMD implements the Equations (39)- (41) presented in Section 4.3 and generate the signal expressed as where τ HMD i (n) is the torque associated with the i-th joint of the MD. Since the PHANToM Omni is a haptic device, it already has a built-in control system, FCS, which uses as reference signal the torques associated with the p(n) array.
After applying the torques to the MD joints via the p(n) signal, the OP receives the feedback force signal, in other words, it feels the object touched by the SD in the ENV. This sensation is identified in by theô(n) signal expressed aŝ As illustrated in Figure 4, the MD, HMD, NW, HSD, and SD subsystems have the following runtimes: t MD , t HMD , t NW , t HSD and t SD , respectively. The sum of these, times taking into account the forward direction (between OP and ENV) and the backwards direction (between ENV and OP), represent the total system latency that can be expressed as Some works presented in the literature review agree that the ideal requirement is that t latency ≤ 1 ms, on the other hand, other works point out that the latency requirement can be expresses as t latency ≤ 10 ms, depending on the application [9][10][11][12]53]. Considering that 30% of the total latency time t latency is spent by MD, HMD, HSD, and SD, it can be understood that Assuming an equal time division among MD, HMD, HSD, and SD it is possible to affirm that the time associated with hardware, t hardware , whether the master, HMD, or the slave device, HSD, can be expressed as Taking the 1 ms constraints into consideration and substituting this value in Equation (71), it is possible to affirm that the hardware time, t hardware , must meet the t hardware ≤ 37.5 µs constraint for all cases (condition 1 ms) or the t hardware ≤ 375 µs constraint for some specific cases (10 ms condition).
Recent studies from the literature show that the 1 ms restriction (t hardware ≤ 37.5 µs) is difficult to achieve using hardware devices based on embedded systems such as microprocessors and microcontrollers [26,54]. The 10 ms restriction (t hardware ≤ 375 µs) is achieved in specific cases where SD is a virtual environment and HSD is a high performance processor computer [53]. Thus this work aims to minimize the execution time in HMD, t HMD , and HSD, t HSD , using FPGA. In other words, the target is to achieve a t hardware ≤ 37.5 µs.
This paper presents a hardware reference model for the FK-HMD, KFF-HMD, IK-HSD, FK-HSD, and FBF-HSD modules illustrated in Figure 4. The complete model that will be presented in detail in the next section makes use of a parallel implementation methodology in which high throughput is prioritized, i.e., the execution time of the modules t FK , t KFF , t IK and t FBF , illustrated in Figure 4.
This work does not propose dedicated hardware reference models for the CPD-HSD, JPD-HSD, CPD-HMD, JPD-HMD and FCS modules as there are several techniques and algorithms that can be applied to them. However, considering the hardware time constraints, t hardware , it is noted that it is also important to use dedicated hardware structures with as FPGA-based circuits for these modules. Studies in the literature foresee the use of AI based techniques for these modules; however, it is essential to note that AI techniques and algorithms implemented on general purpose processor-based hardware platforms can lead to higher processing times [19][20][21][22][23][24][25].

Implementation Description
The FK-HMD and KFF-HMD hardware modules associated with the master device (HMD) and the IK-HSD, FK-HSD, and FBF-HSD hardware modules associated with the slave device (HSD) (Figure 4) were designed using a parallel implementation in order to prioritize the processing speed. The implementations were designed in FPGA using a hybrid scheme with fixed point and floating point representation in distinct parts of the proposed architecture. In the portions that adopt the fixed point format, the variables follow a notation expressed as [sV.N] indicating that the variable is formed by V bits of which N bits are intended for the fractional part and the s symbol indicates that the variable is signed. In this case, the number of bits intended for the integer part is V − N − 1. For the representation of floating point variables, the notation [F32] is adopted. Most of the implemented circuits were designed using a 32-bit single precision (IEEE754) floating point format representation. The fixed point format was used only on the circuit that implements the trigonometric function block (TFB) module, as illustrated in Figure 5. TFB is the module responsible for performing trigonometric operations through the hardware implementation of CORDIC (COordinate Rotation DIgital Computer) [55]. For that, a Xilinx CORDIC IP Core was used. This implementation uses data representation in a fixed-point format using the [s16.13] representation.  As illustrated in Figure 5, the TFB module receives data from external circuits in the 32-bits floating point standard. A conversion to the fixed point numeric representation type represented by the [s16.13] notation is performed through the Float to Fixed-point (F2FP) module that has been implemented in hardware. After the CORDIC hardware operations are performed, the data in the fixed point format is transformed back to the 32-bit floating point through the Fixed-point to Float (FP2F) module which was also implemented in hardware.
Several of the proposed methods to be presented use the constants L 1 , L 2 , L 3 and L 4 . They represent physical characteristics of the PHANToM Omni device as illustrated in Figure 2. These constants use the 32-bit floating point numeric representation.

Forward Kinematics (FK-HMD and FK-HSD)
As illustrated in Figure 4, both the hardware associated with the master device (HMD) and the hardware associated with the slave device (HSD) implement forward kinematics through the FK-HMD and FK-HSD modules, respectively. These modules have the same FPGA-implemented circuit, differing only in the input and output signals. the signals x ENV (n), y ENV (n) and z ENV (n). At every n-th instant all the computation performed in order to calculate the forward kinematics are executed in parallel. Based on Equation (15), the algorithm used for calculating x[F32](n) was implemented in FPGA through the generic circuit illustrated in Figure 6. The circuit was designed to work with three input signals θ 1 [F32](n), θ 2 [F32](n) and θ 3 [F32](n) and one output signal. These signals are forwarded to TFB sub circuits where sine and cosine calculations are performed. For this process the constants L 1 and L 2 , three multipliers, one inverter and one adder are used. The calculation of y[F32](n) based on Equation (16) was implemented in FPGA through the generic circuit shown in Figure 7. The circuit was designed to work with two input signals θ 2 [F32](n) and θ 3 [F32](n) and one output signal. These signals are routed to TFB sub circuits to perform sine and cosine calculations. In the process flow two multipliers, two adders, one inverter and the constants L 1 and L 2 are used. The generic circuit illustrated in Figure 8 was implemented in FPGA to perform the calculation of z[F32](n) and it is based on Equation (17). The circuit is designed to work with three input signals θ 1 [F32](n), θ 2 [F32](n) and θ 3 [F32](n) and one output signal. These signals are routed to TFB sub circuits in order to perform sine and cosine calculations. In the process flow four multipliers, two adders, one inverter and the constants L 1 , L 2 and L 4 are used.  (49) in Section 5) and after performing all parallel computations, the resulting signals x ENV (n), y ENV (n) and z ENV (n) are output from the module via the l(n) array (Equation (49) in Section 5).

Inverse Kinematics (IK-HSD)
The hardware associated with the slave device (HSD) implements the inverse kinematics through the IK-HSD module, as shown in Figure 4. The IK-HSD FPGA-implemented circuit is designed to work with three input signals x HSD [F32](n), y HSD [F32](n) and z HSD [F32](n) and three output signals θ HSD [F32](n) (Equation (24)) and θ HSD
As already described, and according to the illustrations shown in Figures 10 and 11, to perform the calculations of θ HSD  (23)). However, these calculations depend on the calculation of R[F32](n) and r[F32](n). Then, when the IK-HSD module receives the input signals at every n-th instant the circuit shown in Figure 9 performs the calculation of θ HSD  (19) and (20).
x + The circuit shown in Figure 12 used to obtain R[F32](n), is designed to work with two input signals x HSD [F32](n) and z HSD [F32](n) and one output signal. This design contains two multipliers, two adders, the L 4 constant and a sub-circuit called Sqrt, which was implemented in hardware to calculate the square root. The r[F32](n) calculation is performed through the circuit shown in Figure 13. This circuit is designed to work with three input signals x HSD [F32](n), y HSD [F32](n) and z HSD [F32](n) and one output signal. The circuit consists of three multipliers, four adders, one inverter, the constants L 3 and L 4 , and, again, the Sqrt sub-circuit.
After the parallel processing of θ HSD  Figure 14 which is based on Equation (21). The circuit is designed to work with an input signal r[F32](n) and one output signal. It consists of five multipliers, two adder, one divisor, one TFB sub-circuit to calculate the arccosine and the constants L 1 and L 2 . The circuit for obtaining β[F32](n) illustrated in Figure 15 is based on Equation (22) and is designed to work with two input signals y HSD [F32](n) and R[F32](n) and one output signal. The circuit is composed of one adder, one inverter, a TFB sub-circuit to perform the arctangent calculation and the L 3 constant.  Figure 16 which is based on Equation (23) and is designed to work with an input signal r[F32](n) and one output signal. The circuit is composed of five multipliers, two adders, one inverter, one divider, one TFB sub-circuit to perform the arccosine calculation and the constants L 1 and L 2 .   (29), the algorithm for calculating J 11 [F32](n) was implemented in FPGA according to the generic circuit illustrated in Figure 18. The circuit was designed to work with three input signals and one output signal. It uses the constants L 1 and L 2 and has three TFB sub-circuits: two for performing the cosine calculation and one for obtaining the sine value. The calculation of J 31 [F32](n), based on Equation (31), was implemented in FPGA according to the generic circuit illustrated in Figure 19. The circuit was designed to work with three input signals and one output signal. The circuit has three TFB modules, two for sine calculation and one for cosine value and uses the L 1 and L 2 constants. The generic circuit illustrated in Figure 20 was implemented in FPGA to perform the calculation of J 12 [F32](n) and is based on Equation (32). The circuit was designed to work with two input signals and one output signal. The circuit has two TFB sub circuits to perform sine calculation and uses the L 1 constant.  The calculation of J 32 [F32](n) based on Equation (34) was implemented in FPGA according to the generic circuit illustrated in Figure 22. The circuit was designed to work with two input signals and one output signal. In addition to the use of the constant L 1 , the circuit has two TFB sub circuits, one for performing the cosine calculation and one for the sine. The generic circuit illustrated in Figure 23 was implemented in FPGA to perform the calculation of J 13 [F32](n) and which is based on Equation (35). The circuit was designed to work with two inputs and one output signal. In addition to using the constant L 2 , the circuit has two TFB sub circuits, one for performing cosine calculation and one for the sine.  Figure 25. The circuit was designed to work with two input signals and one output signal. In addition to the use of constant L 2 , the circuit has two TFB sub-circuits to perform the cosine calculation. [F32](n) in parallel. The KF circuit shown in Figure 17 is designed to work with twelve input signals and three output signals.

The value of α[F32](n) is obtained from the circuit shown in
Based on Equation (39), the algorithm for calculating τ HMD

1
[F32](n) was implemented in FPGA according to the generic circuit illustrated in Figure 26. The circuit was designed to work with six inputs and one output. [F32](n) joint (Equation (39))-KFF.
The calculation of τ HMD

2
[F32](n) based on Equation (40) was implemented in FPGA according to the generic circuit illustrated in Figure 27. The circuit was designed to work with six inputs and one output. [F32](n) joint (Equation (40))-KFF.
The generic circuit illustrated in Figure 28 has been implemented in FPGA to perform the calculation of τ HMD

Feedback Force (FBF-HSD)
As illustrated in Figure 4 the hardware associated with the slave device (HSD) implements the feedback force via the FBF-HSD module. The FPGA-implemented circuit of the FBF-HSD module is designed to work with six input signals and three output signals. Among the six input variables, x OBJ [F32](n), y OBJ [F32](n) and z OBJ [F32](n) represent the spatial position of the closest object to the SD tool and the other three x ENV [F32](n), y ENV [F32](n) and z ENV [F32](n) represent the spatial position of the SD tool in the ENV module. The three outputs F HSD [F32](n) was implemented in FPGA according to the generic circuit illustrated in Figure 29. The circuit was designed to work with two inputs signals x OBJ [F32](n) and x ENV [F32](n) and one variable  The generic circuit shown in Figure 31 was implemented in FPGA to perform the calculation of F HSD z [F32](n) and it is based on Equation (62). The circuit was designed to work with two input signals z OBJ [F32](n) and z ENV [F32](n) and one variable h z .

Results
The entire tactile internet model infrastructure presented in Figure 4 was implemented with the purpose of validating the FPGA hardware implementation. A spatial trajectory that represents the data sent by the OP through the a(n) (Equation (42)) signal was created to validate the entire developed environment.
The created trajectory performs a variation in all of the three angles of the MD articulation (Figure 3). For this, it was first considered that the MD is in the initial angular position expressed as θ MD 1 (0) = 0, θ MD 2 (0) = 0 and θ MD 3 (0) = 0, which corresponds to the spatial position x OP (0) = 0, y OP (0) = −0.107 and z OP (0) = −0.035 of the tool as illustrated in Figure 32. Initially, the first joint is moved to θ MD 1 (vn) = pi/2 where v represents a quantity of samples that is equal to 4 s, thus resulting in the position x OP (vn) = −0.132, y OP (vn) = −0.107 and z OP (vn) = −0.167. Then, the second joint is moved to θ MD 2 (vn) = pi/4 which results in the position x OP (vn) = −0.093, y OP (vn) = −0.013 and z OP (vn) = −0.167 and, finally, the third joint moves up to θ MD 3 (vn) = pi/4, thus resulting in the x OP (vn) = −0.186, y OP (vn) = 0.025 and z OP (vn) = −0.167 position. The path created is within the limits of the device workspace and takes a total time of t 1 = 12 s of which 4 s are used to perform the movement of each joint. In an effort to validate the circuits from the implemented modules in FPGA, equivalent software models were used to compare the results of both implementations. The software models use a 32-bit floating point format while the hardware modules run a parallel implementation with a hybrid representation which uses both a 32-bit floating point and a fixed point representation in different parts of the proposed architecture, as presented in Section 6. In all scenarios, the signal sampling rate (or throughput) was R s = 1 t s (samples per second), where t s is the time between the n-th samples.
From the experimental results, the mean square error (MSE) between the software model and the hardware implementation proposed by this work was calculated using the MSE which can be expressed as  Table 1 shows the mean square error between the software models and the hardware ones proposed in this paper. The obtained MSE-related results prove to be noteworthy, showing that the forward kinematics (FK-HMD and FK-HSD), inverse kinematics (IK-HSD), kinesthetic feedback force (KFF-HMD) and feedback force (FBF-HSD) modules had an acceptable response, even when using a hybrid representation, compared to the software model that uses a floating point representation. It can be observed that for the variables of the FK-HMD and FK-HSD modules the error was in the range of 10 −08 , for the IK-HSD module the error was of 10 −06 , for the variables of the KFF-HMD module the error was of 10 −07 and for the FBF-HSD module the error was in the range of 10 −16 . These values demonstrate that the FPGA implementations presented an equivalent behavior to the software models. In a hardware implementation, it is important to analyze some requirements postsynthesis such as available hardware usage and the execution time. In the case of FPGAs, the resources are measured through the use of lookup tables (LUTs), Registers and Digital Signal Processors (DSPs) units, to name a few. After validating the hardware-implemented models, synthesis results were obtained using the implementation designed for an FPGA Xilinx Virtex 6 XC6VLX240T-1FF1156. The used Virtex 6 FPGA has 37,680 slices that group 301,440 flip-flops, 150,720 logical cells that can be used to implement logical functions or memories, and 768 DSP cells with multipliers and accumulators. The implementations and results used the Matlab/Simulink and the Xilinx System Generator. Table 2 presents the post-synthesis results related to FPGA resource utilization, sampling rate, and throughput for the modules FK-HMD, KFF-HMD, FK-HSD, IK-HSD, and FBF-HSD. The first column shows the name of the module, the next three columns called registers, LUTs and multipliers represent the amounts of resources used in the FPGA. The column register represents the number of flip-flops that were used, followed by the total percentage used. The column LUTs represents the number of LUTs that were used, followed by the total percentage used. In addition, the column multipliers represents the number of DPS48 internal multipliers that were used, followed by the total percentage used. The t s column represents the sampling rate in nanoseconds that was obtained for each hardware module. Finally, the R s column displays throughput (R s = 1 t s ) values in mega-samples per second for the hardware modules. The synthesis results presented in Table 2 show that the resources used for the FK-HMD and FK-HSD modules were the same. This means that each module, individually, used a percentage of 1.01% which is equivalent to 3041 of the available hardware resources for the registers, was used 5.31% with LUTs, and 1.43% for embedded multipliers DSP48. The IK-HSD module had a hardware percentage consumption of 1.04% for registers, 9.36% for LUTs and 3.52% for multipliers. The KFF-HMD module had a consumption of 1.03%, 8.13% and 6.25% for registers, LUTs and multipliers, respectively. Finally, the FBF-HSD module used a percentage of 0.11% for registers, 0.82% for LUTs and 1.17% for multipliers.
The hardware resources consumed by the HMD hardware modules and the HSD hardware modules were very low. Even if all modules are implemented in single hardware, the consumption remains low. The total sum of hardware resources used in the FPGA by all modules (FK-HMD, KFF-HMD, FK-HSD, IK-HSD and FBF-HSD) was: 12,667 (4.20%) for register, 43,610 (28.93%) for LUTs and 106 (13.80%) for multipliers. The low hardware resources consumption demonstrates that the proposed implementations take up little hardware space in the FPGA which allows other separate implementations to be used concomitantly.
As per Table 2, the throughput values, R s , obtained were significant. Values of 21.27 MSps for the FK-HMD and FK-HSD modules, 4.58 MSps for the IK-HSD module, 14.28 MSps for the KFF-HMD module and 47.61 MSps for the FBF-HSD module were achieved. These results enable critical applications that demand strict time constraints, as is the case with tactile internet applications. The times presented in Table 2 show the critical path (path in the entire design with the maximum delay) on FPGA.
In Table 3, it is possible to see the speedup obtained about latency time constraints. The first column shows the latency constraints for 1 ms and 10 ms [9][10][11][12]. The second column shows the minimum latency values required for the application to function normally (these values are the estimates obtained by Equation (71) for both time constraints). The third column shows the latency related to the hardware implementation presented here. The speedup, fourth column, is calculated using the values of minimum time, Latency Limit, for each constraint and the time of the proposed hardware. It is worth mentioning that this is an estimate to guide the calculations. The 1 ms restriction corresponds to the maximum latency limit of 37.5 µs for acceptable hardware performance. For the 10 ms constraint, the maximum limit is 375 µs. The value t hardware that is presented in Table 3 and according to Equation (71), corresponds to the sum of the latencies of the five implemented modules (Table 2), two modules are associated with the MD device (FK-HMD and KFF-HMD) and three modules are associated with the SD device (FK-HSD, IK-HSD, and FBF-HSD).
Thus, the presented value of 403 ns in Table 3 corresponds to the sum of the two modules related to the master component, which has a total of 117 ns of which 47 ns come from the FK-HMD module and 70 ns from the KFF-HMD module together with the sum of the three modules referring to the slave component, which has a total of 286 ns of which 47 ns derives from the FK-HSD module, 218 ns from IK-HSD and 21 ns from the FBS-HSD module. So for the 1 ms constraint, the implementation presented a 93× speedup relative to the 37.5 µs, and for the 10 ms constraint, the speedup was 930× relatives to the 375 µs limit.
The sample rates resulted from the five modules that were implemented in this work were notably fast. The values obtained contributed to the hardware meeting the time constraint limits required in a tactile internet environment. Hardware latency showed values significantly below the required constraints, as shown in Table 3. These values are well below the 30% presented in the literature and due to the fact that the communication medium demands 70% of application latency, this value can be increased as the latency of hardware devices showed to be significantly low. In other words, it can be said that the remaining latency not spent on the hardware devices can be consumed in the network.
It is important to remember that in a more complex tactile internet environment, there are several others more algorithms to be implemented in hardware such as prediction algorithms, dynamic control, AI based techniques, etc. However, as the proposed implementations present low hardware resource consumption, other necessary modules, as the ones previously mentioned, could also be implemented in the same shared hardware since resources would still be available. Table 4 presents comparisons of the results obtained by the proposed implementation of this work with equivalent results found in works from the state of the art. The first column indicates the references of related works. The next two columns show the used FPGA platform and the amount of degrees of freedom of the used device. The fourth column presents the type of numerical representation used in the implementation and, finally, the last four columns present the times obtained by each reference for latency added by the forward kinematics (FK), inverse kinematics (IK), the kinesthetic force feedback (KFF) and feedback force (FBF) modules, respectively. As described in Table 4, a hardware model for calculating the forward kinematics of a 5-DoF device is presented in [30]. For the trigonometric calculations, a Taylor series expansion was implemented in FPGA for computing the sine, cosine, and tangent arc functions. The proposed hardware was implemented using a 32-bit floating-point representation. The total time to perform the calculations was 1240 ns. The calculations are performed in parallel. Comparing to the forward kinematics (FK) implementation using 32-bit floating-point proposed by this work, the speedup was 26.38× over the model presented in [30].
The work presented in [31] shows the results of an implementation of the inverse kinematics module using floating-point 32-bit representation. Three types of implementations are presented, but only the one with the best performance was compared. For that, it was used an Altera Cyclone IV FPGA, in which a microprocessor system based on the Nios II soft-processor was build. This processor enables to perform operations such as hardware summation multiplication, subtraction division and square root. The equations allow partial parallelization of individual operations, decreasing the computation time. The kinematic model was designed to work with a 3-DoF device, and the time required to calculate is 143000 ns. When compared with the proposal of inverse kinematics (IK) presented in this work, which uses 32-bit floating-point representation, this implementation presented a speedup of 655.96× over in relation to the model proposed by [31].
The kinematics models presented in [32] described in Table 4, presented data regarding the forward and inverse kinematics implementations for controlling a 6-DoF device using the 32-bit fixed-point representation. The modules were implemented using 21-bit for the fractional part and 11-bit in the integer part. For the forward kinematics (FK), 3000 ns are required to perform all calculations, and for inverse kinematics (IK), 4500 ns is required. Based on the results of the implementations presented in this section, the implementation proposed for this work using floating-point representation had a speedup of 63.82× for forward kinematics and 20.64× for the inverse kinematics.
The research presented in [33] proposed a hardware implementation of inverse kinematics to control a 10-DoF device. Although the robotic model has 10 Dof, the equations for the calculations are just composed of subtraction and division calculations. Regarding trigonometric calculations, only the tangent arc is used in the equations, and the square root used through the CORDIC module. The hardware was projected using the 32-bit fixed-point representation, however the amount of bits used in the fractional part was not specified. The architecture proposed to calculate the inverse kinematics requires 440 ns to perform the computation. All calculations are performed by the hardware in parallel. Comparing to the inverse kinematics (IK) implementation using 32-bit floating-point proposed by this work, the speedup was 2.01× over the model presented in [33]. The processing time has a lower value when considering the DoFs, but this is due to the fact that the algorithms are less complex.
The authors in [34] present the results of fixed-point implementation for forward and inverse kinematics to control a 5-DoF device, as described in Table 4. The proposed hardware implementation uses the numerical representation of 32-bit (15-bit to fractional part) and 16-bit (7-bit to fractional part) in different parts of the modules. The equations associated with the calculation of the forward and inverse kinematics make use of the trigonometric functions sine, cosine, arctangent, and arccosine. To perform the arctangent and arccosine, the Taylor series expansion was used. The time required to perform the calculations is 680 ns and 940 ns for forward and inverse kinematics, respectively. Comparing to the floating-point implementation proposed by this work, the speedup was 14.46× for forward kinematic and 4.31× for inverse kinematic over the model presented in [34].
Differently from previous works (Table 4), in [35], the authors present unique hardware for calculating forward and inverse kinematics together. In the proposed model, the 32-bit fixed-point representation was used. The total time to perform the calculation is 2000 ns. The time obtained was calculated taking into account the entire process duration; however, separate times for each module were not specified. Given this scenario, by adding the t s FK module time that calculates forward kinematics with the IK module, the total time resulting from both implementations reaches 265 ns, according to Table 4. Hence, the hardware presented in the work here developed achieved a 7.54× speedup over the model presented in [35].
Differently from previous works (Table 4), in [35], the authors present unique hardware for calculating forward and inverse kinematics together. The hardware computes all calculations in parallel. The computation of forward and inverse kinematics share the same processing time. An ARM processor was used to make the communication part between the modules, and the FPGA is used to calculate the kinematics model. The CORDIC module was used to perform trigonometric calculations. In the proposed model, the 32-bit fixed-point representation was used. The total time to perform the calculation is 2000 ns. The time obtained was calculated taking into account the entire process duration; however, separate times for each module were not specified. Given this scenario, by adding the t s FK module time that calculates forward kinematics with the IK module, the total time resulting from both implementations reaches 265 ns, according to Table 4. Hence, the hardware presented in the work here developed achieved a 7.54× speedup over the model presented in [35].
It can be seen from Table 4, that none of the works from the state-of-the-art presented the hardware implementation of all four robotics algorithms that were presented here. It is also noted that just two works used the floating-point numerical representation. The floating-point implementation of robotics algorithms proposed by this work showed significant gains when compared to the works presented in the literature as shown in Table 4. The different amounts of degrees of freedom (DoF) used in the devices can somehow influence in values of sample rate and throughput. Another factor that can also influence these values is in relation to the type of FPGA that is used to perform the synthesis. Due to the fact that the implementation of this work was designed in a parallel architecture, the increase in the amount of DoF does not necessarily reflect in a significant increase in sample rate.

Conclusions
This paper presented an FPGA hardware reference model for four modules implementing robotics-associated algorithms. The FK-HMD and FK-HSD modules implement the forward kinematics, the IK-HSD module implements the inverse kinematics, the KFF-HMD module implements the kinesthetic feedback force, and the FBF-HSM module implements the feedback force. The parallel FPGA implementation of the four modules is intended to increase the tactile system's processing speed to meet the latency constraints required for tactile internet applications. The modules were designed using a full-parallel implementation which works on a hybrid scheme that uses fixed point and floating point representation in distinct parts of the architecture. Compared to the state-of-the-art, this work describes and implements four different robotics algorithms in FPGA. The implementations presented in this work achieve higher module processing speed when compared to equivalent implementations from the state-of-the-art. All the modules presented here were analyzed based on the synthesis results, which included the FPGA resource utilization, sampling rate, and yield. Based on the synthesis results, it was observed that the implementations achieved high module processing speed, far below the latency limit of 1 ms. Hardware modules accelerated 93× compared to the 37.5 µs time constraint. This work demonstrates that using embedded systems on devices such as FPGAs enables parallel algorithm implementation, thus speeding up data processing and minimizing execution time. Runtime gains can make processing time possible for critical applications that require short time constraints or a large amount of data to be processed in a short time frame.