Reinforcement Learning-Aided Edge Intelligence Framework for Delay-Sensitive Industrial Applications

With the advancement in next-generation communication technologies, the so-called Tactile Internet is getting more attention due to its smart applications, such as haptic-enabled teleoperation systems. The stringent requirements such as delay, jitter, and packet loss of these delay-sensitive and loss-intolerant applications make it more challenging to ensure the Quality of Service (QoS) and Quality of Experience (QoE). In this regard, different haptic codec and control schemes were proposed for QoS and QoE provisioning in the Tactile Internet. However, they maximize the QoE while degrading the system’s stability under varying delays and high packet rates. In this paper, we present a reinforcement learning-based Intelligent Tactile Edge (ITE) framework to ensure both transparency and stability of teleoperation systems with high packet rates and variable time delay communication networks. The proposed ITE first estimates the network challenges, including communication delay, jitter, and packet loss, and then utilizes a Q-learning algorithm to select the optimal haptic codec scheme to reduce network load. The proposed framework aims to explore the optimal relationship between QoS and QoE parameters and make the tradeoff between stability and transparency during teleoperations. The simulation result indicates that the proposed strategy chooses the optimal scheme under different network impairments corresponding to the congestion level in the communication network while improving the QoS and maximizing the QoE. The end-to-end performance of throughput (1.5 Mbps) and average RTT (70 ms) during haptic communication is achieved with a learning rate and discounted factor value of 0.5 and 0.8, respectively. The results indicate that the communication system can successfully achieve the QoS and QoE requirements by employing the proposed ITE framework.


Introduction
The dramatic growth in communication technologies, specifically Fifth-Generation (5G), has obtained enormous interest due to its evolving generic services like massive machine-type communication, Ultrareliable and Low Latency Communication (URLLC), and enhanced mobile broadband [1][2][3]. Apart from this, it is one of the robust key enablers to realizing the so-called Tactile Internet, which provides to steer and control physical and virtual objects at a distance in real-time. The Tactile Internet is envisioned to enable human-to-machine communication where a human being can communicate with machines in a physical/virtual environment and experience haptic sensations (touch and forces) along with conventional audio-video traffic [4]. However, the strict requirements of Tactile Internet such as ultralow latency, ultrahigh reliability, high availability, and ultrasecurity remain open problems to enabling haptic communication over 5G. As per report Release 15 of the Third-Generation Partnership Project (3GPP), the URLLC service provides the reliability of 99.9% for a data packet of size 32-byte with communication latency of 1 ms [5]. Tactile Internet demands a higher haptic data packet rate (1 kHz) and reliability greater than (1 kHz or higher), the packet loss rate 10 −3 -10 −7 availability (99.99%-99.99999%), and Endto-End (E2E) delay in milliseconds (1 ms) [16,17]. Moreover, these requirements rely upon the application's nature. For example, mission-critical applications, such as telesurgery, required the reliability of around 10 −7 with a latency of 1 ms, while teletherapy for phobia treatment demands E2E latency of around 50 ms and remote industrial management requires between 0.25-10 ms. The main challenge for Tactile Internet realization is the transmission of haptic traffic with a packet rate ≥1 kHz under low latency. The high packet rate induces network congestion that leads to packet delay and large packet loss and makes the system prone to QoS violation. These stringent requirements expose different challenges to guaranteeing the QoS and QoE provisioning for the emerging services of Tactile Internet. In the literature, various solutions have been proposed based on haptic codecs and control approaches to tackle the above-mentioned teleoperation challenges [18]. The recent studies focus on integrating different haptic data compression/reduction approaches with stability-ensured control methods, as presented in [18,19]. However, the aforementioned studies achieve stability of the system during interaction while degrading the quality of interaction. Figure 2 shows the dynamic movement of the human operator's hand by interacting with the moveable cube in the virtual application, and the dataset is adopted from the reference database of Tactile Internet haptic codec working group (IEEE P1918.1.1) [19]. Furthermore, it can be clearly observed from Figure 2 that most of the time, environment (operator/teleoperator) dynamics remain away from their peak values. As a result, the demands of the less dynamic environment for latency, reliability, availability, and security can be eased. The high level of environmental dynamicity involves stringent requirements to guarantee QoS and QoE. If we allocate the required QoS statically as per the demand of the application, then we get the guarantee for the desired QoE. However, the allocated network resources are unexploited most of the time. 7LPHPV To this end, there is a strong need for a strategy to achieve the tradeoff between stability and transparency of the system and meet the QoS and QoE requirements under variable time delay and high packet rate. Therefore, in this paper, an RL-based dynamic switching strategy has been proposed where the haptic codecs govern the allocated environment's (operator/teleoperator) dynamics and network resources. The proposed strategy estimates the network resources in real-time and utilizes one of the famous RL algorithms, Q-learning, to select the optimal haptic codec schemes. The proposed switching strategy aims to gain both stability and transparency while meeting the stringent QoS and QoE requirements of tactile Internet services like teleoperations. The proposed strategy collects the Explicit Congestion Notification (ECN) marking packets during communication and inputs to the Long Short-Term Memory (LSTM) model to estimate the congestion levels in the network and select the best codec scheme to reduce the congestion and provide the desired QoS and QoE. The core network design of the proposed system comprises SDN, and MEC technology to realize reduce network latency and support Tactile Internet applications. As we discussed, the SDN paradigm reduces the network complexity and provides dynamic sharing of network resources. Moreover, it provides flexibility to the network, where the operators can define virtual slices corresponding to use cases with a focus to ensure QoS. The major contributions of this paper are summarized below as: • We discuss the Tactile Internet challenges in the context of high pack rate/packet size and extensively review different haptic codec solutions and summarized their key contributions. • An Intelligent Tactile Edge (ITE) framework is proposed that incorporates the ECN mechanism, Machine Learning (ML), and particularly, RL policies for optimal haptic codecs selection based on network congestion level employing the Markov decision process (MDP). • In the proposed framework, an LSTM model is utilized to predict the congestion level in the network by exploiting ECN information in the TCP header of the data packet.
To select the best haptic codec scheme corresponding to the network congestion, an RL-based Q-learning algorithm is employed.
• We conduct different experiments to demonstrate that the proposed system selects the best haptic data compression and reduction scheme for haptic-enabled bilateral teleoperations to ensure QoS and QoE requirements.
The remainder of this paper is structured as follows. Section 2 provides state-of-the-art research work on haptic codec schemes to provision QoS/QoE for haptic-driven teleoperation services. Section 3 presents the proposed strategy to predict network congestion and select optimal solutions to the corresponding QoS and QoE parameters. Section 4 demonstrates the experimental results in terms of QoS and QoE provisioning. Section 5 presents a discussion and future challenges. Finally, Section 6 concludes this paper and presets possible future directions.

Related Research Work
In the literature, various schemes have been suggested to address the communication challenges, such as the high packet rate of the haptic data and the time-varying delay of the network. The haptic data sampling rate challenge imposes a strong demand to packetize and transmit haptic information with a data rate 1 kHz or higher to provide stability and transparency to the teleoperation systems [20].
In [21], an approach termed as Opportunistic Adaptive Haptic Sampling (OAHS) was proposed to provide a higher data rate and signal transmission during haptic communication over the network. The Webers fraction was employed to guarantee the QoS requirement of Tactile-driven services. A data-driven technique to decrease the packet rate of haptic data in case of limited channel bandwidth was proposed in [22] with a focus to guarantee QoS and QoE of the communication system. The authors claim that their proposed scheme improves the application service quality while minimizing computational complexity and latency. The effectiveness of the data-driven was evaluated based on energy consumption. The recent work in [23] presents a Perception-Lossless Codec (PLC) approach to meet the E2E delay requirement of the haptic-derive applications of the Tactile Internet. It utilizes run-length coding to reduce the coding delay, and coding loss and improves the Rate-Distortion (RD) performance.
The work in [24] offers a Perceptual Vibrotactile Coding (PVC) derived from Sparse Linear Prediction (SLP) combining a cutaneous sensitivity model to assist haptic-driven applications. PVC-SLP scheme calibrates the filter parameter from the vibrotactile signal to produce signals. The performance of the PVC-SLP scheme was measured on two public datasets, with 280 and 1001 vibrotactile signals. It is considered a reference standard for tactile codecs. In [25] a Perceptual Wavelet Quantization (VC-PWQ) was proposed for vibrotactile codecs, where input signals are split into multiple blocks to feed the Wavelet which then inputs these blocks into a different frequency spectrum. Then perceptual thresholds were measured using a psychophysical model to lower the signal correlations. The presented method performed well in terms of SNR and PSNR across standard datasets with 280 vibrotactile samples using various compression ratios The authors in [26] explored the bursty haptic traffic problems of high packet rate during teleoperation that exposes several network issues like transmission delay and packet failure. A peak-suppressing adaptive Perceptual Deadband (PDb) approach was proposed which dynamically manages the packet rate based on earlier packet transmission traces. The proposed algorithm is capable of adjusting the PDb with a focus to minimize transmission delay.
Similarly, the study in [27] presents a perceptual-based adaptive sampling scheme to automatically predict the just noticeable difference and reduce the traffic rate in real-time. The proposed approach adjusts the Weber threshold without prior knowledge to maintain the QoE. However, their proposed solution only outperformed when compared against a predetermined set of QoS and QoE measures, such as a system with no communication delays that is also transparent. To maximize the QoE and improve the RD, a deep neural network (DNN)-based end-to-end vibrotactile autoencoder was presented in [28]. Table 1 provides a summary of recently proposed haptic data codecs to enhance QoS and QoE provisioning for teleoperation systems. The improved QoS/QoE factors column indicates which QoS and QoE metrics were improved by the reviewed study. This includes the different haptic quality measuring metrics such as Signal-to-Noise Ratio (SNR), Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM), Haptic Structural Similarity Index Measure (HSSIM), Mean Square Error (MSE), and Mean Opinion Score (MOS). The MOS is a subjective QoE evaluation metric while others are objective metrics. The PSNR measures the signal power by dividing the maximum power by the distorted signal power, whereas, the SNR is the ratio of signal power to signal noise power. The SSIM and HSSIM measure the structural similarity after data compression. The Effectiveness column explains the level of significance of the explored study. We categorized the significance into three levels, Significant (Sig.), Not Significant (Not Sig.), and Mixed. The detail of these levels is explained in the caption of Table 1. Table 1. Summary of the haptic codec techniques for emerging Tactile-driven services. The Effectiveness column indicates the level of significance like Significant (Sig.), Not Significant (Not Sig.), and Mixed. The Sig. means schemes use standard database and baseline comparison, Not Sig. means use a standard database but lack baseline comparison and the Mixed means presented scheme shows baseline comparison in the absence standard database.

Proposed Technique Improved QoS/ Effectiveness Key Contribution QoE Factor(s)
OAHS [21] SNR Mixed Propose an adaptive sampling method to tune PDb thresholding in real time with a focus on minimizing network impairments for the telehaptic system.
To meet the stringent requirements of Tactile Internet ultralow delayed, the data-driven technique was proposed.
To minimize the coding delay, and coding loss and to improve the bit error rate the E2E perceptual-lossless codec was presented to ensure the quality of tactile services.
A sparse linear prediction coding-based vibrotactile technique was presented, where a cutaneous sensitivity function was employed to improve the quality of vibrotactile signals.
VC-PWQ [25] SNR, PSNR Sig. Introduce a hybrid scheme by coupling wavelet transformation and vibrotactile perceptual model for haptic-driven applications.

Peak-Suppressing [26] MSE Mixed
Present a peak-suppressing scheme to adjust the PDb and reduce packet rate with a focus to minimize the network load that leads to delay and packet loss.
Adaptive Sampler [27] MOS Not Sig. An adaptive sampling scheme for selected haptic sample transmission was presented to guarantee the QoS of the teleoperation systems.
In this paper, we proposed an ITE framework that dynamically selects the haptic codec scheme corresponding to the level of congestion in the network while attaining the desired QoE performance under variable network resources and required QoS.

Proposed Framework
The proposed strategy aims to predict the level of congestion in the network, compare them with the QoS and QoE requirements of the teleoperation system, and finally provide the optimal QoS and QoE solution to ensure the stringent requirements. To achieve this goal, the proposed RL-based ITE framework deals with the main two challenges: (1) estimate the network congestion and comparison with domain-specific requirements in real-time, and (2) select the optimal haptic codec to decrease the data rate without degradation in QoE performance and improve the system stability and transparency. This section presents a detailed discussion of the proposed ITE to deal with the aforementioned challenges. At first, we will discuss the ECN mechanism and how ECN-marked packets are used to predict the levels of congestion in communication networks. Secondly, we briefly discuss the Q-learning algorithm to select the optimal codecs based on the level of congestion. Figure 3 depicts the structure of the proposed ITE with a haptic-enabled bilateral teleoperation system scenario in line with the IEEE P1918.1 Tactile Internet reference architecture. The human operator at the master side uses the haptic device to control the remote objects at the salve domain over the communication network and received haptic feedback. The core network comprises SDN-based technology that enables the separation of the control and data plane and realizes Tactile applications. The SDN employs the OpenFlow protocol to communicate between the SDN switch and controller. The solid line indicated the transmission link weights in ms and the yellow dotted lines indicate the SDN control link with SDN-enabled switches. As we discussed in the previous section haptic-enabled services demand high QoS and QoE. As the packet size, and sampling rate for haptic traffic increase the network system induces congestion problems. To avoid congestion problems Active Queue Management (AQM) techniques are used that avoid the buffers overflow by dropping or marking the before buffer overflow [29,30]. In the literature, a lot of studies on AQM schemes have been suggested. The process of marking the data packets instead of dropping them when the network encounters incipient congestion is called ECN. The ECN process helps to reduce packet loss and latency. The data including ECN-marked packets help to understand the nature of the application and the level of congestion in the network. In our proposed ITE we utilized the ECN mechanism and take benefit of the Explicit Congestion Experienced (ECE) flag to collect the ECE-marked packets during haptic traffic transmission among the master and the slave. The ECE-marked packets are input into the LSTM model to forecast the level of congestion in the network. LSTM is a special form of recurrent neural network with memory blocks that make it efficient to learn long-term dependencies. The LSTM model comprises a cell, an input, an output, and a forget gate. LSTM mitigate the vanishing-gradient problem of recurrent neural network. In [31], the authors proposed intelligent active queue management using ECN, where they, utilized the neural network to estimate the congestion and improve the AQM algorithms based on predicted congestion. We borrowed the congestion prediction concept from the work present in [31] to predict the congestion on the rest of the path in real-time. The LSTM model was considered with three hidden layers and the following formula is used to calculate the number of neurons in each layer, it is defined as follows:

Network Congestion Prediction
where the N in is the number of inputs to the LSTM layer, N refers to the number of samples and L is the number of hidden layers. Around 30 neurons are selected for each hidden layer in the LSTM model. The model is trained on 6000 samples at 100 ms intervals for ten minutes period. The proposed ITE deploys the trained model at the edge, which is also known as the tactile support engine in the IEEE P1918.1 Tactile Internet reference architecture. At each iteration, the model predicts the congestion level in the network and inputs the Q-learning algorithm to select the appropriate action to provide the required QoS/QoE. The work in [32] motivates us to integrate the advantages of the RL algorithm and LSTM-based prediction method in resource management. The proposed utilizes LSTM to predict user mobility and an actor-critic algorithm to allocate the network resources in serval slices of the network.

Q-Learning Technique
In this paper, we formulate the dynamic section of the QoS, and QoE provision approaches as the Markov decision process (MDP), where the ITE algorithm selects optimal policy as an agent and the communication network behaves as an environment. The intelligent agent communicate with the environment in sequential decision-making process to observe the current state s(t), perform an action a(t) at time t, and move it into the next state s(t + 1). The state of the system, action, and reward function for the proposed system can be summarized as follows: • Agent: An agent is an entity, which takes out learning tasks in the system. In the proposed framework the ITE that recommends the haptic codec at the application layer acts as an agent. • State: A state of the understudy system provides the observation of the environment that can be examined by the agent. In our proposed framework, we define the discrete level of network congestion as the states S = s(1), s(2), s(3), . . . , s(T). • Action: An action reveals how an intelligent agent responds to the environment based on the observed state. In ITE, the action A is the set of the haptic codecs schemes, which reduce and compress the packet size or packet rate of haptic traffic during communication to ensure QoS and QoE provisioning. The agent performs an action from the action space of databases of haptic codecs approaches and is denoted as A = {a 0 , a 1 , a 2 , . . . , a n }, where n refers to the total available haptic codecs as listed in Table 1. In our case, we have a total of seven codecs, so n = 7 and action space is A = {a 0 , a 1 , a 2 , . . . , a 7 }. • Reward: The reward to the system is a scalar value. At each time, when an agent executes an action, the environment returns a reward as a response to the agent. The reward characterizes the behavior (good/bad) of the environment to action and the agent adjusts its policies based on the reward. In this system, our aim is to increase the network performance and provide QoS/QoE by minimizing the congestion that leads to delay and packet loss. The ITE uses the power function of the connection as a reward function and defined it as follows: where the T put is the throughput and Round Trip Time (RTT) total E2E delay that a packet experiences. So, the reward is explained as the ratio of throughput to RTT.
In the MDP, the successor state s t+1 of the system merely relies on the current state s t of the system instead of the entire prior knowledge. Figure 4 depicts the overview of the optimal haptic codec scheme selection process of the proposed ITE based on the level of network congestion. As it can be seen from Figure 4, in this randomized procedure, when agents interact with the system during learning episodes, the records of the agent's interaction with systems are recorded as a sequence of state, action, and rewards {s(1), a(1), r(1), s(2), a(2), r(2), . . . , s(T − 1), a(T − 1), r(T − 1)}. In every episode, the ITE performs the action a(t) to select the haptic codecs relating to the policy π(a t |s t ) = P[A = a t |S = s t ]. The agent selects the corresponding to the received observation from the environment and here we called it personalized learning action (PLA). After that agent receive a reward of r(t) and transfer to the next state. The goal is to find the optimal policy value, V π (s) = E π [Σ T−1 t=1 r(t)] for each state after actions select and maximize it using the optimal value function value V π * (s) = max π V π . To achieve this, we utilize the Q-learning algorithm to solve the MDP problem and maximize the state value to select the optimal action. (2) following the policy, the proposed scheme selects the haptic codecs from the database by action a(i); and (3) transit the system into the next state s j (t + 1) by receiving reward r(i).
In the literature, Q-learning and its variant algorithms have shown robust results to solve MDP problems. It is an off-policy approach that can be employed for any solution in the MDP framework. On the other hand, Q-learning uses, parametric operations to approximate the Q-function and defined it as follows: where Q(s t , a t ) is the Q-value and is calculated in the Q-table. α ∈ (0 < α ≤ 1) is a learning rate coefficient, r t is the immediate reward agent receive after performing action a t on state s t , and γ is the discount factor that helps to determine the importance of the future reward. Q (s t+1 , a) calculates the best Q-value form the next state. Figure 5 presents the high-level systematic and implementation flow of the proposed ITE framework. In the proposed ITE, firstly the system initialized the user interaction interface to interact with the system and parametric setting default values or as per user definition. It set the packet rate, link bandwidth, link latencies, packet size, and Internet protocol (IP) address.
After initialization system predicts the network congestion and measures the QoS and QoE of the application requirement to the predicted state. If the condition meets, then the system continues with the same codecs if not then the ITE agent selects the optimal codecs corresponding to the state of the system. In the next section, we will present the experiment analysis to prove the effectiveness of the proposed system. The list of default parameter values and settings for experiments are summarized in Table 2.

Result and Discussion
In this section, we discuss and highlight the efficacy of the proposed ITE framework with haptic-enabled bilateral teleoperation system scenarios using different simulation parametric settings. Default parameter values and settings for experiments and the list of haptic codecs are presented in Tables 1 and 2. The presented ITE is openly accessible at (https://github.com/zubair1811/IntelligentTactileEdge2022, accessed on 15 September 2022) for the reproducibility of our work and the ease of fellow researchers.

Simulation Setup
Our simulations environment is created in Python on a machine with an Intel Core i7 processor, 16GB memory, NVIDIA GeForce GT 1030 graphics card, and 64bit Linux (Ubuntu 18.04) operating system. The LSTM algorithm is programmed in the TensorFlow framework, and the Q-learning algorithm is programmed in Python. A Mininet emulator is utilized to design the communication network. Users may simulate actual network topologies with this tool. It also has built-in support for SDN architecture. For the experimentation, we utilized the publicly available 3 Degrees of Freedom (3DoF) teleoperation system haptic dataset [33]. A human operator at the master side uses a Phantom Omni haptic device and interacts with the virtual environment, which serves as a controlled domain, in order to capture the haptic traces. The position, velocity control signals of the human operator at the master side, and force feedback from the salve side for the 3DoF teleoperation dataset are depicted in Figure 6.

Latency Characterization
In haptic communication, QoS parameters are characterized by E2E delay, reliability, and synchronization [34]. The latency requirements for Tactile Internet applications depend on the type of application and dynamics of the remote environment. Some applications such as serious games (tele-soccer) demand latency value ≤10 ms and virtual reality-based teletherapy for phobia treatment E2E delay latency requirements (50 ms) with the packet failure rate (10 −3 -10 −7 ). The latency and packet loss rate deeply impact QoE and the quality of task for haptic-driven applications. The work in [35] presents the role of the Tactile Internet in industrial applications. The authors explored the emerging tactile industrial services and compared their QoS requirements with conventional industrial applications. For example, remote control industrial applications including, process automation, monitoring, maintenance, and fault reporting demand cycle time (≤50 ms), data rate (1∼100 Mbps), latency (≤50 ms) with packet loss rate around ≤10 −7 . Characterization of the latency for teleoperation systems is vital because it helps to provision QoS and QoE. Figure 7 depicts the analysis of the teleoperation system in terms of the E2E delay. To investigate the effect of network congestion with no external host and with multiple external hosts during haptic communication among the master and the slave, a simple haptic-enabled teleoperation system is developed and simulated for the number of external hosts = 1 to 20. The latency investigation with direct haptic communication without the external host, with the number of hosts = 10, 15, 20 is illustrated in Figure 7b-d, respectively. The analysis clearly shows that the trend with the increasing number of hosts increases the round trip time that a packet experiences from an average of 3.4 to 1159 ms. In Figure 7a, the direct haptic communication without an external host between master and slave, the packet latency is between 1 to 3.4 ms as compared to Figure 7c,d, where the average latency values are between 23 to 1159 ms. To demonstrate this latency analysis in-depth, Figure 8a-d depicts the histogram of the packet latencies. This analysis reveals the E2E delays of the haptic data traffic due to network load that induced congestion. Similar to Figure 7a-d, the result in Figure 8a-d indicates that the increase in the number of external host network experience congestion and lead the system to unstable and degrade the application performance. Figure 8a, indicates that most of the latency of the packet lies under 2 to 3 ms, and the frequency of the number of the packet under 4 ms is more than 95%. From Figure 8b-d, it can be clearly seen the latencies of the haptic traffic increase. In Figure 8b, most of the latency of the haptic packets centered between 20 to 60 ms, for Figure 8c it is between 600-1000 ms with an average of 849 ms and Figure 8d it is centered around 800 to 1500 ms with an average of 1159 ms. The packet interarrival times of teleoperation traces are also investigated and illustrated in Figure 9. The control signals as defined in the 3DoF teleoperation dataset are transmitted as a packet to the slave and the slave backward the haptic force feedback in response. The packet interarrival times are calculated as follows: where c refers to the control signals, i is the sample that is transmitted, and T indicates the time instantly. In simple, it can be written the interarrival time of the packet I c i is the difference between the current sample i at the time instant T c i and the previously transmitted signals at T c i−1 . Figure 9a, depicts the packet interarrival time for control signals and Figure 9b illustrates for the force feedback signals.

Learning Convergence
This section presents the study of the Temporal Difference (TD) error made by the proposed system in the learning process over time to explore the different parametric settings to get faster convergence of the system. The main goal is to attain a better convergence with a smaller number of training epochs to reduce the computation overhead. The TD is defined as follows: where r t + γ × max a Q (s t+1 , a) show TD for newly computed Q-value, while Q(s t , a t ) refers the previous Q-value for the state s t . The ∆Q refers to the change in the Q-value and is obtained by subtracting the prior value from the calculated target value. To study the effect of the learning rate α, on the TD error the α values from 0.1 to 0.5 is adopted. Practically, the values of the learning rate coefficient are selected in the range of 0.1 to 0.9, where α value 0 implies no learning of the agent and α = 1 mean the agent just focused on current knowledge and ignored the previous knowledge. In the proposed ITE we focus on the previous and current level of congestion in the network, so we investigate the ITE on learning rate values α = 0.1∼0.5. The γ highlights the importance of the future reward, and the value of discounted factor γ = 0.8 is adopted for experimentation. The system explored the hidden pattern to recommend the optimal action within fine time episodes (T = 40). The analysis of the TD error values for the ITE framework is illustrated in Figures 10 and 11 with values of α 0.1 and 0.5, respectively. Figure 10a-d depicts the TD for the number of training epochs = 1, 2, 5 for the number of external host = 5, 10, 15, 20 during haptic communication between master and slave. Figure 10a with 5 external host depicts more TD error than Figure 10c,d, where the value of TD error lies between 0 to 0.05. Similarly, in Figure 10a, the ITE system with a value of learning rate 0.1 and epochs = 2-5 shows less error. However, in Figure 10d, with external host = 20 the shows convergence with number of training epoch = 1. The ITE is also simulated with a higher value of α = 0.5 to explain the better convergence as depicted in Figure 11. Figure 11a-d clearly sgows that the system with α = 0.5 reveals a significant decrease in TD error. As compared to Figure 10a  The E2E performance of RTT during haptic communication with the external host are shown in Figures 12 and 13. Figure 12 illustrates the simulation results of the ITE in terms of RTT, whereas Figure 12d with 20 external host shows that the proposed system focus to endure the QoS/QoE of the tactile application as compared RTT with our system with 20 hosts in Figure 7d. The Flow Queue Controlling Queue Delay (FQ-CoDel) scheme without any intelligence was used as a baseline model to illustrate the effectiveness of the proposed ITE. The RTT simulation results of the FQ-CoDel with the random strategy to select haptic codecs are shown in Figure 13. Figure 14 also shows that the average throughput of the system fluctuates around 1.5 Mbps, which ensures the stability of the network and controls the congestion level while using the proposed ITE. Finally, a comparative analysis of the power function for ITE and FQ-CoDel-Random is depicted in Figure 15

Discussion and Future Work Directions
In [35], we proposed the network infrastructure to simulate and investigate the delaysensitive and loss-prone tactile industrial application. A simulator, known as IoTactileSim aims to provide a tool to examine the strict QoS and QoE requirement for tactile applications. Recent work [36] on QoS provisioning explored the haptic coding, control systems, and intelligent prediction techniques with a focus to deal with mission-critical and delaysensitive stringent QoS and QoE requirements like ultralow latency, ultrahigh reliability, high availability, and ultrasecurity. This paper aims to propose an ITE framework to utilize the recently proposed haptic traffic reduction or compression methods to adjust the packet rate and packet size based on the level of congestion in the network and improve QoS and QoE. The proposed ITE observes network impairments during remote interaction and compares them with QoS/QoE specifications. The threshold-based approach is used to ensure the QoS and QoE requirements, and to perform the optimal action for selecting the best codec to control the threat of QoS violation. There should be an adaptive approach to defining the application-specific threshold corresponding to network resources. Moreover, haptic traffic communication between the master and the slave is performed over the TCP. However, some studies in the literature suggest that the user datagram protocol is preferable due to the low header size and processing cost. Therefore, we intend to implement such real-time application protocols in our future research to overcome the extra computational and communication cost. To demonstrate the fruitfulness of the proposed system, there is a need to integrate a real-world haptic device at the master to transmit the haptic data and perform the industrial remote task at the salve side. In this regard, in future work, a haptic device will be utilized instead of a haptic dataset to controls the real-world tactile enabled remote control industrial applications. It will help to understand the more practical challenges related to haptic-driven applications in real-world scenarios.

Conclusions
In this work, we proposed an Intelligent Tactile Edge framework to predict the congestion level at the network layer by utilizing the ECN mechanism and adjusting the packet size and packet rate of the haptic transmission at the application layer by employing different haptic codec schemes with a focus ensure QoS and QoE. The proposed ITE utilizes the LSTM model to estimate the network congestion and then the Q-learning algorithm applies to take the action and select optimal haptic codecs to solve the communication network issues (transmission delay, jitter, and packet loss). We employ the power function of the connection as a reward function to optimize the Q-learning algorithm and find the pattern. The proposed ITE was investigated on the Tactile Internet standard 3DoF teleoperation traffic dataset. The simulation results suggest that the ITE is able to ensure the QoS and QoE requirements of the haptic-enabled bilateral teleoperation application. The proposed framework was simulated on different parametric settings to tune the algorithm and show the algorithm convergence to select the optimal parametric configuration. In the recent future, we plan to implement the proposed system in real-work application with a haptic device and extended it with advanced machine learning techniques such as federated learning-aware approaches at the ITE to support emerging technologies.

Conflicts of Interest:
The authors declare no conflict of interest.