Dynamic Offloading Loading Optimization in distributed Fault Diagnosis system with Deep Reinforcement Learning Approach

Artificial intelligence and distributed algorithms have been widely used in mechanical fault diagnosis with the explosive growth of diagnostic data. A novel intelligent fault diagnosis system framework that allows intelligent terminals to offload computational tasks to Mobile edge computing (MEC) servers is provided in this paper, which can effectively address the problems of task processing delays and enhanced computational complexity. As the resources at the MEC and intelligent terminals are limited, performing reasonable resource allocation optimization can improve the performance, especially for a multi-terminals offloading system. In this study, to minimize the task computation delay, we jointly optimize the local content splitting ratio, the transmission/computation power allocation, and the MEC server selection under a dynamic environment with stochastic task arrivals. The challenging dynamic joint optimization problem is formulated as a reinforcement learning (RL) problem, which is designed as the computational offloading policies to minimize the long-term average delay cost. Two deep RL strategies, deep Q-learning network (DQN) and deep deterministic policy gradient (DDPG), are adopted to learn the computational offloading policies adaptively and efficiently. The proposed DQN strategy takes the MEC selection as a unique action while using the convex optimization approach to obtain the local content splitting ratio and the transmission/computation power allocation. Simultaneously, the actions of the DDPG strategy are selected as all dynamic variables, including the local content splitting ratio, the transmission/computation power allocation, and the MEC server selection. Numerical results demonstrate that both proposed strategies perform better than the traditional non-learning schemes.


Introduction
Large-scale and integrated equipment puts forward higher requirements for condition monitoring with the improvement of productivity [1,2,3]. Intelligent mechanical fault diagnosis algorithms have been accompanied by the development of artificial intelligence (AI) and Internet of Things (IoT) technologies, such as the application of deep learning (DL) and reinforcement learning (RL) in fault diagnosis [4,5,6,7,8,9]. A collaborative deep learning-based fault diagnosis framework is proposed to solve the data transmission problem in distributed complex systems, which is a security strategy that does not require the transmission of raw data [10]. An Improved classification and regression tree algorithm are proposed, which ensures the accuracy of fault classification by reducing the iteration time in the computation [11]. A fault diagnosis method based on adaptive privacy-preserving federated learning is used for the Internet of Ships, which guarantees no risk of data leakage by sharing model parameters [12]. A deep learning-based approach to automated fault detection and isolation is used for fault detection in automotive dashboard systems, which is tested against data generated from a local computer-based manufacturing system [13]. An intelligent fault detection method based on the multi-scale inner product is adopted for shipboard antenna fault detection, which uses the inner product to capture fault information in vibration signals and combines it with locally connected feature extraction [14].
The current intelligent fault diagnosis algorithm pays more attention to the reliability of the diagnosis and less attention to the timeliness [15]. The server's computation resources and the timeliness of data processing have become urgent problems to be solved with the exponential growth of diagnostic data throughput. The traditional fault diagnosis The framework of the conventional mechanical fault diagnosis system, in which the terminal uploads the monitoring data to a central server through the network cable for processing. The central server has powerful computing power but is generally far away from the terminal. The terminal is only responsible for collecting monitoring data and typically has no computing power. systems offload the diagnostic data collected by terminals to a server with powerful computing power for processing, as shown in Fig. 1. The server is usually far away from the acquisition terminal, which causes a waste of resources during transmission and increases data transmission delay [16]. The emergence of mobile edge computing (MEC) provides a solution to these problems, which is considered a promising architecture for data access [17,18,19]. MEC deploys several lightweight servers closer to the collection terminals compared to traditional state monitoring systems, which are called mobile edge servers. MEC servers can reduce the burden of performing computation for large content tasks and task processing delays significantly by allowing terminals to offload computation tasks to a nearby MEC server [20,21].
The architecture of MEC usually consists of the user layer and the mobile edge layer [22,23,24,25], as shown in Fig. 2. In the MEC paradigm, the user layer consists of mobile device terminals, which contain various applications and functions and also have certain computing capabilities. When processing each computing task, the device terminal can choose to process it on its own device in addition to offloading the task to the mobile edge layer or cloud layer through data transfer. The mobile edge layer consists of edge servers near the device terminals, which computing resources are more abundant than those of the device terminals. Through computing offload technology, information can be interacted with in real-time to meet the computing needs of different types of application scenarios. The MEC architecture has a wide range of application scenarios in the IoT, such as 5G communication, virtual reality, Internet of Vehicles, smart city, smart factory, etc. The MEC architecture has the advantages of low time delay, green and energy efficiency, security, location, content awareness, etc., which makes it easier to access AI methods and blockchain methods. Computing offloading as one of the core techniques of the MEC has received great attention recently. For simple, indivisible, or highly integrated tasks, binary offloading strategies are generally adopted, and tasks can only be computed locally or all offloaded to the servers [26]. The authors in [27] formulated the binary computation offloading decision problem as a convex problem, which minimizes the transmission energy consumption under the time delay constraint. The computation offloading model studied in [28] assumed that the application has to complete the computing task with a given probability within a specified time interval, for which the optimization goal is the sum of local and offloading energy consumption. This work concluded that offloading computing tasks to the MEC servers can be more efficient in some cases. In practice, offloading decisions can be more flexible. The computation tasks can be divided into two parts performed in parallel: one part is processed locally, and the other is offloaded to the MEC servers for processing [29]. A task-call graph model is proposed to illustrate the dependency between the terminal and MEC servers, in which decisions and latencies are investigated by the joint offloading scheduling and formulated as a linear programming problem [30].
RL has been employed as a new solution to the problem of MEC offloading, which is a model-free machine learning algorithm that can perform self-iterative training based on the data it generates [31,32,33,34]. Task processing delay is a vital optimization parameter for time-sensitive systems. The authors studied the problem of computation offloading in an IoT network in [35], in which the Q-learning-based RL approach was proposed for an IoT device to select a proper device and determine the proportion of the computation task to offload. The authors in [36] investigated joint communication, caching, and computing for vehicular mobility networks. A deep Q-learning-based RL with a multi-timescale framework was developed to solve the joint online optimization problem. In [37], the authors studied the offloading for the energy harvesting (EH) MEC network. An after-state RL algorithm was proposed to address the ...  The framework of the intelligent mechanical fault diagnosis system in this paper, whose contains three parts: intelligent terminal, agent server, and MEC servers. The intelligent terminal is responsible for collecting fault diagnosis data and has a weak data processing capability. The MEC servers are small servers with certain data processing capabilities. The agent server is responsible for policy formulation and controls the ratio of intelligent terminals and MEC servers to process the fault diagnosis data.

MEC
large time complexity problem, and polynomial value function approximation was introduced to accelerate the learning process. In [38], the authors also studied the MEC network with the EH device. The authors proposed hybrid-based actor-critic learning for optimizing the offloading ratio, local computation capacity, and server selection. From the above references, efficient computational offloading decisions based on RL methods can help the system to reduce computational complexity and computational time cost.
In the framework of the intelligent fault diagnosis system proposed in this paper, the user layer consists of intelligent terminals with certain computing power, and the mobile edge layer consists of MEC servers with strong computing power, as shown in Fig. 3. The intelligent terminal offloads the fault diagnosis data to any MEC server proportionally through the agent server's policy. The optimization problem becomes an offloading decision problem in a dynamic MEC environment, and the current channel state information (CSI) cannot be observed while making the offloading decision. The offloading policy should follow the predicted CSI and task arrival rates under the intelligent terminal and MEC server energy constraints, aiming to minimize the long-term average delay cost. We first establish a lowcomplexity deep Q-learning network (DQN) based offloading framework where the action includes only discrete MEC server selection, while the local content splitting ratio and the transmission/computation power allocation are optimized by the convex optimization method. Then we develop a deep deterministic policy gradient (DDPG) based framework which includes both discrete MEC server selection variable and constant local content splitting ratio, the transmission/computation power allocation variable as actions. The numerical results demonstrate that both proposed strategies perform better than the traditional non-learning scheme. The DDPG strategy is superior to the DQN strategy as it can online learn all variables. Compared with the traditional fault diagnosis system, the intelligent fault diagnosis system migrates the original computing tasks based on the central server to the edge computing system, which reduces the computing load of the central server, slows down the network bandwidth pressure, and improves the real-time data interaction. On the other hand, the new intelligent fault diagnosis system solves the problem of the single function of traditional instrumentation systems, which increases the intelligence of instrumentation and makes it easier to access other intelligent methods.
The contributions of this paper can be summarized as follows. 1) A new framework for the intelligent fault diagnosis system based on the MEC framework is proposed, in which MEC servers and intelligent terminals can process monitoring data and the ratio determined by the offload policy of the agent server. Compared with the traditional fault diagnosis system, the intelligent fault diagnosis system solves the problems of limited computing resources and network delay and increases the intelligence of the equipment.
2) Two offloading scenarios of the intelligent fault diagnosis system are modeled: one-to-one and one-to-multiple. One-to-one means that one MEC server can only be connected by one intelligent terminal simultaneously, and oneto-multiple implies that multiple intelligent terminals can be connected to the same MEC server simultaneously. The optimization goal is taking the maximum time delay for the system to complete the computation task at each time slot. Every intelligent terminal and MEC server has its energy constraints, and the agent determines the power allocation during the offloading process.
3) The offloading decision optimization algorithm based on the combination of convex optimization and deep reinforcement learning is designed. Firstly the convex optimization methods are used to solve the connection problem of the intelligent terminal need to choose which MEC server. Then the resource allocation of intelligent fault diagnosis system offloading is given by DQN and DDPG algorithm.
The remainder of this paper is structured as follows. The intelligent fault diagnosis system models are provided in Section 2. The DDPG-based Offloading Design and DQN-based Offloading Design are described in Section 3 and Section 4, respectively. The Numerical results and relevant analysis are presented in Section 5. The conclusion is given in Section 6.

The Intelligent Fault Diagnosis System Model
A new framework for the intelligent fault diagnosis system is proposed in this paper, which consists of MEC servers and intelligent terminals, as shown in Fig. 4. Both MEC servers and intelligent terminals can process monitoring data, and the intelligent terminal can offload data to any MEC server through the agent. The interaction between the intelligent terminal and the MEC server operates in the orthogonal frequency division multiple access frameworks. The offloading policy includes the local content splitting ratio, the transmission/computation power allocation, and the MEC server selection. According to the offloading policy, the monitoring data is split into two parts: one is offloaded to the MEC server for processing, and the remaining part is kept locally for processing by the intelligent terminal. The intelligent fault diagnosis system based on the MEC framework can be divided into three models: the network model, the communication model, and the computing model, which will be introduced separately in the following.

Network model of intelligent fault diagnosis system
The network of intelligent fault diagnosis system supporting offloading contains MEC servers and intelligent terminals, as shown in Fig. 1. Let  = {1, ⋯ , } and  = {1, ⋯ , } be the index sets of the MEC servers and the intelligent terminals, respectively. Part of the diagnostic data will be offloaded to the MEC server, assuming that the MEC server has more computing power than the intelligent terminal. The system time is divided into consecutive time frames with equal time period 0 and the time indexed by ∈ = {0, 1, ⋯}. The channel state information between the -th MEC server and the -th intelligent terminal is denoted as ℎ , , and the task size at intelligent terminal is marked as . The channel state information of the MEC network {ℎ , ( )} and the task arrival ( ) at each intelligent terminal change for each time interval ∈ . In order to save the energy consumption of intelligent terminals and MEC servers and reduce the task processing latency, the central agent node needs to determine the task ratio of local execution content size and offloading content size, as well as the power allocation ratio of local task processing and data transmission. The power splitting of the MEC server among multiple smart terminals should be determined if one MEC server is selected to help handle tasks from multiple intelligent terminals. The communication model and the computational model are described in detail below.  The working principle of intelligent fault diagnosis system in this paper. The intelligent terminal collects the fault diagnosis data and then requests a policy from the agent server. The offload policy of the agent server includes the local content splitting ratio, the transmission/computation power allocation, and the MEC server selection. Finally, the agent server offloads the fault diagnosis data to the MEC server according to the ratio determined by the offload policy.

Communication model of MEC servers and intelligent terminals
In the considered network of intelligent fault diagnosis systems, the communications are operated in an orthogonal frequency division multiple access framework, and a dedicated subchannel with bandwidth is allocated for each intelligent terminal for the partial task offloading. Supposing that intelligent terminal communicates with MEC server , the received signal at MEC receiver can be represented as where denotes the symbols transmitted from intelligent terminal , ( ) is the utilized power at intelligent terminal , and , denotes the received additive Gaussian noise with power 0 . Here the channel gains ℎ , ( ) follows the finite-state Markov chain (FSMC), and thus the communication rate between MEC server and intelligent terminal is give by (1)

Computing model of intelligent fault diagnosis system
The task ( ) received at intelligent terminal at time need to be processed during time interval . Denote the task splitting ratio as ∈ [0, 1] which indicates that at time interval , ( ) bits are executed at the intelligent terminal device and the remaining (1 − ) ( ) bits are offloaded to and processed by the MEC server.
1) Local computing: In local computation, the CPU of the intelligent terminal device is the primary engine, which adopts the dynamic frequency and voltage scaling (DVFS) technique and the performance of the CPU is controlled by the CPU-cycle frequency . Let ( ) denote the local processing power at intelligent terminal , then the intelligent terminal's computing speed (cycles per second) ( ) at -th slot is given by Let denote the number of CPU cycles required for intelligent terminal to accomplish one task bit. Then the local computation rate for intelligent terminal at time slot is given by 2) Mobile Edge Computation Offloading: The task model for mobile edge computation offloading is the datapartition model, where the task-input bits are bit-wise and can be arbitrarily divided into different groups. At the beginning of the time slot, the intelligent terminal chooses which MEC server to connect to according to the channel state. Assume that the processed power which is allocated to the intelligent terminal by the MEC server is , , then the computation rate , at MEC server for intelligent terminal is: where is the number of CPU cycles required for the MEC server to accomplish one task bit, and denotes the CPU-cycle frequency at the MEC server. It is noted that the MEC server can simultaneously process tasks from multiple intelligent terminals. We assume multiple applications can be executed parallel with a negligible processing latency. The feedback time from the MEC to the intelligent terminal is ignored due to the small sized computational output.

DQN-Based Offloading Design
In this section, we develop a DQN-based offloading framework for minimizing the long-term processing delay cost. With the development of the traditional Q-learning algorithm, DQN is particularly suitable for high-dimensional state spaces and possesses fast convergence behavior. The MEC system constructs the DQN environment in the considered DQN offloading design framework. A central agent node is set up to observe status, perform actions and receive feedback rewards. The center can be the cloud server or a MEC server.
The DQN-based offloading framework is introduced in the following, in which the corresponding state space, action space, and reward are defined. In the overall DQN paradigm, it is assumed that the instantaneous CSI is estimated at MEC servers using the training sequences and then delivered to the agent. The CSI observed at the agent is the delayed version due to the channel estimation operations and feedback delay. Only local CSI of intelligent terminals which connect to this MEC server is acquired for each MEC server.

System state and action spaces
System State Space: In the considered DQN paradigm, the state space observed by the agent includes the CSI of the overall network and the received task size ( ) at time . As the agent needs to consume extra communication overhead to connect the CSI from all MEC servers, the MEC server at time observes a delayed version of CSI at time − 1, i.e., {ℎ , ( − 1)}. Denote ( ) = ℎ 1,1 ( ), ℎ 1,2 ( ), ⋯ , ℎ , ( ) , The state space observed at time can be represented as System Action Space: The agent will take certain actions to interact with the environment with the observed state space ( ). As DQN can only take care of the discrete actions, the actions defined in the proposed DQN paradigm constitute only the MEC server selection. The MEC server selection action is denoted as ( ), which can be represented as where , ( ) = 0 means that the intelligent terminal does not select the MEC server at -th time slot, while , ( ) = 1 indicates that the intelligent terminal selects the MEC server at -th time slot.

Reward Function
In the DQN paradigm, the reward is defined as the maximum time delay required to complete all the tasks received at all intelligent terminals. After taking the actions, a dedicated MEC server can calculate the time delays required for the intelligent terminals choosing this MEC server to offload, as all MEC can observe the local CSI. With the loss of generality, we assume that intelligent terminal with ∈  offloads the tasks to MEC , where set  defines the indexes of the intelligent terminals selecting MEC server to offload tasks. To minimize the required time delays, the MEC server needs to formulate an optimize problem to find optimal ( ), ( ), ( ), and , ( ). It is worth noting that as the MEC server knows the instantaneous CSI at time , the solution can be obtained based on ( ), which is different from the MEC server selection taken based on ( −1). For the intelligent terminals which do not offload tasks to the MEC servers, the required time delays for local task processing can be known by these intelligent terminals. The agent collects all the time delay consumptions from the intelligent terminals and the MEC servers to obtain the final reward.
We detail how to compute the time delay for intelligent terminal , assuming that it selects MEC server to offload. The total time consumption for completing the task processing at intelligent terminal is denoted as , which equals to = max{ , , + , } where , , , and , denote the times required for intelligent terminal local task processing, task offloading transmission from intelligent terminal to MEC server , and task processing at MEC server, respectively.
With the computation rate defined in eq. (2), time can be represented as To maximize the reward, we need to minimize the time delay for each intelligent terminal under the total energy constraint at intelligent terminals and MEC servers. To illustrate the way to find optimal ( ), ( ), ( ), and , ( ) for different types of MEC server selection, we next present two typical offloading scenarios, that is, a MEC server serves one intelligent terminal and a MEC server serves two intelligent terminals. It is noted that the proposed way of solving ( ), ( ), ( ), and , ( ) can be extended to the case where a MEC server serves arbitrary number of intelligent terminals.

1) Scenario 1: one MEC server serves one intelligent terminal
The energy consumption at intelligent terminal , denoted by , includes two parts, i.e., one part for local partial task processing and another for partial task transmission. Therefore, can be written as The energy consumption at the MEC server for processing the partial task offloaded from intelligent terminal is denoted by , , and can be represented as The optimization problem formulated to find optimal ( ) = { ( ), ( ), ( ), , ( )} is given by where max, and max, denote the maximum available energy at intelligent terminal and MEC server , respectively. Problem (5) can be rewritten as To solve problem (6), we first find that at optimal solution, constraint (6c) must be active, which can minimize the the objective value (6a). We thus have Substituting (7) to problem (6), we have It is noted that problem (8) is a non-convex optimization problem. We propose an alternating algorithm to solve ( ) and ( ) and ( ) in different subproblems separately to find an efficient solution. In the first subproblem, we solve ( ) for given ( ), and ( ). To minimize the objective function, the optimal solution of ( ) should activate constraint (8b), that is, ( ) = max, − ( ) , , which implies In the second subproblem, we solve ( ) with given ( ) and ( ). The corresponding optimization problem is given by and ( ( )) = 1 ( ( )) + 2 ( ( )), it is known that the optimal ( ), denoted by * ( ), occurs in the following three cases, that is, 1 ( ) = 0, 2 ( ) = 1 or 1 ( 3 ( )) = 2 ( 3 ( )). Note that the solution of the third case can be obtained by solving a cubic equation. The final solution is given as * ( ) = By alternating three subproblems with the solutions given in (9), (10) and (12) until convergence, we obtain the final solution.
2) Scenario 2: one MEC server serves two intelligent terminals Assume that MEC server serves two intelligent terminals, e.g., intelligent terminal and intelligent terminal ′ , then the optimization problem can be formulated as follows The previously proposed iterative algorithm can still be applied here to solve ( ), ( ), ( ) and , ( ) with = { , ′ }. Here the only difference lies in solving , ( ) and , ′ ( ). The corresponding optimization problem can be formulated as It is worth noting that the optimal solution must activate the constraints and make the two terms within the objective function equal to each other. Therefore, the optimal , ( ) and , ′ ( ) can be obtained by solving the following equations Hence, under an action ( ), the system reward can be obtained as  The structure of the DQN-based offloading algorithm is illustrated in Fig. 5, and the pseudocode is presented in Algorithm 1.

DDPG-Based Offloading Design
Note that only the discrete actions can be handled by the DQN-based offloading design, where the reward acquisition mainly depends on solving the formulated optimization problems at MEC servers, which may increase the extra computing burden at the MEC servers. In this section, we rely on the DDPG to design offloading policy, considering that DDPG can deal with discrete and continuous value actions. Different from DQN, DDPG uses the Actor-Critic network to improve the accuracy of the model. In this section, we directly regard ( ), ( ), ( ), ( ), and , ( ) as the output action instead of disassembling the problem into two parts. System State Space: In the DDPG offloading paradigm, the system state space action is the same as the DQN-based offloading paradigm, which is given by where ( − 1) and ( ) defined in (4). As in the DQN offloading paradigm, the agent can only observe the delayed version of CSI due to channel estimation operations and feedback delay.
System Action Space: In DDPG offloading paradigm, the value of , ( ) is utilized to indicate the MEC server selection, which , ( ) = 0 represents that there is no partial task at intelligent terminal offloaded to the MEC server . In other words, the MEC server is not chosen by intelligent terminal . If , ( ) is not equal to 0, it means that the intelligent terminal decides to offload partial tasks to the MEC server . Since the intelligent terminal can only connect to one MEC server at one time slot, only one , ( ) in any time slot is not 0, and the remaining ones are 0. The action space of DDPG offloading paradigm can be expressed as It is noted that here the continuous actions ( ), ( ), , ( ) can be obtained based on state ( ) with delayed CSI ( − 1). System Reward Funciton: In the DDPG offloading algorithm, ( ), ( ), ( ), and , ( ) can be obtained from a continuous action space. With the decisions, the agent tells each intelligent terminal the selected MEC server and delivers ( ), ( ) to it to perform the offloading. Moreover, the agent needs to send , ( ) to each server to allocate computing resources. After that, the reward is obtained as in (15) by collecting observed at the MEC servers or intelligent terminals.
Compared to the DQN-based offloading paradigm, the DDPG-based offloading paradigm does not need the MEC servers to solve the optimization problems, which can release the computation burden at the MEC servers. However, as the DDPG algorithm is generally more complex than the DQN algorithm, the computation complexity unavoidably increases at the agent. The structure of the DDPG-based offloading algorithm is illustrated in Fig. 6. We provide the pseudocode in Algorithm 2.

Algorithm 2
The DDPG-based Offloading Algorithm 1: Randomly initialize the actor network and the critic network with weights and ; 2: Initialize target network and with weights ′ ← , ′ ← ; 3: Initialize the experience replay buffer ; 4: for each episode = 1, 2, ⋯ , do 5: Reset simulation parameters for the environment; 6: Randomly generate an initial state 1 ; 7: for each time slot = 1, 2, ..., do 8: Select an action = ( | ) + ∇ to determine the power for transmission and computation; 9: Execute action and receive reward and observe the next state +1 ; 10: Store the tuple ( , , , +1 ) into ; 11: Sample a random mini-batch of transitions ( , , , +1 ) from ; 12: Update the critic network by minimizing the loss ∶ 13: Update the actor network by using the sampled policy gradient: Update the target networks by:    Task Arrival Rate(Mbps)  Fig. 9: The delay comparison under different task arrival rates, where the red curve represents the delay of the DDPGbased computational offloading paradigm, the blue curve represents the delay of the DQN-based computational offloading paradigm, the yellow curve represents the delay of the "Random" policy, the purple curve represents the delay of the "Local computing" policy, and the green curve represents the delay of the "MEC server computing" policy.

Numerical Results
In this section, we present the numerical simulation results to illustrate the performance of the proposed two offloading paradigms. Assume that the time interval of the system is 1 ms, and the bandwidth of the intelligent fault MEC Server Computing Capability/Local Computing Capability   Energy Constraints of User(J)  The delay comparison under different energy constraints at the intelligent terminals, where the red curve represents the delay of the DDPG-based computational offloading paradigm, the blue curve represents the delay of the DQN-based computational offloading paradigm, the yellow curve represents the delay of the "Random" policy, the purple curve represents the delay of the "Local computing" policy, and the green curve represents the delay of the "MEC server computing" policy.
diagnosis system is 1 MHz. Additionally, the required CPU cycles per bit are 300 cycles/bit at the intelligent terminals and 120 cycles/bit at MEC servers. In the training process, the learning rate of the DQN-based offloading algorithm is 0.01. In the DDPG-based offloading algorithm, the learning rate of the actor network is 0.001, and the learning rate of the critic network is 0.001. In Fig. 7, we plot the training process of the DQN-based algorithm and the DDPG-based algorithm, where the blue curve represents the delay dynamics of the DQN-based algorithm and the red curve represents the delay dynamics of the DDPG-based algorithm. The delay of the system is in an unstable state with large fluctuations in the beginning, indicating that the agent is constantly exploring the environment randomly. After a period of learning, the delay decreases slowly, and the fluctuation range gradually gets smaller. After about 1200 iterations, the DDPG-based algorithm converges to a stable value; after about 1500 iterations, the DQN-based algorithm converges. At this time, the average reward of each episode no longer changes, and the training process is completed. The DDPG-based algorithm converges faster and can obtain a lower latency than the DQN-based algorithm. This indicates that the performance of the DDPG-based algorithm is better than the DQN-based algorithm for our offloading problem.
In Fig. 8, the DDPG-based computational offloading paradigm, the DQN-based computational offloading paradigm, and the "Random" policy are compared for different task sizes in terms of delay. "Random" means that the computing resources are allocated randomly. The delay difference between the three policies is slight at task sizes below 2.5 Bit, with the DDPG-based computational offloading paradigm having the smallest delay and the "Random" policy having the largest delay. The delay of the "Random" policy increases the most as the task size increases, while the latency of the DDPG-based computational offloading paradigm and the DQN-based computational offloading paradigm increases slightly less. The delay of DDPG's computational offload paradigm and DQN-based computational offload paradigm consistently remains low compared to the "Random" policy.
In Fig. 9, we illustrate the offloading delay as the function of the amount of tasks at intelligent terminals. Three benchmarks, namely "Random", "Local computing", and "MEC server computing", are chosen to compare the performance with the proposed two offloading paradigms. Here "Random" means that the computing resources are allocated in a random manner; "Local computing" and "MEC server computing" mean that the tasks are processed only at intelligent terminals and only at MEC servers, respectively. The curves in Fig. 9 show that the required time delay increases correspondingly as the amount of tasks grows. The computation delay of "Local computing" is the largest as intelligent terminals have little local computing capacity. "MEC server computing" performs better than "random scheme" when the task arrival rate is more significant than 2.7 Mbps, which indicates that when the task arrival rate increases, task offloading to MEC servers can obtain a lower time delay. When the task arrival rate is greater than 4 Mbps, the offloading time delay of "MEC server computing" is close to the DQN-based computation offloading algorithm, indicating that most tasks are offloaded to the MEC servers with large task sizes. Both proposed DQN and DDPG offloading paradigms achieve better performance than other benchmarks, proving the proposed methods' effectiveness. On the other hand, the DDPG-based computation offloading paradigm achieves a lower computation delay than the DQN-based computation offloading paradigm, which further verifies the superiority of the DDPG algorithm in dealing with high-dimensional continuous action-state space problems. Fig. 10 shows the impact of the computing capabilities of intelligent terminals and MEC servers on the processing delay. We fix the local computing capability as a constant value and increase the computing capacity of the MEC server continuously, so the computation delay of "Local computing" is not affected by the ratio of computing capacity between the intelligent terminal and MEC server. Under different computing capabilities, the proposed DQN and DDPG offloading paradigms can achieve better performance than the other three benchmarks, and the performance of the DDPG-based offloading paradigm is slightly better than the DQN-based offloading paradigm. When the ratio of MEC server computing capacity to intelligent terminal computing capacity locates between 2 and 3 and the ratio increases, the processing speed of the MEC server is faster than the intelligent terminal, and the intelligent terminal chooses to offload more tasks to the MEC server. The computation delay of "MEC computing" is smaller than the "random scheme". When the ratio exceeds 3 and as the ratio increases, the processing speed of the MEC server is significantly higher than the intelligent terminals. The intelligent terminal prioritizes the task offloading, and the task processing delay is still decreasing, but the downward trend slows down. The computation delay of "MEC computing" is lower than the "random scheme" and close to the DQN-based offloading paradigm, which indicates that most or all tasks are offloaded to the MEC servers. The decrease in the delay is mainly due to the increase in the computing capacity of the MEC servers. Fig. 11 illustrates the computation delay under different energy constraints at the intelligent terminals. The curves show that "MEC computing" is not affected by the change in energy of the intelligent terminal. "Local computing" highly depends on the intelligent terminal energy constraint, and the computation delay decreases significantly as the intelligent terminal energy increases. The increase in intelligent terminal energy indicates a fast local processing speed and high available transmission at the intelligent terminals, which can reduce the computation delay to a certain extent. The computation delay of DQN-based and DDPG-based offloading paradigms decreases significantly as the intelligent terminal's energy increases at the beginning. The computation delay gradually decreases when the intelligent terminal's energy reaches a certain level, which shows that the intelligent terminal's energy constraint significantly impacts the computation delay within a specific range. The computation delay has a weaker impact when the intelligent terminal's energy exceeds a certain range. The DQN-based and the DDPG-based offloading paradigms achieve better performance than other offloading methods under different intelligent terminal energy constraints, which indicates the effectiveness of the proposed computational offloading algorithms. What's more, the performance of the DDPG-based offloading paradigm is slightly better than the DQN-based offloading paradigm.

Conclusion
In this paper, we propose a novel framework for the intelligent mechanical fault diagnosis system, which is a resource allocation scheme based on deep reinforcement learning for offloading diagnostic data of multiple intelligent terminals. The optimization parameters and optimization objectives can be determined by modeling the data offloading scenario of the intelligent fault diagnosis system. Two deep reinforcement learning algorithms, i.e., DQN-based offloading strategy and DDPG-based offloading strategy, are investigated to solve the formulaic offloading optimization problem for obtaining the lowest latency. Comparing the different offloading schemes shows that the proposed deep reinforcement learning-based learning approach can reduce task processing latency under different system parameters. The intelligent fault diagnosis framework proposed in this paper allows easier access to other intelligent technologies, such as deep learning techniques for data calibration, federated learning techniques and blockchain technologies for protecting user data privacy.