Next Article in Journal
Immobilised rGO/TiO2 Nanocomposite for Multi-Cycle Removal of Methylene Blue Dye from an Aqueous Medium
Next Article in Special Issue
Intelligent Traffic Signal Phase Distribution System Using Deep Q-Network
Previous Article in Journal
Development of a Caterpillar-Type Walker for the Elderly People
Previous Article in Special Issue
DOA Estimation in Low SNR Environment through Coprime Antenna Arrays: An Innovative Approach by Applying Flower Pollination Algorithm

Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

# A Cluster-Based Optimal Computation Offloading Decision Mechanism Using RL in the IIoT Field

by
Seolwon Koo
and
Yujin Lim
*
Department of IT Engineering, Sookmyung Women’s University, Seoul 04310, Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(1), 384; https://doi.org/10.3390/app12010384
Submission received: 30 November 2021 / Revised: 28 December 2021 / Accepted: 29 December 2021 / Published: 31 December 2021

:

## 2. System Model

We use the task execution time model and the energy consumption model to minimize the task execution time and energy consumption and that are the objectives of our proposed paper. $M$ represents a set of MECSs, with the number of MECSs equal to $| M |$. $D$ represents a set of devices, with the number of devices equal to $| D |$. $C$ represents a set of tasks, with the number of tasks equal to $| C |$. A device generates a task with a delay constraint given by a Poisson distribution in each time slot. We assume that the task is not divided and is processed locally or offloaded. The task set created in a time slot t is $C ( t )$. Each task $c d ( t )$ that is created by the device d ($d ∈ D$) in a time slot t consists of a tuple with three properties: $c d ( t ) = ( v d ( t ) , w d ( t ) , L r e q )$, where $v d ( t )$ (in Kbits) is the data size of the task and $w d ( t )$ (in cycles/bit) is the number of CPU cycles used per bit to process the task. $L r e q$ (in seconds) is the delay constraint of the task. Task execution delay is defined differently according to the offloading strategy of the task. To calculate task execution delay, the task transmission rate is essential if the task is transmitted to an MECS or another device. The task transmission rate depends on the offload location because of using different links. The communication between the devices and MECS uses a cellular link, while the communication between devices is performed through a D2D link. The task transmission rate is computed using the Shannon–Hartley formula, which is a general communication model. The task transmission rate between device d ($d ∈ D$) and MECS m ($m ∈ M$) in time slot t can be described as:
$R d , m c e l l ( t ) = B c e l l ( t ) log 2 ( 1 + p d , m c e l l ( t ) · h d , m c e l l ( t ) σ 2 )$
where $B c e l l ( t )$ denotes the channel bandwidth of cellular communication at time slot t and $p d , m c e l l ( t )$ is the power required by device d to transmit data to MECS m with the least load. $h d , m c e l l ( t )$ indicates the channel gain between device d and MECS m and $σ 2$ is the noise power. The task transmission rate between devices i and j ($i , j ∈ D$) in time slot t is described as:
$R i , j d 2 d ( t ) = B d 2 d ( t ) log 2 ( 1 + p i , j d 2 d ( t ) · h i , j d 2 d ( t ) σ 2 )$
where $B d 2 d ( t )$ denotes the channel bandwidth of the D2D communication at time slot t and $p i , j d 2 d ( t )$ is the power required by device i to transmit data to device j with the least load. $h i , j d 2 d ( t )$ indicates the channel gain between devices i and j. The downlink rate is not considered because the data output size of the task is relatively small. The task execution delay $L d ( t )$ of a task created by device d in time slot t is given as:
where $o d ( t )$ denotes the offloading strategy for $c d ( t )$. $o d ( t )$ is determined by the CH of the cluster where the task occurs. $o d ( t ) = 0$ indicates that $c d ( t )$ is executed locally on the device where it first occurs, $o d ( t ) = 1$ means that $c d ( t )$ is offloaded to the MECS with the least load, and $o d ( t ) = 2$ denotes that $c d ( t )$ is executed by device with the least load. $s d ( t )$ denotes $w d ( t ) · v d ( t )$, the number of CPU cycles required to process the task. Let $f d$, $f j$, and $f m$ be the computation capabilities of devices d, j, and MECS m, respectively. $W d ( t )$, $W j ( t )$, and $W m ( t )$ are the queuing delays of devices d, j, and MECS m, respectively, in a time slot t. In addition, as per the task execution delay (Equation (3)), the energy consumption of the device is defined differently according to the offloading strategy. When the task generated by device d in a time slot t is executed according to the offloading strategy $o d ( t )$, the energy consumption with respect to the device is stated as follows:
where $ε · ( f d ) 2$ denotes the energy consumption per CPU cycle and $ε$ is a constant dependent on the hardware architecture [5].

## 3. Problem Definition and Proposed Algorithm

In this section, we formulate the problem and describe its solution. We propose an algorithm to improve the system throughput and satisfaction degree associated with the quality of service (QoS) by reducing the total task execution delay and total energy consumption of devices in an IIoT environment, where devices and MECSs have limited computing capability and queue length for processing tasks. The objectives and constraints of this study can be summarized as follows:
where $α$ is a factor that balances the energy consumption and task execution delay and $O$ denotes the offloading strategy set of all tasks. Any $o d$ belonging to the set $O$ means an offloading strategy of a task generated in the device d in a time slot, thus, it can assume a value between 0, 1, and 2, such as C1. At O1 in Equation (5), ${ ( 1 − α ) E d ( t ) + α · L d ( t ) }$ indicates an offloading cost created by combining the energy consumption of a device and the task execution delay. Since the goal of the proposed method is to minimize the offloading cost, it is used as a reward function in a Markov decision process when learning progresses. Through C1, observe that the problem is an integer problem. $E d ( t )$ and $L d ( t )$ represent the energy consumption and task execution delay for processing task c, respectively. C2 demonstrates that the task execution delay must satisfy the delay constraints. Let $q d$ and $q m$ be the loads of device d and MECS m, respectively, while $q d m a x$ and $q m m a x$ denote the maximum loads of device d and MECS m, respectively.
We assume an IIoT scenario that is characterized by a division of work and a job shop. According to such characteristics, cooperating devices close to each other are clustered. After clustering, a CH in each cluster is selected considering the working characteristics and computational capabilities of the devices in the cluster. It is assumed that the selected CH does not change. However, it is difficult to find the optimal offloading strategy in an environment where the states of MECSs and devices change dynamically over time. In a dynamic environment, it is not appropriate to use conventional heuristic methods owing to a high computational complexity. Thus, we use the Q-learning algorithm, a model-free reinforcement learning (RL) scheme that can be executed on a CH without requiring prior knowledge of the environment and high computational resources. Prior to executing Q-learning, each CH determines its serving MECS and D2D device that execute the new tasks in the cluster. Its serving MECS and D2D device are selected considering their respective workloads. The CH determines whether the task is offloaded; if so, the task is offloaded to the serving D2D device or MECS. This approach ensures that the system throughput is improved and delay constraints are satisfied via reducing task blocking and queuing delay. We define the subsequent Markov decision process (MDP) as follows:
• Agent: The CH($c h i$),
• State: being the state of the created task at time slot t in cluster i.
-$c d ( t )$: the task created by device d at time slot t,
-: the load of device d, serving MECS m, and serving D2D device j at time slot t, $∀ d , j ∈ D$, $∀ m ∈ M$
-: the location of device d, serving MECS m, and serving D2D device j at time slot t, $∀ d , j ∈ D$, $∀ m ∈ M$
• Action: $a i ( t ) ∈ { 0 , 1 , 2 } ,$ being the offloading strategy of a task at time t in cluster i.
• Reward (Penalty): $R ( s i ( t ) , a i ( t ) ) = 1 / { ( 1 − α ) · E n o r ( t ) + α · L n o r ( t ) } ,$ where $α$ is the weighting factor between 0 and 1. $E n o r ( t )$ indicates a normalized value of the total of computing and transmission energy consumed by the device when executing the task in a time slot t. $L n o r ( t )$ denotes the normalized value of the execution delay of the task from the time slot t where it occurs until the job execution is completed.
According to the above mentioned MDP, we update the Q-value as follows:
$Q ( s i ( t ) , a i ( t ) ) ← ( 1 − δ ) · Q ( s i ( t − 1 ) , a i ( t − 1 ) ) + δ · ( R ( s i ( t ) , a i ( t ) ) )$
where $δ$ is the learning rate. Our proposed algorithm does not consider multi-hop transmissions.

## 4. Numerical Results

To evaluate the performance of our proposed algorithm, the effects of various indicators, such as the task arrival rate per device and cluster type, were tested. For experimental evaluation, we deployed four MECSs and 52 devices in an MEC system, with the locations of these devices randomly distributed in a 250 m × 250 m square area. The values of the parameters required for the experiments are reported in Table 1.

## Author Contributions

Conceptualization, S.K. and Y.L.; Methodology, S.K.; Software, S.K.; Writing—Review Editing, S.K. and Y.L.; Supervision. Y.L. All authors have read and agreed to the published version of the manuscript.

## Funding

This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea Government(MSIT) (No.2021R1F1A1047113).

Not applicable.

Not applicable.

Not applicable.

## Conflicts of Interest

The authors declare no conflict of interest.

## References

1. Sisinni, E.; Saifullah, A.; Han, S.; Jennehag, U.; Gidlund, M. Industrial Internet of Things: Challenges, Opportunities, and Directions. IEEE Trans. Ind. Inform. 2018, 14, 4724–4734. [Google Scholar] [CrossRef]
2. Sun, W.; Liu, J.; Yue, Y. AI-Enhanced Offloading in Edge Computing: When Machine Learning Meets Industrial IoT. IEEE Netw. 2019, 33, 68–74. [Google Scholar] [CrossRef]
3. Li, X.; Wan, J.; Dai, H.N.; Imran, M.; Xia, M.; Celesti, A. A Hybrid Computing Solution and Resource Scheduling Strategy for Edge Computing in Smart Manufacturing. IEEE Trans. Ind. Inform. 2019, 15, 4225–4234. [Google Scholar] [CrossRef]
4. Lin, C.; Deng, D.; Chih, Y.; Chiu, H. Smart Manufacturing Scheduling with Edge Computing using Multiclass Deep Q Network. IEEE Trans. Ind. Inform. 2019, 15, 4276–4284. [Google Scholar] [CrossRef]
5. Mao, Y.; You, C.; Zhang, J.; Huang, K.; Letaief, K.B. A Survey on Mobile Edge Computing: The Communication Perspective. IEEE Commun. Surv. Tutor. 2017, 19, 2322–2358. [Google Scholar] [CrossRef] [Green Version]
6. Hong, Z.; Chen, W.; Huang, H.; Guo, S.; Zheng, Z. Multi-Hop Cooperative Computation Offloading for Industrial IoT–Edge–Cloud Computing Environments. IEEE Trans. Parallel Distrib. Syst. 2019, 30, 2759–2774. [Google Scholar] [CrossRef]
7. Xie, J.; Jia, Y.; Chen, Z.; Nan, Z.; Liang, L. D2D Computation Offloading Optimization for Precedence-Constrained Tasks in Information-Centric IoT. IEEE Access 2019, 7, 94888–94898. [Google Scholar] [CrossRef]
8. Mehrabi, M.; You, D.; Latzko, V.; Salah, H.; Reisslein, M.; Fitzek, F.H.P. Device-Enhanced MEC: Multi-Access Edge Computing (MEC) Aided by End Device Computation and Caching: A Survey. IEEE Access 2019, 7, 166079–166108. [Google Scholar] [CrossRef]
9. Zhi, L.; Zhu, Q. Genetic Algorithm-Based Optimization of Offloading and Resource Allocation in Mobile-Edge Computing. Information 2020, 11, 83. [Google Scholar] [CrossRef] [Green Version]
10. Yang, G.; Hou, L.; He, X.; He, D.; Chan, S.; Guizani, M. Offloading Time Optimization via Markov Decision Process in Mobile-Edge Computing. IEEE Internet Things J. 2021, 8, 2483–2493. [Google Scholar] [CrossRef]
11. Yang, Y.; Long, C.; Wu, J.; Peng, S.; Li, B. D2D-Enabled Mobile-Edge Computation Offloading for Multiuser IoT Network. IEEE Internet Things J. 2021, 8, 12490–12504. [Google Scholar] [CrossRef]
12. Hossain, M.S.; Nwakanma, C.I.; Lee, J.M.; Kim, D.S. Edge Computational Task Offloading Scheme using Reinforcement Learning for IIoT Scenario. ICT Express 2020, 6, 291–299. [Google Scholar] [CrossRef]
13. Liu, H.; Cao, L.; Pei, T.; Deng, Q.; Zhu, J. A Fast Algorithm for Energy-saving Offloading with Reliability and Latency Requirements in Multi-Access Edge Computing. IEEE Access 2020, 8, 151–161. [Google Scholar] [CrossRef]
14. Wang, D.; Tian, X.; Cui, H.; Liu, Z. Reinforcement Learning-based Joint Task Offloading and Migration Schemes Optimization in Mobility-aware MEC Network. China Commun. 2020, 17, 31–44. [Google Scholar] [CrossRef]
15. Yu, B.; Zhang, X.; You, I.; Khan, U.S. Efficient Computation Offloading in Edge Computing Enabled Smart Home. IEEE Access 2021, 9, 48631–48639. [Google Scholar] [CrossRef]
17. Qian, Y.; Wu, J.; Wang, R.; Zhu, F.; Zhang, W. Survey on Reinforcement Learning Applications in Communication Networks. J. Commun. Inf. Netw. 2019, 4, 30–39. [Google Scholar]
18. Liao, Z.; Peng, J.; Xiong, B.; Huang, J. Adaptive Offloading in Mobile-Edge Computing for Ultra-dense Cellular Networks based on Genetic Algorithm. J. Cloud Comput. 2021, 10, 15. [Google Scholar] [CrossRef]
19. Hu, G.; Jia, Y.; Chen, Z. Multi-User Computation Offloading with D2D for Mobile Edge Computing. In Proceedings of the 2018 IEEE Global Communications Conference (GLOBECOM), Abu Dhabi, United Arab Emirates, 9–13 December 2018. [Google Scholar]
Figure 1. The proposed system architecture of an IIoT environment.
Figure 1. The proposed system architecture of an IIoT environment.
Figure 2. Performance comparison according to task arrival rate per device and the system architecture: (a) task blocking rate; (b) task completion rate within delay constraints; (c) total energy consumption (J); (d) throughput.
Figure 2. Performance comparison according to task arrival rate per device and the system architecture: (a) task blocking rate; (b) task completion rate within delay constraints; (c) total energy consumption (J); (d) throughput.
Figure 3. Performance comparison according to task arrival rate per device and the method of selecting the MECS and D2D device and determining offloading strategy decision: (a) task blocking rate; (b) task completion rate within delay constraints; (c) total energy consumption (J); (d) throughput.
Figure 3. Performance comparison according to task arrival rate per device and the method of selecting the MECS and D2D device and determining offloading strategy decision: (a) task blocking rate; (b) task completion rate within delay constraints; (c) total energy consumption (J); (d) throughput.
Figure 4. Performance comparison according to cluster types when using different target MECS and D2D device selection methods with the same optimal offloading strategy (the number of devices is 52 and task arrival rate is 0.8): (a) task blocking rate; (b) task completion rate within delay constraints; (c) total energy consumption(J); (d) throughput.
Figure 4. Performance comparison according to cluster types when using different target MECS and D2D device selection methods with the same optimal offloading strategy (the number of devices is 52 and task arrival rate is 0.8): (a) task blocking rate; (b) task completion rate within delay constraints; (c) total energy consumption(J); (d) throughput.
Figure 5. Performance comparison according to cluster types when using same-target MECS and D2D device selection methods with the different optimal offloading strategy (the number of devices is 52 and task arrival rate is 0.8): (a) task blocking rate; (b) task completion rate within delay constraints; (c) total energy consumption(J); (d) throughput.
Figure 5. Performance comparison according to cluster types when using same-target MECS and D2D device selection methods with the different optimal offloading strategy (the number of devices is 52 and task arrival rate is 0.8): (a) task blocking rate; (b) task completion rate within delay constraints; (c) total energy consumption(J); (d) throughput.
Table 1. Simulation parameters.
Table 1. Simulation parameters.
ParameterValue
coverage of BS150 m [18]
$B c e l l ( t )$, $B d 2 d ( t )$10 MHz
$σ 2$$10 − 10$
0.5 W
$v d ( t )${600, 800, 1000} Kbits
$w d ( t )$1000 cycles/bit
$f d$2 GHz
$f m$5 GHz
$ε$$10 − 27$
$α$0.7
time slot duration100 ms
$L r e q$80 ms
$δ$0.5
$q d m a x$3
$q m m a x$5
$h d , m c e l l ( t )$ [19]
$h i , j d 2 d ( t$) [19]
 Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Share and Cite

MDPI and ACS Style

Koo, S.; Lim, Y. A Cluster-Based Optimal Computation Offloading Decision Mechanism Using RL in the IIoT Field. Appl. Sci. 2022, 12, 384. https://doi.org/10.3390/app12010384

AMA Style

Koo S, Lim Y. A Cluster-Based Optimal Computation Offloading Decision Mechanism Using RL in the IIoT Field. Applied Sciences. 2022; 12(1):384. https://doi.org/10.3390/app12010384

Chicago/Turabian Style

Koo, Seolwon, and Yujin Lim. 2022. "A Cluster-Based Optimal Computation Offloading Decision Mechanism Using RL in the IIoT Field" Applied Sciences 12, no. 1: 384. https://doi.org/10.3390/app12010384

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.