# A Cluster-Based Optimal Computation Offloading Decision Mechanism Using RL in the IIoT Field

^{*}

*Applied Sciences*: Invited Papers in Computing and Artificial Intelligence Section)

## Abstract

**:**

## 1. Introduction

## 2. System Model

## 3. Problem Definition and Proposed Algorithm

**Agent**: The CH($c{h}_{i}$), $\forall c{h}_{i}\in \left\{c{h}_{1},\dots ,c{h}_{n}\right\}$**State**: ${s}^{i}\left(t\right)=\left({c}_{d}\left(t\right),{q}_{d}\left(t\right),{q}_{m}\left(t\right),{q}_{j}\left(t\right),{l}_{d}\left(t\right),{l}_{m}\left(t\right),{l}_{j}\left(t\right)\right),$ being the state of the created task at time slot t in cluster i.-${c}_{d}\left(t\right)$: the task created by device d at time slot t, $\forall d\in D$-${q}_{d}\left(t\right),{q}_{m}\left(t\right),{q}_{j}\left(t\right)$: the load of device d, serving MECS m, and serving D2D device j at time slot t, $\forall d,j\in D$, $\forall m\in M$-${l}_{d}\left(t\right),{l}_{m}\left(t\right),{l}_{j}\left(t\right)$: the location of device d, serving MECS m, and serving D2D device j at time slot t, $\forall d,j\in D$, $\forall m\in M$**Action**: ${a}^{i}\left(t\right)\in \left\{0,1,2\right\},$ being the offloading strategy of a task at time t in cluster i.**Reward (Penalty)**: $R\left({s}^{i}\left(t\right),{a}^{i}\left(t\right)\right)=1/\left\{\left(1-\alpha \right)\xb7{E}_{nor}\left(t\right)+\alpha \xb7{L}_{nor}\left(t\right)\right\},$ where $\alpha $ is the weighting factor between 0 and 1. ${E}_{nor}\left(t\right)$ indicates a normalized value of the total of computing and transmission energy consumed by the device when executing the task in a time slot t. ${L}_{nor}\left(t\right)$ denotes the normalized value of the execution delay of the task from the time slot t where it occurs until the job execution is completed.

## 4. Numerical Results

## 5. Conclusions

**Figure 2.**Performance comparison according to task arrival rate per device and the system architecture: (

**a**) task blocking rate; (

**b**) task completion rate within delay constraints; (

**c**) total energy consumption (J); (

**d**) throughput.

**Figure 3.**Performance comparison according to task arrival rate per device and the method of selecting the MECS and D2D device and determining offloading strategy decision: (

**a**) task blocking rate; (

**b**) task completion rate within delay constraints; (

**c**) total energy consumption (J); (

**d**) throughput.

**Figure 4.**Performance comparison according to cluster types when using different target MECS and D2D device selection methods with the same optimal offloading strategy (the number of devices is 52 and task arrival rate is 0.8): (

**a**) task blocking rate; (

**b**) task completion rate within delay constraints; (

**c**) total energy consumption(J); (

**d**) throughput.

**Figure 5.**Performance comparison according to cluster types when using same-target MECS and D2D device selection methods with the different optimal offloading strategy (the number of devices is 52 and task arrival rate is 0.8): (

**a**) task blocking rate; (

**b**) task completion rate within delay constraints; (

**c**) total energy consumption(J); (

**d**) throughput.

Parameter | Value |
---|---|

coverage of BS | 150 m [18] |

${B}^{cell}\left(t\right)$, ${B}^{d2d}\left(t\right)$ | 10 MHz |

${\sigma}^{2}$ | ${10}^{-10}$ |

${p}_{d,m}^{cell}\left(t\right),{p}_{i,j}^{d2d}\left(t\right)$ | 0.5 W |

${v}_{d}\left(t\right)$ | {600, 800, 1000} Kbits |

${w}_{d}\left(t\right)$ | 1000 cycles/bit |

${f}_{d}$ | 2 GHz |

${f}_{m}$ | 5 GHz |

$\epsilon $ | ${10}^{-27}$ |

$\alpha $ | 0.7 |

time slot duration | 100 ms |

${L}_{req}$ | 80 ms |

$\delta $ | 0.5 |

${q}_{d}^{max}$ | 3 |

${q}_{m}^{max}$ | 5 |

${h}_{d,m}^{cell}\left(t\right)$ | $148.1+40\ast {\mathrm{log}}_{10}\mathrm{distance}\left(\mathrm{km}\right)$ [19] |

${h}_{i,j}^{d2d}(t$) | $128.1+37.6\ast {\mathrm{log}}_{10}\mathrm{distance}\left(\mathrm{km}\right)$ [19] |

