#### 4.1. Problem Formulation

Following the above described reinforcement learning technique of the stochastic learning automata, each end-user has concluded to the selection of a MEC server to offload its data. Then, the goal of each MEC server is to maximize its profit by processing the end-users’ data, while the goal of each end-user is to maximize its perceived satisfaction, as expressed by its utility function, by offloading the optimal amount of data to the selected MEC server. Thus, a two-layer optimization problem is formulated, as follows.

As it is observed by Equations (

10a) and (

10b), the MEC servers optimal price

${\mathbf{p}}^{(\mathbf{t})*}$ and the end-users optimal data offloading

${\mathbf{b}}^{(\mathbf{t})*}$ are interdependent, thus the joint optimization problem is formulated as a two-layer optimization framework. Initially, the end-users determine their optimal data offloading

${\mathbf{b}}^{(\mathbf{t})*}$ via confronting the optimization problem of their personal utility functions as a non-cooperative game among them. Then, at the second layer, the MEC servers determine their optimal announced prices

${\mathbf{p}}^{(\mathbf{t})*}$ given the data offloading of the end-users, via solving an optimization problem. The formulation and solution of the optimization problem is performed at the SDN controller, where its advanced computing capabilities enable the fast decision-making. In the following two subsections, we analyze in detail each layer of the optimization problem.

#### 4.2. Optimal Data Offloading

At first, the optimal data offloading

${b}_{u,s}^{(t)*}$ of each end-user

u that has selected to offload its data to the MEC server

s at the time slot

t is determined. A non-cooperative game

$G=[U,\{{A}_{u}^{(t)}\},\{{U}_{u}^{(t)}\}]$ is formulated among the end-users who compete with each other towards determining their optimal data offloading. The game

G consists of three components: (a) the set of end-users (i.e., players)

$U=[1,\dots ,u,\dots ,|U|]$;; (b) the strategy space

${A}_{u}^{(t)}=[0,{I}_{u}^{(t)}]$, where

${b}_{u,s}^{(t)}\in {A}_{u}^{(t)}$; and (c) the end-user’s utility function

${U}_{u}^{(t)}$. Each end-user wants to maximize its personal utility function, while considering the physical limitations, as follows.

The concept of Nash Equilibrium is adopted towards determining a stable operation point for the system. At the Nash Equilibrium point, any of the end-users has no incentive to change its amount of data offloading, as no end-user can improve its utility by unilaterally changing its data offloading strategy.

**Definition** **1.** A data offloading vector${\mathbf{b}}_{\mathbf{u}}^{(\mathbf{t})*}=[{b}_{1,s}^{(t)*},\dots ,{b}_{u,s}^{(t)*},\dots ,{b}_{|U|,s}^{(t)*}],s\in S$is the Nash Equilibrium point of the game$G=[U,\{{A}_{u}^{(t)}\},\{{U}_{u}^{(t)}\}]$, if for every end-user u it holds true that${U}_{u}^{(t)}({b}_{u,s}^{(t)*},{\mathbf{b}}_{-\mathbf{u}}^{(\mathbf{t})*})\ge {U}_{u}^{(t)}({b}_{u,s}^{(t)},{\mathbf{b}}_{-\mathbf{u}}^{(\mathbf{t})*}),\forall {b}_{u,s}^{(t)}\in {A}_{u}^{(t)}$.

In the following analysis, our goal is to show the existence and uniqueness of a Nash Equilibrium for the data offloading game. The necessary and sufficient conditions are: (i) the strategy space ${A}_{u}^{(t)},\forall u\in U$ should be non-empty, convex and compact subset of an Euclidean space ${\mathbb{R}}^{U}$; and (ii) the utility function ${U}_{u}^{(t)}({b}_{u,s}^{(t)},{\mathbf{b}}_{-\mathbf{u}}^{(\mathbf{t})},{\mathbf{p}}^{(\mathbf{t})})$ is continuous in ${\mathbf{b}}_{\mathbf{u}}^{(\mathbf{t})}$ and quasi-concave in ${b}_{u,s}^{(t)}$.

**Theorem** **1.** The Nash Equilibrium point of the game$G=[U,\{{A}_{u}^{(t)}\},\{{U}_{u}^{(t)}\}]$exists and the end-user’s best response data offloading strategy is given as follows.where$0\le {b}_{u,s}^{(t)*}\le {I}_{u}^{(t)}$. **Proof** **of** **Theorem** **1.** The strategy space

${A}_{u}^{(t)}=[0,{I}_{u}^{(t)}]$ represents the amount of data that the end-user

u can offload to a MEC server

s, thus by defintion it is non-empty, convex, and compact subset of the Euclidean space

${\mathbb{R}}^{U}$. In addition, based on Equation (

3), the utility function

${U}_{u}^{(t)}({b}_{u,s}^{(t)},{\mathbf{b}}_{-\mathbf{u}}^{(\mathbf{t})},{\mathbf{p}}^{(\mathbf{t})})$ is continuous in

${\mathbf{b}}_{\mathbf{u}}^{(\mathbf{t})}$. Furthermore, we determine the second-order derivative of the utility function

${U}_{u}^{(t)}({b}_{u,s}^{(t)},{\mathbf{b}}_{-\mathbf{u}}^{(\mathbf{t})})$ with respect to

${b}_{u,s}^{(t)}$, as follows.

Given that $\frac{{\partial}^{2}{U}_{u}^{(t)}({b}_{u,s}^{(t)})}{\partial {b}_{u,s}^{(t)2}}<0$, the ${U}_{u}^{(t)}({b}_{u,s}^{(t)},{\mathbf{b}}_{-\mathbf{u}}^{(\mathbf{t})},{\mathbf{p}}^{(\mathbf{t})})$ is concave in ${b}_{u,s}^{(t)}$, it is also quasi-concave in ${b}_{u,s}^{(t)}$. Therefore, the Nash Equilibrium point of the game $G=[U,\{{A}_{u}^{(t)}\},\{{U}_{u}^{(t)}\}]$ exists.

Towards determining the best response strategy of each end-user, we calculate the critical points of the

${U}_{u}^{(t)}({b}_{u,s}^{(t)},{\mathbf{b}}_{-\mathbf{u}}^{(\mathbf{t})},{\mathbf{p}}^{(\mathbf{t})})$, as follows.

The data offloading of each end-user u should satisfy the physical limitations, i.e., $0\le {b}_{u,s}^{(t)}\le {I}_{u}^{(t)}$, thus we have the following cases.

Case A: If ${d}_{u}^{(t)}{p}_{s}^{(t)}>{\alpha}_{u}{\beta}_{u}$ then the best response strategy is ${b}_{u,s}^{(t)*}<0$. However, since the physical limitation imposed states that $0\le {b}_{u,s}^{(t)}$ and our function is concave, then the best response should be ${b}_{u,s}^{(t)*}=0$.

Case B: If ${d}_{u}^{(t)}{p}_{s}^{(t)}<\frac{{\alpha}_{u}{\beta}_{u}}{{I}_{u}^{(t)}\frac{{\beta}_{u}}{{B}_{-u}^{(t)}}+1}$ then the best response strategy is ${b}_{u,s}^{(t)*}>{I}_{u}^{(t)}$. However, since the physical limitation imposed states that ${b}_{u,s}^{(t)}\le {I}_{u}^{(t)}$ and our function is concave, then the best response should be ${b}_{u,s}^{(t)*}={I}_{u}^{(t)}$.

Case C: If $\frac{{\alpha}_{u}{\beta}_{u}}{{I}_{u}^{(t)}\frac{{\beta}_{u}}{{B}_{-u}^{(t)}}+1}\le {d}_{u}^{(t)}{p}_{s}^{(t)}\le {\alpha}_{u}{\beta}_{u}$ then the best response strategy is $0\le {b}_{u,s}^{(t)*}\le {I}_{u}^{(t)}$, which satisfies the physical limitation. In this case, the best response is given by the equation ${b}_{u,s}^{(t)*}=\frac{{B}_{-u}^{(t)}}{{\beta}_{u}}(\frac{{\alpha}_{u}{\beta}_{u}}{{d}_{u}^{(t)}{p}_{s}^{(t)}}-1)$. □

Theorem 1 proves the existence of the Nash Equilibrium point of the game G and determines the best response strategy for each end-user $u,u\in U$. In the following theorem, the uniqueness of the Nash Equilibrium point of the game G is examined.

**Theorem** **2.** The Nash Equilibrium point${b}_{u,s}^{(t)*},\forall u\in U,s\in S$of the game G is unique.

**Proof** **of** **Theorem** **2.** Towards proving the uniqueness of the Nash Equilibrium point

${b}_{u,s}^{(t)*}=B{R}_{u}({\mathbf{b}}_{-\mathbf{u}}^{(\mathbf{t})*})$, for Cases A and B, the Nash Equilibrium point is trivially unique, while for Case C we should show that the best response strategy

$B{R}_{u}({\mathbf{b}}_{-\mathbf{u}}^{(\mathbf{t})*})$ is a standard function [

23]. The properties of a standard function are the following:

Positivity $\mathbf{f}(\mathbf{x})>\mathbf{0}$;

Monotonicity: if $\mathbf{x}\ge {\mathbf{x}}^{\prime}$, then $\mathbf{f}(\mathbf{x})\ge \mathbf{f}({\mathbf{x}}^{\prime})$; and

Scalability: for all $a>1$, $a\xb7\mathbf{f}(\mathbf{x})\ge \mathbf{f}(\mathbf{a}\xb7\mathbf{x})$.

If a fixed point exists in a standard function, then it is unique [

23]. Using Equation (

12), the above properties of the standard function can be easily shown for the end-user’s best response function

$B{R}_{u}({\mathbf{b}}_{-\mathbf{u}}^{(\mathbf{t})*})$. Thus, the Nash Equilibrium point of the game

G is unique. □

In conclusion, it is noted that the optimal data offloading of each end-user is given by Equation (

12).