UbiPriSEQ—Deep Reinforcement Learning to Manage Privacy, Security, Energy, and QoS in 5G IoT HetNets

.


Introduction
We have come a long way towards realizing Mark Weiser's 1991 vision of ubiquitous computing, where anytime, anywhere technologies will penetrate into our daily lives to such an extent that they become indistinguishable from the environments we live in. Cyber-physical systems (CPSs) such as smart cities and societies are examples of such environments [1,2]. The vision is to make systems and applications coexist in these environments, interacting with each other, producing and consuming historical and real-time information emanating from humans, sensors, and machines, and improving quality of life in many areas, for example, transportation [3][4][5][6], healthcare [7], and others [8].
The Internet of Things (IoT) is a vital technology to enable sensing, actions, and other interactions among various "entities" in these ubiquitous environments [9][10][11][12]. The number of "things" connected to the IoT is expected to reach 50 billion by 2020 due to the massive influx of diverse "things" emerging rapidly [13]. This will enable us to monitor and control our environments at micro-levels, in time and space, advancing the path to providing interactive, personalized, and smarter services. However, it will also require solving challenges related to transport, management and analysis of data generated from IoT, particularly the challenges related to the 4Vs of big data analytics (Volume, Velocity, Variety, and Veracity) [9,14]. Any movement of data requires energy and could increase latency. The data, analytics, and other computations would have to be optimally migrated and placed in these environments-on the device, or in the cloud, fog, or edge layers-such that various energy efficiency and QoS demands of these applications and the environments are best satisfied. The ubiquitous environments hence will need reliable, high speed, and low latency wireless networks.
5G networks with their significantly higher speeds (50 Mbps-Gbps), lower-latency, reliability, capacity, and other benefits offer a powerful communication platform for ubiquitous environments [15]. However, privacy and security of users and data will pose paramount challenges in the adoption of these ubiquitous environments by governments, industry, and common people. A particular challenge in this context would be to preserve privacy and security while delivering quality of service (QoS) and energy-efficiency.
Several works have tried to address these challenges (QoS, energy efficiency, security, and privacy) in distributed environments [16][17][18][19][20][21]. Despite these efforts, a number of gaps exist (see Section 2 for a detailed literature review). Firstly, the focus of the existing approaches is on improving or optimizing one or two of the design parameters-QoS, energy efficiency, security or privacy. Optimizing one or more parameters without measuring and counteracting their impact on other parameters is no more acceptable in the emerging ubiquitous environments. Secondly, existing approaches assume a fixed network configuration and attack models, and use fixed strategies in devising solutions. However, real environments are unpredictable and hence a priory identification of all design and operations parameters is not feasible. Solutions are needed that dynamically and adaptively devise strategies and make decisions in rapidly changing ubiquitous environments and holistically optimize performance subject to varying sets of policy constraints. This paper proposes UbiPriSEQ, a framework that adaptively, dynamically, and holistically optimizes QoS, energy-efficiency, security, and privacy of IoT devices and 5G heterogeneous networks (HetNet) based environments. The Framework is built on a three-layered model of ubiquitous environments (IoT Devices Layer, FRAN Layer, and Core Layer). It comprises two modules (Device Module and Network Module). Device Module executes within the IoT Devices Layer, and Network Module runs in both the FRAN Layer and Core Layer. See Figure 1. For background on Core Layer, FRAN Layer, and IoT Devices Layer, see References [17,[22][23][24][25][26][27][28].  UbiPriSEQ allows various IoT devices and other entities to fulfil their computing and data requirements while also managing their privacy and security needs. For example, a smartphone may need to offload a deep learning (DL) computation and associated data to FRAN Layer due to its low-latency constraints and the lack of computing resources to execute the computation on board. Larger and latency-tolerant jobs that require bigger resources are offloaded to Core Layer, which comprises clouds, high performance computing (HPC) clusters, big data analytics facilities, and other infrastructures. The data and computations may also be offloaded or migrated due to security and privacy reasons, or perhaps because the data required by a computational job at a device resides in FRAN Layer. Many other possibilities also exist.

Core
Device Module decides whether to offload computations to the fog or cloud, or run locally. This is done while managing privacy, priority, energy-efficiency, and QoS requirements of the computations. Network Module devises policies for data migration between fog nodes to preserve privacy. It also broadcasts dummy signals to confuse eavesdroppers to address privacy and security concerns. The two modules continuously learn from each other and adaptively optimize the ubiquitous environment. All in all, UbiPriSEQ devises policies and makes decisions related to a number of important parameters including local processing and offloading rates for data and computations, radio channel states, transmit power, task priority and selection of a fog node for offloading, data migration, and others. These decisions are made dynamically, adaptively, and holistically, as needed, and based on the environment to optimize QoS, energy efficiency, security and privacy. The operational intelligence in UbiPriSEQ is provided by Deep Reinforcement Learning (DRL).
The UbiPriSEQ framework proposed in this paper is novel for several reasons. As mentioned above, the earlier works have focused on one or two of the network design parameters while we attempt to holistically optimize QoS, energy efficiency, security, and privacy (data privacy, user location privacy, and user pattern privacy). Secondly, UbiPriSEQ adaptively and dynamically deals with the environment by continuous learning, and does not need a priory complete specification of the environment, security attack models, and other design parameters. Thirdly, we use DRL to provide adaptive intelligence which has not been investigated before in 5G networks. Fourthly, most existing works use simulated data for performance evaluation while we have used SpMV, a real-life application to study UbiPriSEQ performance, and this has not been reported before. The SpMV operation is central to deep learning algorithms and this per se would create a significant impact.
The rest of the paper is organized as follows. Section 2 reviews the relevant works. Section 3 discusses system design requirements. Section 4 describes the UbiPriSEQ framework and Section 5 evaluates the system. Section 6 concludes and gives future directions.

Related Work
Many works have focused on offloading computations to fog or edge devices in order to address latency and energy consumption. Zhang et al. [29] and Liu et al. [30] address latency optimized and energy optimized offloading respectively. Sun et al. [31] optimizes the computational energy efficiency and latency on offloading to the mobile edge. Jiao et al. [20] and Dong et al. [19] discuss multi-user task offloading, which involves multiple users offloading tasks to an edge server. Partial offloading which involves both local processing and edge processing to reduce latency and energy have been discussed by Ren et al. [32] and Kuang et al. [33] respectively. Cooperative offloading, i.e., offloading computation tasks with the help of relay nodes to MEC servers far from the users to meet the computational demands, have been discussed in Reference [34]. Task offloading with energy-harvesting have been discussed by Mahmood et al. [35] and Xu et al. [36]. Extensive literature on this work can be found in References [21,37]. However, these works does not address the privacy or security issues faced by the users and the user devices.
Privacy (especially location-based privacy) have been explored in various networking environments such as wireless sensor networks [38], wireless networks [39], and location-based services [40]. Data privacy has been studied extensively leading the development of several new encryption techniques and distributed multi-party computational solutions; see for example, References [41][42][43]. Large number of works have discussed the privacy and security issues while offloading computations in an MEC network (see the surveys in References [17,18]). However, most of these works address privacy and security issues derived from older cloud based systems. There are only a few works that address privacy and security issues specific to edge and fog networks. He et al. [44] have proposed Constrained Markov Decision Process (CMDP) based technique for offloading computations to the mobile edge to reduce energy consumption and delay with a goal to maintain a pre-specified level of privacy (location and usage pattern). Preserving location privacy while optimizing the energy consumption in presence of a compromised edge service provider have been discussed in Reference [45], where a deep post decision state (PDS) learning algorithm that exploits information about energy harvesting process, has been proposed to offload to various edge devices so that the energy consumption is minimal while maintaining specified privacy. He et al. [46] propose privacy-preserving and cost-efficient (PEACE) task offloading scheme that can preserve user location privacy when there is a user presence inference attack that invades user privacy when offloading tasks to the edge nodes. PEACE is developed based on Lyapunov optimization framework. These works have mainly focussed on offloading with privacy without considering security and other design challenges.
Even though physical layer security techniques such as noisy signals and receiver jamming ( [47]) have been utilized to promote the security and privacy of the conventional wireless communication systems [48], there are very few recent works that utilize such techniques. A novel secure offloading scheme based on physical layer has been proposed in Reference [49]. In this work, the edge server broadcasts jamming signals and utilizes full duplex communication to impede the eavesdroppers and reduce the self interference. The optimal jamming signal and offloading ratio is solved as a bilevel optimization problem. Efficient offloading algorithm is developed based on this technique. Similarly, Zhao et al. [50] utilize a physical layer based technique and massive multiple input multiple output (mIMO) to provide security while the user's total energy consumption is minimized by jointly optimizing the user's offloading data bits, transmission power, and offloading rate. A user side secure sub-carrier allocation has been studied in Reference [50]. Unmanned aerial vehicle (UAV) secure offloading for single antenna mobile edge systems have been studied in Reference [51]. A block chain based cooperative computation offloading and resource allocation scheme considering security and privacy have been discussed in Reference [52]. These works have mainly focussed on offloading with security without considering privacy and other design challenges (except the last work [52] that considers both security and privacy).
The literature review shows that the focus of the existing approaches is on improving or optimizing one or two of the design parameters while we propose to holistically optimize QoS, energy efficiency, security, and privacy. Also, the way our proposed framework UbiPriSEQ adaptively and dynamically deals with the environment by continuous learning is not seen before. Moreover, the use of DRL to provide adaptive intelligence in 5G networks is also novel. Finally, no work similar to UbiPriSEQ exists that have used a real-life application (SpMV) to implement and study the system performance.

UbiPriSEQ: System Requirements
We discuss here the design challenges for 5G IoT HetNets [53][54][55][56], which we have used as the software design requirements for the UbiPriSEQ system.

QoS
Satisfactory QoS is the primary goal for network design. QoS metrics include throughput, packet loss, bit error rate, throughput, latency, jitter, and others. While all QoS metrics are important, latency has become a bottleneck due to emerging interactive, real-time, and safety-critical applications, for example, online gaming, augmented reality, 4K video streaming, drone control, and remote surgery.

Energy-Efficiency
Energy-efficiency (using less energy to perform the same task) has become a key goal in designing any systems, vehicles, buildings, software, and so forth. The drivers include reducing greenhouse gas emissions for planet sustainability, costs, and power requirements. For instance, offloading computations to fog instead of cloud could reduce energy usage significantly.

Privacy
We have addressed the following types of privacy in UbiPriSEQ design.

Location Privacy
Service providers, eavesdroppers, and other attackers can estimate the location of the users with ease via analyzing the offloaded data. Specifically, the user location can be inferred in several ways. Analysis of offloaded data pattern, including the size and channel state, enables the service providers (fog nodes) as well as eavesdropping attackers to deduce the distance to the mobile user. Moreover, the service providers with multiple fog nodes or attackers can use techniques like triangulation to estimate the exact location of the user when the data is offloaded to multiple fog nodes. Hence, the offloading strategy should grantee location privacy.

Usage Pattern Privacy
This is a critical issue missing in the majority of existing research [44]. The fog nodes or attackers can estimate the usage pattern of the user by analyzing the number of tasks generated and the size of each task. As the mobile devices offload all tasks (including queued tasks and newly generated tasks) to fog node to enhance the computational latency when in good channel radio state, the personal information about the mobile device's usage pattern can be obtained by observation. Task offloading history and other related network statistics of the mobile user can be analyzed by the fog nodes to infer and identify the user in the network. This enables the fog node as well as eavesdropper to deduce the specific applications and usage times of a given user.

Data Privacy
Ensuring the confidentiality and privacy of the stored data at the fog nodes constitutes data privacy. With the advent of 5G networks, a huge volume of data from safety-critical applications such as banks and health care is generated and stored in the fog nodes as well as the clouds. Protecting this data during transmission, computation, and storage requires privacy and security policies.

Security
We have addressed the following critical security challenges in UbiPriSEQ design.

Jamming
Jammers utilize noisy signals to interrupt radio communication between the fog nodes and devices and prevents offloading of computations and data, affecting the network resources including bandwidth, computational and energy resources.

Rogue Fog Nodes and Users
Rogue nodes, devices, or users can masquerade as a genuine entity and deceive others into connecting to it, gaining access to confidential and personal information. They can perform a man in the middle attack, which results in data loss and even loss of device control.

UbiPriSEQ: The Proposed Framework
The UbiPriSEQ framework aims to adaptively, dynamically, and holistically optimize QoS, energy-efficiency, security, and privacy of 5G IoT HetNet-based environments. We describe here UbiPriSEQ, its architecture (Section 4.1), its brain (Section 4.2), and the two functional modules (Sections 4.3 and 4.4).

UbiPriSEQ: Architectural Overview
UbiPriSEQ is built on a three-layered model of ubiquitous environments comprising IoT Devices Layer (dLayer), FRAN Layer (fLayer), and Core Layer (cLayer). Figure 1 provides its overview, and Figure 2 accentuates its two functional components, Device Module (DevM) and Network Module (NetM).
dLayer [17,[25][26][27][28] comprises devices (smartphones and other smart "things" in IoT) that produce or consume data or take certain actions. fLayer consists of a macro base station and several micro base stations along with several fog nodes. FRAN (Fog Radio Access Networks) is a paradigm for 5G networks to provide high spectral and energy efficiency [23]. The idea behind FRAN is to take full advantage of local radio signal processing, cooperative radio resource management, and distributed storing capabilities in edge devices, which can decrease the heavy burden on fronthaul and avoid large-scale radio signal processing in the centralized Baseband Unit (BBU) pool [24]. The fog nodes near the micro base stations can locally store and process the data. Both the fog nodes and the micro base stations have caches; however, the data stored in them are not the same. Fog caches have more locally relevant data than the micro base station caches. cLayer [22] consists of the BBU pool connected through the wider Internet with clouds and HPC clusters, dealing with larger, delay-tolerant, applications that require large compute and storage resources. DevM executes in the dLayer on all devices and NetM executes in both fLayer (not in all fog nodes, rather centralized) and cLayer (see Figures 1 and 2). The two modules continuously learn from each other and adaptively optimize the ubiquitous environment.

UbiPriSEQ: Adaptive Learning Architecture
Deep Reinforcement Learning (DRL) is UbiPriSEQ's brain. Reinforcement Learning (RL) enables agents to take optimal decisions through observations and feedback received from the system in terms of positive or negative rewards [57]. Q-Learning is an RL technique in which the agent tries to estimate the Q value-function iteratively, based on the reward obtained from an action, which indicates the optimality of the action in a given state of the system. Many techniques exist to find the Q-value.
We use here Deep Q-Network (DQN), a DRL technique that uses a Deep Learning Network for estimating the Q-values. DQNs leverage Deep Neural Networks (DNNs) to train the learning process by the involved agents thereby improving the learning speed and the performance of Q-Learning algorithms. We discuss the DQN in detail, later in Section 4.3.5.
In UbiPriSEQ, a learning agent could be an IoT device, or edge or fog device, situated near the remote radio head at the micro base stations. An action corresponds to various privacy policies, security policies, defending mechanisms against attacks, and offloading policies. A state is the state of the agent, other devices, fog nodes, various attacks, and privacy characteristics observed by an agent. A search for the best Q-function terminates when an optimal strategy is found. The fog and other devices need to explore and exploit various actions to discover the best action providing an optimal reward. Sufficient interaction with various privacy leaking states and various attack states enable it to reach the optimum action for each state.

Device Module (DevM)
The main functions of Device Module (DevM) are to offload computations to the fog or cloud, or run locally, while managing privacy, priority, energy-efficiency, and QoS. The algorithm for Device Module is given in Algorithm 1. The explanation of Device Module and its algorithm follows in Section 4.3.1 to Section 4.3.5.

Algorithm 1 Device Module (DevM)
Input: system state s t Output: action a t = [a l t , a t t , a b t , f ] 1: Set Q(s t , a t ) with random weights θ 0 2: while ¬converged do 3: observe d t , b t , and l t if t ≤ K then 8: Select a t ∈ A at random 9: else 10: Form experience sequence 11: Φ t = ((s t−K , a t−K ), · · · , (s t−1 , a t−1 ), s t ) 12: Input Φ t to the DNN 13: [Q(Φ t , a t )] ← get Q-value for A using DNN 14: ← random(0,1) 15: take action a t with probability t

16:
a t (a l t , a t t , a b t , f ) ← a t

17:
end if 18: locally compute a l t ; keep a b t in buffer; and migrate a t t tasks to fog node f 19: obtain the latency t t 21: for x ∈ DNN mini-batch do 24: Select Φ t ∈ P at random 25: Calculate DNN loss as per Equation (6) 26: end for 27: Update DNN weights θ t using gradient descent 28: t ← t + 1 29: end while

DevM: Workflow
DevM observes the current channel state and task buffer, and takes a set of actions (i.e.,which tasks to process locally, and which ones to be offloaded, to which fog/cloud node, etc.) based on the current Q-value (obtained from the training phase) or at random with a small probability (see Section 4.3.5). A reward (also known as utility) is associated with each set of actions taken. The utility could be positive or negative, depending on whether DevM is able to improve the functional parameters, that is, privacy, QoS, and energy-efficiency. Subsequently, DevM computes the new Q-value using DQN, and based on it decides the set of actions. This process is repeated iteratively until an optimal Q-value is found. Figure 2 in the left block depicts DevM's process flow.

DevM: PrivacyMetric
The PrivacyMetric Pri d (s t , a t ) for the state s t and action a t at time t is defined in Equation (1). The variable g t is the number of tasks generated, o t is the number of tasks to be transmitted, w l (PrivacyWeight) is the weight that determines the relative importance of the location privacy compared to the usage-pattern privacy, and h t indicates the channel state. The channel state, h t , is modeled as a Markov process with two states {0, 1}. The variable w l takes the value w, ∀w ∈ R + (the required level of usage pattern privacy), when the channel state is "bad", otherwise it is zero for "good" state (see Reference [44]). Simply put, under 'good' channel state, PrivacyMetric is the absolute difference of all the computational tasks produced by a device and the tasks offloaded to fog or cloud.
DevM maintains usage pattern privacy and breaks any patterns in PrivacyMetric value over time by adding PrivacyWeight and intentionally delaying offloading to avoid attackers detecting a usage pattern. DevM is able to hide usage pattern and hence attackers are unable to detect the device's location which could have been compromised due to the device's connection with a known or rogue node or base station (using distance or trilateration). DevMPrivacyMetric estimates the usage pattern privacy and location privacy.

DevM: QoS and Energy-Efficiency
DevM provides higher QoS by selecting the fog or cloud nodes that have SINR higher than a threshold. Moreover, it improves QoS further by utilizing latency constraints of tasks in setting task priorities, and offloading lower-priority tasks to fog or cloud only if needed. Energy-efficiency is improved by DevM by attempting to execute tasks locally if possible, otherwise on fog, and then on cloud.

DevM: Reward Function (Utility)
DevM in each time t observes various parameters from the device and network (newly generated tasks by the device, SINR, etc.), and stores them as the state in time t. This state is used to compute the utility or the rewards (ud d ), which is a weighted function given by Equation (2). The reward/utility function provides a higher reward when the task offloading meets the objectives (privacy, latency, QoS) and a lower reward when an offloading is far from our objectives. It is computed by adding PrivacyMetric (given by Equation (1)) and SINR (sinr t ), and subtracting tasks' computational costs (c t ), tasks' communication and queuing delays (b t ), and the loss to the device due to the tasks' execution delays (ρ). All the parameters are weighted (µ, ω, η) based on their relative importance. DevM tries to achieve a higher utility, so parameters such as SINR are added while others such as costs are subtracted. DevM adds each utility for time t, and its associated action (offload policy) to an Experience Sequence (i.e., the accumulated knowledge). (2)

DevM: Q-Value
DevM comprises a DQN that consists of an input layer, two hidden convolutional layers, a rectifier linear unit layer, and two fully connected output layer. The weights in the CNN are updated using gradient descent optimization [58] based on the previous attack experiences.
The state transits from the state s t to s t+1 when the agents offload data to the selected fog node at state s t . Based on different offloading experiences denoted by (s t , a t , ud (t) d , s t+1 ), the Q-function update of the state-action pair, using general RL based technique can be computed as where α ∈ (0, 1] is the learning rate denoting the weight of the current experience and γ ∈ [0, 1] denotes the view of the agents regrading the future rewards. However, the DQN computes the Q-value for each time t by learning from the replay memory (Experience Sequence). The experience sequence (replay memory) Φ t consists of the current system state s t and the previous K state-action pairs, given by: The experience sequence Φ t is fed as input to the first convolutional layer in DQN. The output of the first convolutional layer is then passed through a rectified linear unit (ReLU) as an activation function. The output of ReLU is passed through the second convolutional layer. The outputs from second convolutional layer is then passed to fully connected layers to obtain the Q-value for all the possible actions |A| (offloading strategies) at current sequence Φ t . Subsequently, in time t, DevM takes an action for offloading based on the Q-value with a probability 1 − , or takes a random action with a minute probability (to explore various rewards and reach global optimum reward).
Once the results are obtained from the offloaded computation the agents update their reward using Equation (2) and the offloading experience (Φ t , a t , ud (t) d , Φ t+1 ) is stored in the memory pool denoted by P. As per the experience replay technique, the agents randomly select an experience from the pool to update the CNN weight θ t . The mean squared error is used to calculate the loss between the networks output and the optimal Q-value. The Q-function for each state-action pair in a DQN is given below: where Φ t is the next state sequence based on the selection of the offloading policy a t at state Φ t and θ t is the weight of DNN at time t. Furthermore, every T time steps, the Q-network is duplicated to obtain a target network to obtain the target Q-values so that updated DNN weights can be obtained. The target optimal Q-function Q opt (Φ t , a t , θ t ) is given as follows: and hence the loss is computed as follows [59]: This explains the loss calculation and weight updating in DQN as well as Q-value compuations using DQN.

Network Module (NetM)
The data sent to the fog nodes for task processing and other cached data in fog needs to be protected against privacy leakages and eavesdroppers. Network Module (NetM) devises policies for data migration between fog nodes to preserve privacy and broadcasts dummy signals to confuse eavesdroppers to address privacy and security concerns. The algorithm for Network Module is given in Algorithm 2 and is explained in the following. s t ← {c t , h t , ζ t , sinr t , δ} 6: if t ≤ K then 7: Select a t ∈ A at random 8: else 9: Form experience sequence 10: Φ t = ((s t−K , a t−K ), · · · , (s t−1 , a t−1 ), s t ) 11: Input Φ t to the DNN 12: [Q(s t , a t )] ← get Q-value for A using DNN 13: ← random(0,1) end if 17: locally store a cl t ; transmit a c f t to fog node f migrate a cc t from fog to cloud node c send dummy signal with frequency f

18:
Pri n (s t , a t ) ← u f n ← Pri n (s t , a t ) − µ · ζ t · t t − ρ + η · sinr t 21: for x ∈ DNN mini-batch do 23: Select Φ t ∈ P at random 24: Calculate DNN loss using Equation (6) 25: end for 26: Update DNN weights θ t using gradient descent 27: t ← t + 1 28: end while NetM workflow is the same as of DevM (see Section 4.3.1), i.e., it observes its environment and takes a set of actions (see Figure 2). The difference lies in the specific observations and set of actions it takes, the way it calculates the PrivacyMetric and Utility, and the parameters it tries to improve. The observations include cache content, channel state, task priority, and SINR. The actions include which data to keep, and which ones to be offloaded, to which fog/cloud node, and which basestations the dummy signal is broadcasted to. The PrivacyMetric used by NetM, Pri n (s t , a t ), defined in Equations (8), is based on differential privacy, i.e., the difference in hamming distance between two data blocks (from two arbitrary devices) present at a fog node should not exceed a certain threshold (i.e., required privacy) because a higher difference could allow attackers to identify types and content of data, and the device. NetM PrivacyMetric estimates the data privacy unlike the PrivacyMetric in DevM.
K is a privacy operation (Laplace mechanism) that operates on two data sets D i and D j received at a fog node from two arbitrary devices, i and j. M is the local memory of the fog node and di f f is the privacy parameter in differential privacy. Pri n (s t , a t ) can be considered as the lower bound differential probability parameter exp( di f f ) [42].
The utility function (reward) ud n is computed by Equation (9). Pri n (s t , a t ) represents the data privacy metric for NetM. c t is the tasks' aggregated computational costs and sinr t is SINR at time t; µ and η are the weights giving their relative importance, respectively. ζ t is the priority of the task and ρ is the loss to the device when the computations are not performed in the given time threshold.

UbiPriSEQ: System Evaluation
We now evaluate the performance of UbiPriSEQ using SpMV computations (see Section 1). We have implemented the UbiPriSEQ framework in Python over the TensorFlow platform. The UbiPriSEQ system is evaluated using a real-life application of prime significance, Sparse Matrix-Vector product (SpMV). SpMV computations are fundamental to many scientific, engineering, business, and social applications that provide timely intelligence for the design, operations, and management of ubiquitous environments [60][61][62][63][64][65][66][67]. SpMV is also utilized in newer compressed Deep Neural Networks with pruned layers for computations [68]. The sparse matrices used for our experiments are from the University of Florida Sparse Matrix Collection consisting of a varied number of applications such as thermal, economics, optimization, and structural design (https://sparse.tamu.edu/). We have studied the performance of the UbiPriSEQ system for a range of SpMV computations and metrics including SINR, privacy metric, latency, and utility function and have compared it with constrained Markov Decision Process (CMDP) [44].
The experiments were run on a machine with 12 core Intel Xeon CPU with clock frequency of 3GHz, 32 GB RAM, and an Nvidia Quadro GPU. The edge nodes based on Jetson Nano has a Quad-core ARM A57 CPU with 1.43 GHz frequency, 4 GB RAM, and Ethernet connectivity. The storage of fog nodes is using microSD. The experiments were performed based on one-second time slots. The number of devices in IoT Devices Layer to the number of fog nodes is 1:3. The for the greedy selection in DRL is x t , where x is a real number in the range (0, 1), and t is the time [57]. The default network parameters used in the experiments have been depicted in Table 1. The jamming attacks are modeled as interfering signals/noise in the experiment. Some devices and fog nodes are modeled as "rogue" and UbiPriSEQ detects and defend itself from these. x t , ∀x ∈ (0, 1), t ∈ R  Figure 3a compares provided privacy levels and shows that UbiPriSEQ outperforms CMDP by 25% on average (computed by the average difference between the two privacy levels). UbiPriSEQ achieves a higher-level of privacy at a faster rate due to the data migration policies not seen in Reference [44]. We achieve a higher QoS due to better defenses against the attacks. Figure 3b plots SINR and shows an average improvement in the network QoS by 22% in the presence of jamming signals and eavesdroppers. Figure 3c shows that UbiPriSEQ minimizes the average latency at a faster rate then CMDP with the maximum difference against CMDP of 74% 2.19 × 10 5 seconds (lower difference in latency is seen later after the systems stabilize). Figure 3d shows better average utility for UbiPriSEQ compared to CMDP, 33% on average.     These results illustrate that UbiPriSEQ was able to defend and recover from security issues in the network. UbiPriSEQ is successful in selecting the best offload and migration policies that ensure improved privacy, security, and QoS including latency, energy-efficiency, and reliability. The dynamic learning process in both the modules enables both the fog devices as well as the edge devices to optimize the offloading strategy while preserving user privacy and security. This multi-agent learning enables the devices to learn and decide the offloading decisions based on the strategies taken by the network module to provide security. Similarly, at the same time, the network module learns from the effects of the strategies taken by the device module to provide a better security (jamming, noise, etc.) strategy while maintaining the required privacy. This process of learning by modules from each other provides theUbiPriSEQ framework a better performance compared to the CMDP-based approach [44]. Moreover, as we have mentioned earlier, the CMDP-based approach [44] has only focussed on privacy, while we propose a holistic framework to optimise multiple design goals, and this differentiates our work from all the state-of-the-art relevant approaches.

Conclusions and Future Work
5G networks with their significantly higher speeds, lower-latency, reliability, capacity, and other benefits offer a powerful communication platform for ubiquitous environments. The complexity of today's ubiquitous environments have become increasingly complex due to the emergence of collaborative edge, fog, and cloud paradigms. However, privacy and security of users and data will pose paramount challenges in the adoption of these ubiquitous environments by governments, industry, and common people. A particular challenge in this context would be to preserve privacy and security while delivering quality of service (QoS) and energy-efficiency. Several works have tried to address one or more of these challenges (QoS, energy efficiency, security, and privacy) in distributed environments.
Despite these efforts, a number of challenges exist. Firstly, the focus of the existing approaches is on improving or optimizing one or two of the design parameters; QoS, energy efficiency, security or privacy. Optimizing one or more parameters without measuring and counteracting their impact on other parameters is no more acceptable in the emerging ubiquitous environments. Secondly, existing approaches assume a fixed network configuration and attack models, and use fixed strategies in devising solutions. The ultimate aim is to optimally migrate and place data, analytics, and other computations in these environments such that security, privacy, energy efficiency and QoS demands of these applications and the environments are best satisfied [8,[70][71][72][73].
In this article, we proposed UbiPriSEQ that uses DRL to adaptively, dynamically, and holistically optimize QoS, energy-efficiency, security, and privacy. The Framework is built on a three-layered model and comprises two modules. UbiPriSEQ devises policies and makes decisions related to a number of important parameters including local processing and offloading rates for data and computations, radio channel states, transmit power, task priority, selection of fog nodes for offloading, and data migration. We evaluate UbiPriSEQ and compare it with the CMDP-based approach [44]. The results of our modular approach show clear performance gain over the CMDP-based approach as well as addresses other challenges that the CMDP-based approach does not address (e.g., security).
Our UbiPriSEQ framework is novel for several reasons. In contrast to the existing state-of-the-art, we attempt to holistically optimize QoS, energy efficiency, security, and privacy. Also, UbiPriSEQ is designed using a modular approach. It adaptively and dynamically deals with the environment by continuous learning, and does not need a priory complete specification of the environment, security attack models, and other design parameters. Moreover, DRL has not been used before in 5G networks to provide adaptive intelligence. Finally, most existing works use simulated data for performance evaluation while we have used SpMV, a real-life application to study UbiPriSEQ performance, and this has not been reported before. The SpMV operation is central to deep learning algorithms and this per se would create a significant impact.
We would like to note here that while the experimental results reported in this paper show that UbiPriSEQ outperforms CMDP [44], the excellence of one scheme over the other cannot be proved by a set of simulated experiments. Although we have given some valid justifications for UbiPriSEQ outperforming CMDP (see the last paragraph in Section 5), this is not a mathematical and absolute proof that UbiPriSEQ is better than CMDP. Many deeper investigations and actual deployments of the proposed work are needed to understand and improve the proposed approach. Nevertheless, we believe that UbiPriSEQ does provide overall a better approach for network design and operations due to its holistic and modular approach.
Going forward, we aim to explore integrating blockchains with UbiPriSEQ, which would enable a highly secure and private network for data and task offloading. We also aim to improve privacy metrics to improve better integration of various types of privacy. While UbiPriSEQ proposed in this paper provides a first step towards holistic optimization of 5G-based ubiquitous environments, this is just the beginning of an exciting quest; much more needs to be done in perfecting the various dimensions of UbiPriSEQ system.