1. Introduction
The promising big data applications based on IoT produced so much data [
1,
2], and thus it is impractical to transfer all these data to the data center for processing in real time. To address these challenges, the fog computing and edge computing are proposed in recent years as the distributed cloud computing solution for IoT applications. Due to the limited computation and communication capability of IoT end devices, some extensive computing models should be provided for processing large amount of IoT data. Fog computing is one of promising technologies that provide computation and communication services to IoT applications [
3,
4]. The concept of fog computing is similar to the edge computing [
5,
6,
7,
8,
9]. Both of them are devoted to provide computation and communication resources for IoT users in the proximate area of IoT devices. According to the definition of European Telecommunications Standards Institute (ETSI), Multi-access Edge Computing (MEC) is one of the key technologies towards 5G and characterized by several merits such as ultra-low latency, high bandwidth, location awareness, etc. [
5]. The edge computing nodes (MEC nodes) are deployed at many locations with access points, such as at the macro LTE base stations, at a multi-Radio Access Technology cell aggregation site, etc. [
5]. On the other side, the fog computing nodes are not necessarily deployed around these access points. To distinguish these two concepts, in this paper, nodes with sufficient communication and computing resources scattered among IoT devices are referred to as
fog nodes. The nano-server clusters deployed at the access points are referred to as
MEC nodes, which usually have more resources than fog nodes. The collaborative computing of the IoT-end nodes, the fog nodes, the MEC nodes and the data center are referred to as
IoT-Fog-Edge computing in this context.
The studies on collaboration of Fog, MEC, and Cloud computing usually focus on finding an optimal solution to allocate the IoT tasks to appropriate virtual machines which are hosted on fog, MEC or cloud nodes. In [
10], the authors propose to allocate the workload among local MEC servers, neighborhood MEC servers or cloud servers to minimize the energy consumption of MEC nodes subject to delay constraints. Lyapunov drift-plus-penalty-based dynamic queue evaluation is used for the online allocation algorithm. In [
11], an optimal algorithm is put forward to determine that the tasks should be allocated to clouds near the end devices or to the one far from the end devices for the energy efficient big data processing. The delay constraints to tasks in near clouds and far cloud are taken into account. In [
12], an optimal algorithm for joint task allocation among mobile devices, the computing access point and the remote cloud is proposed, where computing access point can be treated as an MEC node. The studies introduced above are not correlated with the continuous data-flow (CDF) problem, which is the main concern of our work. The CDF is the data-flow continuously generated by IoT end nodes and is to travel through the fog nodes and MEC nodes before it reaches the cloud data center. The optimization to CDF problem is an optimization in multi-stage graph. On the contrary the optimization problem introduced in above studies is one-stage optimization problem. The typical CDF application in IoT-Fog-Edge computing is the E-Health Monitoring System [
13,
14,
15,
16], which will be elaborated as the Motivation Scenario in section II. Some efforts were devoted to the performance measurement and optimization of E-Health system [
13,
14,
15,
16] from the perspective of architecture and deployment optimization. To the best of our knowledge, there is little work required to optimize the performance of E-Health monitoring system from the perspective of mathematical modelling and programming.
To optimize the energy consumption of IoT devices while subject to the latency constraints, the anomaly should be discovered because the anomalous nodes would cause abnormal latencies and job loss or task failure on transmission paths [
17,
18]. As a result, additional retries and retransmissions will occur, resulting in increased energy consumption. Anomaly detection is a conventional method in the wireless sensor networks (WSN) [
19,
20,
21,
22], which is organized using the ad hoc way. The ad hoc mechanism makes the whole WSN system vulnerable to the intrusion of malicious nodes, thus the anomaly detection is carried out for finding the anomaly. The deficiency of anomaly detection in WSN is that many messages may be generated and exchanged in the networks. On the other hand, the network topology of fog and MEC computing is relatively stable and the nodes have more computation resources. Some systematic security and safe mechanism could be adopted in the IoT fog/edge computing, such as the block chain [
4], intrusion detection system (IDS) [
23], trust management scheme [
24], and action-oriented programming model [
9], etc. Although these security and safe mechanisms are capable of handling the most of malicious attacks, some anomalies still exist, such as the anomalies caused by hardware/software errors of fog/MEC nodes, the anomalies caused by network congestion and jitter, or the attacks which are hard to be defended such as the Denial of Service (DoS) attack [
23]. To carry out the latency awareness for the CDF problem, we put forward a lightweight anomaly detection strategy. This strategy only makes use of the cumulative historical latency data of fog/MEC nodes to discovery the anomalous nodes. The results of the anomaly detection will be fed into the proposed optimization algorithm for latencies evaluation.
Many researchers modeled the energy consumption optimization problem in edge and fog computing as a mixed integer nonlinear programming problem (MINLP) [
25]. This is a kind of problem for which it is difficult to find the optimal solution. Many researchers use the block coordinate descent (BCD) method to solve this kind of problem [
26,
27]. In these research works, the BCD method showed good performance and can quickly find the solution of complex problems. In this paper, we also use the framework of BCD method to transform the original problem into the minimum cost maximum flow sub-problem and the power control sub-problem. The proposed method is called BCDM algorithm in this paper.
In this work, we put forward a latency-aware energy-efficient IoT-Fog-Edge Computing(IFEC) strategy for Continuous Data-Flow (CDF) services. The main contributions of this work are:
We developed a formal model for energy-efficient CDF optimization. This is a model that is composed of four level entities in IFEC computing. This model is used to formulate an optimal problem that minimizes the energy consumption subject to the latency constraints and with the anomalous fog or MEC nodes existing in systems.
We proposed a novel lightweight anomalous nodes detection strategy for latency-aware CDF optimization.
We designed a block coordinate descend-based max-flow algorithm to solve latency-aware energy-efficient CDF problem iteratively.
The performance of proposed model and algorithm was evaluated by simulations based on real-life datasets.
The remainder of this paper is organized as follows: In
Section 2, the motivation scenarios in E-Health system are elaborated. In
Section 3 we present the system model and problem formulation. In
Section 4, we put forward the proposed solutions for the CDF problem. The numerical simulation based on real-life datasets are presented in
Section 5 and we draw conclusions in
Section 6. The acronyms used in this paper are listed in
Table 1.
2. Motivation Scenario
Thanks to the rapid progress of wireless communications and wearable devices, the E-Health Monitoring (EHM) System has become a paradigm of IoT applications [
13,
14,
15,
16]. A typical EHM system is composed of an IoT sensing subsystem, networking subsystem, cloud data processing and storage subsystem. In [
13], the EHM system is divided into four parts, which are wearable devices, Machine-to-Machine (M2M) gateway, Network Service Capability Layer (NSCL), Data processor and openEHR services. Although the authors did not present the IoT-Fog-Edge computing, it actually can be treated as an CDF scenario because the proposed model has the same characteristics as CDF. The M2M gateways continuously collect data from sensors (such as the heart rate, blood pressure, blood oxygen saturation, etc.) and send the data periodically to data processors in the virtual machines hosted at a cloud provider. The data flow will go across the M2M gateways, NSCL and reach the cloud at last. In [
14,
15,
16], the IoT, fog, edge and cloud integrated architecture for EHM system was clearly illustrated. The performance, including the latency, availability, and the potential challenges in EHM systems, were addressed in [
13,
14,
15] respectively. A common characteristic in the EHM system is that the data flow starting from the IoT devices will travel across the fog layer and edge layer before it reaches the cloud data center. The network latency and availability are mainly influenced by the devices in networking subsystems such as the fog and MEC nodes. We generalized the system model of EHM system to the continuous data-flow IoT system. In our model, each IoT device has its specific domain tasks, such as the EHM tasks. The data flow generated by different IoT devices can go through same or different network paths. Our aim is to minimize the global energy consumption of multiple IoT devices while subject to the latency constraints at each IFEC level.
3. System Model
In CDF application scenarios, nodes in the networks could be categorized into following four levels according to their computation capability:
IoT end level: The IoT end devices belongs to this level, such as sensors, RFID, etc. which are used to collect the raw data in IoT.
Fog level: The fog nodes with very limited communication and computation capability belong to this level, such as the IoT gateway.
MEC level: The server clusters with limited computation capability belong to this level, which are deployed at places in proximity to access points of mobile networks, such as at the macro LTE base stations, at a multi-Radio Access Technology cell aggregation site, etc. [
5].
Cloud level: There are only data centers at this level.
As previously mentioned, in the continuous data flow service scenarios, the IoT end devices should send the data to Fog nodes or MEC servers for preprocessing before these data are sent to data centers. A client software should be installed on end devices to communicate with the Fog nodes and MEC servers to receive the application services. These applications have domain-specific tasks that are offloaded to Fog nodes or MEC servers to execute all kinds of complicated computation, e.g., intelligent video acceleration, augmented reality (AR), etc. The computation resources are actually virtual machines that are deployed specifically for users’ applications. The VMs are referred to as
Proxy VMs [
28].
Without loss of generality, we assume the data traffic starting at the end devices should go through the fog nodes to the MEC servers and reach the data center at last. In the case that the end device does not need to offload data to the MEC server, but directly sends data to the data center by the fog node, the MEC node can be merely treated as an access point to the core networks. Our aim is to minimize the energy consumption of end devices while subject to the latency constraint in each level.
The system framework can be formulated as a tuple
$(UE,FN,EC,DC)$.
$UE$ is a collection with
I IoT end devices.
$UE=\{u{e}_{i},i=1,2,\dots ,I\}$.
$FN$ is a collection with
J fog nodes,
$FN=\{f{n}_{j},j=1,2,\dots ,J\}$.
$EC$ is a collection with
M MEC nodes,
$EC=\{e{c}_{m},m=1,2,\dots ,M\}$.
$DC$ is the data center. Each
$u{e}_{i}$ can be expressed as a triple tube:
$u{e}_{i}=\{{\lambda}_{i},{P}_{i},{\tau}_{i}\}$, where
${\lambda}_{i}$ is the arrival rate of tasks of
$u{e}_{i}$ in a time unit. We assume tasks arrive at
$u{e}_{i}$ according to the poisson distribution. The value of arrival rates of Poisson distribution can be estimated by fitting method [
29].
${P}_{i}$ is the wireless transmission power of
$u{e}_{i}$ and
${\tau}_{i}$ is the maximum acceptable latency for finishing a task to ensure the quality of services(QoS).
$f{n}_{j}$ and
$e{c}_{m}$ can be expressed as tubes
$f{n}_{j}=\{{\mu}_{j},p{r}_{j}\}$ and
$e{c}_{m}=\{{\mu}_{m},p{r}_{m}\}$ respectively, where
$\mu $ is the service rate or processing capability, i.e. the expected number of tasks which can be processed in a time unit. We assume service time of a fog node or an MEC node follows exponential distribution with mean
$\frac{1}{\mu}$.
$pr$ is the reliability that a task can be executed successfully.
3.1. Anomalous Nodes Discovery and Confidence Evaluation
Latency variation in IoT may be caused by network congestion or jitter. The slight variation would not change the regularity of the statistic features of network latency. On the contrary, the software or hardware errors, or attacks from the malicious users, may cause the anomalies of fog nodes and MEC nodes. Thus, the network performance of these nodes may become unstable and unpredictable. To guarantee the QoS of data flow services, an optimal solution for CDF problem should bypass these anomalous nodes. We put forward a lightweight anomalous nodes discovery strategy. This strategy is based on the observation that the anomalous nodes will exhibit some anomalous behaviors and deviate from the statistic regularity of normal nodes.
We give following definition of anomalies according to the definition of Das et al. [
30].
Definition 1. In our paper, anomalies are defined as any observations of latencies that are different from the normal behavior of the latency data.
3.1.1. Chi-Square Test and Similarity Measurement
${\chi}^{2}$ test is a test of the goodness-of-fit [
31]. It is used to test the null hypothesis that the observed data comes from a specific distribution. Given the sample size is large enough, we have following null Hypothesis 1:
Hypothesis 1 (H1). The latency value of the observed fog or MEC node comes from the normal distribution.
This hypothesis derives from the research on latency characteristics of mobile IoT [
13]. If a fog node or MEC node is disrupted, its statistic feature of latency should be different with the normal distribution. Let
${p}_{i}(t)$ be the p-value got from
tth round
${\chi}^{2}$ test against latency sample
${\ell}_{i}(t)$ of
ith node.
${\ell}_{i}(t)=({s}_{i}(0),{s}_{i}(1),\dots ,{s}_{i}(t))$ is a sample vector and
${s}_{i}(t)$ is the
tth latency sample of node
i.
${\phi}_{i}(t)=({p}_{i}(0),{p}_{i}(1),\dots ,{p}_{i}(t))$ is the vector of p-value. The Cosine similarity of latency distribution of two nodes is defined as follows:
Let the node
j be a normal node which is selected as a
guard by the service provider of data flow services. If the node
i is also a normal node, it should show similar behaviors with
i from the perspective of latencies. The similarity can be measured by the cosine function
$sim(i,j)$. Those nodes of which behaviors deviate from the baseline behaviors can be treated as anomalous nodes. The concept
guard node is used in the anomaly detection and monitoring in wireless sensor networks (WSN) [
18,
32]. Guard node is a kind of node used for anomaly detection in wireless sensor networks (WSN). The guard node is between the sending node and the receiving node, which can be used for normal communication and monitoring. The selection of guard nodes is based on the location and trustworthiness of the nodes [
18,
33,
34]. Because the continuous data flow network in this paper is not a WSN-like mobile ad-hoc network, so when selecting guard nodes, we do not need to consider the location factor, but only need to select according to the trustworthiness of the nodes. Including the architecture of nodes, security level and other trust related attributes of nodes can be used as a measurement of trustworthiness. The guard nodes in WSN should be responsible for monitoring and detecting the anomalies in addition to serving as the baseline. Nevertheless, the guard nodes in our work merely are treated as the baseline. The issues on monitoring and detecting malicious nodes are out of scopes of this work.
3.1.2. F-Test
In some cases, the behaviors of a node may conform to normal distribution but the node does not have the same parameters of distributions as the guard node. An F-test is used to test whether two samples come from the normal distribution with the same variance. In this context, the variance of the latency vector of a node is compared to the one of the guard node by F-test. If the p-value of the test is less than 0.05, we reject the null Hypothesis 2.
Hypothesis 2 (H2). The latencies of the observed fog or MEC node and of the guard node come from the normal distributions with the same variance.
3.1.3. Put It All Together
According to the results from
Section 3.1.1 and
Section 3.1.2, we can calculate the reliability value. The real value corresponding to different reliability level depends on the application in practice. For example, 0.9 can be treated as a high value of reliability for most business applications. However, applications in finance and banking industry require more than 0.99 reliability. A simple way to estimate the real value is to calculate it by the value of the guard node
j as follows:
where
$\lfloor x\rfloor $ is to get the largest integer value less than/equal to
x.
$IND(x)=1$ when
$x\ge 1$ and
$IND(x)=x$ otherwise.
${F}_{t}(i,j)$ is the F-test for node
i and
j. When the number of latency samples
n approaches
$+\infty $, the degree of freedom also approaches
$+\infty $, and
${F}_{t}(i,j)$ should approach 1 given the node
i is as confident as guard node
j. It should be noted that our aim is not to get the accurate value of reliability of a node. Our aim is to find the anomalous nodes and bypass these nodes in the optimal solution of the CDF problem given there are alternative normal nodes.
Figure 1 is used to illustrate the execution process of proposed anomaly detection algorithm and its position in the whole optimization model.
3.2. Tandem Queue Model
We put forward the tandem queue model depicted in
Figure 2 for investigating the execution latency of tasks.
3.2.1. Latency in IoT End Level
In the IoT end level, tasks arrive at end equipment
$u{e}_{i}$ according to the poisson distribution with arrival rate
${\lambda}_{i}$. Without loss of generality, we assume the data size of each task is equal to
d. The transmission latency at this level can be expressed as following equations.
where
B is the wireless channel bandwidth.
${P}_{i}$ is the transmission power.
${\sigma}^{2}$ is the variance of complex white Gaussian channel noise.
${H}_{i}$ is the average channel gain.
${r}_{i}$ is the average transmission data rate.
$\frac{{r}_{i}}{d}$ is the number of tasks which can be transmitted in a time unit through the wireless channel. The transmission queue of each end equipment is an M/M/1 queue. To guarantee the queue stability, following constraint must be satisfied:
3.2.2. Latency in Fog and MEC Level
Same as the model at the IoT end level, the latency in fog level and MEC level can also be expressed as the queue model as follows:
where
${x}_{ij}$ is the number of tasks sent from the
$u{e}_{i}$ to the
$f{n}_{j}$.
$\frac{{x}_{ij}d}{{r}_{i}}$ is the transmission duration that the
$u{e}_{i}$ transmits
${x}_{ij}d$ units data to
$f{n}_{j}$.
The $p{r}_{j}$ and $p{r}_{m}$ are the evaluated reliability that a task can be executed successfully by a node. As previously mentioned, we measure the confidence of the nodes in our model and map the confidence to the reliability. According to the binomial distribution, the expectation of the number of retrying for one successful execution is $\frac{1}{pr}$.
3.3. Problem Formulation
The latency-aware energy-efficient CDF problem can be formulated as following expression:
where
${E}_{i}=\frac{{P}_{i}}{{r}_{i}}d$. We assume that an IoT node can only be connected to one access point in a duration. Thus, transmission power should be same for the same IoT node. So is the transmission rate. The problem
MP is a mixed-integer-non-linear programming problem (MINLP), which is an NP-hard problem [
35]. There is one multi-dimension combinatorial decision variable
x and one continuous multi-dimension decision variable
P. There is no generic optimal algorithm to solve this kind of problem in polynomial time complexity. In following section, we put forward two approximate algorithms, the block coordinate descent based multi-flow algorithm (BCDM) and Best-effort algorithm.
6. Conclusions
In this work, we put forward a latency-aware energy-efficient continuous data-flow optimization strategy. This strategy is designed for continuous data flow applications in IoT-fog-edge computing scenarios. The most typical application of continuous data-flow is E-health Monitoring System. We made use of a novel lightweight anomaly detection strategy to get the confidence of the fog and MEC nodes. We used the confidence as the metric to evaluate the reliability of each nodes and use it to estimate the latencies in the energy-efficient continuous data-flow problem with latency constraints. We established a formal model and solved the problem using the block coordinate descend max-flow (BCDM) algorithm. The real-life datasets were used in the numerical study to verify the performance of the proposed strategies. Numerical results showed that the proposed strategies have good performance in all simulations.
In this paper, we only consider the latency property of data flow service. However, some other network attributes will also have a great impact on the overall performance of the system, such as frequency of messages, message rates, size, etc. We will combine these attributes with latencies for measuring the system performance in our future work. Although the main motivation scenario of the continuous data flow problem in this paper is E-health monitoring system, the continuous data flow problem can also be extended to more application scenarios, such as mobile social network [
40], intelligent industrial monitoring system [
41], etc. We will further consider the location of fog nodes and MEC nodes in continuous data flow problem [
42]. Furthermore, we will conduct more simulations in an event-driven simulator [
43], such as the YAFS [
44], in our future work.