Application of Wireless Sensor Network Based on Hierarchical Edge Computing Structure in Rapid Response System

: This paper presents a rapid response system architecture for the distributed management of warehouses in logistics by applying the concept of tiered edge computing. A tiered edge node architecture is proposed for the system to process computing tasks of di ﬀ erent complexity, and a corresponding rapid response algorithm is introduced. The paper emphasizes the classiﬁcation of abstracted outlier sensing data which could better match di ﬀ erent sensing types and transplant to various application ﬁelds. A software-deﬁned simulation is used to evaluate the system performance on response time and response accuracy, from which it can be concluded that common predeﬁned emergency cases can be detected and responded to, rapidly.


Introduction
Wireless Sensor Networks (WSNs) [1] have been widely applied to transport [2], agriculture [3], smart cities [4] and smart homes [5] domains as the critical environmental sensing infrastructure in Internet of Things (IoT) systems. Tens of thousands of ubiquitous sensors enable Wireless Sensor Networks to continuously capture large amounts of sensed data, which will keep on growing in the coming years. Even though WSNs have a strong sense of the environment with increasingly accurate data capture ability, how to give meaning to these huge amounts of data, and how to use these data intelligently and quickly in various mainstream applications are the challenges of current research [6]. The value of WSNs as the tentacles of IoT systems cannot be realized under inaccurate data analytics and delayed system response. Therefore, the real-time screening and efficient processing of these sensor data have become a hot research direction and challenge.
From the perspective of intelligent logistics, the warehouse is one of the data centers that generate massive interlaced and correlative data. For general warehouse management, there are two application uses of sensed data: one is for cargo management, which includes goods identification (using RFID) and goods tracking (location and movement); the other is for safety management, which refers to

Hierarchical Edge Computing Structure
Compared with classical WSN system architectures, the proposed edge computing-based graded system architecture consists of three core layers as illustrated in Figure 1. As a widely accepted environmental sensing infrastructure, sensor nodes in the WSN collect sensing data and track changes to the environment continuously. For better identification and management, sensor nodes in the WSN are logically separated into different areas. By moving the computation to the edge of the networks, the edge computing layer reduces the response time during communications as well as the required upload bandwidth between the sensor networks and the cloud. Compared to just using the cloud, the edge layer is physically close to the environment. The functionality of the edge layer in the proposed structure is refined into three grades of edge nodes. Grade one and two edge nodes are focused on general functions including data formatting, preliminary data processing for WSN data collection, as well as the execution of tasks and control commands allocated by the upper layer (higher grade edge nodes or the cloud). Grade three edge nodes contribute to more complex data analysis, which involves data that is potentially useful for prediction and control, as well as generating or relaying control commands from the upper layer down to the lower layer. Cloud computing is docked to the cloud layer, which contributes to the centralized analysis of global data and management of the entire network. In addition, the connection between users and the system via the cloud realizes the remote operation and control of all areas covered by the terminal devices. For application developers, the system can be accessed via the cloud or edge node for application deployment depending on the deployment requirements and the network condition.
Electronics 2020, 9, x FOR PEER REVIEW 3 of 13 track changes to the environment continuously. For better identification and management, sensor nodes in the WSN are logically separated into different areas. By moving the computation to the edge of the networks, the edge computing layer reduces the response time during communications as well as the required upload bandwidth between the sensor networks and the cloud. Compared to just using the cloud, the edge layer is physically close to the environment. The functionality of the edge layer in the proposed structure is refined into three grades of edge nodes. Grade one and two edge nodes are focused on general functions including data formatting, preliminary data processing for WSN data collection, as well as the execution of tasks and control commands allocated by the upper layer (higher grade edge nodes or the cloud). Grade three edge nodes contribute to more complex data analysis, which involves data that is potentially useful for prediction and control, as well as generating or relaying control commands from the upper layer down to the lower layer. Cloud computing is docked to the cloud layer, which contributes to the centralized analysis of global data and management of the entire network. In addition, the connection between users and the system via the cloud realizes the remote operation and control of all areas covered by the terminal devices. For application developers, the system can be accessed via the cloud or edge node for application deployment depending on the deployment requirements and the network condition.

Distributed Micro-Database
As stated, grade one and two edge nodes in the edge layer are focused on general functions such as data formatting, preliminary data processing for WSN data collection. A lightweight database (SQLite [17]) component is inserted between the grad-1 and grad-2 edge nodes in the system. The purposes of including a light-weight database are (a) capturing the data effectively, (b) supporting selective data retrieval, (c) filtering data without additional programming, and (d) enhancing data readability and translatability. The interaction between grade one and two edge nodes is illustrated in Figure 2.

Distributed Micro-Database
As stated, grade one and two edge nodes in the edge layer are focused on general functions such as data formatting, preliminary data processing for WSN data collection. A lightweight database (SQLite [17]) component is inserted between the grad-1 and grad-2 edge nodes in the system. The purposes of including a light-weight database are (a) capturing the data effectively, (b) supporting selective data retrieval, (c) filtering data without additional programming, and (d) enhancing data readability and translatability. The interaction between grade one and two edge nodes is illustrated in Figure 2. Compared to some popular databases such as MySQL [18] and PostgreSQL [19], SQLite is a good choice as it has low resource usage suitable for use in embedded products for IoT applications. Due to communication dependencies on the database, the main communication protocols are called via APIs within the program directly. SQLite has outstanding performance, with low-resource consumption, low latency, and overall simplicity of configuration and management.
As a common agent component in the WSN communication environment, message brokers that implement communication protocol would deal with messages from endpoints that generate massive data. The communication between EN-1 and EN-2 shown in Figure 2 is achieved by implementing message brokers on all the edge nodes. Since most of the IoT communication brokers such as Message Queuing Telemetry Transport (MQTT) brokers do not provide any mechanism for logging historical data, a script to log sensed data to SQLite is written based on python benefitting from the multiple programming language adaptation feature of SQLite. The script will log data on a collection of topics, which includes message time, message topic, and message payload. Considering the repeatability of the sensed data, the script only logs changed data from the status sensor, which indicates that if a status sensor sends its status as "ON" once per second, then it could result in 3600 times of "ON" messages logged every hour. However, the script only logs one message. The script uses the main thread to get the data (on message callback) and a worker thread to log the data. A queue is used to move the messages between threads. Once the data is placed in the queue, the worker will take it from the queue and log it into a disk. The worker is started at the beginning of the script.

Abnormal Data Type Abstract
In previous work [20], four abnormal sensor data types were defined for the rapid response system testing in the warehouse scenario, which were (i) rapid-growth, (ii) slow-growth-diffusion, (iii) slow-growth-nondiffusion, and (iv) error data on a single node. The classification was based on the data change rate and considered the diffusivity of the sensed object. However, to loosely couple the outlier detection approach with the specific sensing data type, the abnormal data types are summarized into three abstracts which could improve the portability of the system for adapting to various application scenarios. To support this, an enhanced outlier detection mechanism is also proposed. The following section presents the outlier type abstracts and related detection mechanisms.
Sensing information that is collected by sensors is classified into two forms, (i) numerical data and (ii) status data. In the detection and rapid response system, our monitoring method for numerical data anomalies is mainly to record comparison results, while the detection method for state data anomalies is to record state changes. The detection of numerical data is commonly based on a certain preset threshold, which specifies whether it is an outlier. Two methods are considered for setting a threshold: constant value and prediction formula. The constant value is applied for data with high Compared to some popular databases such as MySQL [18] and PostgreSQL [19], SQLite is a good choice as it has low resource usage suitable for use in embedded products for IoT applications. Due to communication dependencies on the database, the main communication protocols are called via APIs within the program directly. SQLite has outstanding performance, with low-resource consumption, low latency, and overall simplicity of configuration and management.
As a common agent component in the WSN communication environment, message brokers that implement communication protocol would deal with messages from endpoints that generate massive data. The communication between EN-1 and EN-2 shown in Figure 2 is achieved by implementing message brokers on all the edge nodes. Since most of the IoT communication brokers such as Message Queuing Telemetry Transport (MQTT) brokers do not provide any mechanism for logging historical data, a script to log sensed data to SQLite is written based on python benefitting from the multiple programming language adaptation feature of SQLite. The script will log data on a collection of topics, which includes message time, message topic, and message payload. Considering the repeatability of the sensed data, the script only logs changed data from the status sensor, which indicates that if a status sensor sends its status as "ON" once per second, then it could result in 3600 times of "ON" messages logged every hour. However, the script only logs one message. The script uses the main thread to get the data (on message callback) and a worker thread to log the data. A queue is used to move the messages between threads. Once the data is placed in the queue, the worker will take it from the queue and log it into a disk. The worker is started at the beginning of the script.

Abnormal Data Type Abstract
In previous work [20], four abnormal sensor data types were defined for the rapid response system testing in the warehouse scenario, which were (i) rapid-growth, (ii) slow-growth-diffusion, (iii) slow-growth-nondiffusion, and (iv) error data on a single node. The classification was based on the data change rate and considered the diffusivity of the sensed object. However, to loosely couple the outlier detection approach with the specific sensing data type, the abnormal data types are summarized into three abstracts which could improve the portability of the system for adapting to various application scenarios. To support this, an enhanced outlier detection mechanism is also proposed. The following section presents the outlier type abstracts and related detection mechanisms.
Sensing information that is collected by sensors is classified into two forms, (i) numerical data and (ii) status data. In the detection and rapid response system, our monitoring method for numerical data anomalies is mainly to record comparison results, while the detection method for state data anomalies is to record state changes. The detection of numerical data is commonly based on a certain preset threshold, which specifies whether it is an outlier. Two methods are considered for setting a threshold: constant value and prediction formula. The constant value is applied for data with high stability such Electronics 2020, 9, 1176 5 of 12 as temperature data from an incubator. Prediction formula, which requires preset formula or complex prediction calculations, is often used for sensor data with uncertain change trends and large overall fluctuations. Road vehicle flow monitoring sensor data is a typical example. Besides exceeding the threshold, data jitter within the boundary threshold is considered as a type of outlier as well. The core performance of data jitter is the sharp increase/decrease in the dispersion of values, i.e., the degree of deviation of the sensing data from the average value increases under the premise of the constant sampling interval. In the domain of WSN, sensor data has a strong relation with time series. Hence, the processing of sensor data needs to consider both the timing factor and the data volume. Considering the premise that the sensor nodes have limited computing capacity and limited storage space, we simplified the calculation of the real-time standard deviation of data jitter into the calculation of data increments, and set a threshold for the increments, thereby transforming data jitter anomalies into threshold anomalies.
As for the status data, an expect-states pattern, which holds all effective statuses of the sensor data, is used for the outlier detection. One of the typical status sensors is the bistate switch. In an application scenario where the bistate switch is required to be normally 'ON', the expect-states pattern will hold the 'ON' state only, which indicates the 'OFF' state is an outlier.
Based on the three outlier data abstracts presented above, a rapid response strategy for the edge nodes in the system to justify the abnormal types is proposed in the following section.

Rapid Response Strategy
Within a target monitoring area, there are two primary cases in which sensor nodes may generate abnormal sensing data: one is sudden environmental change, the other is error data caused by a broken sensor or irruption. A rapid response is only expected to be triggered by the first case, which could save time for emergency interventions and reduce the potential for business losses. In contrast, a rapid response caused by the second case will lead to a waste of resources. The aims of this strategy are to initially process the data near the edge, at the data generation end, quickly identify the types of abnormal data, and make response strategies. It avoids the system being overly sensitive caused by responses to the raw data; in the meanwhile, it could reduce the waiting time on uploading redundant data to the cloud as well as the waiting time on the instructions from the center server. Table 1 lists the proposed outlier type abstracts and specifications. With the hierarchical edge computing architecture illustrated in this paper, we distribute three abstracts of outlier type detection to different tiers of EN. Based on the principle that the lower the EN's grade, the lower the computing complexity, EN-1 simply detects the status changes and numerical data that exceed the threshold with a constant value. EN-2 processes the data exceeding the threshold with prediction formula in coherence with historic data retrieved from the local micro database. The computing that involves real-time standard deviation based on massive data and related prediction will be allocated to the higher tiers of EN even to the Cloud. The detailed process is proposed as pseudo-codes in Algorithm 1. * r indicates compare result in both cases. Since thresholds may either be upper boundary or lower boundary, we use Boolean variable r to denote whether data value exceeds the boundary. ** Data

Implementation
In this section, the proposed method is evaluated using software-defined networks. The platform, hardware settings, response accuracy are reported.

Implementation Platform and Hardware
Considering that massive sensor nodes in IoT applications cannot be completely simulated in a laboratory scenario for research experiments, we selected an open-source flow-based tool and platform called Node-RED [21] for implementing the architecture proposed in this paper. Node-RED is used as a platform to integrate components at multiple layers in the network. It provides a convenient connection with the web interface to visualize the data flow and configure the network. In our simulation, ENs Electronics 2020, 9, 1176 7 of 12 communicate by subscribing to a selected topic on a broker, and the messages/data coming into the topic could be observed on the web interface. The Node-RED is run on both the Raspberry Pi (Rpi) machines and laptops as the initial setup of the network. With Node-Red running on the equipment, we could manage and configure the network and component connection using a browser. Thus, all the platform and hardware were set up as listed in Table 2. Besides the physical sensors, python scripts were used to simulate sensing data for various scenarios that currently cannot be implemented in the laboratory. Table 2. Simulation platform and hardware.

Implementation
Networking ( As is shown in Table 2, Raspberry Pi and PC were used as edge nodes and the MQTT broker, as well as CoAP MA, were deployed on edge. Users could access the web-based client to retrieve raw data and preprocessed data. The network for the whole system was simulated based on Node-RED simulator. The cloud is planned to apply IBM open-source cloud as some of the IoT application components are integrated and free to use. The block diagram in Figure 3 shows the architecture of the simulation environment, which contained the components listed in Table 2. Figure 4 illustrates an example Node-red flow that implements the functions of EN-1.   As is shown in Table 2, Raspberry Pi and PC were used as edge nodes and the MQTT broker, as well as CoAP MA, were deployed on edge. Users could access the web-based client to retrieve raw data and preprocessed data. The network for the whole system was simulated based on Node-RED simulator. The cloud is planned to apply IBM open-source cloud as some of the IoT application components are integrated and free to use. The block diagram in Figure 3 shows the architecture of the simulation environment, which contained the components listed in Table 2. Figure 4 illustrates an example Node-red flow that implements the functions of EN-1.

Case Study Model
The experiment scenario is a cognitive warehouse which could recognize and respond to unexpected cases as illustrated in Table 1 rapidly. Temperature was selected as the sensing data type during the case study. The sampling rate of sensor data was configured as once per ten seconds, once per thirty seconds, and once per minute for three groups of repeating tests. The network scale was configured as 1000 nodes, 5000 nodes and 10,000 nodes, respectively.
When setting the network scale as 1000 nodes, the nodes were grouped into 50 LANs which kept generating sensed data. Every 20 sensor nodes in the same group were considered as neighbors and connected with EN in Star topology. Besides the normal data (set as 26 ± 1 centigrade for initialization), three types of outlier data were inserted into the data set randomly, which included over the boundary (constant and predicted), jitter, and error. Figure 5 illustrates four example cases during our experiments with fixed sampling rate as 10 per second and fixed network scale as 1000 nodes. Figure 5a-c judge outliers based on the constant threshold. The difference between (a) and (b) was the duration that outliers been detected. Both of these cases would be reported as "exceeding threshold" while the case in (b) needed to be further processed by EN-2. The case shown in Figure 5c indicated jitter happened to the sensor node. The single node's value did not exceed preset boundaries, but it broke the stability of the sensed data, which could be considered as one of the general outlier types. The case shown in Figure 5d denoted the threshold based on the prediction formula. Data which deviated from the expected scope would be considered as outliers.
Such data flows were tested under three different network scales with two different sampling rates by the system. The outcomes and related analysis are presented in the next section.

Outcomes and Analysis
To normalize the analysis, we only tested and compared the numerical sensor data during the experiment. Instead of practical environmental data, our dataset was designed for testing the proposed architecture and algorithm performances in the simulation. During the test under the setting of 1000-node network scale and 1/10 sec sampling rate, 50 groups of data were generated and tested corresponding to the WSN in 50 LANs. Packs of sensor nodes for each corresponding edge node were configured as 20, which was a fixed parameter during the whole experiment. In other words, every 20 sensor nodes in the same group were considered as neighbors that shared data at their corresponding edge node. Four types of numerical outliers were inserted into a different dataset for each group, which were 'error', 'exceed threshold', 'jitter', as well as 'dynamic threshold'. The system average response time and accuracy were used to reflect the system performance. The system response time was calculated by (T response − T detect )/f sampling , where T response denoted the timestamp that the system responded to the outlier, T detect denoted the timestamp that the outlier was detected, and f sampling denoted the sampling rate of the sensed data. The accuracy was calculated by dividing the number of outliers that were detected correctly by the number of original datasets on each end node. over the boundary (constant and predicted), jitter, and error. Figure 5 illustrates four example cases during our experiments with fixed sampling rate as 10 per second and fixed network scale as 1000 nodes. Figure 5a-c judge outliers based on the constant threshold. The difference between (a) and (b) was the duration that outliers been detected. Both of these cases would be reported as "exceeding threshold" while the case in (b) needed to be further processed by EN-2. The case shown in Figure 5c indicated jitter happened to the sensor node. The single node's value did not exceed preset boundaries, but it broke the stability of the sensed data, which could be considered as one of the general outlier types. The case shown in Figure 5d denoted the threshold based on the prediction formula. Data which deviated from the expected scope would be considered as outliers. Such data flows were tested under three different network scales with two different sampling rates by the system. The outcomes and related analysis are presented in the next section.

Outcomes and Analysis
To normalize the analysis, we only tested and compared the numerical sensor data during the experiment. Instead of practical environmental data, our dataset was designed for testing the proposed architecture and algorithm performances in the simulation. During the test under the setting of 1000-node network scale and 1/10sec sampling rate, 50 groups of data were generated and tested corresponding to the WSN in 50 LANs. Packs of sensor nodes for each corresponding edge node were configured as 20, which was a fixed parameter during the whole experiment. In other words, every 20 sensor nodes in the same group were considered as neighbors that shared data at their corresponding edge node. Four types of numerical outliers were inserted into a different dataset for each group, which were 'error', 'exceed threshold', 'jitter', as well as 'dynamic threshold'. The system average response time and accuracy were used to reflect the system performance. The system response time was calculated by (Tresponse − Tdetect)/fsampling, where Tresponse denoted the timestamp that the system responded to the outlier, Tdetect denoted the timestamp that the outlier was detected, and fsampling denoted the sampling rate of the sensed data.     By analyzing the statistical outcomes shown in Figures 6 and 7, it can be observed that cases related to the errors and exceeding the threshold could be 100% detected by the system. However, the system average response time had an obvious difference. The results would not be affected by the network scale and the sampling rate. As introduced in Sections 2.3 and 2.4, any data process that required the retrieval of historical data from the local database needed to be handled on EN-2. Therefore, extra data retrieval time and increment calculation led to a relatively long system average response time for the exceed threshold cases. The cases relating to the jitter had above eighty percent accuracy with a 9.69 sampling unit average response time which indicated that the edge system could detect the jitter-type data exception successfully within 10 sampling units. Eighty percent of the tested outliers defined by the dynamic threshold could be detected by the edge system within 17 sampling units. The sampling interval did not impact the system average response time and the response accuracy due to the unity of the sampling rate that was preset in the algorithm and the sampling interval configured in the simulation environment.  By analyzing the statistical outcomes shown in Figures 6 and 7, it can be observed that cases related to the errors and exceeding the threshold could be 100% detected by the system. However, the system average response time had an obvious difference. The results would not be affected by the network scale and the sampling rate. As introduced in Sections 2.3 and 2.4, any data process that required the retrieval of historical data from the local database needed to be handled on EN-2. Therefore, extra data retrieval time and increment calculation led to a relatively long system average response time for the exceed threshold cases. The cases relating to the jitter had above eighty percent accuracy with a 9.69 sampling unit average response time which indicated that the edge system could detect the jitter-type data exception successfully within 10 sampling units. Eighty percent of the tested outliers defined by the dynamic threshold could be detected by the edge system within 17 sampling

Discussion
This paper has proposed a rapid response system architecture designed for data outlier detection, which involves the concept of hierarchical edge computing and brings the advantages of edge computing-its low latency to the edge of the network. For the case study of distributed warehouse management in logistics, an algorithm for distinguishing and rapidly responding to emergency cases was proposed. Three types of outlier abstracts were emphasized in this paper in order to promote the portability of the system. The performance of the system was evaluated by a software-defined simulation, with a focus on accuracy and rapidness of the Grade-1 and Grade-2 edge nodes in the system. Applying the rapid response algorithm showed that above eighty percent of the abnormal cases could be detected and responded to, in less than twenty sampling units.
A handful of the jitter cases that were not detected by the system were mainly caused by the 'break' during the jitter period. The system judged that the case did not satisfy the predefined jitter condition. One of the potential approaches for improving this is to use a machine learning model for EN-3 so that decisions are made for the lower layers. Importing artificial intelligence models onto Grade-3 edge nodes is one of the avenues to improve the response accuracy, which could enable decisions to be made on the edge of the cloud. The same situation happens to the cases related to the dynamic threshold. Once the deviation of outlier data is not significant, the system may skip the abnormal cases. Thus, to implement the entire system architecture as proposed in this paper, a clear direction for future research is the implementation of Grade-3 edge nodes, which potentially focuses on the short-time prediction. Besides the edge computing layer, the interaction and interoperation between the edge and the Cloud are also a valuable direction to extend our research.
Furthermore, some practical issues in the WSN communication environment such as message collisions, dead nodes, hotspots, etc., are valuable but have not been considered at the current research stage. We consider them as a potential research direction and will do more investigation and experiments to enhance our research outcomes in the future.