Optimization-Based Approaches for Minimizing Deployment Costs for Wireless Sensor Networks with Bounded Estimation Errors

Hsiao, Chiu-Han; Lin, Frank Yeong-Sung; Yang, Hao-Jyun; Huang, Yennun; Chen, Yu-Fang; Tu, Ching-Wen; Zhang, Si-Yao

doi:10.3390/s21217121

Open AccessArticle

Optimization-Based Approaches for Minimizing Deployment Costs for Wireless Sensor Networks with Bounded Estimation Errors

by

Chiu-Han Hsiao

^1,*,†,‡

,

Frank Yeong-Sung Lin

^2,‡,

Hao-Jyun Yang

¹,

Yennun Huang

¹

,

Yu-Fang Chen

²,

Ching-Wen Tu

¹ and

Si-Yao Zhang

²

¹

Research Center for Information Technology Innovation, Academia Sinica, Taipei 115, Taiwan

²

Department of Information Management, National Taiwan University, Taipei 10617, Taiwan

^*

Author to whom correspondence should be addressed.

^†

Current address: 128 Academia Road, Section 2, Nankang, Taipei 115, Taiwan.

^‡

These authors contributed equally to this work.

Sensors 2021, 21(21), 7121; https://doi.org/10.3390/s21217121

Submission received: 15 September 2021 / Revised: 21 October 2021 / Accepted: 21 October 2021 / Published: 27 October 2021

(This article belongs to the Special Issue Performance, Reliability and Scalability of IoT Systems)

Download

Browse Figures

Versions Notes

Abstract

:

As wireless sensor networks have become more prevalent, data from sensors in daily life are constantly being recorded. Due to cost or energy consumption considerations, optimization-based approaches are proposed to reduce deployed sensors and yield results within the error tolerance. The correlation-aware method is also designed in a mathematical model that combines theoretical and practical perspectives. The sensor deployment strategies, including XGBoost, Pearson correlation, and Lagrangian Relaxation (LR), are determined to minimize deployment costs while maintaining estimation errors below a given threshold. Moreover, the results significantly ensure the accuracy of the gathered information while minimizing the cost of deployment and maximizing the lifetime of the WSN. Furthermore, the proposed solution can be readily applied to sensor distribution problems in various fields.

Keywords:

Lagrangian Relaxation; network deployment; pearson correlation; wireless sensor networks (WSNs); XGBoost

1. Introduction

Wireless sensor networks (WSNs) are used worldwide; approximately 500 billion devices in various industries are connected to the Internet. Applications include manufacturing, e-commerce, energy, surveillance, and environmental detection [1,2]. Sensors access information through the network to provide numerous innovative services. The diverse requirements of services are satisfied for networking between people, between people and machines, or even between machines for purposes ranging from residential to social communication [3]. Thus, humans and things are beginning to use the Internet to monitor air quality, temperature, landslides, and other data [4]. Placing a sensor at each point requiring data collection may cause sensor redundancy, resulting in a substantial wastage of resources [5,6]. However, reducing the number of deployed sensors may lead to acquisition of insufficient or incorrect information [7]. It is a trade-off for network planning and operation. For large-scale WSNs, system deployment designers must analyze relevant trade-offs to develop a protocol extending the lifetime of WSNs by improving energy efficiency and collecting data accurately. In this paper, the proposed sensor deployment strategies ensure the accuracy of the gathered information while minimizing the cost of deployment and maximizing the lifetime of the WSN. Based on previous studies as mentioned in [8], a preliminary analysis was performed for several methods of balancing WSN energy consumption. The variable-range transmission power control method optimizes traffic distribution by deploying sensors which are inexpensive compared to transmission costs. The mobile-data-sink deployment and multiple-data-sink deployment methods both adjust the position of the data sink in the network. Nonuniform initial energy assignment and intelligent sensor or relay deployment are also methods for reducing power consumption in WSNs [3].

This paper proposed deploying WSNs to provide services with sufficient lifetime and reduced cost and overall energy consumption. In particular, large-scale applications were investigated. By analyzing thermal sensor data from related experiments, we found that the temperature measurements of many sensors are correlated. The Australian Climate Observation Reference Network-Surface Air Temperature (ACORN-SAT) dataset includes consistent and uniform daily temperature records from 112 observation sites beginning in 1910. Figure 1a illustrates the Australia contour and Figure 1b presents all the sites; larger circles indicate a site with higher costs. The use of additional sensors increases the total amount of data. It increases the likelihood that similar data are collected between sensors. Excessive data are redundant and reduce processing efficiency. Therefore, a mathematical model was formulated to estimate the temperature values of locations without installed sensor nodes. In this paper, thermal sensors (for temperature data) were selected because they are representative of many other applications, such as in heating products, pipe temperature measurements for flowing liquids (e.g., oil or chemical products), smart homes, refrigerator temperatures, and large-scale environmental monitoring. The proposed method compared with previous works is called error-bound satisfaction and ensures the quality of the results. Performance metrics are used to evaluate results and ensure that estimation error values are within the system’s tolerance of the average estimation error. By guaranteeing the accuracy of the estimation systematically, the lifetime of the deployed WSN is sufficient to meet the requirements for controlling the sensors during operations. Sensing and installation costs are fixed during sensor installation. The primary contribution of this paper is to minimize deployment costs within given error bounds. Then, the cost minimization methods for WSNs meet numerous metrics. The proposed WSN deployment strategies can minimize cost and maintain accuracy within the system’s average estimation error threshold. This work combines both theoretical and practical considerations to minimize the deployment cost of temperature sensors. The proposed strategies are readily applied to sensor distribution problems in various fields.

Research Scope:

Minimizing the deployment cost of temperature sensors.
Data used in this research is from the Australian Climate Observation Reference Network-Surface Air Temperature.
Data ranges from 2008–2019.
Expected to significantly reduce the deployment cost while maintaining the error under threshold.

2. Related Work

WSN quality of service is measured in the literature by considering coverage, connectivity, network lifetime, and network deployment costs [9]. Several mathematical models, algorithms, and heuristics also have been proposed to solve other problems such as region of interest monitoring, intruder detection, and energy efficiency management in various applications. This section briefly identifies the shortcomings of these methods. It derives related sensor node deployment strategies depending on numerous factors for the system design in network planning and operations. Some strategies include placing sensors to minimize the number of sensors, reducing cost, and ensuring the accuracy of the data collected results.

2.1. Correlation-Aware Deployment Methods

The sensitivity of the estimation error depends on either the static sensor deployment or dynamic adjustment strategies. Methods of identifying the most informative sensor were proposed in [10]. Roy et al. assumed that a set of data snapshots could characterize the monitored phenomenon. To reconstruct the data with the required accuracy, a method for identifying optimal sensor locations was formulated. However, to handle both stationary and nonstationary fields, two optimization models were proposed. For both deployment problems, an iterative solution algorithm was proposed to obtain a sensor deployment strategy. Although the input was assumed to be perfect, errors may exist in the simulated data based on this assumption [11].

Additionally, for monitoring spatial phenomena such as temperatures in indoor or preconfigured environments, we can assume that the collection of data is possible in the predeployment phase. Krause et al. defined the quality of a given topology using the concept of mutual information to choose the best location for sensors using a Gaussian is the variation of this method [12]. The solution was a polynomial algorithm defined using the submodules of mutual information after formalizing the problem. The paper was extended by [13], considering not only the coverage area but also the connection cost; the qualities of the links were assumed to be Gaussian. Based on the results of the temperature measurements, the Gaussian approach was unsuitable [13]. A new temperature measurement and prediction method was designed using mathematical programming techniques.

A classic problem of measurement is the estimation of the data collected by a small set of deployed sensors to reduce costs. In [14], Ranieri et al. proposed a greedy heuristic algorithm to solve the related problem of perception deployment by considering a general form and studying its mathematical characteristics. Simulations were conducted to prove that the algorithm can solve the problem in a short time and can provide an approximate optimal solution. In [15], a perception topology was defined to select active sensors and inactivate other sensors. Liaskovitis et al. considered an already-deployed sensor network and proposed an algorithm to define the network. To determine whether a sensor remains active, they estimated changes in the sensed phenomenon online. By contrast, in this paper, we propose that offline selection of the sensing points is made during the network planning stage.

Furthermore, analysis of sensing data is essential. Machine learning and multiple linear regression models were compared for remote sensing data [16]. Forkuor et al. proposed methods including multiple linear regression, random forest regression, support vector machine, and stochastic gradient boosting and compared the performance metrics. The results revealed that multiple linear regression had the predicting ability. However, this method is limited by the relationship between dependent and independent data variables. The use of artificial intelligence and machine learning has become more prevalent, leading to an increase in input sensor data. However, some input data may not be useful for the model. In [17], Yan et al. used multiple linear regression to predict results with a small number of useful variables [17]. The correlation analysis can identify the most correlated data to predict missing sensor data with low error [18].

In [19], Ma et al. derived a sensor deployment scheme that eliminates vacancy points that cannot be estimated, achieving low sensor density and guaranteeing bounded estimation error. Ma et al. compared various spatial patterns, including the equilateral triangle, square, and regular hexagon. They concluded that an equilateral triangle pattern was the best deployment strategy. In [20], Kim et al. proposed an efficient deployment scheme for a surveillance sensor network incorporating the event occurrence rate. The scheme aimed to minimize the number of sensors deployed in a large-scale WSN and satisfy the target probability of detection. Their proposed scheme reduced the total number of sensors by 10% to 40%. In [21], Han et al. established a deployment strategy for underwater acoustic sensors. Their strategy considered network distribution in a three-dimensional environment. They simulated numerous deployment schemes and demonstrated that a tetrahedral scheme had better overall performance in reducing error.

2.2. Sensor Deployment Applications

The applications of WSNs are versatile. Ramesh et al. proposed a system of pore pressure transducers, dielectric moisture sensors, and movement sensors to detect landslides in real time [22,23]. The system had numerous sensor columns distributed over an area of interest. Moreover, their work included a complete architecture of the physical sensor columns and the backend software service. Huang et al. proposed a fiber-optic sensing system capable of monitoring debris flows [24]. The system included a light source, a data logger, a four-port coupler, and four fiber Bragg grating accelerometers. The results revealed that the proposed fiber-optic system outperformed conventional sensing systems and had high reliability; the system’s performance was promising for monitoring natural disasters. Marin-Perez et al. proposed a building automation system using the PLUG-N-HARVEST architecture that uses Internet of things (IoT) to achieve reliable security and intelligent management [25]. The automated building had low energy consumption. Moreover, in [26], Wright proposed a system for decision-making and operations on a fully autonomous ship using machine learning and artificial intelligence with multiple sensor modalities.

2.3. Summary

Recent studies have considered constraints such as network connectivity and energy consumption. All coverage formulas assume that the sensor has a given detection range in event perception methods or assume the distribution of sensor measurement values in the appropriate perception method; examples include [27] for wind monitoring and [28] for data center server overheating detection. To design deployment methods, considering the characteristics of application instances, a new application-aware deployment method shown in Table 1 is proposed. The thermal sensor deployment in this paper is similar. In previous studies, methods of identifying the most informative sensors have been proposed and implemented in several fields. Several predictive models and their distribution have been discussed. In this research, the data only contained the temperature for each site on each day. Thus, one site can be the dependent variable, and another site can be the independent variable. Correlations between the dependent and independent variables can be analyzed to discover correlative pairs. Due to cost or energy consumption considerations, the solution is proposed to reduce the number of deployed sensors and yield results within the error tolerance. The correlation-aware method is also designed in a mathematical model that combines theoretical and practical perspectives to determine sensor deployment strategies.

Summary of the Research Gap:

Focuses on the temperature data which can be widely applied in various fields, such as humidity monitoring, air quality sensing, GPS surveillance, or landslide detection.
Uses optimization-based methods to achieve reliable results.
The methods used in this work are suitable for data from other fields.

3. System Architecture and Problem Formulation

3.1. System Structure

Characterization of the Deployment Region: IoT applications have recently expanded to almost every industry. To increase flexibility, we propose an optimal deployment strategy of minimizing cost while ensuring the accuracy of the collected data. Temperature data were used in the mathematical model, but the research can also be applied to other data such as humidity, air quality, or pressure [1,2,4]. Therefore, we attempt to generalize the mathematical model for sensor deployment. The mathematical model is suitable for application in different time zones or different climates.

Node Deployment: ACORN-SAT dataset includes consistent and uniform daily temperature records from 112 observation sites beginning in 1910. Figure 1b presents all the sites. The website (http://www.bom.gov.au/climate/data/acorn-sat/ (accessed on 20 October 2021).) displays ACORN-SAT v.2 (1911–2019), which is the original record of each highest and lowest daily temperature without any modifications due to extreme weather or strange weather records. Training data were collected from year of 2010–2018 and the testing data from 2019. The dataset contains relevant data including the longitude and the latitude of each site; the data were thus separated into different sections by longitude. These data-sets were used to verify the suitability of the mathematical model in different time zones. We propose three categories of experiments: (1) all 112 sites are used in one topology, (2) the sites are separated into two equal areas by longitude, and (3) the sites are separated into four equal areas by longitude.

Methodology Design: In Figure 2, the solution approach was proposed. Data were first collected from deployed sensors. However, some data loss may have occurred due to network processing, causing missing values in the obtained data. For example, in 2019, the temperature data for Tennant Creek, NT, has six days of missing data; and 18 days of missing data for Bridgetown, WA. Overall, there are 1% of missing data. These missing values were determined through interpolation. The intact data were then input into the four proposed solution models: Pearson correlation coefficients (PCC) and linear regression, XGBoost, referencing capability ranking (RCR), and x coefficient ranking (xCR). Finally, the Lagrangian Relaxation (LR) model was used to determine gaps between the objective cost and the theoretical minimum for each model.

3.2. Problem Formulation

In the proposed mathematical formulation, D is the set of evaluation data obtained before applying the model. The data in D are the ground truth of whether a sensor is installed at each location (i.e., all measured temperature values of all locations for all time zones). The main notations used in this study are presented in Table 2 and Table 3. The decision variables express the experimental outcome and indicate the sensor distribution and corresponding parameters.

The objective function of cost minimization is expressed by Equation (1). The total cost of deployment is the sum of the cost of the installed sensors. Variables i and j specify the locations of sensor nodes and

x_{i}

is Boolean and indicates whether a sensor is installed at location i.

The mathematical model is based on the convection of heat energy and the conduction of radiation between points. Consequently, each temperature value is related to the other. By determining the relevance of the temperature of each location at each time, we can estimate temperatures at locations without sensors using the accurate measurements of the surrounding. We denote the estimated temperature at location i in dataset k to be m. Each solution model identifies m in different ways as discussed in Section 4. A universal method of calculating error is used in each method (Equation (3)). The overall error threshold of the system is expressed by Equation (4).

Solution models were primarily evaluated by minimizing

e_{k i}

in Equation (3). The LR method was used to iteratively achieve the primal feasible outcome, i.e., the lowest deployment cost. The objective function of the primal problem is presented in Equation Integer Programming (IP) subject to constraints (2)∼(7).

\begin{matrix} \min \sum_{i \in V} C_{i} x_{i} & \forall i \in V \end{matrix}

(1)

s.t.

\begin{matrix} m_{k i} = \sum_{j \in V} p_{j i} H_{k j} x_{j} & \forall k \in D, \forall i, j \in V, j \neq i \end{matrix}

(2)

\begin{matrix} e_{k i} = {(m_{k i} - H_{k i})}^{2} & \forall k \in D, \forall i \in V \end{matrix}

(3)

\begin{matrix} T = \sum_{k \in D} \sum_{i \in V} \frac{W_{i} e_{k i}}{| D | | V |} \leq Ψ \end{matrix}

(4)

\begin{matrix} ε \leq p_{j i} \leq Π_{j i} & \forall i, j \in V \end{matrix}

(5)

\begin{matrix} x_{i} \in {ε, 1} & \forall i \in V \end{matrix}

(6)

\begin{matrix} 0 \leq e_{k i} \leq Z_{k i} & \forall k \in D, \forall i \in V . \end{matrix}

(7)

4. Solution Approach

4.1. Lagrangian Relaxation-Based Method

Lagrangian relaxation-based solution approach was widely studied in the 1970s. The LR problem can be established by removing the complex constraints and appending them after identifying the objective function with weights. The weights are Lagrangian multipliers and symbolize penalties when constraints are broken [29]. In this paper, the objective is to obtain the solution to the primal problem. The algorithm is followed by the procedures of the Lagrangian relaxation-based approach shown in Figure 3. Based on the mathematical formulation, the LR problem can be solved by disintegration to several independent subproblems. The LR problem is divided into five subproblems to find minima. Each subproblem is optimally solved using a divide-and-conquer approach. If a minimization problem is considered, the solution of the LR approach is the lower bounds [30]. The lower bounds are improved by adjusting the multipliers set between the LR and the dual problems. After obtaining a solution of the dual problem, its feasibility must be further checked or adjusted by the proposed and self-designed heuristics, such as RCR, xCR, or xGBoost (parallel model selections) for obtaining the primal feasible solution. A solution is feasible if it satisfies all constraints of the primal problem. The answer is marked if there is a feasible solution determined by a feasible check. Finally, the gap between the lower bounds and the feasible solutions is calculated for the entire process. The calculations are iteratively repeated until the termination conditions are satisfied.

4.1.1. Step 1: Reformulation for Relaxation

Complicated constraints are relaxed to obtain a primal optimization problem and feasible solution regions are extended to simplify the primal problem. The primal problem is then transformed into an LR problem associated with Lagrangian multipliers. In accordance with the decomposition of decision variables,

p_{j i} x_{j}

, we introduce an auxiliary variable,

s_{j i}

to reformulate the constraint equation (Equation (2)) by replacing

p_{j i} x_{j}

with

s_{j i}

to form the constraint Equations (8)–(10).

\begin{matrix} log s_{j i} = log p_{j i} + log x_{j} & \forall i, j \in V, j \neq i \end{matrix}

(8)

\begin{matrix} m_{k i} = \sum_{j \in V} H_{k j} s_{j i} & \forall k \in D, \forall i, j \in V, j \neq i \end{matrix}

(9)

\begin{matrix} ε \leq s_{j i} & \forall i, j \in V, j \neq i \end{matrix}

(10)

4.1.2. Steps 2 and 3: Decomposition and Solution of Subproblems

We then relax the reformed mathematical model. The constraint Equations (3), (4), (8), and (9) were relaxed by introducing the Lagrange multipliers

μ_{k i}^{1}

,

μ^{2}

,

μ_{k i}^{3}

, and

μ_{k i}^{4}

. Consequently, the original problem is transformed into the LR problem as the objective function in Equation (11) subject to constraints (5), (6), (7), and (10).

\begin{matrix} \min Z_{L R} = \sum_{i \in V} C_{i} x_{i} \\ + \sum_{k \in D} \sum_{i \in V} μ_{k i}^{1} [e_{k i} - {(m_{k i} - H_{k i})}^{2}] \\ + μ^{2} [\sum_{k \in D} \sum_{i \in V} (\frac{W_{i} e_{k i}}{| V |}) - Ψ] \\ + \sum_{k \in D} \sum_{i, j \in V, j \neq i} μ_{k i}^{3} [log p_{j i} + log x_{j} - log s_{j i}] \\ + \sum_{k \in D} \sum_{i \in V} μ_{k i}^{4} [\sum_{j \in V, j \neq i} H_{k j} s_{j i} - m_{k i}] \\ s . t . (5), (6), (7), and (10) . \end{matrix}

(11)

The LR problem is then decomposed into five subproblems. The objective of this step is to reach a minimum

Z_{L R}

. Thus, each subproblem is solved individually. Each minimum is obtained by algorithms for the five subproblems determining

s_{j i}

,

x_{i}

,

p_{j i}

,

e_{k i}

, and

m_{k i}

are presented as follows.

Subproblem 1 (related to

s_{j i}

)

\begin{matrix} \min \sum_{k \in D} \sum_{i, j \in V, i \neq j} (- μ_{k i}^{3} log s_{j i} + μ_{k i}^{4} H_{k j} s_{j i}) \\ s . t . ε \leq s_{j i} \leq Π_{j i} \forall i, j \in V, i \neq j . \end{matrix}

(12)

Subproblem 1 is a minimization problem. It consists of continuous variable and logarithm, extremum will be found when differential equals zero or the boundaries of the decision variable,

S_{j i}

. Algorithm 1 shows the pseudo-code of Subproblem 1. First, we calculate the partial differential of the objective function by

S_{j i}

. Let the product is equal to zero to determine the value of

\frac{μ_{k i}^{3}}{μ_{k i}^{4} H_{k j}}

. Secondly, the validity of the extremum must be checked. Setting the

S_{j i}

equal to

ε

,

\frac{μ_{k i}^{3}}{μ_{k i}^{4} H_{k j}}

, or

Π_{j i}

such that the objective value of (Sub 1) is minimum, correspondingly.

Algorithm 1 Subproblem 1

Input: Given parameters $H$ and Lagrangian multipliers $μ^{3}$ , $μ^{4}$ .
Output: Decision variable $s$ .
Initialize: $s_{j i} \leftarrow 0$ , $\forall i, j \in V, i \neq j$
for $k = 0$ to $(| D | - 1)$ do
for $j = 0$ to $(| V | - 1)$ do
if $i = j$ then
continue
end if
if $ε \leq \frac{μ_{k i}^{3}}{μ_{k i}^{4} H_{k j}} \leq Π_{j i}$ then
$s_{j i} \leftarrow ε$ , $\frac{μ_{k i}^{3}}{μ_{k i}^{4} H_{k j}}$ , or $Π_{j i}$ such that $- μ_{k i}^{3} log s_{j i} + μ_{k i}^{4} H_{k j} s_{j i}$ has minimum.
else
$s_{j i} \leftarrow ε$ or $Π_{j i}$ such that $- μ_{k i}^{3} log s_{j i} + μ_{k i}^{4} H_{k j} s_{j i}$ has minimum.
end if
end for
end for

Subproblem 2 (related to

x_{i}

)

\begin{matrix} \min \sum_{i, j \in V, i \neq j} (C_{i} x_{i} + \sum_{k \in D} μ_{k i}^{3} log x_{j}) \\ s . t . x_{i} \in {ε, 1} \forall i \in V . \end{matrix}

(13)

Subproblem 2 is further decomposed into

| V |

independent minimization problems. For a location i, the decision variable

x_{i}

is examined for two values which are

ε

or 1, such that

C_{i} x_{i} + μ_{sum} log x_{i}

has minimum, respectively. The pseudo-code is shown in Algorithm 2.

Algorithm 2 Subproblem 2

Input: Given parameters $C$ and Lagrangian multiplier $μ^{3}$ .
Output: Decision variable $x$ .
Initialize: $x_{i} \leftarrow 0$ , $\forall i \in V$
for $i = 0$ to $(| V | - 1)$ do
$μ_{sum} \leftarrow 0$
for $k = 0$ to $(| D | - 1)$ do
for $j = 0$ to $(| V | - 1)$ do
if $j = i$ then
continue
else
$μ_{sum} \leftarrow μ_{sum}$ + $μ_{k j}^{3}$
end if
end for
end for
$x_{i} \leftarrow ε$ or 1 such that $C_{i} x_{i} + μ_{sum} log x_{i}$ has minimum.
end for

Subproblem 3 (related to

p_{j i}

)

\begin{matrix} \min \sum_{k \in D} \sum_{i, j \in V, i \neq j} μ_{k i}^{3} log p_{j i} \\ s . t . ε \leq p_{j i} \leq Π_{j i} \forall i, j \in V, i \neq j . \end{matrix}

(14)

Subproblem 3 is also a minimization problem consisting continuous variable and logarithm. Since (Sub 3) is a logarithmic equation, the minimum of such equation lies at either one of the boundaries of

p_{j i}

. Therefore, Algorithm 3 simply checks either

ε

or

Π_{j i}

has the minimum value. Algorithm 3 shows the pseudo-code of Subproblem 3.

Algorithm 3 Subproblem 3

Input: Given Lagrangian multiplier $μ^{3}$ .
Output: Decision variable $p$ .
Initialize: $p_{j i} \leftarrow 0$ , $\forall i, j \in V, i \neq j$
for $k = 0$ to $(| D | - 1)$ do
for $i = 0$ to $(| V | - 1)$ do
for $j = 0$ to $(| V | - 1)$ do
if $j = i$ then
continue
else
$p_{j i} \leftarrow ε$ or $Π_{j i}$ such that $μ_{k i}^{3} log p_{j i}$ has minimum.
end if
end for
end for
end for

Subproblem 4 (related to

e_{k i}

)

\begin{matrix} \min \sum_{k \in D} \sum_{i \in V} (μ_{k i}^{1} + μ^{2} \frac{W_{i}}{| V |}) e_{k i} \\ s . t . 0 \leq e_{k i} \leq \max (H_{k i}) - \min (H_{k i}) \forall k \in D, \forall i \in V . \end{matrix}

(15)

Subproblem 4 aims at deriving the right

e_{k i}

such that

(μ_{k i}^{1} + μ^{2} \frac{W_{i}}{| V |}) e_{k i}

has minimum. Since

μ_{k i}^{1} + μ^{2} \frac{W_{i}}{| V |}

is given,

e_{k i}

can be determined by checking whether

μ_{k i}^{1} + μ^{2} \frac{W_{i}}{| V |}

is negative. Algorithm 4 shows the pseudo-code of Subproblem 4.

Algorithm 4 Subproblem 4

Input: Given parameters $W$ and Lagrangian multipliers $μ^{1}$ , $μ^{2}$ .
Output: Decision variable $e$ .
Initialize: $e_{k i} \leftarrow 0$ , $\forall k \in D, \forall i \in V$
for $k = 0$ to $(| D | - 1)$ do
for $i = 0$ to $(| V | - 1)$ do
if $μ_{k i}^{1} + \frac{μ^{2} W_{i}}{| V |} \geq 0$ then
$e_{k i} \leftarrow 0$
else
$e_{k i} \leftarrow {(\max (H_{k i}) - \min (H_{k i}))}^{2}$
end if
end for
end for

Subproblem 5 (related to

m_{k i}

)

\begin{matrix} \min \sum_{k \in D} \sum_{i \in V} [- μ_{k i}^{1} m_{k i}^{2} + (2 μ_{k i}^{1} H_{k i} - μ_{k i}^{4}) m_{k i}] \\ s . t . \min (H_{k i}) \leq m_{k i} \leq \max (H_{k i}) \forall k \in D, \forall i \in V . \end{matrix}

(16)

Subproblem 5 is a quadratic equation of

m_{k i}

, so the minimum could be found by its differential. The boundary of

m_{k i}

lies between

\min (H_{k i})

and

\max (H_{k i})

. Algorithm 5 first checks whether the differential

\frac{2 H_{k i} μ_{k i}^{1} - μ_{k i}^{4}}{2 μ_{k i}^{1}}

lies within the boundary. If so, the

m_{k i}

should be either

\min (H_{k i})

,

\max (H_{k i})

, or

\frac{2 H_{k i} μ_{k i}^{1} - μ_{k i}^{4}}{2 μ_{k i}^{1}}

. Otherwise, the minimum happens at the one of boundaries, so

m_{k i}

should be either

\min (H_{k i})

or

\max (H_{k i})

. Algorithm 5 shows the pseudo-code of Subproblem 5.

Algorithm 5 Subproblem 5

Input: Given parameters $H$ and Lagrangian multipliers $μ^{1}$ , $μ^{4}$ .
Output: Decision variable $m$ .
Initialize: $m_{k i} \leftarrow 0$ , $\forall k \in D, \forall i \in V$
for $k = 0$ to $(| D | - 1)$ do
for $i = 0$ to $(| V | - 1)$ do
if min $(H_{k i}) \leq \frac{2 H_{k i} μ_{k i}^{1} - μ_{k i}^{4}}{2 μ_{k i}^{1}} \leq$ max $(H_{k i})$ then
$m_{k i} \leftarrow \min (H_{k i})$ , $\frac{2 H_{k i} μ_{k i}^{1} - μ_{k i}^{4}}{2 μ_{k i}^{1}}$ , or $\max (H_{k i})$ such that $μ_{k i}^{1} (- m_{k i}^{2} + 2 m_{k i} H_{k i}) - μ_{k i}^{4} m_{k i}$ has minimum.
else
$m_{k i} \leftarrow \min (H_{k i})$ or $\max (H_{k i})$ such that $μ_{k i}^{1} (- m_{k i}^{2} + 2 m_{k i} H_{k i}) - μ_{k i}^{4} m_{k i}$ has minimum.
end if
end for
end for

4.1.3. Steps 4 and 5: Dual Problem and Subgradient Method

The LR problem can be solved optimally if all subproblems are solved optimally using the divide-and-conquer approach. The optimal value of the LR problem, denoted as

Z_{L R}

, is an LB of

Z_{I P}

. Hence, to derive the LB, we must adjust the Lagrangian multipliers to identify those with the greatest values by solving the dual problem shown in (17).

\begin{matrix} \max Z_{D} = Z_{L R} (μ_{k i}^{1}, μ^{2}, μ_{k i}^{3}, μ_{k i}^{4}) \\ s . t . μ_{k i}^{1}, μ_{k i}^{3}, μ_{k i}^{4} \in R, μ^{2} \geq 0 \forall k \in D, \forall i \in V . \end{matrix}

(17)

The subgradient method proposed by Held and Karp [31,32] is a commonly used approach for solving the dual problem due to the simplicity of its programming. First, we let vector m be a subgradient of the dual problem. Over n iterations of the subgradient method, the multiplier vector is updated by Equation (18).

μ_{n + 1} = μ_{n} + t_{n} m_{n}

(18)

t_{n} = λ_{n} \frac{[Z_{I P} - Z_{D} (μ_{n})]}{| | m_{n} {| |}^{2}}

(19)

The step size

t_{n}

is defined in Equation (19). According to the work of Held et al. [33],

λ_{n}

is a scalar. Usually, it is set to two and halved if

Z_{D} (μ_{n})

cannot increase within a certain number of iterations. The procedures of LR method and the subgradient method are presented in Figure 3. It is the way to find the tightest lower bound of the dual problem iteratively.

4.1.4. Step 6: Obtaining the Primal Feasible Solutions

A set of decision variables was extracted after the five subproblems were solved. However, due to the relaxation of multiple complex constraints, the solution may not be feasible (as mentioned in Section 4.1). Therefore, we designed heuristic methods to tune decision variables to achieve feasibility. The two proposed methods were RCR and xCR.

Referencing Capability Ranking (RCR)
The primary goal of this study was to accurately estimate temperature measurement at locations without sensors. Equation (2) describes the estimation model; where $m_{k i}$ can be calculated by a linear combination of the data series $H_{k j}$ and coefficients $p_{j i}$ . Thus, the problem is reduced to deriving an optimal series of $p_{j i}$ such that $e_{k i}$ in Equation (3) is minimized. To derive $p_{j i}$ , we apply the steepest gradient descent method. The steepest gradient descent method, also known as the gradient method, was first described by Cauchy in 1847. Other analytic methods have been inspired by the method or derived from its deformation; the gradient method is thus fundamental to optimization methods. The method requires minimal work and few storage variables, and has low initial point requirements. However, it converges slowly, is inefficient, and sometimes is unable to yield an optimal solution. The goal of nonlinear programming is the numerical optimization of nonlinear functions. The theory and methods of nonlinear programming are used in military, economic, management, production process automation, engineering design and product optimization design applications. Nonlinear programming methods were used to calculate the optimal set of the coefficient $p_{j i}$ in Equation (2). The objective function for gradient descent is Equation (20). In each LR iteration, the $x_{i}$ of each subproblems is used to optimize the corresponding $p_{j i}$ . After several rounds of gradient descent, if the overall error of estimated measurements using optimized $p_{j i}$ satisfies the average error threshold $Ψ$ the set $x_{i}$ is considered feasible and the total cost is recorded. However, if $Ψ$ is not satisfied, locations for deployed sensors are added to reduce the overall error. The addition is based on the ranking of $f_{i}$ of each location i; $f_{i}$ is the “referencing ability” of location i for other locations as presented in Equation (21).

$\min \sum_{k \in D} \sum_{i \in V} {(m_{k i} - H_{k i})}^{2}$

(20)

$f_{i} = \frac{\sum_{j \in V} p_{j i}}{C_{i}}, \forall i \in V .$

(21)
x Coefficient Ranking (xCR)
xCR is similar to RCR. In this strategy, Equation (2) was also applied to estimate $m_{k i}$ given $x_{i}$ by the LR procedure in each iteration. Moreover, steepest gradient descent was used to determine $p_{j i}$ to minimize objective error (Equation (20)). If, after several rounds of gradient descent, if the overall estimated measurement error does not satisfy the average error threshold $Ψ$ , locations are added to mitigate the error based on the ranking of the coefficients of $x_{i}$ in the LR objective formulation until a feasible set of $x_{i}$ is produced.

In summary, to achieve a feasible solution set

x_{i}

and its objective value (Equation (1)),

x_{i}

from Algorithm 1 is first used to calculate the optimized set of

p_{j i}

. Then, the average error of estimation T is compared with the threshold

Ψ

. If the threshold is satisfied, the set

x_{i}

is considered feasible. Otherwise, some i in

x_{i}

are changed from zero to one (i.e., sensors are installed) based on either the RCR or xCR method to mitigate error. This process is illustrated in Figure 4.

4.2. Pearson Correlation Coefficients and Linear Regression Methods

The PCC model begins with determining its coefficients:

c_{j i} = \frac{c o v (H_{k i}, H_{k j})}{σ_{H_{k i}} σ_{H_{k j}}}

(22)

- 1 \leq c_{j i} \leq 1 \forall i, j \in V .

(23)

The value of

c_{j i}

is the Pearson correlation coefficient. The term

c o v

in Equation (22) refers to the covariance between

H_{k i}

and

H_{k j}

where the data at location

i, j \in V

are all included in the dataset D. The term

σ

is the standard deviation of variables

H_{k i}

and

H_{k j}

; the data at location

i, j \in V

are all included in the dataset D. Each pair of data at location i and j in dataset D have the same value regardless the order of i and j. If i and j are the same in the equation, the value of

c_{j i}

is 1, indicating identical data pairs. If the value of

c_{j i}

is between

\pm 0.5

and

\pm 1.0

, the correlation is considered strong. The values of

c_{j i}

between

\pm 0.3

and

\pm 0.49

indicate moderate correlation. Values of

c_{j i}

lower than

\pm 0.29

indicate weak correlation between i and j. Negative

c_{j i}

implies that the data are negatively correlated. Thus, pairs with greater

| c_{j i} |

have stronger correlations.

Because our goal is to minimize the cost of sensor deployments, the temperature value of locations without installed sensors must be estimated. In PCC, for any measuring location without a sensor, the measured values of several nearby sensors are required to estimate a value for the missing sensor through linear regression. The correlation coefficient obtained through convex combination is a decision variable. Zero indicates no association. Each location without a sensor is associated with all measured values at locations with installed sensors.

The estimated measurement value

m_{k j}

at location j can be obtained from Equation (24) for location

j \in V

and dataset

k \in D

.

H_{k i}

is the measurement value physically collected from sensors. After calculation of the correlation coefficients, the relationships between each pair i and j are all known. Thus, rankings of the correlation coefficients can indicate sensor deployment locations. In Equation (24), coefficients

a_{j i}

and

b_{j i}

are obtained from the training data-sets

H_{k i}

and

H_{k j}

.

H_{k i}

indicates the independent variable and

H_{k j}

is the dependent variable. After obtaining

a_{j i}

and

b_{j i}

, the testing data

H_{k i}

can be used to obtain a predicted value

m_{k j}

. In short, the measurement at location j is inferred using an actual measurement at location i. If the correlation coefficient of a selected pair is positive, then the corresponding

a_{j i}

is also positive. However, if data for the independent variable and dependent variable is switched, both

a_{j i}

and

b_{j i}

will change. The best solution will have a lower residual for the result of the equation; that is, the coefficient of determination (

R^{2}

) will be higher. Because Equation (24) is a unary linear regression,

R^{2}

of the selected pair equals the square of the correlation coefficient. Therefore, the correlation coefficient can be used to determine the value of

R^{2}

for a given pair.

m_{k j} = a_{j i} H_{k i} + b_{j i}

(24)

\forall k \in D, \forall i, j \in V, i \neq j

Because the Pearson correlation coefficient is a measurement of the strength of the association between two variables, high coefficients for two given nodes have two meanings:

The node is strongly associated with other nodes; the temperature can thus be accurately estimated by other nodes.
The node is strongly associated with other nodes; it can be used to estimate temperatures at other nodes.

To decide where to deploy a sensor, the summation of the correlation coefficients for each site can be ranked to determine each node’s connectivity. Strongly connected nodes are preferentially installed.

The node with the highest sum of coefficients is first deployed. Then, the temperature at all other nodes connected to the deployed node can be predicted using linear regression; deployment of these nodes can be avoided. Next, the node with the second-highest sum of coefficients is deployed, and nodes strongly connected with this deployed node are not deployed. This process continues until all strongly connected nodes are deployed or have been removed. Finally, leftover nodes are deployed because they are not predicted by any deployed node. The algorithm is presented in Figure 5.

4.3. Extreme Gradient Boosting Method

Extreme gradient boosting (XGBoost) is a decision-tree-based boosting system that is well known and widely used in machine learning [34,35]. The system can meet the classifying and regression requirements of our method. Assuming K trees in the classification, F denotes the space of functions containing all regression trees,

f_{k} (x_{i})

is the weight of the

i^{t h}

sample in the

k^{t h}

tree. The model is formulated by Equation (25):

m_{k i} = \sum_{k = 1}^{K} f_{k} (x_{i}), \forall f_{k} \in F

(25)

In the XGBoost model, we split the data 8:2 for the training set and testing set. The main objective of using XGBoost is to train a model that can estimate the temperature at sensor-uninstalled locations. In each iteration of LR, the subproblems produce a random set of

x_{i}

indicating whether a sensor is installed at location i. XGBoost then trains itself using this set of

x_{i}

. After completing training, the model verifies whether the overall error of estimated measurements is below the average error threshold

Ψ

. If so, the set

x_{i}

is considered feasible and its total cost is recorded. Otherwise, the set

x_{i}

is discarded and the model proceeds to the next iteration.

5. Computational Experiments

We compared different methods and topologies within the system tolerance on the average estimation error.

Ψ

is set to temperature 1.5 degrees. The sensors were divided into one, two, and four equal area clusters by longitude. Because the sensors were not uniformly distributed, the eastern section contained more sensors; Topology 2_2 had twice as many sensors as Topology 2_1. Figure 6 depicts the distribution and number of sensors for each topology.

The experiments were conducted using a computer with an AMD Ryzen 5 5600X 6-Core Processor @3.7 GHz, 32 GB RAM, and under Windows 10 Professional 19041.1165 and Python 3.7.6.

The experimental parameters are listed in Table 4. Table 5 displays the experimental outcomes for each optimization method and topology. The table reveals that the XGBoost, RCR, and xCR methods were more effective in finding minimum deployment costs than the PCC method was. PCC typically deployed more sensors than other methods. Furthermore, the performance of XGBoost, RCR, and xCR did not differ substantially.

5.1. XGBoost

Figure 7 displays the results of deployment using XGBoost. The distribution of the sensor nodes was less dense and the sensors were more evenly distributed across the region compared with the distribution in Figure 6 where sensors are densely situated in the southeast region. In particular, for Topology 4, sensors were more evenly distributed than in Topology 1 or Topology 2. The easternmost section (Topology 4_4) included exactly the same number of sensors as in the westernmost section (Topology 4_1).

Figure 8 displays the cost reduction for each method compared with the original cost. The performance of XGBoost is outstanding; costs have been reduced by 80% on average, compared to the case when all 112 sensors are distributed. Even in the worst case (Topology 4_2) the cost has been reduced by over 40%. Moreover, Figure 8 also reveals that the cost reduction is primarily from sensor reduction in the eastern region; that is, savings were greater for Topology 2_2 and Topology 4_4 compared with their western counterparts.

5.2. RCR

Sensor distributions obtained using LR with the RCR heuristic method of identifying feasible solutions are displayed in Figure 9. Table 5 reveals that the performance of RCR was similar to that of XGBoost for both overall cost and number of sensors used. Cost reduction was also approximately 80% on average. However, the sensor distribution differed between these two methods. Figure 9 reveals that sensors are still unevenly distributed over the territory; Topology 2_2 has twice as many sensors as Topology 2_1. Topology 4_4 also had at least twice as many sensors as Topology 4_1, Topology 4_2, and Topology 4_3.

5.3. xCR

Sensor distributions obtained using LR with the xCR heuristic method of identifying feasible solutions are displayed in Figure 10. The performance of xCR was also similar to that of XGBoost and RCR. However, as revealed by Figure 10, the sensor distribution is substantially more concentrated in the southeast region compared with those of XGBoost and RCR. The two overlapping sensors at the most southeastern are of Topology 2_2 and Topology 4_4 are both in Tasmania at location 42.89

^{\circ}

S, 147.33

^{\circ}

E and 42.99

^{\circ}

S, 147.07

^{\circ}

E; the distance between these sensors is only 23.95 km.

5.4. Pearson Correlation and Linear Regression

Table 5 presents the total cost and number of sensors deployed using PCC; the number of sensors is significantly greater than that of other methods. Figure 11 reveals that the sensors are again densely situated in the southeast region; few sensors have been eliminated compared with the distribution in Figure 6. Figure 12 demonstrates that cost reduction from PCC was only 55% on average across the topologies; this result was substantially worse than the 80% cost reduction of XGBoost. For example, in Topology 4_2 PCC removes only one sensor; cost reduction in that region was only 17%.

5.5. Extended Application

5.5.1. Lifetime Enhancement

The preceding experiments were conducted under various topologies. However, some types of sensors require regular reinstallation; these sensors may have fixed lifetimes or nonrechargeable batteries. In this situation, deploying all sensors first followed by strategically activating sensors while predicting the measurements of inactive sensors within an error threshold may be desirable. After the active sensors exceed their lifetimes, inactive sensors can be strategically activated to predict the data of the previously active sensors. Maximizing the number of cycles could result in long sensor lifetimes with a fixed deployment cost.

The optimization model is applicable in this scenario. Because we aim to maximize the number of cycles, we must minimize the number of sensors in each cycle. Therefore, we can start from Topology 1, containing all 112 sensor nodes and find an optimal sensor distribution. Then, we can eliminate sensors which are out of battery and use the remaining sensors to find a new optimal distribution. This process can continue until all sensors have been used or the average error exceeds an error threshold.

In this lifetime enhancement experiment, XGBoost was used. The costs of all sensors were assumed to be identical to identify a minimum number of sensors in each cycle. Table 6 reveals the result; A maximum of five cycles can be achieved with approximately 20~sensors activated in each cycle without exceeding the error constraint. Thus, a one-time expenditure of deploying 99 sensors (the remaining 13 sensors were unnecessary) can be used for five times the lifetime of an individual sensor. The sensor distributions with colors for each cycle are depicted in Figure 13.

5.5.2. Other Applications

The proposed model can be applied to not only temperature sensors but also humidity monitoring, air quality sensing, GPS surveillance, landslide detection [22,23,36,37,38,39], and other sensors. The ultimate goal of the model is to reduce deployment costs; the contribution of this study would be substantial for deployments with high sensing expenditures for other extreme or critical applications. Additionally, the solution could be used by deployment practitioners to enhance the lifetimes of sensors within error tolerance.

6. Conclusions

In this work, the goal was to minimize deployment costs for numerous sensors strategically. We chose the ACORN-SAT dataset to test the model. The dataset includes 112 sensor locations across Australia in ten years. The mathematical formulation is modeled and solved by the proposed procedures systematically. XGBoost, PCC, and the LR method using the heuristic RCR and xCR strategies were proposed and called error-bound satisfaction to determine the primal feasible solutions. Finally, we demonstrated that XGBoost and LR using RCR could reduce costs by 80%; thus, the goal was achieved. Furthermore, we introduced a method of using the model to maximize the lifetime of a sensor network which is sufficient to meet the requirements for controlling the sensors during operations. In conclusion, this work combines both theoretical and practical considerations to minimize the deployment cost of temperature sensors. The proposed solution can be readily applied to sensor distribution problems in various fields.

Author Contributions

Conceptualization, C.-H.H., C.-W.T. and S.-Y.Z.; methodology, C.-H.H. and H.-J.Y.; software, C.-H.H., H.-J.Y. and C.-W.T.; validation, C.-H.H.; formal analysis, C.-H.H.; resources, C.-H.H., F.Y.-S.L. and Y.H.; writing—original draft preparation, C.-H.H., H.-J.Y. and Y.-F.C.; writing—review and editing, C.-H.H., H.-J.Y. and Y.-F.C.; visualization, C.-H.H. and H.-J.Y.; supervision, F.Y.-S.L. and Y.H.; project administration, F.Y.-S.L. and Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

This work was supported in part by Academia Sinica, Taiwan, under Grant Number 3012-C3448.

Conflicts of Interest

The authors declare no conflict of interest.

References

Albreem, M.A.; Sheikh, A.M.; Alsharif, M.H.; Jusoh, M.; Mohd Yasin, M.N. Green Internet of Things (GIoT): Applications, Practices, Awareness, and Challenges. IEEE Access 2021, 9, 38833–38858. [Google Scholar] [CrossRef]
Janbi, N.; Katib, I.; Albeshri, A.; Mehmood, R. Distributed Artificial Intelligence-as-a-Service (DAIaaS) for Smarter IoE and 6G Environments. Sensors 2020, 20, 5796. [Google Scholar] [CrossRef] [PubMed]
Nguyen, D.C.; Cheng, P.; Ding, M.; Lopez-Perez, D.; Pathirana, P.N.; Li, J.; Seneviratne, A.; Li, Y.; Poor, H.V. Enabling AI in Future Wireless Networks: A Data Life Cycle Perspective. IEEE Commun. Surv. Tutor. 2021, 23, 553–595. [Google Scholar] [CrossRef]
Su, X.; Liu, X.; Motlagh, N.H.; Cao, J.; Su, P.; Pellikka, P.; Liu, Y.; Petäjä, T.; Kulmala, M.; Hui, P.; et al. Intelligent and Scalable Air Quality Monitoring With 5G Edge. IEEE Internet Comput. 2021, 25, 35–44. [Google Scholar] [CrossRef]
Sutjarittham, T.; Habibi Gharakheili, H.; Kanhere, S.S.; Sivaraman, V. Experiences With IoT and AI in a Smart Campus for Optimizing Classroom Usage. IEEE Internet Things J. 2019, 6, 7595–7607. [Google Scholar] [CrossRef]
Khalifeh, A.; Abid, H.; Darabkh, K.A. Optimal Cluster Head Positioning Algorithm for Wireless Sensor Networks. Sensors 2020, 20, 3719. [Google Scholar] [CrossRef] [PubMed]
Sugiura, K. SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Networks by Drones. IEEE Robot. Autom. Lett. 2018, 3, 2963–2970. [Google Scholar] [CrossRef] [Green Version]
Cheng, Z.; Perillo, M.; Heinzelman, W.B. General Network Lifetime and Cost Models for Evaluating Sensor Network Deployment Strategies. IEEE Trans. Mob. Comput. 2008, 7, 484–497. [Google Scholar] [CrossRef]
Liu, B.; Dousse, O.; Nain, P.; Towsley, D. Dynamic Coverage of Mobile Sensor Networks. IEEE Trans. Parallel Distrib. Syst. 2013, 24, 301–311. [Google Scholar] [CrossRef] [Green Version]
Roy, V.; Simonetto, A.; Leus, G. Spatio-Temporal Sensor Management for Environmental Field Estimation. Signal Process. 2016, 128, 369–381. [Google Scholar] [CrossRef]
Veiga, T.; Munch-Ellingsen, A.; Papastergiopoulos, C.; Tzovaras, D.; Kalamaras, I.; Bach, K.; Votis, K.; Akselsen, S. From a Low-Cost Air Quality Sensor Network to Decision Support Services: Steps towards Data Calibration and Service Development. Sensors 2021, 21, 3190. [Google Scholar] [CrossRef]
Krause, A.; Singh, A.; Guestrin, C. Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies. J. Mach. Learn. Res. 2008, 9, 235–284. [Google Scholar] [CrossRef]
Krause, A.; Guestrin, C.; Gupta, A.; Kleinberg, J. Robust Sensor Placements at Informative and Communication-Efficient Locations. Acm Trans. Sens. Netw. 2011, 7, 1–33. [Google Scholar] [CrossRef]
Ranieri, J.; Chebira, A.; Vetterli, M. Near-Optimal Sensor Placement for Linear Inverse Problems. IEEE Trans. Signal Process. 2014, 62, 1135–1146. [Google Scholar] [CrossRef] [Green Version]
Liaskovitis, P.G.; Schurgers, C. Leveraging Redundancy in Sampling-Interpolation Applications for Sensor Networks: A Spectral Approach. Acm Trans. Sens. Netw. 2010, 7, 12:1–12:28. [Google Scholar] [CrossRef]
Forkuor, G.; Hounkpatin, O.K.L.; Welp, G.; Thiel, M. High Resolution Mapping of Soil Properties Using Remote Sensing Variables in South-Western Burkina Faso: A Comparison of Machine Learning and Multiple Linear Regression Models. PLoS ONE 2017, 12, e0170478. [Google Scholar] [CrossRef]
Yan, X.; Xie, H.; Tong, W. A Multiple Linear Regression Data Predicting Method Using Correlation Analysis for Wireless Sensor Networks. In Proceedings of the 2011 Cross Strait Quad-Regional Radio Science and Wireless Technology Conference, Harbin, China, 26–30 July 2011; Volume 2, pp. 960–963. [Google Scholar] [CrossRef]
He, D.; Liu, X.; Zheng, J.; Chan, S.; Zhu, S.; Min, W.; Guizani, N. A Lightweight and Intelligent Intrusion Detection System for Integrated Electronic Systems. IEEE Netw. 2020, 34, 173–179. [Google Scholar] [CrossRef]
Ma, J.; Komuro, N.; SAkata, S. Sensors deployment for location estimation in wireless sensor networks. In Proceedings of the 2010 Second International Conference on Ubiquitous and Future Networks (ICUFN), Jeju, Korea, 16–18 June 2010; pp. 60–65. [Google Scholar] [CrossRef]
Kim, H.; Han, S.W. An Efficient Sensor Deployment Scheme for Large-Scale Wireless Sensor Networks. IEEE Commun. Lett. 2015, 19, 98–101. [Google Scholar] [CrossRef]
Han, G.; Zhang, C.; Shu, L.; Rodrigues, J.J.P.C. Impacts of Deployment Strategies on Localization Performance in Underwater Acoustic Sensor Networks. IEEE Trans. Ind. Electron. 2015, 62, 1725–1733. [Google Scholar] [CrossRef]
Ramesh, M.V. Real-Time Wireless Sensor Network for Landslide Detection. In Proceedings of the 2009 Third International Conference on Sensor Technologies and Applications, Washington, DC, USA, 18–23 June 2009; pp. 405–409. [Google Scholar] [CrossRef]
Solari, L.; Del Soldato, M.; Raspini, F.; Barra, A.; Bianchini, S.; Confuorto, P.; Casagli, N.; Crosetto, M. Review of Satellite Interferometry for Landslide Detection in Italy. Remote Sens. 2020, 12, 1351. [Google Scholar] [CrossRef]
Huang, C.J.; Chu, C.R.; Yin, H.Y.; Chen, P.S. Calibration and Deployment of a Fiber-Optic Sensing System for Monitoring Debris Flows. Sensors 2012, 12, 5835. [Google Scholar] [CrossRef] [Green Version]
Marin-Perez, R.; Michailidis, I.T.; Garcia-Carrillo, D.; Korkas, C.D.; Kosmatopoulos, E.B.; Skarmeta, A. PLUG-N-HARVEST Architecture for Secure and Intelligent Management of Near-Zero Energy Buildings. Sensors 2019, 19, 843. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Wright, R. Intelligent Autonomous Ship Navigation using Multi-Sensor Modalities. Transnav Int. J. Mar. Navig. Saf. Sea Transp. 2019, 13, 503–510. [Google Scholar] [CrossRef]
Du, W.; Xing, Z.; Li, M.; He, B.; Chua, L.H.C.; Miao, H. Sensor Placement and Measurement of Wind for Water Quality Studies in Urban Reservoirs. ACM Trans. Sens. Netw. 2015, 11, 1–27. [Google Scholar] [CrossRef]
Wang, X.; Wang, X.; Xing, G.; Chen, J.; Lin, C.X.; Chen, Y. Intelligent Sensor Placement for Hot Server Detection in Data Centers. IEEE Trans. Parallel Distrib. Syst. 2013, 24, 1577–1588. [Google Scholar] [CrossRef] [Green Version]
Fisher, M.L. The Lagrangian Relaxation Method for Solving Integer Programming Problems. Manag. Sci. 2004, 50, 1861–1871. [Google Scholar] [CrossRef] [Green Version]
Geoffrion, A.M. Lagrangean Relaxation for Integer Programming. In Approaches to Integer Programming; Balinski, M.L., Ed.; Springer: Berlin/Heidelberg, Germany, 1974; pp. 82–114. [Google Scholar] [CrossRef]
Held, M.; Karp, R.M. The Traveling-Salesman Problem and Minimum Spanning Trees. Oper. Res. 1970, 18, 1138–1162. [Google Scholar] [CrossRef]
Held, M.; Karp, R.M. The Traveling-Salesman Problem and Minimum Spanning Trees: Part II. Math. Program. 1971, 1, 6–25. [Google Scholar] [CrossRef]
Held, M.; Wolfe, P.; Crowder, H.P. Validation of Subgradient Optimization. Math. Program. 1974, 6, 62–88. [Google Scholar] [CrossRef]
Jiang, Y.; Tong, G.; Yin, H.; Xiong, N. A Pedestrian Detection Method Based on Genetic Algorithm for Optimize XGBoost Training Parameters. IEEE Access 2019, 7, 118310–118321. [Google Scholar] [CrossRef]
Punmiya, R.; Choe, S. Energy Theft Detection Using Gradient Boosting Theft Detector With Feature Engineering-Based Preprocessing. IEEE Trans. Smart Grid 2019, 10, 2326–2329. [Google Scholar] [CrossRef]
Lin, P.; Tang, L.; Ni, P. Field Evaluation of Subgrade Soils Under Dynamic Loads Using Orthogonal Earth Pressure Transducers. Soil Dyn. Earthq. Eng. 2019, 121, 12–24. [Google Scholar] [CrossRef]
Li, X.; Zhuang, Z.; Qi, D.; Zhao, C. High Sensitive and Fast Response Humidity Sensor Based on Polymer Composite Nanofibers for Breath Monitoring and Non-Contact Sensing. Sens. Actuat. Chem. 2021, 330, 129239. [Google Scholar] [CrossRef]
Liu, Y.; Nie, J.; Li, X.; Ahmed, S.H.; Lim, W.Y.B.; Miao, C. Federated Learning in the Sky: Aerial-Ground Air Quality Sensing Framework With UAV Swarms. IEEE Internet Things J. 2021, 8, 9827–9837. [Google Scholar] [CrossRef]
Wang, S.; Ding, S.; Xiong, L. A New System for Surveillance and Digital Contact Tracing for COVID-19: Spatiotemporal Reporting Over Network and GPS. JMIR Mhealth Uhealth 2020, 8, e19457. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Australia map and the locations of the observation sites. (a) Map of Australia. (b) Locations of the observation sites.

Figure 2. Flowchart of proposed solution approach.

Figure 3. Procedures of the Lagrangian relaxation-based approach.

Figure 4. Procedures of the Lagrangian relaxation-based method for obtaining feasible solutions.

Figure 5. Procedures of Pearson correlation coefficient method for sensor deployment.

Figure 6. Sensor distribution in the three distinct topologies.

Figure 7. Sensor deployment distribution from XGBoost.

Figure 8. Cost reduction for each topology using XGBoost.

Figure 9. Sensor deployment distribution from referencing capability ranking.

Figure 10. Sensor deployment distribution from x Coefficient Ranking.

Figure 11. Sensor deployment distribution from Pearson correlation coefficients.

Figure 12. Cost reduction for each topology using Pearson correlation coefficients.

Figure 13. Active sensor distribution in each cycle.

Table 1. Proposed Model Comparisons With Literature.

Model	Related Work	Sensor Allocation	Cost/Energy Consumption	Data Collection	Correlation Aware	Deployment Strategy
Dynamic Coverage Measures	[9]		✓	✓	✓
Sparsity-Enforcing Sensor Management Methods	[10]	✓				✓
Gaussian and Non-Gaussian Process	[12,27]	✓				✓
FrameSense	[14]	✓	✓	✓	✓
Lightweight and Intelligent Intrusion Detection Method	[18]				✓
Proposed model		✓	✓	✓	✓	✓

Table 2. Notations of Given Parameters.

Notation	Description
D	Index set of evaluation data, where $D = {1, 2, \dots, k, \dots, \| D \|}$
V	Index set of locations, where $V = {1, 2, \dots, i, \dots, j, \dots, \| V \|}$
$C_{i}$	Installation cost of sensor i, where $i \in V$
$H_{k i}$	Measurement is taken at location i in dataset k, where $i \in V$ , $k \in D$
$Ψ$	Tolerance on the average estimation error
$W_{i}$	Weight is associated with location i, where $i \in V$ ( $0 \leq W_{i} \leq 1$ and $\sum_{i \in V} W_{i} = 1$ )

Table 3. Notations of Decision Variables.

Notation	Description
$x_{i}$	Binary variable, 1 if sensor i is installed, and 0 otherwise
$p_{j i}$	Weighting factor is set from sensor j to estimate measurement at location i, where $i, j \in V$
$Π_{j i}$	Maximum weighting factor for sensor j to estimate measurement at location i, where $i, j \in V$
$m_{k i}$	Measurement is estimated for location i using dataset k, where $i \in V$ , $k \in D$
$e_{k i}$	Estimation error is calculated at location i using dataset k, where $i \in V$ , $k \in D$
$Z_{k i}$	Maximum estimation error calculated at location i using dataset k, where $i \in V$ , $k \in D$
T	Average estimation error ( $T = \sum_{k \in D} \sum_{i \in V} \frac{W_{i} e_{k i}}{\| V \| \| D \|}$ )

Table 4. Parameters for Computational Experiments.

Given Parameter	Value
Number of evaluation data ( $\| D \|$ )	3650
Number of locations ( $\| V \|$ )	112
$C_{i}$	5∼1000
$H_{k i}$	12.1 °C∼47.9 °C
$Ψ$	1.5 °C
$W_{i}$	0.000182∼0.018169, $\sum W_{i} =$ 1

Table 5. Experimental Outcomes for Each Method and Topol. (Topology).

Cost	Topol. 1	Topol. 2_1	Topol. 2_2	Topol. 4_1	Topol. 4_2	Topol. 4_3	Topol. 4_4
(# of Sensors)	(112)	(28)	(84)	(20)	(8)	(33)	(51)
XGBoost	6108	3186	2779	2210	3002	3096	1051
XGBoost	(23)	(11)	(15)	(9)	(5)	(11)	(9)
RCR	8264	3732	6272	2157	2035	2288	2820
RCR	(30)	(10)	(25)	(7)	(3)	(9)	(16)
xCR	7434	3858	5383	2073	2035	1904	3602
xCR	(39)	(10)	(30)	(6)	(3)	(6)	(17)
PCC	26,382	6578	17,673	3149	4349	4994	10,620
PCC	(75)	(18)	(54)	(11)	(7)	(16)	(35)

Table 6. Results of Sensor Network Lifetime Enhancement.

Cycle	1	2	3	4	5	Remaining
of Active Sensors	20	18	19	21	21	13

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hsiao, C.-H.; Lin, F.Y.-S.; Yang, H.-J.; Huang, Y.; Chen, Y.-F.; Tu, C.-W.; Zhang, S.-Y. Optimization-Based Approaches for Minimizing Deployment Costs for Wireless Sensor Networks with Bounded Estimation Errors. Sensors 2021, 21, 7121. https://doi.org/10.3390/s21217121

AMA Style

Hsiao C-H, Lin FY-S, Yang H-J, Huang Y, Chen Y-F, Tu C-W, Zhang S-Y. Optimization-Based Approaches for Minimizing Deployment Costs for Wireless Sensor Networks with Bounded Estimation Errors. Sensors. 2021; 21(21):7121. https://doi.org/10.3390/s21217121

Chicago/Turabian Style

Hsiao, Chiu-Han, Frank Yeong-Sung Lin, Hao-Jyun Yang, Yennun Huang, Yu-Fang Chen, Ching-Wen Tu, and Si-Yao Zhang. 2021. "Optimization-Based Approaches for Minimizing Deployment Costs for Wireless Sensor Networks with Bounded Estimation Errors" Sensors 21, no. 21: 7121. https://doi.org/10.3390/s21217121

APA Style

Hsiao, C.-H., Lin, F. Y.-S., Yang, H.-J., Huang, Y., Chen, Y.-F., Tu, C.-W., & Zhang, S.-Y. (2021). Optimization-Based Approaches for Minimizing Deployment Costs for Wireless Sensor Networks with Bounded Estimation Errors. Sensors, 21(21), 7121. https://doi.org/10.3390/s21217121

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimization-Based Approaches for Minimizing Deployment Costs for Wireless Sensor Networks with Bounded Estimation Errors

Abstract

1. Introduction

2. Related Work

2.1. Correlation-Aware Deployment Methods

2.2. Sensor Deployment Applications

2.3. Summary

3. System Architecture and Problem Formulation

3.1. System Structure

3.2. Problem Formulation

4. Solution Approach

4.1. Lagrangian Relaxation-Based Method

4.1.1. Step 1: Reformulation for Relaxation

4.1.2. Steps 2 and 3: Decomposition and Solution of Subproblems

4.1.3. Steps 4 and 5: Dual Problem and Subgradient Method

4.1.4. Step 6: Obtaining the Primal Feasible Solutions

4.2. Pearson Correlation Coefficients and Linear Regression Methods

4.3. Extreme Gradient Boosting Method

5. Computational Experiments

5.1. XGBoost

5.2. RCR

5.3. xCR

5.4. Pearson Correlation and Linear Regression

5.5. Extended Application

5.5.1. Lifetime Enhancement

5.5.2. Other Applications

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI