1. Introduction
With the application of condition based maintenance (CBM) in critical infrastructures, wireless sensor networks have gained so much popularity and are being deployed for sewage flood monitoring in sewer pipeline network, leakage monitoring in gas pipeline network, and strength monitoring of megastructures [
1,
2,
3,
4,
5]. By delivering monitoring data, wireless sensor networks provide a key basis to help in assessing the assets’ or equipment’ condition, which is useful to guide the allocation of maintenance resource in time and space. In the dimension of time, the maintenance intervals are determined with the guidance. The interval initiates the maintenance actions such as the repair or replacement, which essentially allocates the resources in time dimension. For space, the guidance tells that where the parts of the critical infrastructure are likely to need maintenance. Based on that, initial limited maintenance resource can be target to the critical parts or regions. However, wireless sensor network deployment is a great challenge in pipeline networks. Despite the declining price of sensors, the cost remains high for the application, which requires hundreds and thousands of sensor nodes [
6,
7,
8,
9]. In the face of a limited budget and increased spatial distribution, a large-scale deployment of a wireless sensor network is impossible to implement for a whole pipeline network [
10]. Budget constraints only allow a limited part of a pipeline network to be deployed. Therefore, the key problem is to identify the regions where the necessity of their deployments of wireless sensor network is more important than the availability.
To the best of our knowledge, there is no effective approach to address the issue. First of all, the key problem is different from the coverage problems which is the most similar one among the current research problems. In general, coverage problems are classified into three categories—point coverage problem, area coverage problem, and barrier coverage problem. No matter which type the problem is, the problem is essentially an optimization. The objective is to determine which deployment strategy for a specified sensor field can achieve the maximum utility of a wireless sensor network with constraints such as the number of sensor nodes, the area covered by the wireless sensor network, or the lifetime of sensors [
11,
12,
13,
14,
15]. However, our problem needs to determine which parts of the specified sensor field are prioritized for deployment. Essentially, this is a sort rather than an optimization. Next in importance was that our problem originates from the invalidation of the common assumption used by many existing approaches. No matter which deployment method is applied, deterministic deployment or random deployment, there is a predetermined sensor field underpinning it. Based on the assumption on the predetermined sensor field, we can calculate coverage ratio, which is often regarded as one of the coverage requirements for wireless sensor networks [
16]. However, there is no a predetermined sensor field in our problem. We have no idea about the location, boundary, or area of the sensor field, and all the attributes need us to determine when we know which parts of the specified sensor field are prioritized for deployment. Finally, the current primary deployment metrics are not suitable. In a pipeline network, wireless sensor network deployment is an application in industrial diagnostics; its coverage requirements can fall into the target coverage category [
17,
18,
19,
20]. Those requirements are used to measure the quality of service of the sensors’ sensing function provided by determining how to deploy the sensor network. In our problem, we were concerned with how to measure the value realized by determining whether or not to deploy a sensor network. In conclusion, we need to develop a methodology for addressing the issue.
In this study, we tried to develop an approach to address this challenge. Our approach was based on risk-based prioritization. Through risk-based prioritization, stakeholders are able to target resources where parts of a pipeline network have a high risk. Therefore, resources can be utilized in a more effective and efficient manner [
21]. Risk-based prioritization originates from a risk-based inspection project, which was started by the American Petroleum Institute (API) [
22,
23,
24]. Subsequently, the idea has been applied widely in different industrial contexts, such as in the nuclear industry in order to prioritize maintenance [
25,
26,
27,
28], gas pipelines to guide the allocation of maintenance resources on the most risky stretches of pipeline [
29], the optimization of the maintenance of water supply networks [
30], and the risk ranking procedure for bridges [
31]. However, limitations can be observed when they are applied in the wireless sensor network deployment of pipeline networks.
First of all, it is not beneficial for a pipeline network to execute the second step of the whole deployment plan of a wireless sensor network in practice. As mentioned above, the wireless sensor network deployment can be seen as a two-step process in pipeline networks:
Sorting—identifying the critical regions by ranking different deployment regions in terms of risk;
Optimizing—determining the deployment strategy in order to achieve the maximum utility of a wireless sensor network.
In the second step, for each optimization the geographical environment of the installation, the signal interference around the sensor nodes, the area of coverage, and other spatial constraints should be kept in mind. In the deployment of pipeline networks, the optimization needs to be conducted in a given area or length, because the coverage ratio needs to be calculated to measure the quality of service of the wireless sensor network. Risk-based prioritization is required to consider the factors quantitatively. However, the current approach of risk-based prioritization cannot achieve the aim [
32].
Furthermore, many stakeholders are often not satisfied with the result, because the analysis process does not provide a strong credible basis for the estimation of risk uncertainty. In a pipeline network, a wireless sensor network is used to detect failure events at some location. Risk uncertainty can describe where the failure event is more likely to occur. According to the estimation of risk uncertainty, we can deploy the sensor nodes efficiently in order to avoid the situation so that considerably more regions are available than necessary for monitoring. Unfortunately, the used assumptions and hypotheses in many approaches cannot be proven right or wrong in the estimation [
33]. This leaves the assessment of risk uncertainty to be based on the degree of belief of the assessors. Therefore, decision makers do not gain any confidence from the analysis, which, in particular, relies on the assessment of uncertainty based on the subjective judgments of the assessors.
Against these backdrops, some improvements can be achieved based on risk-based prioritization in our proposed approach. The advantages of the proposed approach are as follows:
Our approach combines risk-based prioritization with spatial statistics, which quantitatively estimates risk of any geographic region where the pipeline network located with the consideration of the area of the region. It is very useful for the second step to be executed when the deployment/placement scheme is required to be assessed based on coverage ratio;
Statistical tests are applied before modelling, which provide a strong credible basis for the estimation of risk uncertainty. It is valuable for engineers to determine the deployed region with consideration of the effect of condition monitoring, in particular, detecting the failure events.
The total cost caused by failures in each deployment region is calculated to measure the severity of risk. Then the risks of different deployment regions are calculated based on the severity and uncertainty. By sorting the risks, high risk regions are identified. This allows the deployment of the wireless sensor network to be guided.
The rest of this paper is structured as follows:
Section 2 presents our method, including all the relevant statistical tests and steps for risk-based prioritization;
Section 3 shares a real case study, which examines the availability of our method, and the process and outcomes are detailed; and finally,
Section 4 concludes the paper and outlines future work.
2. Method
To suggest better practice for wireless sensor network deployment, our approach is represented in a rigorous and confident manner, as shown in
Figure 1. Three statistic tests were first conducted in two stages, which guaranteed that the inhomogeneous Poisson point process is a rational application for modelling the location dataset of pipeline failure events. Based on that, the properties of the inhomogeneous Poisson point process were applied to estimate the probability of the failure occurrence in different deployment regions. Afterwards, the rest of the procedures of risk-based prioritization were executed, and the risks in different deployment regions were obtained. Finally, the different deployment regions were ranked in the order of risk.
2.1. Inhomogeneous Poisson Point Process
There is a growing body of contemporary data about where an individual pipeline failure event occurs. It provides new opportunities to study the spatial pattern of failure occurrence in a pipeline network. The observed spatial locations of the failure events in a pipeline network can be viewed as data in the form of a set of points, irregularly distributed within a region of space (as is shown in
Figure 2). As one of the spatial statistics technologies, spatial point process is widely used to analyze spatial point data [
34,
35,
36]. The inhomogeneous Poisson point process, being one of the models in the spatial point process [
37,
38,
39], was applied in our approach to assess the uncertainty of risk.
An inhomogeneous Poisson point process,
, is a random mechanism whose outcomes are a point pattern,
. In our approach, the element,
, of the point pattern,
, represents the location of the failure event. The location of the failure event is generally referenced using a geographical coordinate. For any region,
, such as the region of Kansas in
Figure 2, the number of failure events occurring in it,
, is a well-defined random variable. Based on the quantity, the inhomogeneous Poisson point process is often characterized by two fundamental properties:
Poisson Counts—the number of failure events, , has a Poisson distribution;
Independent—if parts of Region are , ,…, , which do not overlap, the counts ,…, are independent random variables.
According to the properties, the probability of observing
failure events occurring in any region,
, can be described by the Poisson distribution, which is generally represented as Equation (1). The quantity
is the expected number of failure events occurring in Region
, which is calculated by Equation (2) with the intensity
. The intensity is interpreted as the average number of failure events occurring per unit area. If the intensity is spatially varying, it is called an inhomogeneous Poisson point process, which is used to distinguish from the homogeneous Poisson point process where the intensity is constant. The difference between the homogeneous and inhomogeneous Poisson point processes can be observed in
Figure 3.
2.2. Statistical Tests for Inhomogeneous Poisson Point Process
In order to describe a real dataset of pipeline network failures well, through the inhomogeneous Poisson point process, our approach was to verify the relevant properties of the inhomogeneous Poisson point process using three hypothesis tests in two stages. In stage one, the Poisson counts and independent properties were respectively verified by the chi-square goodness of fit test [
40,
41,
42] and the significance test based on Moran’s
I [
43,
44,
45]. This concluded that the Poisson point process was a suitable model for the dataset in stage one. The conclusion should be subject to further review on whether the intensity of the failure event was spatially varying (i.e., whether the Poisson point process is homogeneous or inhomogeneous). Therefore, a dispersion test for spatial point pattern was conducted in stage two [
46] to determine whether the suitable model for pipeline failure events was the inhomogeneous Poisson point process rather than the homogeneous Poisson point process.
2.2.1. First Test in Stage One: Chi-Square Goodness of Fit Test
To determine whether the property Poisson counts were held, a chi-square goodness of fit test was required to determine whether the data followed a specific probability distribution. In this hypothesis test, the null and alternative hypotheses were as follows:
H0: the number of pipeline failure events in Region and a given period, , follows a Poisson distribution;
H1: the number of pipeline failure events in Region and a given period, , does not follow a Poisson distribution.
Using Equation (3), the mean
of Poisson distribution was estimated based on the sample data like those shown in
Table 1. The sample data described the actual failure number per the given period. Based on that, a Poisson distribution with the estimated mean
was obtained, and the theoretical frequency,
, was calculated by multiplying the sample size,
, and its probability,
. Then the
test statistic was obtained by Equation (4). By comparing the
test statistic, we could determine whether to reject
H0. If the null hypothesis is rejected, the proposed approach is not suggested to be used because the Poisson point process is not a suitable model for the dataset. If the null hypothesis is accepted, it should continue to implement the next steps.
where
is the number of classes of the number of failure events,
is the number of failure events of the
jth class,
is the observed frequency, and
is the sample size
where
is the theoretical frequency,
generally equals
and
is the number of parameters estimated from the sample.
2.2.2. Second Test in Stage One: The Significance Test Based on Moran’s I
The second test in stage one inspected whether or not the numbers of failure events in different parts of Region
appear to be correlated. The first step in this procedure was to calculate the observed value of Moran’s
I based on the pipeline network failure dataset, then a significance test was performed to determine whether the observed value of Moran’s
I differed enough from the value that was expected from where the independent property was held.
Figure 4 illustrates the process based on a simple example.
As a commonly used spatial statistic to describe spatial autocorrelation, Moran’s I measures the degree to which observations (the number of pipeline failure events in this study) at different spatial locations (the different regions or parts of a pipeline network in this study) are similar to each other. Its calculation is based on two categories of information—the observation and the location. Here, the observation information included the numbers of pipeline failure events in different regions, often denoted by for Region i. The location information is represented by a spatial weights matrix, and the dimensions of it are ( being the number of regions). Here, the element of the spatial weights matrix, , reflected the level of spatial proximity in two different regions of pipeline network, which is generally given by 1 if Regions i and j are neighbors and 0 otherwise. With both observation and location information, the observed value of Moran’s I can be calculated by Equation (5). In general, the observed value of Moran’s I will be compared with the expected value of Moran’s I. The expected value of Moran’s I can be obtained by . If the observed value of Moran’s I were significantly larger than the expected value of Moran’s I, it indicates a positive spatial autocorrelation; If the observed value of Moran’s I were significantly less than the expected value of Moran’s I, it indicates a negative spatial autocorrelation. If there were no significant difference between the observed value of Moran’s I and the expected value of Moran’s I, it indicates a spatial independence.
where
is the mean of
.
In the second step, a significance test is performed. The null and alternative hypotheses were defined as follows:
The null hypothesis stated that the numbers of failure events in different regions of pipeline network will be randomly distributed among those regions. Under the null hypothesis, the distribution of the test statistic Moran’s I was obtained by calculating all possible values of Moran’s I under rearrangements of the numbers of pipeline failure events on all the regions. Imagine that all the failure numbers in different regions are picked up and thrown down onto all the regions again, with each number falling randomly. The proportion is obtained by counting how many permuted Moran’s I are larger than the observed value of Moran’s I, which is a p-value. Finally, a decision must be made to accept the null or alternative hypothesis according to the level of significance. If the null hypothesis is rejected, the proposed approach is not suggested to be used because the Poisson point process is not a suitable model for the dataset. If the null hypothesis is accepted, it should continue to implement the next steps.
2.2.3. Test in Stage Two: The Dispersion Test for Spatial Point Pattern Based on Quadrat Counts
Although it can be concluded that the Poisson point process is a suitable model for the dataset of pipeline failure events after stage one, the evidence for inhomogeneity needed to be assessed. Therefore, a dispersion test for spatial point pattern based on quadrat counts was conducted in stage two, which essentially is a chi-squared test to test goodness of fit. It guaranteed that the inhomogeneous Poisson point process is a suitable model, rather than homogeneous Poisson point process.
In the dispersion test for spatial point pattern based on quadrat counts, the quadrats represented the regions nominated to be deployed in a pipeline network, and they were required to have an equal area, . Generally, the null and alternative hypotheses in the test were defined as:
H0: the intensity is homogeneous in the Poisson point process based on the dataset of pipeline failure events;
H1: the intensity is inhomogeneous in the Poisson point process based on the dataset of pipeline failure events.
According to the null hypothesis and Equation (2), the numbers of failure events in different quadrats were realizations of Poisson distribution with the constant mean, . Therefore, it was rational to apply the chi-squared test to test goodness of fit to the Poisson distribution to determine whether or not to reject the null hypothesis.
The test statistic can be calculated by Equation (6). In Equation (6), the quantity, , is the number of failure events in Region (or the quadrat), and the distribution of the test statistic is approximately a distribution with degrees of freedom ( is the total number of regions nominated to be deployed). By performing a chi-squared test, it could be determined whether or not to reject the null hypothesis.
where the intensity
is estimated by
and the total number of points is
.
2.3. Risk-Based Prioritization
By the verification above, there is no reason to doubt that the inhomogeneous Poisson point process is a good model for the failure events of a pipeline network. Based on that, the probability of the failure occurrence in different regions can be estimated. According to Equation (1), the probability of the failure event occurring in Region can be represented as Equation (8). To calculate the mean value of this Poisson distribution , the integral in Equation (2) is replaced by , which assumes that the intensity in Region is homogeneous, but the intensity of the whole pipeline network is inhomogeneous. Based on this assumption, the intensity can be estimated by Equation (7). Although the number of failure events in a pipeline network follows a Poisson distribution, it is worth noting that it was observed under a specified period. Therefore, the probability should be revised as .
After the estimation of the probability of the failure occurrence in different regions, the consequence of the failure occurrence is evaluated in different regions. In our approach, the consequence of the failure occurrence is measured by the total cost for the region
, which is denoted by
. The total cost
is calculated based on the summarization of the cost of the failure event happens in each region. The cost of each failure event is constituted by six kinds of costs, which is described in
Table 2. Based on the six kinds of costs, the assessment of the failure consequence can consider the capacity of the failure and subsequent events to cause death, injury, or damage to employees and/or the public and the environment. Apart from that, it can also consider the consequences of failure on the business, such as the costs of lost production, repair and replacement of pipeline, and the damage to the company reputation.
With the estimated probability and the evaluated consequence, the risk can be determined. In the proposed approach, the risk of each region is defined as the product of the probability and the consequences of the failure occurrence. Based on the obtained risk, all the regions can be ranked in order of risk, then the deployment priority can be given to each region.
4. Conclusions
The result in the case study shows that the proposed approach is feasible for a pipeline network to reasonably direct a wireless sensor network to deploy critical sensor fields in the face of a limited budget and increased spatial distribution. By combining spatial statistics with risk-based prioritization, our approach is effective in identifying the sensor fields with the highest priority in a pipeline network, which is useful to target the initial deployment where the sensor field is necessary rather than available. Additionally, the application of statistical tests provides a strong credible basis for the analytical conclusion, which is very different from the existing methods whose conclusions are based on the subjective judgments of assessors. Moreover, the analytical conclusion is very helpful for the coverage problem and for developing the deployment strategy further. More importantly, we gained insight from the case study, which the spatial distribution of the likelihood of failure occurrence in a pipeline network has spatial heterogeneity. The spatial and geographical variations are useful a priori knowledge on how to guide the deployment of wireless sensors, rather than adopting the simple assumption that each sensor field has an equal likelihood of being deployed.
At the same time, the limitations can be observed in the proposed approach. First of all, the proposed approach cannot consider the characteristics of pipelines such as the diameters, the flow, and the failure modes. The reason why brings about the limitations is the transformation of the modeled object from a stretch or a section of pipeline to the whole district where pipeline network located. In the process of transformation, the spatial coordinates become the surrogate for other variables that report failure modes and system characteristics. Thus, many characteristics in components level or system level cannot be modeled although it overcomes the difficulty of insufficient data by the aggregation of the rarely failure events which belongs to the different sections of pipeline that dispersed in different positions of the industry infrastructure. Next in importance is the strict constrains of the application, which is the validations of properties of Poisson point process, which is a double-edged sword. Although the validations increase the falsifiability of the proposed approach and reduce the vulnerability to human biases and errors, the availability of our approach is significantly decreased by the validations if those properties are violated. Therefore, users should apply the proposed approach in the appropriate situation.
In the future, we plan to incorporate some other statistical technologies to estimate the risks of different sensor fields that are conditional on different failure modes. Based on that, the approach will not only be useful for homogeneous deployment but will also guide the heterogeneous deployment of wireless sensor networks.