1. Introduction
Water is an essential component of human survival, needed for drinking, washing, sanitation and other domestic and industrial processes. That is why one of the United Nations’ sustainable development goals (goal 6) is to ensure the availability and sustainable management of water and sanitation for all. Despite the importance of water, there are 2.2 billion people globally without safely managed drinking water, including 785 million without basic drinking water [
1]. The situation is more alarming in sub-Saharan Africa, where 42% of people are without a basic water supply [
2]. Regardless of the water scarcity, it is shocking that in developing countries, especially in sub-Saharan Africa, the rate of water loss and non-revenue water (NRW) is high [
3]. High NRW is mostly caused by leakages in the water distribution network (WDN), which sometimes exceed more than 70% of the total NRW [
4,
5]. Leakages are frequent in the WDNs in these regions because the pipes transporting water to the user’s premises are usually installed by the users themselves. The installations rarely meet the standard requirements, especially the recommended depth for burying the pipes. This leads to a scenario where not all the pipes are buried. Some parts of the pipes are buried underground while other parts are exposed to the surface (especially the final connections to the user’s meters). The exposed pipes are more vulnerable to damages resulting from punctures due to human activities or cracks, which lead to leaks/bursts in the WDN.
Figure 1 displays an example of a WDN transporting water to a user’s premise in Buea, a town in the southwest of Cameroon. Due to the scarcity of water sources and the increase in the demand of water in these regions, caused by an increase in population and urbanization, it is required that water losses be minimized as much as possible by implementing smart technologies that detect leaks in real-time, prompt rapid repair intervention and that require little human intervention. One such technology, which has been used for more than a decade now, is the wireless sensor network-based water pipeline monitoring (WWPM) system, which is the most suitable technique to detect and localize leakages in pipelines [
6,
7].
WWPM systems can be categorized as either invasive or non-invasive, depending on the type of sensors they use. Invasive WWPM systems use intrusive sensors, such as flow and pressure sensors that monitor internal pipeline parameters, such as flow rate and internal pressure. Unlike invasive WWPM systems, non-invasive WWPM systems use non-intrusive sensors, such as piezoelectric patches [
8], force-sensitive resistors [
9,
10], accelerometers [
11,
12,
13], and acoustic sensors [
14], to externally detect changes in the pipeline caused by leaks. These non-invasive methods have become very popular for leak detection because of features, such as their low-cost, low power consumption, ease of installation, and maintenance. Non-intrusive sensors like accelerometer sensors have gained more popularity in recent years and have constantly been deployed in most WWPM systems, especially those installed on metallic pipes. However, there are a lot of challenges involved in using accelerometers for monitoring plastic pipelines, which make up most of the WDNs in developing countries, since in plastic pipes, the attenuation is higher and the propagation of leak signals (vibration) does not go far [
15,
16]. Reliable leak detection requires that the accelerometers be placed very close to each other in order to have a higher spatial resolution [
17]. Using high accuracy accelerometers can reliably detect most, if not all, of the leaks that occur in the pipeline. However, the need for lower inter-sensor distances and the expensive nature of high accuracy accelerometers will increase the overall cost of the WWPM system, making them not cost-effective and unsuitable for deployment in developing countries, especially in sub-Saharan Africa. Thus, the active research areas of vibration-based WWPM that have and continue to attract much research in recent years include the improvement of leak detection performance, energy consumption reduction, an extension of the WWPM system lifespan, and cost reduction [
13,
18].
Recently, in order to reduce the cost of WWPM systems, the use of low-cost MEMS accelerometers has gained a lot of popularity [
19]. Many recent studies have used low-cost MEMS accelerometers in their works to achieve both lower cost and lower power consumption [
11,
12,
13,
20]. However, most of these studies still suffer from low leak detection reliability. Earlier studies, which have used low-cost MEMS accelerometers, proposed future work in the use of data filtering [
11] or multi-sensor data fusion [
12] to improve leak detection reliability. In a recent study, Nkemeni et al. [
21] proposed a fully distributed leak detection solution based on the distributed Kalman filter (DKF), that combines both Kalman filtering and redundant multi-sensor data fusion. The authors applied a DKF algorithm for leak detection in WWPM systems using low-cost MEMS accelerometers and analyzed the leak detection performance and energy consumption of the DKF-based solution and compared it with a local Kalman filter (LKF) solution and a centralized Kalman filter (CKF) solution. Their results revealed that the DKF solution works, and it is a better compromise between LKF and CKF in terms of leak detection reliability and energy consumption.
According to He et al. [
22], different variants of DKF algorithms for low-cost sensor networks exist, which can be classified as either diffusion-, gossip-, or consensus-based, depending on their underlying distributed data fusion strategy. The DKF algorithm used in [
21] was a diffusion-based DKF proposed by Battistelli et al. [
23], and the reason they used the diffusion-based DKF was because of its lower communication requirement and fully distributed property which made it a good candidate for real-time leak detection in WWPM systems using nodes that are battery-powered. However, the question of which category of DKF is optimal in terms of responsiveness to real-time monitoring, leak detection reliability, and energy consumption remains unanswered. This paper evaluates and compares the leak detection performance of three DKF algorithms, including a consensus-based [
24], a gossip-based [
25], and a diffusion-based [
23] algorithm, for leak detection in WWPM systems using low-cost MEMS accelerometers for monitoring plastic water pipes, and demonstrates why diffusion-based DKFs are optimal. This study is novel, and it is the first, to the best of our knowledge, to evaluate the performance of DKF algorithms in the context of WWPM systems.
The first objective of this paper is to select three DKF algorithms from the study of [
22], one from each DKF category, and implement them. The second is to compare their leak detection performance and determine which of the three DKF algorithms is optimal for leak detection in WWPM systems composed of a network of low-cost MEMS accelerometer sensors. The main contribution of this paper is the use of a combined approach that involves both simulations and laboratory experiments to compare the leak detection performance of the three selected DKF algorithms in the context of WWPM. WSN has been used in previous studies to monitor both above-the-ground (surface) pipes [
9] and underground (buried) pipes [
10]. The goal of using WSN in monitoring both above-the-ground and underground pipes is to ensure that leaks are detected in real-time as they occur and also to reduce human intervention [
13]. Although leaks emanating from above-the-ground pipes can be visually observed, WSN-based methods are preferred for monitoring these pipes because manual inspection methods are more laborious as they will require regular inspection, they provide a slow response and most times, the leaks will be detected only after a considerable amount of water has been lost. This study focuses on above-the-ground pipes since they are the most likely to leak.
4. Performance Evaluation
The goal of this study was to compare the leak detection performance of the three DKF algorithms and determine which is optimal for leak detection in WWPM systems composed of a network of low-cost MEMS accelerometer sensors. We start this section by presenting the performance results obtained from the simulations and then end the section by presenting and discussing the results obtained from physical experiments for validation purposes. This will partially answer the question of which DKF algorithm is better and well suited for application in a fully distributed leak detection solution in WWPM systems using low-cost MEMS accelerometers.
4.1. Presentation and Discussion of Simulation Results
Figure 9 depicts the RMSE of the three selected DKF algorithms. For each DKF algorithm, the RMSE of both sensor nodes S1 and S2 are presented on the same plot.
From
Figure 10, we see that there is no significant difference between RMSE of both sensor nodes S1 and S2 for all the DKF algorithms. The results in
Figure 10 show that a difference in the RMSE value of both sensor nodes S1 and S2 only occurred for the cases of ICF and SGG-ICF at the beginning of the simulation. However, this difference becomes insignificant with time, as the RMSE values of both sensor nodes converge to the same value. This implies that all the three DKF algorithms compute consistent estimates and thus maintain local consistency in the estimates of neighboring sensor nodes. This agrees with the results published in [
22], which showed that all three DKF algorithms achieved local consistency when applied in a low-cost sensor network target tracking application. The property of local consistency is especially very important for ensuring high reliability and reducing the FAR of a WWPM system implementing DKF, given that it will prevent contradictory outputs from neighboring sensor nodes as revealed in [
21].
To compare the estimation accuracy of the three DKFs,
Figure 10 depicts the RMSE of sensor node S1 for all three DKF algorithms. From
Figure 10, we see that the RMSE of EDKF converges to 0.0667 while the RMSE value of SGG-ICF is slightly greater than that of ICF at the beginning, and they both converge to the value of 0.0397 with time. Moreover, it can be seen that ICF and SGG-ICF have lower RMSE values and can provide better estimation accuracy compared to EDKF. These results are also consistent with the results of He et al. [
22]. Thus, we expect the leak detection performance of ICF and SGG-ICF to be higher than that of EDKF. To further evaluate the performance of the selected DKF algorithms, we carried out simulations on the two-node linear WSN presented in
Section 2.3 using acceleration data collected from the field. The results of the performance of the selected DKF algorithms from simulations are depicted in
Figure 11.
From
Figure 11, it can be seen that ICF and SGG-ICF have leak sensitivities that are significantly higher compared to that of the EDKF. ICF has the highest sensitivity (100%), followed by SGG-ICF with a sensitivity of 95% and finally EDKF with a sensitivity of 65%. As shown in
Figure 10, ICF had the lowest RMSE value, and this explains why it has the highest sensitivity in
Figure 11. However, the overall accuracy of the DKF algorithms revealed that SGG-ICF has the highest accuracy (94%) followed by EDKF (93%), and lastly ICF (88%). This goes further to support the conclusions of [
22], which stated that SGG-ICF is well suited for distributed state estimation in low-cost sensor networks as it provides more flexibility and strikes a balance between estimation accuracy and communication burden. However, based on the claim by Chan et al. [
31], that accuracy may not be an optimal metric for evaluating the performance of a leak detection system, as it is dependent on class proportions, we cannot, at this point, state that SGG-ICF is better. We will need further experiments to conclude which DKF algorithm provides more reliable leak detection.
The presented simulation results imply that SGG-ICF and ICF are more sensitive to detecting leakages when compared to EDKF. This is because, in ICF and SGG-ICF, the sensor nodes exchange their local information multiple times between measurement updates whereas, in EDKF, the sensor nodes communicate with their neighbors at most, one time in between measurement updates. In addition, the event-triggered-commutation attribute of EDKF (which allows neighboring nodes to approximate the local information pairs of their neighbors and not to communicate when the difference between the predicted state and the last transmitted state is below a defined threshold) reduces its estimation accuracy. However, this attribute makes the EDKF have a lower communication requirement, unlike SGG-ICF and ICF, which have higher communication requirements, which will eventually lead to high power consumption. To validate these simulation results, we present in the next subsection the results of the performance of the three algorithms obtained from experiments conducted on the laboratory testbed.
4.2. Presentation and Discussion of Experimental Results
In this section, we present and discuss the results obtained from the laboratory experiment scenarios described in
Section 2.4 in order to validate the simulation results presented in
Section 4.1 above. The results of the performance of the selected DKF algorithms obtained from laboratory experiments are depicted in
Figure 11.
From
Figure 11, the leak sensitivities are 61%, 77%, and 75% for EDKF, ICF, and SGG-ICF, respectively. From the sensitivity results, it can be seen that ICF detected most of the leak events that occurred and missed detecting fewer leak events compared to SGG-ICF and EDKF. EDKF, with the lowest sensitivity of 61%, failed to detect 39% of the leak events that occurred in the pipeline, causing it to have the highest miss detection rate (MDR). This means that a high proportion of actual leaks will go unnoticed in the case of EDKF compared with the other algorithms. In terms of specificity, SGG-ICF is highest with 95% followed by EDKF with 93% and lastly, ICF with 80%. From the specificity results, it can be seen that SGG-ICF correctly detected most of the no-leak events and generated fewer false alarms compared to ICF and EDKF. ICF, with the lowest specificity of 80%, declared 20% of no-leak events as leak events, causing it to have the highest FAR. For a good leak detection system, it is better for the FAR to be higher than the MDR because the false alarms generated by the leak detection system can be ignored without it affecting the NRW. However, a higher MDR has an adverse effect on the NRW as it represents the true leaks that occur on the WDN and are undetected but lead to water losses and a high NRW. Thus, the sensitivity of a leak detection system has a powerful effect on the NRW, and we will focus more on this metric as a measure of the reliability of a leak detection system.
Knowing that sensitivity is a measure of how well the leak detection system detects true-leak events while specificity is a measure of how well the system recognizes no-leak events on the pipeline, this means that the DKF algorithm which has the highest sensitivity and specificity values is more reliable. This combined effect of sensitivity and specificity is captured by the accuracy metric. From
Figure 11, SGG-ICF has the highest accuracy (92%), followed by EDKF with 90%, and lastly, ICF with 80%. This means that SGG-ICF is more reliable for leak detection compared to ICF and EDKF. This result is consistent with that obtained from simulations.
4.3. Comparison of Simulation and Experimental Results
In this subsection, we compare the results obtained from laboratory experiments with the simulation results and from there conclude which algorithm is optimal, based on the values of the performance metrics presented for both simulations and laboratory experiments.
Figure 12 presents the error between the simulation and laboratory results categorized by the DKF algorithm, while
Figure 13 presents the error between the simulation and laboratory results categorized by performance metric. The laboratory results are taken as the reference to compute the error.
4.3.1. Comparison of the Sensitivity of Simulations and Laboratory Experiments
Comparing the results of the sensitivities obtained from simulations with those obtained from the laboratory experiments (
Figure 11), we realize that there is a general decrease in the sensitivity obtained from the laboratory experiments when compared with those obtained from simulations. This decrease can be explained by the existence of packet loss during communication between neighboring sensor nodes in the physical experiments which are absent in the simulations. We see that there is no significant difference in the sensitivity of EDKF obtained from the laboratory experiments (61%) when compared to those obtained from simulations (65%). However, there are significant differences in the sensitivities of ICF and SGG-ICF obtained from simulations and laboratory experiments. For ICF and SGG-ICF, the sensitivities are 77% and 75%, respectively, from laboratory experiments as compared to 100% and 95%, respectively, recorded from simulations. This result implies that ICF and SGG-ICF are greatly affected by packet loss compared to EDKF. This can be attributed to the high communication requirement of ICF and SGG-ICF (which involves large amounts of exchanges between neighboring sensor nodes) and the highly unreliable wireless links in low-cost WSNs. For EDKF, its diffusion property alongside its event-triggered nature drastically reduces the number of exchanges between neighboring sensor nodes and thus, reduces the packet loss rate. We observed that in the physical experimentation of the EDKF, the packet loss rate was very low (<5%). This makes EDKF very appropriate for real-time application in systems where the dynamics of the system are changing fast. For ICF and SGG-ICF, which require multiple communications rounds between successive measurement updates to achieve excellent estimation accuracy, it is evident that their overall estimation accuracy depends on the packet loss rate. However, given that we are dealing with low-cost sensor networks where the communication links are unreliable, this increase in the number of data exchanges between neighboring sensor nodes will increase the likelihood of packets being lost. We observed in the physical experiments that out of five data exchanges that occurred between neighboring sensor nodes during the information fusion stage, only 60% of the transmitted packets were received successfully, meaning that only three out of five messages transmitted were successfully received. This explains the significant difference in the sensitivities of ICF and SGG-ICF obtained from simulations and laboratory experiments. The results in
Figure 12 confirm that EDKF is the DKF algorithm least affected by packet loss due to its lower average error value when considering all three performance metrics. These results agree with the proposition of He et al. [
22], which suggested the use of diffusion-based DKF algorithms in situations where communication resources are limited. Furthermore, it can be seen from
Figure 13 that sensitivity is the performance metric most affected by packet loss. This can be explained by the fact that the occurrence of a leak in the pipeline leads to a sudden increase in the measured pipe surface acceleration, which results in an estimated acceleration that is significantly different from the previously estimated acceleration when there was no leakage in the pipeline. The DKF algorithm is required to react fast in order to capture this sudden change. As such, any delay resulting from packet loss and retransmission will minimize the chances of detecting this sudden increase in the pipe surface acceleration. However, the response time for ICF and SGG-ICF is slow since they have to involve numerous communication rounds between measurements. The loss of packets, due to the unreliable wireless links in low-cost WSNs, further worsens the estimation accuracy. Thus, to achieve high sensitivity, it is required that measurements be treated in a timely manner as they are obtained. This means that EDKF is more responsive to real-time leak detection and attractive for detecting fast leaks in WWPM systems compared with ICF and SGG-ICF.
4.3.2. Comparison of the Specificity of Simulations and Laboratory Experiments
The results in
Figure 11 reveal that there is no significant difference between the specificity obtained from simulations and that obtained from physical experiments. From the results, we see that SGG-ICF has the highest specificity in the physical experiments as opposed to EDKF, which has the highest specificity from the simulation results. Generally, from the simulation and physical experiments results, we observed that the specificities of the DKF algorithms were high. This means that there is a low likelihood of an alarm being triggered when there is no real occurrence of a leak in the pipeline and this increases the reliability of the leak detection system.
4.3.3. Comparison of the Accuracy of Simulations and Laboratory Experiments
In terms of accuracy, SGG-ICF still has the highest accuracy (92%) which is slightly lower than that obtained from simulations (94%), followed by EDKF (90%), which is lower than the value derived from simulations (93%). In the same light, the accuracy of ICF obtained from laboratory experiments (80%) is significantly lower than that obtained from simulations (88%). We can deduce, that from the agreement of the trend of the accuracy values of both simulations and laboratory experiments, SGG-ICF has the highest leak detection performance. However, these results also confirm that accuracy is not a perfect metric for evaluating the performance of leak detection techniques due to its bias, based on its dependence on class proportions, as earlier stated in [
31]. For example, in our case (which is also similar to what happens in real life), the number of no-leak events is greater than the number of leak events. Thus, the accuracy is affected more by its ability to correctly recognize the no-leak events than its ability to detect the leak events. This can be seen in
Figure 11 when you compare the accuracy of ICF and EDKF. ICF has a higher sensitivity (which is a measure of the ability to detect leak events) compared to EDKF. However, its specificity (which is a measure of the ability to correctly recognize no-leak events) is lower than that of EDKF. The fact that EDKF has higher accuracy than ICF means that the accuracy, in this case, is affected more by its ability to correctly recognize the no-leak events than its ability to detect the leak events. The class imbalance does not affect the sensitivity and specificity metrics that we have also presented. Thus, by combining both sensitivity and specificity, we still see that SGG-ICF is the most performant algorithm, which agrees with the accuracy. Though the accuracy of SGG-ICF is slightly higher than that of EDKF, it has a high communication burden compared to EDKF. This high communication requirement will cause it to consume more battery power compared to EDKF. Thus, if we consider both the leak detection accuracy and power consumption, we realize that EDKF is the optimal algorithm among the three algorithms when dealing with battery-powered sensor nodes in WWPM applications.
4.4. Summary
In this section, we performed simulations and physical experiments to evaluate the leak detection performance of the selected DKF algorithms. The results from simulations and laboratory experiments revealed that ICF had the highest leak sensitivity, while SGG-ICF had the highest specificity and accuracy. The results of the leak detection performance for EDKF derived from simulations were close to that obtained from the laboratory experiments. However, there was a significant difference between the leak detection performance of ICF obtained from simulations and that derived from the laboratory experiments. The difference was explained by the fact that there was a loss of packets in the physical experiments during communications between sensor nodes that were not considered during simulations.
5. Conclusions
This paper presents the evaluation and comparison of the leak detection performance of three selected DKF algorithms implementing distributed data fusion strategies based on diffusion, gossip and consensus. For novelty, the study used a combined approach that involves simulations and laboratory experiments to compare the leak detection performance of the three selected DKF algorithms. A summary of the laboratory results is depicted in
Table 5.
From the combination of both sensitivity and specificity in
Table 5, it can be concluded that SGG-ICF is the most performant algorithm, which agrees with the accuracy. Though the accuracy of SGG-ICF is slightly higher than that of EDKF, it has a high communication burden compared to EDKF. This high communication requirement will cause it to consume more battery power compared to EDKF. Thus, if we consider both the leak detection accuracy and power consumption, we realize that EDKF is the optimal algorithm among the three algorithms when dealing with battery-powered sensor nodes in WWPM applications, as revealed by both the simulation and laboratory experiment performance results. The laboratory results reveal that the event-triggered diffusion-based DKF is optimal because it has a lower communication burden and is less affected by packet loss, which makes it more responsive to real-time leak detection.
Future work will involve the study of the power consumption of the three DKF implementations so that the combined effect of leak detection performance and energy efficiency can be used to determine which category of DKF is optimal for practical implementation in battery-powered sensor nodes. In addition, though results obtained from the laboratory testbed were satisfactory, extending the experiments to a field study that involves the deployment of a large-scale linear WSN on a real WDN with real-life conditions is also suggested for future work. This is important because the simplistic nature of the laboratory WDN does not capture all the complications in a real WDN. Another interesting point is that most WWPM studies are limited simulations and experiments on laboratory testbeds. Thus, extending experiments to real WDNs will contribute to the WWPM literature. In addition, we suggest, for future work, the implementation of machine learning techniques at the decision step of the leak detection algorithm. This implies that once the DKF has been used at the feature extraction phase to estimate the pipe surface vibration, the value can then be passed to a trained classifier at the decision phase to accurately determine the existence of a leak or no leak on the pipeline. Finally, the implementation of leak localization techniques, such as acoustic correlation analysis, will be investigated in future experiments.