Advances in Assessing the Reliability of Water Distribution Networks: A Bibliometric Analysis and Scoping Review

: The reliability of Water Distribution Networks (WDNs) is a critical topic that has been explored by many researchers over the last few decades. Nevertheless, this research domain has gained momentum in the last decade. WDN reliability was addressed in the literature using different approaches and techniques. This study presents a bibliometric analysis and scoping review of the progress and gaps in research on WDN reliability. The analysis was performed on a total of 347 articles from 2000 to 2022, which were retrieved from the SCOPUS database. The bibliometric analysis identiﬁed trends and gaps by focusing on articles output, citation network of articles, co-authorship and collaboration analysis, co-citation, and clustering analysis. In addition, coupling keywords analysis with thematic maps analysis helped identify basic, niche, emerging, and declining research themes in the ﬁeld of WDN reliability. Three major research themes were found: WDN optimization, reliability assessment, and consideration of GHG emissions and energy-cost for WDN expansion. Reliability surrogate measures (RSMs) were found to be one of the most widely researched topics in this ﬁeld. Performance assessment of various RSMs, as well as the consideration of energy and cost for WDN design and expansion stood out as the emerging trends for future research in the ﬁeld of WDN reliability.


Introduction
Water Distribution Networks (WDNs) are subject to hydraulic and mechanical failures.The performance assessment of WDNs, considering both failure and operating conditions, falls under the general term of WDN reliability [1].Various techniques exist in the literature to conduct a representative reliability assessment of WDNs, including: the minimum cut set method [2], Monte Carlo simulation (MCS) [3], node availability [4], node-reliability factor, volume-reliability factor, and network reliability factor [5], etc.Most of these techniques require prohibitively long computational time and power when applied to complex realworld networks.For example, computational times of 19.72, 43.95, and 90.5 h were reported a hydraulic reliability-based design using the MCS method of the standard WDNs: Two loop, GoYang, and Fossolo, respectively [6].Similarly, for a mechanical reliability-based design using the minimum cut set method, computational times of 1.211, 2.825, and 9.083 h were reported for Two loops, GoYang and Fossolo WDNs, respectively [6].To overcome this issue, several studies presented reliability surrogate measures (RSMs), such as entropy [7], resiliency [8], network resilience [9], modified resilience index [10], etc., to reduce the computational time and reach a near-optimal solution, even when considering multiple objectives.
Some bibliometric reviews concerned with Water Distribution Systems (WDSs) [52] and the resilience of WDNs [53] recently appeared in the literature.A bibliometric analysis quantitatively assesses trends in a specific research field by conducting statistical and metadata analysis of the typical attributes of articles published in a particular domain, including the title, authors, keywords, date of publication, number of citations, etc. [52,53].It also helps discover the subtle differences in the scientific evolution of a specific domain.More specifically, it can be effectively used for quantifying the research output from different authors, the collaboration between the authors, countries, universities, and the impact of different authors and articles on the advancement of a specific field [54].The previous analysis focused on WDSs in general, considering all problems such as optimal design, risk assessment, etc. [52], whereas another bibliometric review focused on the resilience assessment of WDNs [53].Resilience is, however, a subset of reliability assessment for WDNs.Thus, a broader bibliometric analysis and scoping review is needed to provide a better direction on the use of different optimization tools for reliability-based design, expansion, and maintenance planning of WDNs.Therefore, this study, for the first time, presents a bibliometric analysis and scoping review of reliability assessment tools for managing and designing WDNs.The primary research questions answered by the analysis pertain to the identification of the dominant research themes in the domain, their extent of exploration, the nature of collaboration, key concepts, tools, and applications, as well as research gaps, and recommendations for future research on WDN reliability assessment.Thus, the study can be useful to researchers in the field for scoping future research and identifying collaboration opportunities.

Background
Consideration of reliability dates to Goulter and Morgan (1985) [55], who considered redundancy in the pipe network as a measure to ensure reliability in the network.Some studies focused on the consumer's perspective of reliability by assigning a damage function to each consumer, that defines the damages caused when the demands are not met [56].Other studies presented a methodology for reducing the impacts of failures in WDNs by minimizing the negative effects on consumers and operational costs [57].A recent study focused on the consumers' lack of water supply by identifying the longest suspension time [58].In addition, inherent deficiencies in WDSs, such as inadequate disinfection, low pressure, intermittent operations, leakages, and corrosion, can lead to decreased quantity as well as the quality of water reaching the consumers [59].Fluctuating pressures in the WDNs have also led to the development of advanced automated tools operating the amount of inflow to the system, including pumps and valves [60].However, most articles have focused on quantifying the hydraulic and mechanical reliability of WDNs, as these can be accurately modeled.Bao and Mays [1] clearly distinguished between the hydraulic and mechanical reliabilities of WDNs.They defined hydraulic reliability as water demand satisfaction considering uncertainty in parameters, such as demand and pipe roughness coefficient, while mechanical reliability was defined as demand satisfaction considering the failure of one or more pipes.Goulter [61] presented the multi-objective optimization approach for the reliability-based design of WDNs based on the constraint method.Su et al. [2] presented the minimum cut set approach for mechanical reliability estimation of WDNs, which is one of the most widely used approaches.Jacobs and Goulter [62] presented the graph theory-based approach for the reliability analysis of WDNs.Bhave and Gupta [63] represented uncertainty in decision variables by fuzzifying the variables.Shibu and Reddy [64] presented a fuzzy probabilistic approach to transform uncertainty in variables into fuzzy random variables.
Reliability assessment tools can be classified into two major classes: hydraulic and mechanical.Hydraulic reliability can be estimated using techniques such as MCS, First Order Second Moment (FOSM), node-reliability, volume-reliability, and network reliability factors.The most used techniques for mechanical reliability estimation include the minimum cut set approach and the simple probabilistic approach.The classification of the reliability assessment approaches is presented in Figure 1.
Water 2023, 15, x FOR PEER REVIEW 3 of 25 in the WDNs have also led to the development of advanced automated tools operating the amount of inflow to the system, including pumps and valves [60].However, most articles have focused on quantifying the hydraulic and mechanical reliability of WDNs, as these can be accurately modeled.Bao and Mays [1] clearly distinguished between the hydraulic and mechanical reliabilities of WDNs.They defined hydraulic reliability as water demand satisfaction considering uncertainty in parameters, such as demand and pipe roughness coefficient, while mechanical reliability was defined as demand satisfaction considering the failure of one or more pipes.Goulter [61] presented the multi-objective optimization approach for the reliability-based design of WDNs based on the constraint method.Su et al. [2] presented the minimum cut set approach for mechanical reliability estimation of WDNs, which is one of the most widely used approaches.Jacobs and Goulter [62] presented the graph theory-based approach for the reliability analysis of WDNs.Bhave and Gupta [63] represented uncertainty in decision variables by fuzzifying the variables.Shibu and Reddy [64] presented a fuzzy probabilistic approach to transform uncertainty in variables into fuzzy random variables.
Reliability assessment tools can be classified into two major classes: hydraulic and mechanical.Hydraulic reliability can be estimated using techniques such as MCS, First Order Second Moment (FOSM), node-reliability, volume-reliability, and network reliability factors.The most used techniques for mechanical reliability estimation include the minimum cut set approach and the simple probabilistic approach.The classification of the reliability assessment approaches is presented in Figure 1.The following is a summary of the typical application of these tools in the reliability assessment of WDNs:

•
The MCS method involves generating random values for uncertain variables, such as nodal demands and/or pipe roughness coefficients.The random variables are usually modeled by assuming that they follow a certain probability distribution with a deterministic mean and a certain percentage of the mean as the standard deviation.The network is then analyzed for each of these random samples of uncertain variables.The reliability is then estimated as the ratio of the number of times the system performed satisfactorily to the total number of samples.The system is in failure condition if at least one of the nodes has any deficit in supply.The advantage of this approach is that it is easy to implement, while the major disadvantage is that it requires a substantial amount of random sample generation, making it a very timeconsuming process.

•
The FOSM includes the estimation of the covariance matrix of output parameter(s) as a function of the variance of the model input parameters.In addition, the gradient of the model output with respect to the model input parameters is calculated.The WDN needs to be evaluated for K + 1 scenarios (K being the number of input parameters and one base scenario, where the parameters are assumed to be certain).The main advantage of the FOSM method is that it does not require the substantial The following is a summary of the typical application of these tools in the reliability assessment of WDNs:

•
The MCS method involves generating random values for uncertain variables, such as nodal demands and/or pipe roughness coefficients.The random variables are usually modeled by assuming that they follow a certain probability distribution with a deterministic mean and a certain percentage of the mean as the standard deviation.The network is then analyzed for each of these random samples of uncertain variables.The reliability is then estimated as the ratio of the number of times the system performed satisfactorily to the total number of samples.The system is in failure condition if at least one of the nodes has any deficit in supply.The advantage of this approach is that it is easy to implement, while the major disadvantage is that it requires a substantial amount of random sample generation, making it a very time-consuming process.

•
The FOSM includes the estimation of the covariance matrix of output parameter(s) as a function of the variance of the model input parameters.In addition, the gradient of the model output with respect to the model input parameters is calculated.The WDN needs to be evaluated for K + 1 scenarios (K being the number of input parameters and one base scenario, where the parameters are assumed to be certain).The main advantage of the FOSM method is that it does not require the substantial number of simulations needed in the MCS method.The drawback of the method is that it becomes inefficient in cases of large variations in the random variables, since the effect of non-linearity in the model becomes dominant, thereby deteriorating the accuracy of the model.

•
Node reliability factor is defined as the ratio of available outflow to desired outflow volume at a node.Volume reliability factor is defined as the ratio of the total outflow volume to desired outflow volume for the entire network, considering all the states during the analysis period.Network reliability factor is calculated by multiplying the time factor and node factor with the volume reliability factor.The advantage of considering these factors is that they provide a more accurate picture of the reliability values when compared to the normal approach followed in the MCS method.The drawback of this method is that it requires significant computational time, as it employs the MCS approach.

•
Minimum cut set is defined as the minimum number of components that, upon collectively failing, causes a system failure.However, if any of the components is in a working state, the system failure would not occur.Thus, for a branched network, the failure of a single component can lead to the failure of the entire network; in a looped network, simultaneous failure of two or more components might be needed to cause system failure.This implies that the size of the cut set would be different for different networks.The reliability of the WDN is estimated as the demand satisfaction considering failure of each of these minimum cut sets.

•
The probabilistic approach considers that the failure probability of each pipe is the same and thus estimates reliability as the demand satisfaction under the simulated failure of each pipe at a time and aggregate for all failure situations.Different tools exist to determine the failure scenarios, such as reliability block diagrams [65], failure tree analysis [66], etc.

Methodology
The methodology of the analysis conducted in this study involved three steps: (1) data extraction; (2) bibliometric analysis; and (3) content analysis.The framework of the methodology is presented in Figure 2. The research questions addressed in the study are presented first, followed by the details of each of these steps.
number of simulations needed in the MCS method.The drawback of the method is that it becomes inefficient in cases of large variations in the random variables, since the effect of non-linearity in the model becomes dominant, thereby deteriorating the accuracy of the model.

•
Node reliability factor is defined as the ratio of available outflow to desired outflow volume at a node.Volume reliability factor is defined as the ratio of the total outflow volume to desired outflow volume for the entire network, considering all the states during the analysis period.Network reliability factor is calculated by multiplying the time factor and node factor with the volume reliability factor.The advantage of considering these factors is that they provide a more accurate picture of the reliability values when compared to the normal approach followed in the MCS method.The drawback of this method is that it requires significant computational time, as it employs the MCS approach.

•
Minimum cut set is defined as the minimum number of components that, upon collectively failing, causes a system failure.However, if any of the components is in a working state, the system failure would not occur.Thus, for a branched network, the failure of a single component can lead to the failure of the entire network; in a looped network, simultaneous failure of two or more components might be needed to cause system failure.This implies that the size of the cut set would be different for different networks.The reliability of the WDN is estimated as the demand satisfaction considering failure of each of these minimum cut sets.

•
The probabilistic approach considers that the failure probability of each pipe is the same and thus estimates reliability as the demand satisfaction under the simulated failure of each pipe at a time and aggregate for all failure situations.Different tools exist to determine the failure scenarios, such as reliability block diagrams [65], failure tree analysis [66], etc.

Methodology
The methodology of the analysis conducted in this study involved three steps: (1) data extraction; (2) bibliometric analysis; and (3) content analysis.The framework of the methodology is presented in Figure 2. The research questions addressed in the study are presented first, followed by the details of each of these steps.

Research Questions
This bibliometric analysis will address the following research questions (RQs):

•
What trends and patterns can be detected by analyzing articles on WDN reliability in terms of: a.
The number of relevant publications and citations.b.
The countries that contributed the most to the knowledge base.c.
The top journals that have published the most cited articles on the topic.d.
The top articles in the literature that have the greatest impact in terms of citation as well as page rank analysis.e.
The top authors in the literature that have the greatest impact in terms of number of publications and citation.
• What is the nature of collaboration in the field of WDN reliability as evidenced by co-authored publications?

•
What are the key concepts, tools, and applications that have been explored in the field of WDN reliability and how they are related?
In addition, the article will discuss research gaps, and make recommendations for future research on WDN reliability assessment and related applications.

Data Extraction
Data extraction included the retrieval of documents related to WDN reliability from different literature databases.In the present study, "SCOPUS" was used as the search database due to its wider range of journal coverage compared to databases, such as "Web of Science", "ERIC", or "IEEE Xplore".The search focused only on peer reviewed journal articles.The search found 808 documents when (TITLE (water AND distribution AND network *) OR TITLE (water AND supply AND network *) OR TITLE (water AND distribution AND system *) AND TITLE-ABS-KEY (reliability)) were considered.Upon limiting the search to document type "article", language "English", and publication year "2000-2022", 347 articles were selected for detailed analysis.

Bibliometric Analysis
Bibliometric analysis involves quantitative and statistical analysis of the literature [59].
In the present study, the analysis was carried out by determining the following aspects: (1) The number of documents and citations in a defined period.
(2) The top authors publishing articles in the field of WDN reliability by number of publications and citations and their collaborations.(3) Top articles by the number of citations.(4) The top countries producing articles in the field of WDN reliability and their collaborations.
The analysis was performed using two open-source software packages, VOSviewer V.1.6.18 and Bibliometrix V.4.0.1, as suggested by past studies [52,53].VOSviewer was employed to generate the collaboration maps for different authors and countries, as well as the citation network of authors.Bibliometrix was used to present statistical data regarding the number of publications and citations for different authors, articles, countries, and journals [67].To determine the ranking of the articles in the co-citation network, page rank was used as a measure.Page rank is a measure of how often and where the article has been cited [53]; it implies that, if the article has been authored by highly cited authors and cited by highly cited articles, the page rank for that article would be high.Page rank for journal citation network was estimated using the following equation [68,69]: where π (k)T = Page rank vector at the kth iteration, p n−1 and p n−2 are the number of papers published in journal J in the years n − 1 and n − 2, n being the current year, and π (k)T is updated using the following equation H = very sparse, raw substochastic hyperlink matrix, α = scaling parameter between 0 and 1, a = binary dangling node, and e T = row vector of all 1 s.

Content Analysis
The content analysis consisted of two parts: Cluster analysis: The co-citation network of the authors in the field of WDN reliability was generated to formulate clusters for different authors based on their common citations.The top articles by global citations from each cluster were then selected to identify the common themes of the various journals.Thereafter, the clusters were generated for author keywords to determine the focus areas of different articles based on the commonly used keywords in them.
Thematic map generation: Thematic maps were generated based on the keywords for the various articles to determine the basic themes explored in various articles, the most widely explored themes, niche themes, and emerging themes.A theme is decided based on the common keywords used by a group of articles.Density and centrality were used to measure the development and influence of a particular theme.Density measures the strength of internal ties among all the keywords within a theme and so is a measure of the development of that theme.Centrality represents the extent of connection to other themes and can be used to measure the influence of the theme in the research field.The following equations were used to calculate the centrality and density of a theme [68]: e hk is the count for collective occurrence of keywords h and k belonging to different themes.
e ij is the count for collective occurrence of keywords i and j belonging to the same theme and w is the total keyword count for the theme.
Conclusions were drawn in terms of the niche themes that require more exploration, emerging themes, and future research directions.

Trend in Articles Output
The trend in the number of publications and citations with respect to time is presented in Figure 3.The number of publications has increased gradually over the years, with the highest number of publications in 2020 (39 articles).There was a slight decline in 2021.The number of citations, however, was remarkably high in 2000, with gradual decline over subsequent years with a sudden spike in 2005.The highest number of citations was in 2014 (939 citations).The number of citations, however, declined again over subsequent years.The figure shows that the number of citations has been increasing and decreasing over this period, despite an increasing trend in the number of publications in this field.This confirms findings from a recent literature review [70] that showed an increasing trend in the number of publications in the field of WDN optimization.This is also in agreement with findings from a similar analysis considering publications on the resilience assessment of WDNs [53].This depicts the growing interest in the field of WDN reliability.

Citation Analysis of Articles
The contribution and impact of articles is depicted by citation analysis.Figure 4 shows the citation network of different articles with a minimum of twenty citations.This limit was selected to obtain a clear citation network map with readable fonts.The size of the node depicts the number of citations, while the links between the various nodes depict the collaboration between them.The distance between the nodes depicts the association between any two nodes in terms of co-citation.Three clusters are identified in Figure 4. Cluster 1 groups articles focusing on reliability/resilience assessment of WDNs.Cluster 2 groups articles focusing on reliability-based design of WDNs, while cluster 3 focuses on the optimal design of WDNs considering aspects of entropy, resiliency, and reliability.
The ten articles with the highest number of citations are presented in Table 1.The article [8], published in Urban Water Journal, is the most cited document.The probable reason for this is that the article introduced the resilience index as a surrogate for WDN reliability; this index is one of the most widely used RSMs.The ratio of local to global citations is 17.60%, implying that a significant fraction of the total citations fall in the field of WDN reliability.The Normalized Local Citations (NLC) and Normalized Global Citations (NGC) are calculated as the ratio of the actual (local/global) citations to the average number of citations for documents citing the article with the same year of publication [70].NLC and NGC for [8] is 5.39 and 5.60, respectively, implying that it has around five times more citations when compared to similar articles published in the same year.The second highest cited article is [9], published in Journal of Water Resources Planning and Management (JWRPM).The high number of citations may be credited to the fact that the paper introduced network resilience, another widely used RSM.The NLC and NGC values are 3.37 and 3.42, respectively, which shows a higher citation score (around three times) when compared to similar articles.The third highest number of citations is obtained for [71], published in Engineering Optimization, as it employs resilience in their study for multiobjective design of WDNs.The NLC and NGC values are also more than 1, implying it has been cited a greater number of times on average when compared to similar articles published in that year.This shows that the use of RSMs, such as resilience and network resilience, has gained a wide popularity in last 22 years.

Citation Analysis of Articles
The contribution and impact of articles is depicted by citation analysis.Figure 4 shows the citation network of different articles with a minimum of twenty citations.This limit was selected to obtain a clear citation network map with readable fonts.The size of the node depicts the number of citations, while the links between the various nodes depict the collaboration between them.The distance between the nodes depicts the association between any two nodes in terms of co-citation.Three clusters are identified in Figure 4. Cluster 1 groups articles focusing on reliability/resilience assessment of WDNs.Cluster 2 groups articles focusing on reliability-based design of WDNs, while cluster 3 focuses on the optimal design of WDNs considering aspects of entropy, resiliency, and reliability.
The ten articles with the highest number of citations are presented in Table 1.The article [8], published in Urban Water Journal, is the most cited document.The probable reason for this is that the article introduced the resilience index as a surrogate for WDN reliability; this index is one of the most widely used RSMs.The ratio of local to global citations is 17.60%, implying that a significant fraction of the total citations fall in the field of WDN reliability.The Normalized Local Citations (NLC) and Normalized Global Citations (NGC) are calculated as the ratio of the actual (local/global) citations to the average number of citations for documents citing the article with the same year of publication [70].NLC and NGC for [8] is 5.39 and 5.60, respectively, implying that it has around five times more citations when compared to similar articles published in the same year.The second highest cited article is [9], published in Journal of Water Resources Planning and Management (JWRPM).The high number of citations may be credited to the fact that the paper introduced network resilience, another widely used RSM.The NLC and NGC values are 3.37 and 3.42, respectively, which shows a higher citation score (around three times) when compared to similar articles.The third highest number of citations is obtained for [71], published in Engineering Optimization, as it employs resilience in their study for multi-objective design of WDNs.The NLC and NGC values are also more than 1, implying it has been cited a greater number of times on average when compared to similar articles published in that year.This shows that the use of RSMs, such as resilience and network resilience, has gained a wide popularity in last 22 years.

Co-Author Analysis
A total of 740 authors were extracted from 34 countries.The ten most influential authors by number of citations and publications are presented in Table 2.The statistics presented in Table 2 are presented based on the 347 articles shortlisted for the analysis.Farmani R. has the highest number of citations (779), followed by Tanyimboh T.T. and Walters G.A. (646 and 609, respectively).Farmani R. has highly cited articles primarily focused on evolutionary algorithms, reliability assessment, and the use of artificial neural networks (ANN) for pipe burst rate prediction.Tanyimboh T.T., who started publishing significantly in 2000 (and even before), has the highest number of publications in the field, with total 20 publications; Farmani R., who started publishing in 2005, has the second highest number of publications, with 11.The major works by Tanyimboh T.T. involve optimal design of WDNs, reliability assessment, and use of RSMs, such as entropy and resiliency.Walters G.A. has primarily published works on WDN design, considering   2 are presented based on the 347 articles shortlisted for the analysis.Farmani R. has the highest number of citations (779), followed by Tanyimboh T.T. and Walters G.A. (646 and 609, respectively).Farmani R. has highly cited articles primarily focused on evolutionary algorithms, reliability assessment, and the use of artificial neural networks (ANN) for pipe burst rate prediction.Tanyimboh T.T., who started publishing significantly in 2000 (and even before), has the highest number of publications in the field, with total 20 publications; Farmani R., who started publishing in 2005, has the second highest number of publications, with 11.The major works by Tanyimboh T.T. involve optimal design of WDNs, reliability assessment, and use of RSMs, such as entropy and resiliency.Walters G.A. has primarily published works on WDN design, considering aspects such as multi-objective design, calibration of WDN models, consideration of uncertainty, tank simulation, maintenance, and rehabilitation as well as risk and vulnerability assessment of WDNs.Jeffrey P. and Yazdani A. started publishing much later, in 2011, and have published only 4 articles in this field.However, they have a considerable number of citations (458 each).Dissemination of professional and technical information to the civil engineering professional was a major goal of the research field.This was accomplished through a variety of publications and information products, including 35 professional and technical journals.To better understand and visualize the collaboration among the authors, a collaboration map is presented in Figure 5.The map is generated using VOSviewer by setting the minimum number of documents as three and minimum number of citations as 20.The top 30 authors are presented from the shortlisted authors in order to present a clear and readable picture of collaboration.The bigger the node size, the larger the number of publications by the author.The closer the two nodes, the greater the association between the two in terms of co-citation.The thicker the links between two nodes, the stronger the collaboration between the two.Three clusters can be seen; these clusters represent a sub-community with a strong collaboration between the authors.Cluster 1 is comprised of articles on the optimal design of WDNs using evolutionary algorithms and the application of other soft computing tools.Cluster 2 is comprised of articles on optimal WDN design, pipe failure prediction models, and optimal valve placement models.Cluster 3 is comprised of articles on reliability-based design of WDNs and the use of surrogate measures for reliability assessment.In cluster 1, Farmani R. has the highest number of publications, while Savic D.A. and Tanyimboh T.T. have the highest number of publications for cluster 2 and cluster 3, respectively.The links between the various authors represent the collaboration in terms of co-authorship for various documents.The topology appears quite dense for each community, depicting strong collaboration among the authors.However, the sparse topology between the green and blue cluster indicates that more collaborative works should be carried out between the authors of these two communities.

Countries Network Map
In order to explore the global distribution of publications in this specific research field, a network map for collaboration among various countries was generated.A network of 19 countries is shown in Figure 6.The size of the node represents the production volume by number of documents, while the links represent collaboration between the countries.The distance between two nodes depicts their association by of co-citation.Five clusters can be identified and are presented in Table 3. Cluster 1 is comprised of the US, Iran, Canada, Korea, and Qatar; this cluster has the highest number of publications (273).Cluster 2 includes the UK, Italy, South Africa, Portugal, China, France, and Norway; this cluster has the highest number of citations (4738).The US, UK, and China are the countries with highest number of publications (93, 84, and 76 articles, respectively).The US belongs to cluster 1, while the UK and China belong to cluster 2. This shows that the UK and China strongly collaborate with each other, while the US has minimal collaborations with UK and China.However, the country producing documents with highest number of citations is the UK, followed by Italy and US.The United Kingdom (UK) and the United States (US) are the two countries that have produced the highest number of publications.The substantial number of publications in the UK can be attributed to the fact that the University of Exeter has a strong partnership with the private sector in an attempt to regulate the water laws to reduce leakage levels, improve water quality, and level of service of the water infrastructure [72].In the US, the American Society of Civil Engineers (ASCE), founded in 1852, is one of the oldest and largest bodies working on the development, maintenance, and enhancement of civil engineering infrastructure.The emerging countries in this research domain are Qatar, Portugal, Norway, and Czech Republic, with 3, 7, 8, and 8 publications, respectively.Portugal and Norway belong to cluster 2, while Qatar and the Czech Republic belong to cluster 1 and 4. While cluster 2 leads in terms of highest number of citations, cluster 3 has the lowest number of citations.It can be inferred that the countries belonging to cluster 4 and 5 (Israel, Poland, Czech Republic, Germany, and Spain) need to collaborate with the countries of other clusters to improve their scientific contributions.

Countries Network Map
In order to explore the global distribution of publications in this specific research field, a network map for collaboration among various countries was generated.A network of 19 countries is shown in Figure 6.The size of the node represents the production volume by number of documents, while the links represent collaboration between the countries.The distance between two nodes depicts their association by of co-citation.Five clusters can be identified and are presented in Table 3. Cluster 1 is comprised of the US, Iran, Canada, Korea, and Qatar; this cluster has the highest number of publications (273).Cluster 2 includes the UK, Italy, South Africa, Portugal, China, France, and Norway; this cluster has the highest number of citations (4738).The US, UK, and China are the countries with highest number of publications (93, 84, and 76 articles, respectively).The US belongs to cluster 1, while the UK and China belong to cluster 2. This shows that the UK and China strongly collaborate with each other, while the US has minimal collaborations with UK and China.However, the country producing documents with highest number of citations is the UK, followed by Italy and US.The United Kingdom (UK) and the United States (US) are the two countries that have produced the highest number of publications.The substantial number of publications in the UK can be attributed to the fact that the University of Exeter has a strong partnership with the private sector in an attempt to regulate the water laws to reduce leakage levels, improve water quality, and level of service of the water infrastructure [72].In the US, the American Society of Civil Engineers (ASCE), founded in 1852, is one of the oldest and largest bodies working on the development, maintenance, and enhancement of civil engineering infrastructure.The emerging countries in this research domain are Qatar, Portugal, Norway, and Czech Republic, with 3, 7, 8, and 8 publications, respectively.Portugal and Norway belong to cluster 2, while Qatar and the Czech Republic belong to cluster 1 and 4. While cluster 2 leads in terms of highest number of citations, cluster 3 has the lowest number of citations.It can be inferred that the countries belonging to cluster 4 and 5 (Israel, Poland, Czech Republic, Germany, and Spain) need to collaborate with the countries of other clusters to improve their scientific contributions.

Journal Impact
To study the global distribution of publication among various journals in the field of WDN reliability, Table 4 presents the publication, citations, and indexing for various journals.JWRPM is one of the oldest journals in the field.It began publishing in 2001 and has

Journal Impact
To study the global distribution of publication among various journals in the field of WDN reliability, Table 4 presents the publication, citations, and indexing for various journals.JWRPM is one of the oldest journals in the field.It began publishing in 2001 and has the highest number of publications and citations in this field.This may be attributed to the fact that JWRPM belongs to ASCE, based in US, one of the countries with highest number of publications.The focus of the journal is advancement in knowledge of civil engineering infrastructures, and so has many publications on the topic of WDNs.

Co-Citation and Clustering Analysis 4.4.1. Co-Citation Analysis
Co-citation analysis is conducted using bibliometric information of the articles.A Co-citation network is comprised of articles that have been cited by the articles considered for the analysis.Articles belonging to a common theme (commonly cited by a group of authors) are aggregated into one cluster.An author co-citation map was generated using Bibliometrix, as shown in Figure 7.The minimum number of citations was fixed to three and minimum number of articles to 20.As shown in Figure 7, three clusters can be identified.The top ten articles by number of citations from each cluster are presented in Table 5. Betweenness is a measure of how often a node falls on the shortest path between two nodes.Thus, a higher value of betweenness implies that the node has been cited a greater number of times by the authors in the co-citation network.
Cluster 1 belongs to articles on reliability-based design of WDNs and application of optimization techniques.These articles primarily focus on development and application of optimization tools for single and multi-objective reliability-based design of WDNs.One study had the highest page rank in this cluster [8] as it introduced resilience as a surrogate for reliability; this has been employed by numerous studies as an RSM.Another study [20] had the second highest page rank.The article employs the LPG method for the optimal design of WDN, which has been subsequently employed by many studies published in high impact journals.Cluster 2 groups articles focusing on reliability assessment techniques and simulation methods.The highest citation in this cluster was for the EPANET manual [77], that has been used by a substantial number of articles as a simulation tool.The second highest number of citations belongs to a study that introduced a simulation methodology for analyzing the reliability of WDNs [78].The third highest was for a study that introduced the MCS method for hydraulic reliability estimation this is now one of the most widely used techniques [1].Cluster 3 groups articles focusing on the use and analysis of various RSMs as substitute for WDN reliability.In this cluster, the study that introduced a modified resilience index as a surrogate for WDN reliability was the most cited article [10].A comparative study on the usage of various RSMs, such as entropy, resilience, and network resilience, was presented in one article [79] and, thus, it is one of the widely cited articles.Some studies [71,80,81] presented entropy as a surrogate for WDN reliability while other studies [74,75,79] presented a comparative analysis of the various RSMs.
Water 2023, 15, x FOR PEER REVIEW 13 of 25 introduced a modified resilience index as a surrogate for WDN reliability was the most cited article [10].A comparative study on the usage of various RSMs, such as entropy, resilience, and network resilience, was presented in one article [79] and, thus, it is one of the widely cited articles.Some studies [71,80,81] presented entropy as a surrogate for WDN reliability while other studies [74,75,79] presented a comparative analysis of the various RSMs.Clustering of articles was conducted based on the most frequently used keywords, as shown in Figure 8. Keywords were aggregated into one cluster if they commonly appeared in a group of articles.The links between different nodes represent that they appear in articles that have collaborations among themselves.The thickness of the links depicts the extent of collaboration, while the distance represents the co-citation.Similar keywords are grouped into one, as they imply the same meaning.The most frequent keywords, their grouping, and the number of occurrences is presented in Table 6.The most widely used keyword was "water distribution network(s)" and related keywords, with 228 occurrences, followed by keywords such as "reliability" and "optimization", with 116 and 48 occurrences, respectively.Clustering of articles was conducted based on the most frequently used keywords, as shown in Figure 8. Keywords were aggregated into one cluster if they commonly appeared in a group of articles.The links between different nodes represent that they appear in articles that have collaborations among themselves.The thickness of the links depicts the extent of collaboration, while the distance represents the co-citation.Similar keywords are grouped into one, as they imply the same meaning.The most frequent keywords, their grouping, and the number of occurrences is presented in Table 6.The most widely used keyword was "water distribution network(s)" and related keywords, with 228 occurrences, followed by keywords such as "reliability" and "optimization", with 116 and 48 occurrences, respectively.Five clusters were identified, as shown in Figure 8. Cluster 1 (red) consists of keywords such as water distribution system(s), hydraulic and mechanical reliability, resilience, uncertainty, entropy, demand driven analysis, hydraulic analysis etc. Thus, the focus of this cluster is on articles concerned with WDN design that consider the aspects of reliability, resilience, entropy, uncertainty etc.The largest node in cluster 2 (green) is optimization, linked with other keywords such as multi-objective optimization, water quality, rehabilitation, and operation.Thus, the articles in this cluster primarily focus on aspects of multi-objective design, operation, and rehabilitation of WDNs, incorporating aspects of optimal design and water quality.Cluster 3 (blue) involves keywords such as reliability, vulnerability, redundancy, and graph theory.Thus, the focus of this cluster is on reliability assessment tools, considering other aspects such as vulnerability and redundancy.Cluster 4 (yellow) is comprised of the keywords "genetic algorithm" (GA), "EPANET", and "MAT-LAB".This shows that the application of GA, linked with EPANET and MATLAB, forms a sizable portion of articles in this field.The fifth and final cluster (purple) is comprised of two keywords, "algorithms" and "design".Thus, the articles that incorporate various algorithms for optimizing WDNs are presented in this cluster.

Thematic Clustering and Research Trends
A thematic map is generated using the author keywords to determine the basic themes, niche themes, emerging themes, widely explored, and unexplored themes.The thematic map generated is shown in Figure 10.

Thematic Clustering and Research Trends
A thematic map is generated using the author keywords to determine the basic themes, niche themes, emerging themes, widely explored, and unexplored themes.The thematic map generated is shown in Figure 10.
The size of ae bubble represents the number of articles that have explored the keywords within a theme.Thus, bigger bubble sizes represent a greater number of articles that have explored a particular theme.There are four quadrants in the map.The bottom right quadrant represents the themes with high centrality but low density, constituting the basic themes.These are the themes that have not yet fully developed but play a vital role in the research field.These topics, such as optimization, hydraulic reliability, redundancy, hydraulic analysis etc., constitute the basic themes in this field.The grouping of the keywords represents that consideration of redundancy in hydraulic and demand driven analysis constitute one research field, while multi-objective genetic algorithm considering hydraulic reliability has been employed together to a significant extent.The top right quadrant represents the motor themes; these are the themes with high density and centrality.These themes are fully developed and are vital in the research domain.The consideration of EPANET and MATLAB for mechanical reliability is revealed as a motor theme.This, however, does not necessarily mean that there is no scope for future development in this research domain.The top left quadrant consists of the niche themes that have been fully developed but have less relevance in the research domain.Thus, these themes are of limited importance for the research field.Topics such as mathematical modeling, Bayesian networks, and cascading failures are niche themes.The bottom left quadrant represents the themes with low density and centrality.These are the topics that have been less developed and thus have exceptionally low relevance.These themes are the emerging themes.Topics such as consideration of CO 2 emissions and energy and cost were underexplored but are emerging themes in the research domain.Network analysis, simulation, and hydraulic models are emerging themes that have been significantly explored.Thus, future works should focus on the consideration of CO 2 emissions, energy, and cost for multi-objective design of WDNs, as well as development of the hydraulic models for network analysis and simulation.

Thematic Clustering and Research Trends
A thematic map is generated using the author keywords to determine the basic themes, niche themes, emerging themes, widely explored, and unexplored themes.The thematic map generated is shown in Figure 10.The size of ae bubble represents the number of articles that have explored the keywords within a theme.Thus, bigger bubble sizes represent a greater number of articles

Discussion
The results of the bibliometric analysis included several insights into the development of research on the reliability of WDNs.One important finding is that the number of publications in the field has increased over the last two decades.This was discussed in a recent article by [53], who reported an increasing trend in the number of articles produced in the field of WDN design.Article [8] is found to be the most cited article; this finding coincides with the study by [52], which showed that there is an increasing trend in articles considering resilience in WDN design.Farmani R. and Tanyimboh T.T. are the most cited authors; they have published articles on the use of EAs for WDN design, reliability assessment, and the use of RSMs, such as entropy and resiliency.This aligns with the conclusions by [52,53] and the increase in the use of EAs and RSMs for WDN design.Article [77] is found to be the article with the highest page rank in the co-citation network; this is because it is the EPANET manual, that has been employed in numerous studies as the simulation tool.Further exploration of articles is conducted to determine the research areas and the associated research gaps and presented in the following sub sections.

Major Research Areas
The major research areas are classified into three main categories: 1.
Reliability-based single and multi-objective design of WDNs: The works in this category consists of the application and development of improved optimization tools for reliability-based design of WDNs.Various traditional methods, such as LP, NLP, and MINLP, were employed by past studies.Subsequently, advanced techniques, such as evolutionary algorithms, were employed and were found to possess several advantages over the traditional approaches.Article [11], published in JWRPM, is one of the most cited publications in this field.They formulated the problem as a multi-objective optimization problem for cost minimization and reliability maximization (estimated using resilience as surrogate) using NSGA-II as the optimization tool.Article [95], published in JWRPM, is also a highly cited study employing GA for solving a reliability-based single objective problem: cost minimization keeping reliability as a constraint, considering uncertainty in nodal demands using MCS.Some of the recent publications in this field are [12], that incorporates Self-Adaptive Differential Evolution (SADE) for reliability-based design of WDNs; [96], that presented a Dynamic Adaptive Approach for WDN design; and [97], that presented Self-Adaptive Solution-Space Reduction Algorithm for WDN design.

2.
Reliability Assessment Models: Development of reliability assessment techniques are the focus of this field.Some of the oldest techniques include minimum cut set method and simple probabilistic approach for mechanical reliability estimation; and MCS, FORM, and network reliability factor for hydraulic reliability estimation.Later, several RSMs evolved, including resilience, network resilience, and entropy etc., that have the advantage of reduced computational time compared to the traditional approaches.Article [8], published in Urban Water Journal, is one of the most cited articles in the field; it introduced resilience as a surrogate for reliability.Article [9], published in JWRPM, is another highly cited article; it introduced network resilience as an RSM.Several studies focused on performance assessment of one or more of these RSMs [6,13,79,98,99].Studies reported different findings in terms of the performance of these RSMs under different conditions.

3.
Consideration of energy, life cycle cost (LCC), and GHG emissions for expansion and rehabilitation of WDNs: The research in this field focuses on the consideration of aspects such as optimal design and expansion, considering aspects such as LCC, energy consumption, and GHG emissions.Article [11] is a highly cited article that considered LCC in the formulation of WDNs.Article [100] is another highly cited article that presented a model for minimization of LCC and CO 2 emissions and found that minimizing CO 2 emissions can be achieved at a higher LCC.Article [101] presented a dynamic design for the expansion and rehabilitation of WDNs.It found that a dynamic design led to more reliable and lower cost networks.Articles [102,103] presented a enhanced evolutionary algorithm frameworks for the expansion of WDNs considering LCC in their model, and found that this framework led to better solutions when compared to traditional approaches.

Research Gaps
The major research gaps in each of the research areas are as follows: 1.
Reliability-based single and multi-objective design of WDNs: Many optimization tools are provided in the literature.Different studies have shown different tools to be efficient for solving the WDN design problem.There is, however, a need to present a comprehensive review of the various optimization tools and to test their suitability, their advantages, and their drawbacks.The suitability of these algorithms has mostly been tested for consideration of objectives such as cost and reliability.A detailed analysis in terms of how the algorithms will perform on consideration of other objectives, such as GHG emissions, vulnerability etc., should be performed.

2.
Reliability Assessment Models: The last two decades have shown an emerging trend in the usage of RSMs.Studies presented and tested these RSMs under various conditions.However, different studies reported different conclusions regarding the suitability of these RSMs.Few studies compared the performance of the various RSMs and the outcomes varied for different studies.Surrogate measures such as entropy, resilience, and network resilience are based on the pressure and flow conditions that would prevail in case of probable failure scenarios.Some studies focused on the consumer's perspective in terms of the damages caused by incorporating a damage function, when the requirements are not met.Thus, there is a lack of a comprehensive review in terms of the usage of these RSMs, the conditions under which they have been tested and found to be suitable, and their advantages and disadvantages.Findings in terms of which RSMs need further exploration, which RSMs are suitable under what conditions, and which RSMs are unsuitable for some specific conditions must be produced.

3.
Consideration of energy, LCC, and GHG emissions for the expansion and rehabilitation of WDNs: Some studies considered the aspects of energy, LCC, and GHG emissions for the optimal expansion and rehabilitation of WDNs.The benefit of considering these aspects in WDN optimization must be tested by acquiring real data in terms of emissions or cost under different scenarios but when implemented at similar locations.Improvements of the WDN expansion techniques are needed, as new optimization approaches continue to be developed.

Conclusions and Recommendations
WDN reliability is of the most important and relevant areas of research in the field of safe drinking water supply.This study applied bibliometric analysis and review techniques to explore research themes, trends, and gaps in the topic of WDN reliability.The major insights and conclusions from the bibliometric analysis and content review are as follows: 1.
There is an increasing trend in the number of publications in the field of WDN reliability, revealing its growing importance over the past two decades.The number of citations, however, has an alternating increasing and decreasing cycle.

2.
Some of the most cited documents are comprised of articles focused on the introduction of RSMs, such as resilience, network resilience etc.This shows that the use of RSMs has gained considerable momentum over the last two decades.

3.
Bibliographic coupling led to identification of three major areas of publications: reliability, water distribution networks, and optimization.Thus, cluster 1 is comprised of articles on reliability-based design of WDNs, including aspects such as hydraulic and mechanical failures, uncertainty, vulnerability, and redundancy.Cluster 2 is comprised of articles focused on WDN design and modeling, including aspects such as single and multi-objective optimization, rehabilitation, leakage, calibration, etc. Cluster 3 includes articles on the development and application of various optimization tools for WDN design.

4.
Thematic maps revealed that the consideration of cost and energy constitute one of the emerging trends in this field.Detailed analysis of the articles sheds light on the need to assess the suitability and performance of various RSMs in WDN analysis.
The insights and findings from this study can be useful in terms of forming the scope of the upcoming editions for journals and conferences.These can focus on emerging themes such as consideration of CO 2 emissions and energy cost for WDN reliability.The identified research must include employing novel multi-objective optimization tools, extensively exploring and comparing RSMs, and incorporating environmental factors such as GHG emissions in WDN design and expansion.These needs can help formulate the objectives of future research.

Figure 1 .
Figure 1.Classification of reliability assessment approaches.

Figure 1 .
Figure 1.Classification of reliability assessment approaches.

Figure 3 .
Figure 3. Number of articles and number of citations for articles published on the topic of reliability of WDN.

Figure 3 .
Figure 3. Number of articles and number of citations for articles published on the topic of reliability of WDN.

25 Figure 4 .
Figure 4. Citation network of articles: the node size represents citation number; the link between different nodes represent the collaboration between them; the distance between nodes represents the association between them in terms of co-citation.

Figure 4 .
Figure 4. Citation network of articles: the node size represents citation number; the link between different nodes represent the collaboration between them; the distance between nodes represents the association between them in terms of co-citation.

Figure 5 .
Figure 5.A network map of co-authorship for different authors, node size represents the number of publications for each author and the link thickness represents the collaboration extent.

Figure 5 .
Figure 5.A network map of co-authorship for different authors, node size represents the number of publications for each author and the link thickness represents the collaboration extent.

Figure 6 .
Figure 6.Collaboration network for countries.Node size represents the number of publications and link represents the collaboration among the countries.

Figure 6 .
Figure 6.Collaboration network for countries.Node size represents the number of publications and link represents the collaboration among the countries.

Figure 8 .
Figure 8. Clusters based on author keywords.Node size represents the number of occurrences of the keyword and the link between two nodes represent their simultaneous occurrences in articles.

Figure 9
Figure9shows the evolution of author keywords over time.The color of the node represents the period when it has been most employed, the size of the node represents the number of times the specific keyword was used, and the links represents the co-occurrence in different articles.The use of keywords like entropy, rehabilitation, and algorithms dates to 2005.The use of keywords such as water distribution network(s), optimization, reliability, redundancy, hydraulic analysis seems to gain momentum in 2010.Keywords such as resilience, genetic algorithm, mechanical reliability, and water quality gained importance between 2010 to 2015.Whereas keywords such as hydraulic reliability, uncertainty, and vulnerability evolved hugely in 2015.The keywords significant employed in recent years include EPANET, MATLAB, and calibration.Water 2023, 15, x FOR PEER REVIEW 16 of 25

Figure 9 .
Figure 9. Author keyword evolution over time.Node size represents the number of occurrences of the keyword and the link between two nodes represent their simultaneous occurrences in articles.

Figure 10 .
Figure 10.Thematic map based on author keywords.Node size represents the number of articles on a particular theme.

Figure 9 .
Figure 9. Author keyword evolution over time.Node size represents the number of occurrences of the keyword and the link between two nodes represent their simultaneous occurrences in articles.

Figure 9 .
Figure 9. Author keyword evolution over time.Node size represents the number of occurrences of the keyword and the link between two nodes represent their simultaneous occurrences in articles.

Figure 10 .
Figure 10.Thematic map based on author keywords.Node size represents the number of articles on a particular theme.

Figure 10 .
Figure 10.Thematic map based on author keywords.Node size represents the number of articles on a particular theme.

Table 1 .
Top ten articles with highest number of citations.Local Citations, GC = Global Citations, NLC = Normalized Local Citations, NGC = Normalized Global Citations.

Table 1 .
Top ten articles with highest number of citations.authorswereextracted from 34 countries.The ten most influential authors by number of citations and publications are presented in Table2.The statistics presented in Table

Table 2 .
Top ten most influential authors ranked by total number of citations.
Notes: TC: Total Citations, NP: Number of Publications, PY_start: Publication start year.

Table 3 .
Countries collaboration, production, and citation data.Number of Publications, TC: Total Citations, TNP: Total Number of Publications for the cluster, TTC: Total Number of Citations for the cluster.

Table 3 .
Countries collaboration, production, and citation data.
Notes: NP: Number of Publications, TC: Total Citations, TNP: Total Number of Publications for the cluster, TTC: Total Number of Citations for the cluster.
Urban Water Journal started publishing in 2000 and has the second highest number of citations.It has 20 publications in the research domain.However, it has a high value of citations per publications.Water Resources Management (WARM) started publishing in this field later than other journals, beginning in 2008, but still has the third highest number of citations and second highest number of publications.It should be noticed JWRPM was established in 1993, WARM in 1987, and the Urban Water Journal in 2000.This may be the reason for lower number of publications in Urban Water Journal.

Table 4 .
Top ten most cited journals.

Table 5 .
Clustering of co-cited articles.

Table 6 .
Frequently used author keywords.