Fecal Indicator Bacteria Data to Characterize Drinking Water Quality in Low-Resource Settings: Summary of Current Practices and Recommendations for Improving Validity

Fecal indicator bacteria (FIB) values are widely used to assess microbial contamination in drinking water and to advance the modeling of infectious disease risks. The membrane filtration (MF) testing technique for FIB is widely adapted for use in low- and middle-income countries (LMICs). We conducted a systematic literature review on the use of MF-based FIB data in LMICs and summarized statistical methods from 172 articles. We then applied the commonly used statistical methods from the review on publicly available datasets to illustrate how data analysis methods affect FIB results and interpretation. Our findings indicate that standard methods for processing samples are not widely reported, the selection of statistical tests is rarely justified, and, depending on the application, statistical methods can change risk perception and present misleading results. These results raise concerns about the validity of FIB data collection, analysis, and presentation in LMICs. To improve evidence quality, we propose a FIB data reporting checklist to use as a reminder for researchers and practitioners.


Introduction
Assessing microbial contamination in drinking water is crucial to verify water safety, understand baseline conditions, validate preventive interventions, and investigate disease outbreaks [1]. Fecal indicator bacteria (FIB) values are widely used to characterize microbial contamination [2,3], and there are multiple ways to assess FIB presence and concentration. These include presence/absence, most probable number (MPN), and colony count methods (e.g., membrane filtration, plating, or gel) [2,4,5]. The membrane filtration method is considered a gold standard in quantitative FIB testing and recommended by the American Public Health Association (APHA), American Water Works Association (AWWA), and Water Environment Federation (WEF) in Standard Methods for the Examination of Water and Wastewater [6].
To ensure the validity and reproducibility (i.e., replicable sampling and testing protocol) of the membrane filtration test results, step-by-step instructions are available, in Standard Methods and other guidelines [4,6,7]. The instructions focus primarily on sample collection steps and precautions, preservation and storage, laboratory quality control (e.g., personnel, facility, equipment, supply), media preparation, analytical quality control (e.g., plate counts comparison, control culture, duplicate analysis, sterility checks), data handling, and documentation and record-keeping.

Materials and Methods
The study consisted of two investigations: (1) a systematic review of FIB data reporting in the published literature, and (2) analysis of selected example FIB datasets to demonstrate how FIB data presentation and analysis impact results.

Systematic Review of FIB Data Reporting
We completed a systematic review to identify how FIB data are currently collected, analyzed, and reported in the published literature including the development of (1) search strategy, (2) inclusion criteria, (3) selection and processing strategy, and (4) result synthesis. Each step of this systematic review process is summarized below.

Search Strategy
The databases Ovid Medline (PubMed), Scopus, and Web of Science were searched using a set of search terms related to three themes: low-and middle-income countries (LMICs), FIB, and drinking water, excluding pharmaceutical and agricultural terms ( Figure 1). Individualized search strings were developed for each database using appropriate field tags and Boolean operators. We finalized the search in July 2020 to include papers published up to this date.

Materials and Methods
The study consisted of two investigations: (1) a systematic review of FIB data reporting in the published literature, and (2) analysis of selected example FIB datasets to demonstrate how FIB data presentation and analysis impact results.

Systematic Review of FIB Data Reporting
We completed a systematic review to identify how FIB data are currently collected, analyzed, and reported in the published literature including the development of (1) search strategy, (2) inclusion criteria, (3) selection and processing strategy, and (4) result synthesis. Each step of this systematic review process is summarized below.

Search Strategy
The databases Ovid Medline (PubMed), Scopus, and Web of Science were searched using a set of search terms related to three themes: low-and middle-income countries (LMICs), FIB, and drinking water, excluding pharmaceutical and agricultural terms (Figure 1). Individualized search strings were developed for each database using appropriate field tags and Boolean operators. We finalized the search in July 2020 to include papers published up to this date. "Drinking water" OR "Water purif*" OR "Water qualit*" OR "Water treatment*" OR "Potable water" AND "Escherichia coli" OR "E. coli" OR "E-coli" OR "E coli" OR "E-Coli" OR "E Coli"OR "Total colifor*" OR "Fecal colifor*" OR "Feacal colifor*" OR "Thermotolerant colifor" AND "LMIC" OR "low and middle income countr*" OR "low-and-middle-income" OR "low income country" OR "low-income-country" OR "middle income country" OR "middle-income-country" OR afghanistan OR libya OR albania OR macedonia OR algeria OR madagascar OR "American Samoa" OR malawi OR angola OR malaysia OR armenia OR maldives OR azerbaijan OR mali OR bangladesh OR "Marshall Islands" OR belarus OR mauritania OR belize OR mauritius OR benin OR mexico OR bhutan OR micronesia OR bolivia OR moldova OR bosnia OR herzegovina OR mongolia OR botswana OR montenegro OR brazil OR morocco OR bulgaria OR mozambique OR "Burkina Faso" OR myanmar OR burundi OR namibia OR "Cabo Verde" OR nepal OR cambodia OR nicaragua OR cameroon OR niger OR "Central African Republic" OR "CAR" OR nigeria OR chad OR pakistan OR china OR palau OR colombia OR panama comoros OR "Papua New Guinea" OR congo OR paraguay OR congo OR peru OR "Costa Rica" OR philippines OR "Ivory Coast" OR "Cote d'Ivoire" OR romania OR cuba OR rwanda OR djibouti OR samoa OR dominica OR "Sao Tome" OR principe OR "Dominican Republic" OR senegal OR ecuador OR serbia OR egypt OR "Sierra Leone" OR "El Salvador" OR "Solomon Islands" OR eritrea OR somalia OR ethiopia OR "South Africa" OR fiji OR "South Sudan" OR gabon OR "Sri Lanka" OR gambia OR "St. Lucia" OR "Saint Lucia" OR georgia OR "St. Vincent" OR "Saint Vincent" OR grenadines OR ghana OR sudan OR grenada OR suriname OR guatemala OR swaziland OR guinea OR syrian OR syria OR guinea-bissau OR tajikistan OR guyana OR tanzania OR haiti OR thailand OR honduras OR timor-leste OR "Timor Leste" OR india OR togo OR indonesia OR tonga OR iran OR tunisia OR iraq OR turkey OR jamaica OR turkmenistan OR jordan OR tuvalu OR kazakhstan OR uganda OR kenya OR ukraine OR kiribati OR uzbekistan OR korea OR vanuatu OR kosovo OR vietnam OR "Kyrgyz Republic" OR kyrgyzstan OR "West Bank" OR gaza OR lao OR laos OR yemen OR lebanon OR zambia OR lesotho OR zimbabwe OR liberia OR "middle-east" OR "middle east" OR "Africa" OR "Sub-Saharan Africa" OR "Central America" OR "Latin America" OR "Caribbean" OR "South America" OR "Central Asia" OR "East Asia" OR pacific OR "South Asia" OR "Asia" OR "South-east Asia" OR "southeast Asia" OR "South east Asia" AND NOT (in key words, titles, abstracts) only title for web of science, "Multidrug resistance" OR "Antibacterial" OR "Anti-bacterial" OR "Endocrine disruptor" OR "Antibiotic*" OR "Polymerase Chain Reaction*" OR "genetic*" OR "Drug" OR "antigen" AND "removal" OR "reduc*" OR "decrease*" OR "purify*" OR "control*" AND "Diarrhea*" OR "Diarrhoea*" OR "cholera" OR "health" OR "waterborne disease*"

Inclusion Criteria
Inclusion criteria were developed following the populations, interventions, comparisons, outcomes, and study types (PICOS), adapted for laboratory datasets [37]. The population for this review consisted of FIB test results collected from source or household drinking water samples in LMICs, as defined by the World Bank Income groups in 2018 [38]. To be included, FIB had to be analyzed with the membrane filtration method for total coliform, thermotolerant (fecal) coliform, or Escherichia coli; we limited FIB search to the three coliform types because membrane filtration is generally recommended for those three groupings of bacteria [6]. No interventions or comparisons were required for inclusion. Manuscripts were included if the outcome of quantitative analysis of FIB was reported. All study types (i.e., observational and experimental) were eligible for inclusion. Review documents were not included, but individual references in review documents were screened for inclusion. Manuscripts published in English between 1 January 2000 and 25 July 2020 were included in the review. The literature review is reported following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [39].

Selection and Processing
Search results were merged, and the duplicates were removed using EndNote X8.1 (Philadelphia, PA, USA). Unique articles were then screened by title, abstract, and full text using exclusion criteria in Microsoft Excel for Office 365 (Redmond, WA, USA). At title screening, manuscripts not in LMICs or not with source and household water samples were excluded. At abstract screening, in addition to those from title screening, non-membrane filtration FIB testing methods and non-drinking water samples were excluded. In full-text screening, only manuscripts that reported quantitative results from membrane filtration FIB testing of source or household drinking water samples in LMICs were included.
Studies were independently double-screened by a team of four research assistants and the primary study author. Data were extracted from included studies in a detailed coding sheet that included title, journal, year of publication, abstract, digital object identifier, study type, main objective, description of the sample collection, membrane filtration and bacterial enumeration steps, data preparation, presentation, and statistical techniques applied to the data. Results from independent readers were matched, and differences were resolved by consulting with authors.

Result Synthesis
Results were synthesized initially by two broad categories: (1) sample collection and processing, and (2) data preparation and analysis. Within each broad category, results were summarized using percentage of manuscripts that included a particular step by subcategory, including, for sample collection and processing, whether manuscripts referenced a standard method, sample collection procedures, and membrane filtration procedures. Please note that steps were identified using Standard Methods. For data preparation and analysis, subcategories included how data were prepared, characterized using descriptive statistics and visualization, and analyzed using statistical methods for correlations and associations. Please note that we did not complete meta-analysis on the data, as that was not necessary for this research and has been completed elsewhere [15]. Additionally, a risk of bias assessment was not completed for each included manuscript, as part of the research question was to determine biases present in the data.

Analysis of Selected Example FIB Data Sets
To elucidate the impact of various methods for sample collection and processing and data analysis of FIB data identified in the systematic review, we used two publicly available FIB datasets: the 2012-2013 Bangladesh and 2014-2015 Congo Multiple Indicator Cluster Surveys (MICS) [20,40]. In the Bangladesh survey, data were collected between December 2012 and April 2013 from urban and rural areas of seven administrative divisions, covering 20,903 households. In the Congo survey, data were collected between November 2014 and February 2015 from urban and rural areas of 12 administrative departments, covering 12,811 households. MICS datasets were selected for this analysis because of their large sample size representing the full country and to avoid any potential bias by using other secondary data which were collected for a different purpose (i.e., intervention effectiveness or performance evaluation). Congo and Bangladesh surveys were selected because these were the first two national MICS surveys with FIB data completed in South Asia and Africa.
During both surveys, a microbial water quality test was completed for some, but not all, surveyed households and associated water sources. In Bangladesh, 2582 (5%) households and 2532 sources were tested; in Congo, 1486 (12%) households and 1277 sources were tested. Water sample collection and testing procedures were the same in both countries and are fully described in respective reports [40,41]. In summary, household samples of 100 mL were collected by asking for "a glass of water that members of the household would drink" and source samples were collected directly from the source by asking "is it possible to visit the water source from where the drinking water was collected?", and then enumerators walked to that source after the survey. Enumerators filtered the 100 mL sample through a 0.45 micron filter paper and placed that filter on Compact Dry EC growth medium plates (Nissui, Japan). Separately, 1 mL from the sample was pipetted onto a different Compact Dry EC plate. Plates were incubated in ambient temperature or using incubation belts for 24 h, after which the number of red/purple and blue colonies were recorded. Plates with no colonies were reported as 0, and plates with 100 or more colonies were reported as 100. Please note that the exact location where tests were processed was not specified for each household. However, MICS guidelines recommend processing samples on site if possible or processing in a convenient location by collecting water in WhirlPak ® bags (Nasco, Fort Atkinson, WI, USA). If the transportation time was >30 min, samples were placed in a cooler with ice [42].
Raw data were downloaded in comma-separated values (CSV) file format from the MICS United Nations International Children's Emergency Fund (UNICEF) website (https: //mics.unicef.org/, accessed on 25 February 2021). Both datasets contained E. coli (blue colonies) and non-E. coli coliform (red colonies) results in household and corresponding source samples in CFU for 1 mL and 100 mL dilutions. We prepared data for analysis by aggregating results as follows: (1) removing data where the 100 mL sample was 0 and 1 mL was >0 (considered an unreliable result; Bangladesh 3%, Congo 15%); (2) when both values were >0, calculating the geometric mean of 100 mL count value and 100 × 1 mL count value; (3) including the 100 mL count value directly when the 1 mL value was 0.
Based on the systematic review results, we then completed five analyses on both the Bangladesh and the Congo datasets, to assess the impact of different data replacement, descriptive statistic calculations, visualizations, hypothesis test, and correlation test methods.
We applied three different data replacement methods on Bangladesh and Congo household E. coli data to show the impact of data replacement method on data distribution. The three scenarios were as follows: (a) removing below detection limit (BDL) and above detection limit (ADL) samples; (b) replacing BDL and ADL samples with the detection limit (i.e., BDL = 0 CFU/100 mL and ADL = 100 CFU/100 mL or 1000 CFU/100 mL); (c) replacing BDL values of 0 counts with 0.5 CFU/100 mL, and ADL values (of 100 or 1.00 count value) by adding 1 to the detection limit (i.e., 101 or 1001 CFU/100 mL). We compared the distributions of the log-transformed data using Wilcoxon signed rank tests and Student's t-test. We used the scenario c for further analysis.
To demonstrate the importance of reporting adequate descriptive statistics, we described the household E. coli datasets for Bangladesh and Congo grouped by urban and rural areas using 11 parameters (arithmetic mean, geometric mean, standard deviation, geometric standard deviation, median, 25th and 75th percentiles, interquartile range, 10th and 90th percentiles, minimum and maximum, skewness, and kurtosis).
Additionally, to display the usefulness of visualization techniques, we presented the household E. coli Bangladesh and Congo data using a bar plot, boxplot, scatter plot (together with source E. coli), and map (the data were grouped by the 12 departments of Congo and seven divisions of Bangladesh, except for the scatter plot). We selected the four visualization techniques because they were commonly used in the reviewed articles.
To present the importance of hypothesis test methods, we compared Bangladesh household E. coli concentration of two districts and Congo household E. coli concentration of two divisions using Wilcoxon rank sum test and chi-squared test. We selected two methods appropriate for continuous data and binary variables. The independent observation and random sampling assumptions of Wilcoxon and chi-squared tests were met because each observation was from an independent household and the MICS surveys were designed to randomly select households form the population.
The effect of correlation method selection was demonstrated using Bangladesh and Congo household E. coli and coliform (non-E. coli) data. We applied Pearson and Spearman correlation methods on the dataset to demonstrate the change in correlation coefficients. The same methods were applied to the log-transformed data to demonstrate the effect of data transformation on the skewed dataset. All data processing and analysis steps were completed in R (Vienna, Austria) [43].
Lastly, on the basis of our results, we developed a checklist to consider when membrane filtration-based FIB data are used and reported. The "sample collection" and "membrane filtration" sections of the checklist included critical steps following the recommended methods to understand the field procedure and any adaptation from the guidelines. The "enumeration" and "statistical analysis" sections of the checklist included critical data preparation and statistical steps that will support researchers to communicate the results and readers to understand the findings.

Systematic Review Results
To complete our study objectives, we conducted a systematic literature review. Overall, 2251 manuscripts with FIB data in LMICs were identified in the initial and follow-up searches, 1850 unique articles remained after removing duplicates, 1107 were included after title screening, 301 were included after abstract screening, and 171 were included for data extraction after full-text review ( Figure 2). The final set of manuscripts represented studies from 48 LMICs. The five most represented countries were Bangladesh in 22, India in 14, Kenya in 12, Cambodia in 11, and South Africa in nine manuscripts.

Sample Collection and Processing
A total of 95 (56%) manuscripts included a reference to a standard method that was followed for data collection and analysis; the most common referenced method was APHA/AWWA/WEF Standard Methods 58 (34%) ( Table 1). Concurrently, 76 (44%) did not reference a standard method (e.g., recommended methods, manufacturer guidance, or published literature). Please note that sometimes manuscript authors referred to following Standard Methods and did not include any other sampling details in the manuscript.
Standard Methods suggest five key steps for sample collection and transport: (1) collect in sterile glass or plastic; (2) use sodium thiosulfate (to inactivate any chlorine or bromine present and prevent ongoing disinfection); (3) collect a representative sample from the source; (4) if not analyzed within 1 h, place on ice and maintain temperature of <10 • C; (5) for drinking water samples, begin analysis within 6 h of collection and, for non-drinking water samples, begin analysis within 24 h. A total of 66 (39%) studies reported any sample collection sterility information (e.g., sterile vial/bag, hand sanitization before collection), 34 (20%) studies reported using sodium thiosulfate, 102 (60%) reported storing the sample at "low" temperature (from 2-8 • C), and 94 (55%) reported the time between sample collection and membrane filtration (2-48 h). Of the 94 reporting the time, 50 (53%) met the criteria of analysis begun within 6 h of collection and completed within 8 h.

Sample Collection and Processing
A total of 95 (56%) manuscripts included a reference to a st followed for data collection and analysis; the most common APHA/AWWA/WEF Standard Methods 58 (34%) ( Table 1). Co not reference a standard method (e.g., recommended methods, or published literature). Please note that sometimes manuscrip lowing Standard Methods and did not include any other samp script. Table 1. Literature review results on reporting the sample collection, an BDL, below detection limit; ADL, above detection limit.

Reported Topics
Collection Included a reference to standard method Included any sample collection sterility information Used sodium thiosulfate Reported storing the sample in "low" temperature (range: 2-8 °C) Reported time between sample collection and membrane filtration (range: Reported starting the test in 6 h and/or completed in 8 h Analysis  Standard Methods suggest that, in membrane filtration, sterile apparatus should be used, positive and negative controls at the beginning and end of sampling should be completed, 10% of the plates should be duplicated, 5% should be blank, appropriate dilutions based on water quality should be selected, samples should be filtered and placed on a selective medium-soaked pad, and then those samples should be incubated at an appropriate temperature for the appropriate time. In systematic review results (Table 1), 31 (18%) reported using blank samples (negative controls) to check sterile procedures, 44 (26%) reported using duplicate samples to check analysis precision, 41 (24%) reported using multiple appropriate dilutions based on water source, 80 (47%) reported the volume of filtered sample water, 126 (74%) reported the name/type of the growth medium, 112 (65%) reported incubation temperature, and 110 (64%) reported incubation time.

Data Preparation and Analysis
Standard Methods do not provide specific analysis techniques, but recommend discarding data if controls are contaminated, only counting plates where a certain number of colonies have grown (dependent on media; e.g., 20-80 colonies, and no more than 200-250), only including in analysis "countable" plates, reporting BDL and ADL results, and that data are likely to be skewed and should be log-transformed. Please note that ADL samples are referred to as too numerous to count (TNTC) in FIB reporting. Regarding data preparation, 82 (48%) manuscripts reported using >1 dilution, 35 (20%) manuscripts reported how values from multiple dilutions were aggregated, 84 (49%) reported the percent of BDL and ADL samples, 46 (27%) reported how BDL results were handled, 49 (29%) reported how ADL results were handled, and 73 (43%) studies reported log-transforming data (Table 1).
To compare to standards, 50 (29%) studies converted data into categorical data and categorized data according to the World Health Organization (WHO)'s risk categories [1], and 45 (26%) studies converted the data into binary data to compare to WHO's guideline value of <1 FIB/100 mL [1] or secondary guideline of "intermediate risk" at <10 CFU/100 mL [1]. In the data, 40 (23%) studies used the 1 CFU/100 mL cutoff value, and five (3%) studies used 10 CFU/100 mL.
In assessing association methods, we found that 18 (11%) manuscripts reported the use of Pearson correlation coefficients, and 12 (7%) studies reported the use of Spearman correlation coefficients. Odds ratios were reported by 24 (14%) studies and risk ratio was reported by 10 (6%) studies. Please note that the use of advanced statistical techniques (e.g., multivariate regression models) was outside the scope of this review.

Analyses Using Example Dataset
According to the systematic review data, we demonstrate the use of five common FIB data analysis methods for the publicly available Congo and Bangladesh FIB datasets. We assessed the utilization of different data replacement methods, descriptive statistic calculations, visualization tools, hypothesis/comparison test methods, and correlation test methods.

Data Replacement Methods
To document any impact on results of different BDL and ADL replacement methods, household Escherichia coli (E. coli) CFU/100 mL datasets from Bangladesh and Congo were analyzed. According to recommendations from Standard Methods [44] and what was reported in the systematic review, we prepared data using three methods: (a) removed censored data; (b) replaced BDL and ADL with the detection limit; (c) replaced BDL with 0.5 and ADL with adding 1 to the detection limit. Histograms of log-transformed data are presented in Figure 3. 5%), and McNemar's test (4, 2%). Of those (n = 63) who reported using parametric tests, only five (8%) studies reported completing any data normality assumption check (e.g., Shapiro-Wilk test, quantile-quantile (QQ) plot, histogram).
In assessing association methods, we found that 18 (11%) manuscripts reported the use of Pearson correlation coefficients, and 12 (7%) studies reported the use of Spearman correlation coefficients. Odds ratios were reported by 24 (14%) studies and risk ratio was reported by 10 (6%) studies. Please note that the use of advanced statistical techniques (e.g., multivariate regression models) was outside the scope of this review.

Analyses Using Example Dataset
According to the systematic review data, we demonstrate the use of five common FIB data analysis methods for the publicly available Congo and Bangladesh FIB datasets. We assessed the utilization of different data replacement methods, descriptive statistic calculations, visualization tools, hypothesis/comparison test methods, and correlation test methods.

Data Replacement Methods
To document any impact on results of different BDL and ADL replacement methods, household Escherichia coli (E. coli) CFU/100 mL datasets from Bangladesh and Congo were analyzed. According to recommendations from Standard Methods [44] and what was reported in the systematic review, we prepared data using three methods: (a) removed censored data; (b) replaced BDL and ADL with the detection limit; (c) replaced BDL with 0.5 and ADL with adding 1 to the detection limit. Histograms of log-transformed data are presented in Figure 3.  As can be seen, the form of FIB distributions changed depending on BDL/ADL replacement method. All pairwise comparisons were statistically significantly different (all Wilcoxon signed rank test, p < 0.001). Means of log-transformed values were also significantly different (all t-test, p < 0.001) between the BDL/ADL replacement methods.

Descriptive Statistics
In Standard Methods, it is recommended to use the geometric mean for estimating central tendency, except in risk assessment, where the arithmetic mean may provide a greater safety factor [8]. It is also noted that the data will be skewed. In the systematic review, the most commonly reported descriptive statistic was the frequency of WHO risk category, and distribution information was rarely reported. While WHO categorization could effectively convey the risk, understanding the data properties is important for further statistical analysis. To demonstrate the effect of descriptive statistics selection, 11 different descriptive statistics were applied to the Bangladesh and Congo FIB datasets stratified by urban and rural areas ( Table 2). As shown in Table 2, the selection of a descriptive statistic influences the results. In particular, geometric mean was consistently one WHO risk category [1] below the arithmetic mean. Reporting additional descriptive statistics can characterize FIB distributions and justify statistical test selection. For example, standard deviation and interquartile range can help understand data variability; as FIB data are generally skewed, reporting 25th, 50th (median), and 75th percentiles is useful in detecting outliers or extreme values which can result from multiplying with dilution factors; refined values for percentiles (e.g., 5th, 10th, 90th, and 95th) are informative to understand data spread; data range (minimum and maximum) presents detection limits of the FIB test; skewness (measure of distribution symmetry) and kurtosis (measure of distribution tail extensions) characterize the extent of deviation of the FIB data from a normal symmetrical distribution. For instance, none of the four groups had skewness and kurtosis values close to 0 and 3 (typical for a normal distribution), respectively. This suggests that the Bangladesh and Congo data did not follow the normal distribution and, thus, parametric tests are likely to be inappropriate and data transformation or nonparametric tests are more suitable for analysis than traditional parametric tests. In fact, the presented results suggested that samples from urban areas had higher FIB concentration than rural areas in Bangladesh and samples from rural areas had higher FIB concentration than urban areas in Congo.

Visualizations
As found in the systematic review, data visualizations were provided in slightly over half of the manuscripts. Appropriate FIB data visualization can aid reporting by emphasizing relevant characteristics (e.g., distribution, risk category proportions, spatial and temporal variation, and correlation). To demonstrate the impact of different visualizations on data interpretation, we visualized data from Bangladesh and Congo using four plot types frequently seen in the systematic review: bar chart using WHO risk categories, box plot presenting E. coli concentration distribution grouped by administrative units, scatter plot demonstrating correlation between two variables, and maps to communicate spatial variation of the E. coli concentration (Figures 4 and 5). Perception of information provided by different visualization tools could be severely affected when used without understanding plot limitations. For example, if the objective is to present the water quality using WHO risk category, a bar plot illustrating the composites of samples with very high, high, medium, and low concentration by location (Figures 4a and 5a) would be a useful approach. Similarly, a box plot will clearly illustrate the distribution of E. coli concentration in the water samples (Figures 4b and 5b), a scatter plot will be useful to present the relationship between two comparable variables (Figures 4c and 5c), and a map will visualize spatial variations of E. coli concentrations aggregated by mapping unit (Figures 4d and 5d).

Visualizations
As found in the systematic review, data visualizations were provided in slightly ov half of the manuscripts. Appropriate FIB data visualization can aid reporting by emp sizing relevant characteristics (e.g., distribution, risk category proportions, spatial a temporal variation, and correlation). To demonstrate the impact of different visualizatio on data interpretation, we visualized data from Bangladesh and Congo using four p types frequently seen in the systematic review: bar chart using WHO risk categories, b plot presenting E. coli concentration distribution grouped by administrative units, scat plot demonstrating correlation between two variables, and maps to communicate spa variation of the E. coli concentration (Figures 4 and 5).   Perception of information provided by different visualization tools could be sever affected when used without understanding plot limitations. For example, if the object is to present the water quality using WHO risk category, a bar plot illustrating the co posites of samples with very high, high, medium, and low concentration by location (F ures 4a and 5a) would be a useful approach. Similarly, a box plot will clearly illustrate t distribution of E. coli concentration in the water samples (Figures 4b and 5b), a scatter p will be useful to present the relationship between two comparable variables (Figures  and 5c), and a map will visualize spatial variations of E. coli concentrations aggregated mapping unit (Figures 4d and 5d).

Hypothesis/Comparison Testing
As found in the systematic review, one-quarter of manuscripts reported testing h pothesis via group comparisons. These comparisons can be completed using differe data types, e.g., using binary, categorical, or continuous data. In this analysis, we co pared FIB concentration between two adjacent administrative divisions in Banglade (Rajshahi and Khulna) and two adjacent departments in Congo (Kouilou and Poin Noire), using two approaches (Table 3). First, we used the original continuous data a then we concerted continuous values into binary variable by using two cutoffs of ≥1 a ≥10. In Bangladesh data, the result from the Wilcoxon rank sum test (used because t data were not normally distributed) indicated that the E. coli levels in household samp

Hypothesis/Comparison Testing
As found in the systematic review, one-quarter of manuscripts reported testing hypothesis via group comparisons. These comparisons can be completed using different data types, e.g., using binary, categorical, or continuous data. In this analysis, we compared FIB concentration between two adjacent administrative divisions in Bangladesh (Rajshahi and Khulna) and two adjacent departments in Congo (Kouilou and Pointe-Noire), using two approaches (Table 3). First, we used the original continuous data and then we concerted continuous values into binary variable by using two cutoffs of ≥1 and ≥10. In Bangladesh data, the result from the Wilcoxon rank sum test (used because the data were not normally distributed) indicated that the E. coli levels in household samples were significantly different between the two divisions. However, the chi-squared test applied to binary variable suggested that the E. coli levels were not statistically different between the two divisions for either cutoff value (≥1 and ≥10). In the Congo data, the Wilcoxon rank sum test and chi-squared test for cutoff ≥10 suggested that the E. coli levels in the household samples were significantly different between the two departments. However, the chi-squared test with cutoff ≥1 suggested that the levels were not statistically different between the two departments.
The method for statistical comparisons should be selected with respect to the research question and statistical properties of the data. For example, if the research question is about detecting the difference between FIB concentration in household samples in different divisions, continuous data may offer a more reliable and consistent inference than data split into categories. However, if the research question is about the difference in the proportion of households with contaminated samples (e.g., FIB values above specific thresholds) between the two divisions, then the binary variable should be used for statistical testing. Additionally, conversion from continuous FIB data to binary using different cutoffs should be completed with caution as the result may change depending on the cutoff threshold, as seen with the Congo data.

Associations
As seen in the systematic review, associations were tested in 36% of the manuscripts, using Spearman and Pearson correlations. Pearson correlation coefficients are suitable for testing linear associations for variables with distributions that are preferably symmetric and close to normal, whereas Spearman correlation is a good alternative for a monotonic relationship and distributions that are slightly skewed. We demonstrated the effect of choosing different correlation techniques in assessing the associations between E. coli and other coliform bacteria concentrations in household water samples in Bangladesh and Congo data ( Figure 6) using multipanel plots [45].
It is commonly assumed that total coliform and E. coli are correlated [5]. As can be seen ( Figure 6), the right-skewed histograms of both variables suggested that the data are nonnormal. Pearson correlation between the two variables yielded a weak association (r = 0.199, p-value < 0.001), while Spearman correlation showed a moderate association (ρ = 0.365, p-value < 0.001) ( Figure 6) from the Bangladesh data. In the Congo data, Pearson correlation suggested moderate association (r = 0.382, p-value <0.001), while Spearman correlation showed stronger association (ρ = 0.559, p-value <0.001) between the two variables. In this case, Pearson correlation is likely to underestimate the true associations picked up by the Spearman correlation coefficient, because Spearman correlation uses ranked (i.e., relative position label as first, second, third, etc.) values, unlike the Pearson correlation coefficient that utilizes the actual FIB values. Thus, correct magnitude of association may not be observed if the method to detect association is applied without considering FIB data properties, especially as the FIB data are generally not normally distributed.
A different approach to study association when distribution is skewed is to apply It is commonly assumed that total coliform and E. coli are correlated [5]. As can be seen (Figure 6), the right-skewed histograms of both variables suggested that the data are non-normal. Pearson correlation between the two variables yielded a weak association (r = 0.199, p-value < 0.001), while Spearman correlation showed a moderate association (ρ = 0.365, p-value < 0.001) ( Figure 6) from the Bangladesh data. In the Congo data, Pearson correlation suggested moderate association (r = 0.382, p-value <0.001), while Spearman correlation showed stronger association (ρ = 0.559, p-value <0.001) between the two variables. In this case, Pearson correlation is likely to underestimate the true associations

Discussion
To understand how FIB data are produced by membrane filtration in LMICs, we conducted a systematic review of the literature and analyzed publicly available datasets. FIB data are collected and analyzed using membrane filtration, and they are reported frequently by researchers in LMIC contexts. In the systematic review, it was found that sample collection and processing steps were under-reported, and different statistical methods were used to analyze data. Analyzing the publicly available datasets, we demonstrated that different statistical methods can significantly change results and/or interpretation; for example, (1) different data preparation methods can change the FIB data distribution, (2) data description parameters can change the FIB risk perception and communicate misleading information, (3) different hypothesis test methods can produce contrasting results, and (4) different statistical correlation methods can produce different levels of association. We describe each of these findings below and propose a checklist for FIB data reporting, analysis, and presentation.
While there are standard methods for membrane filtration sample collection and processing, reporting adherence to these methods in the published literature was limited. It is recommended to have a sample collection plan that adheres to standards and report that when publishing data. This will increase research reliability. Additionally, there are common adaptations to standard methods used in LMICs, including extending holding times before analysis and storing the sample at low temperature ( Table 1). The impacts of these adaptations are not always known, although research has been completed to show limited impact from extending holding time [46] or not having consistent incubation temperature [47], and other research has found significant impact results from not using thiosulfate in sample collection [48]. Further research to determine the impact of commonly used adaptations of membrane filtration for use in LMICs is warranted.
In the systematic review, we found that a variety of statistical techniques were applied to FIB data. With improved data collection and reporting, novel applications of sophisticated analytical methods could advance the use of FIB for in-depth spatiotemporal analysis and modeling [49,50]. However, inadequate data descriptions and frequent use of these tests without proper justification were observed. While the use of well-grounded statistical methods strengthens research inference to report and interpret FIB data, erroneous applications of statistical procedures raise questions about findings and undermine the research validity. Of particular note in the review, (1) while a plethora of literature is available to handle censored environmental datasets [11,34,[51][52][53][54], fewer than one-third of the studies reported steps completed to replace BDL and ADL values; (2) descriptive statistics are universally reported in manuscripts, and, while useful to understand FIB concentrations, the prevalence of reporting only one statistic or only the mean is misleading; (3) clear articulation of the research questions and use of appropriate statistical techniques for testing will ensure valid results; relationships between FIB data and other variables are frequently included in evaluations, and results can impact decision-making and policy.
In some instances, statistical analyses that use FIB counts/concentrations rather than risk categories can be problematic. FIB results have inherent uncertainty because of the spatiotemporal variability of bacteria populations, the patchiness/clumping of bacteria in water, the lack of correlation between E. coli concentrations and pathogen concentrations [55], and dependence on physical and biological conditions [56,57]. In assessing temporal variability of E. coli concentrations, the statistical analysis that treats FIB concentrations as precise is reasonable [52]. However, the use of FIB counts/concentrations to indicate the risk of fecal contamination should be considered with high caution. Furthermore, interpreting statistical results as the precise indication of health risk could be misleading. In such circumstances, categorization (for example, with WHO's risk categories) replaces the FIB values and shifts the focus on reporting the risk category. Although risk categories can obfuscate the "intuitive differences" between FIB concentrations (for example, between 11 and 99 CFU/100 mL), one of the reasons to use them is to avoid overstating precision or confidence when it is uncertain what the difference between 11 and 99 is actually indicating.
As seen with the data analysis examples presented in the results, inappropriate data processing and analysis methods can result in misinterpretations and erroneous conclusions. This needs to be avoided; thus, it is recommended that, in LMIC settings with limited laboratory support, appropriate selection of and reporting of sample collection and processing and data preparation and analysis should be an integral part of FIB research. Additionally, sharing raw and/or processed data can improve the replicability of the results.
To that end, we present a checklist ( Table 4) that can be used to develop a sample collection plan and report results. Table 4. Checklist of recommended parameters to report in manuscripts including membrane filtration data from low-and middle-income countries (LMICs).

Section/Topic Checklist Item
Sample collection 1 Report sample collection equipment and supplies (e.g., sterile bottle/bag/vial) 2 Report if sodium thiosulfate (or equivalent) was used (if chlorinated sample) 3 Report if aseptic procedure was maintained to prevent contamination 4 If not analyzed in one hour, report if <10 • C was maintained 5 If not analyzed immediately, report the time between collection and analysis Membrane filtration 6 Report if positive and negative controls were checked 7 Report the volume of sample filtered 8 Report number, dilution, and/or volume of serial dilutions 9 What diluent was used if any 10 Report the selective growth media 11 Report the incubation time and temperature Enumeration 12 Report the detection minimum/maximum range for enumeration 13 Report aggregation method for serial dilutions 14 Report the number of BDL and ADL samples 15 If BDL/ADL samples were included in analysis, report how values were replaced 16 Report if any data were dropped due to positive/negative controls 17 If a subset of enumerations were verified by a second person Statistical analysis 18 Report if the data were transformed 19 Report if the data were analyzed as count, continuous, categorical, and binary 20 Describe dataset using parameters that justify any following statistical analysis 21 For data visualization, ensure proper tool was selected to aid information communication 22 Provide rationales for the choice of statistical method 23 Report if the data met the assumptions of the selected statistical test While membrane filtration is considered the gold standard in certified laboratories, it is challenging to conduct high-quality membrane filtration testing in research field laboratories in LMIC settings. As we found in the review, quality control steps (e.g., duplicates, blanks) are often not reported. Depending on the trained staff availability, resources, equipment, and research questions, alternative methods (such as MPN methods) may provide reliable results [4,58,59]. If membrane filtration is selected, we propose following and reporting the process using the checklist included in Table 4. By improving FIB data reporting, the quality of publication with FIB data will also improve and have a better chance to reach a broader audience [60].
Limitations of the systematic review were as follows: (1) only peer-reviewed articles published in English were included, and (2) only three electronic citation databases were initially searched. While the inclusion of other languages and more databases may have increased the number of articles, we do not feel these limitations impacted results. Limitations in the publicly available datasets were as follows: (1) exact information on field sampling procedure was not available, and (2) all samples were processed for 1 mL and 100 mL dilutions without consideration of water source; this could have produced more BDL and ADL results than when dilutions are carefully considered. However, as the datasets were used to present examples, we considered that using these datasets was a better option to alternatives such as using a simulated dataset. Additionally, the topics outlined in Table 4 should be viewed as indicative and not as an exhaustive list of param-eters that will meet the reporting needs for all possible FIB data collection, processing, and analysis scenarios. We propose using the checklist as a preliminary tool to assess the inclusion of relevant information. Determining the impact of sample collection and processing and data preparation on more advanced statistical techniques (e.g., regression, time-series analysis) was outside the scope of this manuscript. Lastly, as future research, we recommend completed detailed investigations of the individual issues identified in this review to establish guidelines for FIB data analysis.

Conclusions
Membrane filtration methods are commonly used in LMICs to assess drinking water quality risk. Our review results show that, generally, sample collection and processing techniques and data preparation and analysis methods are inadequately reported and can be inappropriate, which, as seen herein, can lead to misleading results. We found limited reporting of adherence and adaptation in using membrane filtration methods. Additionally, using example datasets, we demonstrated the results of different statistical method selection on FIB data analysis results and/or interpretation. Our example analysis highlights the importance of adequate reporting of FIB data usage in LMIC. Lastly, to standardize FIB data collection, processing, and analysis reporting, we proposed a checklist. We hope the topics discussed in this manuscript will assist researchers to strengthen FIB results and assist reviewers and readers in interpreting FIB results in LMICs.

Data Availability Statement:
We confirm that all data sets, scripts, and images are available from the corresponding author upon request. The list of articles is available from the corresponding author upon request.