Accounting for Dilution of SARS-CoV-2 in Wastewater Samples Using Physico-Chemical Markers

: Most sewer networks collect domestic wastewater and a variable proportion of extraneous water, such as rainwater, through surface runoff and industrial discharges. Accounting for wastewater dilution is essential to properly quantify wastewater particle loads, whether these are molecular fragments of SARS-CoV-2, or other substances of interest such as illicit drugs or microplastics. This paper presents a novel method for obtaining real-time estimates of wastewater dilution and total daily volume through wastewater treatment works, namely when ﬂow data is not available or unreliable. The approach considers the levels of several physico-chemical markers (ammonia, electrical conductivity, and orthophosphate) in the wastewater against their dry-weather levels. Using high-resolution data from the national Wastewater Surveillance Programme of Wales, we illustrate how the method is robust to spikes in markers and can recover peaks in wastewater ﬂow measurements that may have been capped by hydraulic relief valves. We show the method proves effective in normalising SARS-CoV-2 viral loads in wastewater samples and discuss other applications for this method, looking at wastewater surveillance as a vital tool to monitor both human and environmental health.


Introduction
Monitoring the levels of human-derived chemicals or microbes in the wastewater of a community can offer cost-effective insight into their health and behaviour [1].Wastewaterbased epidemiology (WBE) has proven to be a vital tool in tackling public health concerns, including poliovirus [2] and antimicrobial resistance [3], as well as revealing trends in human behaviour such as illicit drug and pharmaceutical use [4].Most recently, WBE has been adopted around the globe to monitor SARS-CoV-2-the virus that causes COVID-19.
The key advantage of WBE is its ability to provide a potentially unbiased estimate of the prevalence of a disease in the community.Clinical surveillance techniques are constrained by the breadth and impact of local testing infrastructure, as well as the behavioural characteristics of the population under surveillance.For instance, testing is often restricted to infected, symptomatic people with access to healthcare [5].Likewise, economic, social and geopolitical pressures can limit the impact of clinical surveillance, particularly within a pandemic situation [6,7].
Wastewater samples for WBE are collected where practical in the sewage system, often at the inlet of a wastewater treatment plant.In the case of SARS-CoV-2 monitoring, the number of virus ribonucleic acid (RNA) copies in each wastewater sample are then measured using quantitative reverse transcription polymerase chain reaction (RT-qPCR).There is, however, no consensus on the best way to normalise the SARS-CoV-2 RNA data collected from these wastewater samples (see [8] for a recent review).
When comparing different wastewater catchments or regions, the chosen normalisation must account for the number of people the sample aims to represent as well as the dilution of virus RNA copies in the wastewater.However, wastewater networks are all different, transporting various proportions of domestic wastewater and servicing mobile populations.Measurements of dilution, population numbers, or proxies of these are challenging and not always reliable.
In almost all wastewater networks, extraneous water will find its way into the system.In separated sewer systems, infiltration occurs invariably [9], while stormwater is a due component of the sewage in combined sewer networks.In either case, the excess water, sometimes referred to as 'parasitic' water [10,11], must be accounted for to properly interpret the concentration of a substance.
Measuring the volume of wastewater to pass through the treatment work over a period of time (typically per day) allows concentrations to be normalised for dilution.In practice, this often is done by aggregating wastewater flow measurements from the inlet of the site up to a daily total for the wastewater flowing through it; this gives the normalisation its name.However, daily flow is not identical to daily volume when the system is equipped with relief valves and holding tanks.Therefore, we seek an estimate for the total daily volume of wastewater that would pass through each site if unimpeded.
Using data collected from the national Wastewater Surveillance Programme of Wales, this paper presents a novel approach to flow normalisation by finding estimators for total daily volume based on stable, readily measured chemical markers in the wastewater: electrical conductivity, ammonia and orthophosphate.We test if the method is robust to sudden changes in a single marker and if it proves effective as a normalising factor for SARS-CoV-2 surveillance.We also discuss the applications of this method beyond WBE, looking at wastewater surveillance as a vital tool for monitoring both human and environmental health.

Case Study
Since September 2020, the Welsh Government has monitored SARS-CoV-2 RNA levels in wastewater networks across Wales (UK) as part of their response to the COVID-19 pandemic.The national programme collects composite samples at almost fifty wastewater treatment plants and monitors the wastewater of over two million people in Wales (roughly 75% of the population).This large-scale monitoring is the product of a collaborative effort between the public sector (Welsh Government and Public Health Wales), academic institutions (Bangor University and Cardiff University), and water utility companies (DwrˆCymru Welsh Water (DCWW) and Hafren Dyfrdwy).Details of the laboratory processes used to quantify the amounts of SARS-CoV-2 RNA copies in these wastewater samples can be found in [12].The electrical conductivity, ammonia, and orthophosphate data used here were also generated as part of the national monitoring programme.An electrical conductivity meter and probe were used to take electrical conductivity measurements.Ammonia and orthophosphate were measured using a SPECTROstar Nano Microplate Reader (BMG Labtech, Ortenberg, Germany), a UV/vis spectrometer, set at 667 and 820 nm, respectively.Here we focus on the process of normalising the SARS-CoV-2 data to obtain estimates of SARS-CoV-2 prevalence per person that can be compared across communities with different wastewater systems.
To test our novel normalisation approach, we used 21 weeks of data collected from six wastewater networks representing various populations and geographies under surveillance in Wales.A summary of the sites is given in Table 1.All six sites studied have long-term 15 min records of flow taken from certified flow meters (MCERTS-the UK Environment Agency Monitoring Certification Scheme).The wastewater network in Wales was mostly built in the Victorian era [13] and is predominantly a mixed wastewater network collecting domestic, industrial and surface runoff waters.To cope with extreme weather events or other infrastructural events such as blockages, the network is equipped with a number of hydraulic relief valves that cap flow.We chose sites with flow meters located near the inlet of the treatment plant (often linked to strong flow capping), at the plant's inlet and at the exit (effluent) of the treatment plant.Figure 1 shows the time series for observed total daily flow at each study site from March 2021 to March 2022, and illustrates the effect of capping on flow measurements.The case study chosen thus allows us to assess whether our approach still performs well under various flow conditions, including when flow data are capped through the operation of relief valves or diversion channels.

Estimating Population Size
The most straightforward way to account for the number of people that a wastewater sample aims to represent is to estimate the population within the sewershed (wastewater catchment that drains into a wastewater treatment plant) from recent population estimates.Without the most-recent national census data, we used a publicly available dataset from the Office for National Statistics (ONS), which provides estimates for the mid-2020 population of every Lower Layer Super Output Area (LSOA) in England and Wales.We allocate the population of each LSOA to the built area within, assuming that people live in buildings.We then extracted for each sewershed all the built area and corresponding population within it.
While this approach provides an accurate estimate for the static, constant population of a sewershed, it does not account for natural fluctuations in the population.Such fluctuations may come from sources such as commuting or tourism.

Using Flow Data for Dilution
Accounting for the exact dilution of wastewater at any given time from flow data alone requires flow meters to be installed across each catchment wherever water may pass.
That is, not only at the inlet to the site but also at any holding tanks, pumping stations, and overflow devices.From these measurements, and ignoring infiltration, we could calculate the volume of water in the system at any time.However, this is not practical.Instead, the most direct and achievable way to account for dilution is to measure the flow of the wastewater into the inlet of the sewage treatment plant.
In practice, the measured SARS-CoV-2 RNA gene copies per litre of sampled wastewater on one day are multiplied by the total volume of wastewater over the last 24 h at the sampling site to compute the daily viral load entering the wastewater treatment plant.This load is, in turn, normalised for the population of the sewershed drained by this plant to obtain a viral load per capita.The key benefit of this normalisation is that its values are comparable between treatment plants serving different-sized populations.There are several caveats to consider with this approach, however, relating to the flow data itself.

Data Availability
Flow data needs to be measured and available in a timely way.Flow meters are not always installed on wastewater treatment plants, which is why, for example, the English wastewater surveillance programme does not adopt this method.Similarly, the Scottish programme only uses this when flow data is available.
In Wales, access to reliable and timely flow data for all the wastewater sites-even the smaller ones-is possible, which places the Wales programme at an advantage in this respect.The two water utilities operating in Wales provide flow records at 15-min intervals, most taken from certified flow meters (MCERTS).There are cases, however, in Wales or elsewhere, where flow data may not be available.When flow meters malfunction, for example, or during localised sampling within the network (at manholes, etc.), where flow meters cannot be installed.Alternative approaches are therefore required.

Data Reliability
When blockages occur in the network or during periods of sustained rainfall, wastewater spills out from the network through hydraulic relief valves such as combined sewer overflows (CSOs).The data collected at the flow meter of a treatment plant could then be capped, and the true dilution effect lost.
Sometimes, flow meters cannot be installed at the inlet of a treatment plant.Instead, flow is measured at the point at which effluent is discharged from the works.During typical weather conditions, where the flow through a treatment plant is under its maximum capacity, this is not a concern.That is, flow data collected at either end of the treatment works is comparable so long as it is aggregated to a daily level, given the homogenising effect of the treatment works.Further, flow data collected from the effluent of a site may be lagged compared with the wastewater collected in a sample.None of the sites under surveillance in Wales present this issue, and the lagging effect is insignificant.

Using Proxies for Dilution
An alternative to using flow measurements to assess dilution is to use proxies for dilution.These proxies (markers) are characteristic of the wastewater, and they are usually chemical or biological.A few well-documented proxies are outlined below, all measured in the Welsh wastewater surveillance programme:

•
Electrical conductivity (EC): an established method to account for dilution [14].EC is easy to measure accurately and does not decay as it transits through the sewerage system.However, it is sensitive to changes in salinity, which increases during the winter months from road de-icing, as well as in works situated directly on the coastline.

•
Ammonia: a recognised chemical indicator of human urine content in wastewater derived from the urea in urine [15].This measure is less reliable in sewersheds that drain other sources of ammonia, such as industrial waste from the processing of meat [16].There may also be differences in the rate at which urea is converted to ammonia across the sewer networks.

•
Orthophosphate: a measure of inorganic phosphorus in wastewater.Orthophosphate commonly originates in the waste of humans and animals, agricultural runoff and household detergents [17].In urban sewersheds or dry-weather conditions, orthophosphate levels are typically driven by domestic wastewater.However, this measure becomes less indicative of faecal dilution in sewersheds with substantial agricultural runoff.

•
Cross-assembly phage (crAssphage): a bacteriophage abundant in the gut of humans and a known indicator of human faecal content in wastewater [18].While crAssphage reflects the dilution of SARS-CoV-2 in the wastewater, like many biological markers, it is inherently difficult to quantify.As a virus, crAssphage must be quantified in a similar way to SARS-CoV-2, and it is subject to similar issues such as degradation, extraction efficiency, and PCR inhibition.Further, because crAssphage is a DNA virus and not an RNA virus like SARS-CoV-2, it is likely to differ in key parameters, such as degradation rate.Additionally, despite approximately 50% of the population carrying and shedding crAssphage, shedding rates can vary substantially between individuals, so the measure is less reliable in sewersheds with smaller populations.RNA viruses have been used to normalise SARS-CoV-2 in wastewater, including the Pepper mild mottle virus (PMMoV), another known indicator of human faecal content in wastewater [19][20][21].Using an RNA virus means that degradation rates should be similar to SARS-CoV-2; however, they still suffer from issues such as extraction efficiency, PCR inhibition, and variable shedding rates.In addition, the cost and sample processing time involved in measuring physico-chemical markers such as electrical conductivity, ammonia, and orthophosphate is much less than quantifying biological markers, such as crAssphage and PMMoV.

Other Marker-Based Approaches Used for WBE
To provide a minimal baseline for context, we outline the flow normalisation procedures used by two other government programmes in the United Kingdom.Full details of these approaches are available in [8].These processes by no means encompass the entire state of flow normalisation procedures, but they are representative and distinct.They also offer examples of how certain flow normalisation procedures are inapplicable given the constraints of the wastewater network and the needs of stakeholders overseeing the surveillance project in Wales.
In the Scottish approach, the flow normalisation process provides an output in SARS-CoV-2 gene copies per day.To do this, any available flow data is used directly.Where flow data is missing (often, their sites provide measurements several days or weeks in arrears), a linear mixed model is used for total daily flow.
This model pools the data from all the sites under surveillance to relate the total daily flow at each site to site-specific random effects, a static estimate for each catchment population, and sample ammonia concentrations.If no ammonia data is available, then the time series for total daily flow is inferred using a spline function based on recent ammonia trends.This approach cannot be implemented in Wales because, unlike Scotland, where "it is not thought that capping is a major issue in Scottish wastewater networks" [8], daily flow data is capped at almost all sites across the country due to diversion channels, storm tanks and CSOs.
In the English programme, dilution effects are handled by correcting observed SARS-CoV-2 concentrations according to variability in flow rather than a direct measure of flow.This variability is itself a dimensionless quantity, and thus the dilution normalisation provides an output whose units are still gene copies per litre.The variability in flow is captured by observed changes in orthophosphate and ammonia.
The Bayesian flow variability model is fitted to each site individually, meaning sites with few data points are subject to greater uncertainty.As more data becomes available, the estimated variability for past data points is thus likely to be adjusted.The key benefit of this model is that it does not use flow data at all.However, there are substantial caveats to consider regarding consistency in reporting and the essential assumptions of the model.There is scope to utilise a similar approach that extends it to a hierarchical model (where data is, in a sense, shared between sites), but further investigation is required.
As can be seen, the approaches applied in Scotland and England exist at opposite ends of the flow data availability spectrum.In Wales, we sit somewhere between these poles in that we have access to data, but it is not always reliable.As such, neither of these approaches is suitable for our wastewater surveillance if we wish to make full use of the information available to us.The following section describes the approach used in Wales to produce a flow normalisation by estimating daily volume from the data collected at each site.

The Volume Estimation Procedure
When normalising SARS-CoV-2 concentrations measured in wastewater samples, we seek to make the best use of the flow data available for Welsh sewersheds.We have measurements of SARS-CoV-2, daily flow, and figures for a set of markers (electrical conductivity, ammonia, orthophosphate, and crAssphage).When the volume of wastewater produced by the sewershed is not affected by overflow (low-flow days), the daily flow gives a good measure of the volume produced.When there has been overflow, we can use the level of the markers relative to their historic levels on low-flow days to estimate the volume.

Calculating Volume Estimates from a Single Marker
For a specific catchment, let V t be the volume of wastewater on day t and F t the flow measured at the sample point.On a low-flow day V t = F t but when there has been overflow V t > F t .Let P be the population of the catchment and D the mean production of waste per person per day, then the dilution of waste on day t is given by Let m t be the level of some marker m on day t then m t ∝ d t , assuming that the mean level of the marker in the source waste is fixed.Thus, if s is any low-flow day, we have m t /m s = V s /V t , which we rearrange to get Equation ( 2) holds for all low-flow days s, so we can combine measurements from all of our low-flow days when we need to estimate V t .Let s 1 , . . ., s n be low-flow days then, writing V t (m) to indicate that this is an estimate using marker m, we have In practice, we restrict ourselves to low-flow days from the previous year, and our working definition of a low-flow day is one for which the flow is between the 10th and 40th percentiles of flow over the preceding year.To combine estimates V t (m) from different markers, we use a robust average which downweights estimates that are very different from F t .This is detailed in the next section.

Robust Volume Estimation
As mentioned above, we describe our procedure for a fixed catchment.We suppose that on day t we have a flow measurement F t , a set of markers M, and for each marker m ∈ M an estimate V t (m) of the volume for day t.Our goal is then to estimate V t , which we do in two steps: 1.
Using a tolerance ǫ, expressed as a proportion, decide which estimates V t (m) are substantially different from F t using the rule Let M t be the set of markers that give substantially different estimates.

2.
We estimate V t either using F t or the V t (m), depending on how many V t (m) are substantially different from F t .That is, our estimate depends on the size |M t | of the set M t .Write Vt for our estimate, then for a fixed parameter k ≥ 0 our procedure is That is, we only use the V t (m) if enough of them are substantially different to F t that we no longer have confidence in F t as a measure of V t .Moreover, if we do decide to use the V t (m) to estimate V t then we only use the ones that are substantially different, but we weight them so that those closer to F t are given more weight.This weighting provides robustness from isolated extreme marker values as well as embedding confidence in the observed flow data.
In our case, we used the markers electrical conductivity, ammonia, and orthophosphate.We found our crAssphage measurements too variable to give good estimates of V t .For our tuning parameters, we chose ǫ = 0.1 and k = 2.A choice of k = 0 would mean the direct flow measurement F t is never used, while k = |M| would mean all our approximations V t (m) need to be substantially different before we abandon F t .We found that taking k = 2 mitigated the effect of extreme marker values while providing robust estimates for V t .

Operational Considerations
In the above volume estimation procedure, several practical assumptions are made about data availability.Unfortunately, a sufficiently complete record of all the required data is often unavailable due to operational constraints and technological malfunctions.Alternative solutions include:

•
If there is no observed flow data available for a site on a given day, then Vt is taken as the arithmetic mean of all the volume estimates.

•
If there are no low-flow days with data for a marker, then all observed days are used in its volume estimate.
Further, if there is a small amount of low-flow data for a marker, then its low-flow estimate (the numerator of Equation ( 3)) is subject to change.Over time, these estimates will settle and become robust.However, this means that the volume estimates for that marker will be updated retroactively.

Methodology for Testing Our Procedure
To test the effectiveness and robustness of the proposed normalisation approach, we use our case study data to: (i) compare estimates for daily volume and measured daily flow across the six sites, (ii) assess the value of dilution proxies in inferring total daily volume when the capping affects flow measurements, and (iii) illustrate how the time series of SARS-CoV-2 RNA levels per person vary consistently between estimated and direct flow normalisations.While we should not try to fit the estimated and direct flows, given that flow data are sometimes capped, we use the coefficient of variation of the root mean square error (CVRMSE) [22] to assess the quality of the fit and to give an indication of the consistency of results obtained with each.

Results
In this section, we use a case study of 21 weeks of data from six sites across Wales to assess the volume estimation procedure outlined in Section 2.6.Table 1 lists all the key characteristics of these sites, including the location of each flow meter in the treatment works.The sites with flow measured at the inlet-i.e., those whose flow data should be most reliable-are Bangor Treborth, Gowerton, and Ponthir.All wastewater catchments in Wales make use of diversion channels (CSOs, etc.), but from inspecting Figure 1, we see that Bangor Treborth, Cardiff Bay, Five Fords (Wrexham), and Llangefni are all subject to moderate to severe capping.
Therefore, we hope that the proposed method will produce close and consistent volume estimates at Gowerton and Ponthir across the spectrum of their observed flow data.For the remaining sites, however, we expect the caps in total daily flow to be recovered by our estimation procedure.Note that throughout this section, measures of estimated volume and measured daily flow are presented per capita using our population estimates to improve comparability between sites.
A comparison of estimates for total daily volume against the directly measured total daily flow at each site (Figure 2) shows a strong agreement between the two in most cases.At the sites with inlet flow, there is consistency between the estimates and direct flow across the spectrum of observed data.Where flow is measured near the inlet, after the main relief valves, the marker estimates suggest that direct-flow measurements consistently underestimate the total daily flow at high levels.This is the case at Cardiff Bay, where the network is equipped with numerous hydraulic relief valves to cope with a strong gradient which promotes flash water flows during rainfall events.The same is true at Llangefni, where this relationship is even clearer.
A comparison between observed daily flow and estimated daily volume per capita at each site (Figure 3) demonstrates how the proposed method may be used to overcome the issue of capping.It also highlights the effect of flow meter position and relief valves on measured flow.Sites with flow meters at the inlet of the site (Bangor Treborth, Gowerton and Ponthir) show a strong agreement between the observed daily flow and estimated volume, particularly at Gowerton and Ponthir.In Bangor Treborth, a coastal site where diversion interventions are necessary to stop saltwater ingress, peak estimates are slightly higher, and there are periods of sustained underestimation.In those three cases, the fit quality is good at each site, with CVRMSE values of 0.313, 0.399 and 0.402, respectively.
In contrast to the inlet sites, the remaining catchments have substantially poorer fits between observed daily flow and estimated daily volume; they produce CVRMSE values between 0.477 and 0.613.It is particularly clear at Cardiff Bay and Llangefni that the combined volume estimate can recover caps in the daily flow time series.Given the quality of the method at inlet flow sites, where flow should be more reliable, there is evidence to indicate that the estimates at the other sites are also representative of the true, undisturbed daily volume.Indeed, in some cases, disparities between the daily flow and estimated daily volume can be explained by precipitation events, causing an increase in surface runoff which will ultimately be capped in the daily flow measurements.For example, on 28 October 2021, the South West region of the UK, where the Cardiff Bay catchment is situated, received 26.06 mm of precipitation, compared to the already high 5.50 mm monthly average [23], explaining the high estimated daily volume at this time (Figure 3).5) and (6).Each plot includes a dashed line corresponding to a one-to-one fit, and a solid line showing a simple linear regression between the two axes.Both axes are given on a log10-scale.The estimated SARS-CoV-2 viral loads also seem comparable among the markers.Figure 4 illustrates the use of all three marker-volume estimates (derived from ammonia, orthophosphate and electrical conductivity), as well as the overall volume estimate, to calculate daily SARS-CoV-2 viral load (gene copies per capita).For each marker, there are tight, broadly consistent patterns in viral load across the sites, indicating that they can be used in an ensemble when flow data is no longer reliable or is unavailable.Also, the effect of isolated spikes is reduced when considering viral load.For instance, jumps in ammonia (e.g., early November in Five Fords (Wrexham) and Llangefni) and orthophosphate (late November in Gowerton) do not persist into the overall estimate, smoothing the variability and softening the trends in viral load.Furthermore, during dry weather periods, the viral loads derived from our volume estimates coincide with the viral load calculated using flow data directly.This provides evidence that presenting both these wastewater signals alongside one another will not lead to issues with consistency or interpretability.

Discussion
Our work proposes a new normalisation approach to estimate the dilution of SARS-CoV-2 signal in wastewater from very different sewage networks.This normalisation is achieved by estimating the daily volume of wastewater to pass through each sewage treatment works from a set of reliable physico-chemical markers.The volume on any given day is estimated from the relative levels of a marker compared to so-called 'low-flow' days, where the observed flow through the site is far below its capacity.
We showed how the method is robust in the face of sudden, isolated spikes in particular markers because it uses an ensemble approach based on three different markers.We also illustrate how using a volume estimate can help overcome the issue of capped flow measurements in combined sewer systems equipped with numerous hydraulic relief valves.For each of the markers, we showed there were tight, broadly consistent patterns in viral load across the six representative sewer networks, indicating they can be used in an ensemble when flow data is no longer reliable or is unavailable.Further, the described approach provides a normalisation process that is operationally flexible since it does not require a constant stream of direct flow data, and it may be implemented using only one of the three markers.
There are, of course, caveats to the approach proposed.For instance, as highlighted, each of the physico-chemical markers selected has a known sensitivity to variations in specific sewershed attributes.For example, electrical conductivity is sensitive to salt inputs from road runoff and marine intrustion, and ammonia to levels of industrial meat waste.For the Wales programme, we retained the markers that were most adapted to this national network.Even in doing so, there are still occasions where one marker can yield different estimated flow volumes.For example, at Llangefni in late February, low orthophosphate concentrations caused the orthophosphate estimated SARS-CoV-2 viral load to increase, which was not seen to the same extent in other markers (Figure 4).This could be due to the input of non-domestic wastewater, as there are food processing industries, including meat processing, in Llangefni, which may increase ammonia and conductivity but not orthophosphate.Reasons such as these mean that applying our method to other countries requires careful consideration as to the markers that are likely, or are shown to be, the more robust to local natural conditions.However, our ensemble approach provides the flexibility of choosing the most suited set of markers for the sewershed, building, for example, on recent advances in the use of chemical markers for tracing wastewater contamination in freshwaters [24].
Another limitation to the SARS-CoV-2 normalisation used concerns the population estimates used to compare different treatment sites and sewersheds.The chosen estimates are static in that they do not account for natural fluctuations in population counts caused by phenomena like commuting or tourism.Creating a dynamic population estimate is an area of further study that will likely require additional data sources beyond physicochemistry to be operational [25].For instance, real-time mobility data can prove useful in quantifying the movement of people and the pathogens they may carry [26,27].Until such a time that this sort of data is readily available for research purposes, static estimates provide a useful and broadly accurate estimate for population counts, whether derived from census data or biochemical markers [28].
The flow normalisation method proposed here also has applications beyond SARS-CoV-2 surveillance.First, the markers used to estimate volume are cheap to monitor and could thus provide a competitive alternative to installing flow meters.Probes that measure EC or ammonia, for example, are increasingly accurate and cheap compared to the costs of installing flow meters.One of the main technical issues with wastewater assessments is that the sewage into which the measuring instrument is immersed is often heterogeneous and full of debris that can clog the instrument.Biofilm formation on the sensor may also lead to the need for frequent cleaning.Flow meters regularly fail for reasons such as these.The option of multiple probes that provide an ensemble-based proxy flow reduces this risk.Additionally, some markers, such as electrical conductivity, can be accurately measured with simple probes that are not very sensitive to sewage debris.In sewersheds with strong flash event effects, due to high gradients and high proportions of parasitic waters, the risk of sediment and debris is high, reducing the operability of flow meters.The option of proxy measurements like EC that are less sensitive to these conditions is an advantage.Finally, probes for chemical markers like EC are often sturdy, battery operated, allow for cloud-based data storage, and can be deployed as remote sensors in a wide array of wastewater systems, including in areas with limited or no infrastructure.
Understanding sewage dilution has wider value, and our approach may provide a simple and effective tool to that effect.An influx of water into the sewage networks can occur intentionally, for example, when the networks are designed to collect surface runoff, as is the case for combined sewers, or unintentionally, through groundwater infiltration or misconnections.This has an economic cost for wastewater utilities (and therefore for the customer) because the wastewater treatment efficiency can be affected by the significant variations in dilution and because of additional pumping and its associated energy costs [10,29].The ability to monitor dilution in a cost-effective way with our approach could allow water utilities to quickly localise unintended water ingress or better prepare for expected dilution.The ingress of additional parasitic water also has an ecological cost, namely in countries with older combined sewage systems where relief valves regularly discharge raw sewage into water bodies.A better understanding of discharge frequency, amount and concentration of CSO spills would provide a better way to quantify the ecological risks these spills may have on local ecosystems.
Looking beyond COVID-19 and viral pathogens, the approach described here can be readily used for a range of other human-derived chemicals of public or environmental interest (e.g., illicit chemicals, human stress hormones, antimicrobial resistance genes).In the case of microplastics, the approach may also help with source apportionment between domestic, industrial, and road traffic-related inputs.The methodology described here was tested at six large centralised urban wastewater treatment plants.Testing the approach in smaller sewersheds (<1000 individuals) with faster in-network transit times (i.e., more diurnal or stochastic flows) would also be useful.

Figure 1 .
Figure 1.Time series of observed total daily flow at each representative site between March 2021 and March 2022.

Figure 2 .
Figure 2. Scatter plots of estimated volume and measured total daily flow at six wastewater treatment sites in Wales over a 21-week period up to 7 March 2022.The overall estimate (bottom row) was calculated according to Equations (5) and(6).Each plot includes a dashed line corresponding to a one-to-one fit, and a solid line showing a simple linear regression between the two axes.Both axes are given on a log10-scale.

Figure 3 .
Figure 3.Time series for observed daily flow (in grey) and estimated daily volume (in blue) per capita at six wastewater treatment sites in Wales.The time series covers a 21-week period up to 7 March 2022 at each site.Each plot shows a 10-day, right-aligned rolling average for each measure.Note there are different y-axis scales for each site.

Figure 4 .
Figure 4. Time series for SARS-CoV-2 viral load from each marker estimate and the combined estimate (far right column) for six wastewater treatment sites in Wales.The figure shows the time series of the various estimates for SARS-CoV-2 viral load over a 21-week period up to 7 March 2022 at each site.Each marker is presented in its own colour as a 10-day, right-aligned rolling average.Imposed on each plot is the viral load as calculated with the observed daily flow data, presented in grey.

Table 1 .
Characteristics of the representative sites under surveillance, with data from the Office for National Statistics (ONS), which provides estimates for the mid-2020 population and age of every Lower Layer Super Output Area (LSOA) in England and Wales.