3.1. Analysis of Experimental Categories
A summary of effluent concentrations and removal rates for the smaller and larger WWTPs is presented in
Table 1. Final effluent sCOD concentrations for the 12 smaller WWTPs ranged from 21 mg/L to 317.5 mg/L, with a mean value of 64.1 mg/L. The range of tCOD concentrations in smaller systems was 22.0 to 727 mg/L, which is strongly correlated with TSS. The mean effluent tCOD was 114.6 mg/L. By contrast, at the larger benchmark plants, maximum effluent tCOD levels were an order of magnitude lower than smaller plants, with mean effluent concentrations for sCOD and tCOD of 64.0 mg/L and 77.5 mg/L, respectively. Effluent NH
4-N concentrations at the smaller WWTPs ranged from 1.75 mg/L to 49.2 mg/L, with a mean value of 16.5 mg/L, whereas NH
4-N concentrations in the final effluent of the larger WWTPs were on average 2.2 mg/L (never exceeding 5.2 mg/L).
The effluent quality for the smaller WWTPs was much more variable than the larger plants for all parameters, except pH and DO. The largest observed standard deviation (SD) among effluent parameters was for tCOD at the smaller WWTPs and NO3-N for the larger plants. No tCOD regulation typically exists on discharge concentrations for the smaller WWTPs, therefore they are not routinely controlled. This is evident in the measured highest effluent concentration of 727 mg/L, which was six times higher than the mean. The lowest SD was observed in pH and DO effluent values.
In terms of removal rates, the parameter with the highest mean rate of removal at smaller WWTPs was TSS (80.0%), whereas mean removal rates were highest for NH4-N at the larger WWTPs (92.9%). The SD of removal rates across larger plants was lowest for NH4-N, which is probably a result of explicit discharge regulations. The lowest SD amongst removal rates at smaller WWTPs was for sCOD, but this was still >20 mg/L and suggests a high level of variance in effluent quality. In fact, one small WWTP had effluent quality poorer than influent quality. The lowest SD at the larger WWTPs was for NH4-N (3.7 mg/L).
There was a significant difference between the mean effluent values of the design categories across all parameters except NO
3-N at 95% confidence (ANOVA, 4 × 10
−10 <
p < 3.9 × 10
−3;
p = 0.06 for NO
3-N). The similarity between NO
3-N effluent values may be because most small WWTPs serve rural communities. This presumably means more farms, which might lead to an increased load of NO
3-N entering the wastewater collection system, which would probably not be removed and would thus be present in effluent discharges. However, without being able to determine load fluxes or specific process mechanisms, it is not possible to confirm this speculation. Other than NO
3-N, the least confidence in significance was between pH among final effluent samples, which is not surprising when considering the SD of values for both small and larger plants (
Table 1). For removal rates, there was also a significant difference between the removal rates at the different WWTP sizes and technologies, across all parameters (ANOVA, 2.5 × 10
−9 <
p < 2.5 × 10
−4).
3.2. Covariance of Effluent Parameters
Covariance data on final effluent parameters from the 12 small WWTPs are summarized in
Figure 2. The correlation between the mean effluent concentration and the SD was strongest for tCOD (r
2 = 0.93). This demonstrates a strong relationship between the treatment performance and operational stability across treatment systems. A similarly strong trend was seen for sCOD and TSS (r
2 = 0.75 for both), and also for NH
4-N (r
2 = 0.84), which is surprising because none of the small WWTPs had a discharge limit for NH
4-N at the time of the study. This is interesting, because the smaller WWTPs are unlikely to have been designed for or operated in order to achieve nitrification, and yet some small treatment systems are consistently sustaining some nitrification. This suggests that observed trends of covariance are probably a ‘natural’ phenomenon rather than a result of operational practices or engineered design. In other words, conditions promoting nitrification have occurred by ‘chance’ and have developed to be relatively stable over time.
In terms of TP, while there was a significant difference in removal rates between the large and small WWTPs (ANOVA,
p < 0.05), covariance trends between performance and stability were relatively weak (r
2 = 0.45). None of the monitored WWTPs have phosphorus removal technologies. It is much less likely that TP removal, especially by enhanced biological removal, will occur by chance, compared with nitrification. The three larger treatment systems are clustered to the lower left-hand corner of the plot (i.e., higher quality effluent and greater stability) for all parameters except for TP. After this, the next most obvious observation on performance versus stability covariance trends is differences among technology types. The package plants tend to discharge higher quality effluent on average and do so more consistently. For example, the SD of NH
4-N ranged between about 3 and 8 mg/L for RBC and HiPAF treatment types (
Figure 2e). It was, however, not possible from this covariance analysis to exactly determine the role treatment type (or any other factor) played in the stability of effluent quality.
3.3. Reliability of Small Wastewater Treatment Plants
Design concentrations for tCOD for each small WWTP are summarized in
Figure 3, grouped by the WWTP size and technology type. The lowest effluent concentration required to maintain compliance with the UWWTD tCOD discharge standards at 99% confidence is 63.7 mg/L. Given this criterion, it is not surprising that one of the 50–125 PE trickling filters had the highest mean tCOD effluent concentration, well beyond discharge standards (727 mg/L). The highest design concentration was 78.2 mg/L, which was calculated for the RBC with a PE of between 50 and 125.
Whilst the range of design concentrations was relatively small (14.5 mg/L), there was a clear inverse relationship between the measured and design concentrations (
Figure 3). However, two WWTPs that had mean effluent concentrations of >125 mg/L had design concentrations higher than three of the treatment systems with mean concentrations >125 mg/L. This confirms that calculations driven by covariance and probability analysis are not simply the average of measured values or numerical distance from the mean (i.e., SD). Means and SDs are both useful at times, but are ultimately limited measures of performance because of the underlying assumptions upon which their implications depend. Specifically, the assumption of a Gaussian or additive normal distribution [
23], which may not summarize the characteristics of every parameter of interest. Therefore, other methods are needed to better understand performance trends, which may allow deeper insights into risks of WWTP compliance failure, ideally also aimed at ecological improvement in catchments. While we do not endorse neglecting sites that appear to provide stable performance naturally, increased awareness of a WWTP’s reliability means that operational practices and allocation of resources can be optimized, including more accurately allocating suitable levels of maintenance to achieve optimal performance.
The experimental groups with the most similar design concentrations, and therefore the most similar effluent quality (measured as tCOD concentration, only), were small AS WWTPs with a PE between 50 and 125 (50–125_SAS). Considering the position of these two systems in the covariance plots (
Figure 2), it is apparent the observation is also relevant for other treatment performance parameters.
3.4. Prediction of Small Wastewater Treatment Plant Reliability
Whilst it is useful to observe the evident similarity of effluent quality that was discharged from small AS plants, it is perhaps more important to understand what drives or influences such trends. The adage, “no two WWTPs are the same” may be true, but there also may be enough similarity between the performance of different systems to identify dominant predictors. Thus, we applied a simple machine learning algorithm to predict the reliability of the small WWTPs assessed in this study, which determined the likelihood of tCOD effluent concentrations exceeding site-specific design concentrations (
Figure 3).
An optimized RF classification model was used to predict the exceedance of the effluent concentration over the design concentration, with an accuracy of 64.2% and, therefore, a mean standard error of 0.358. This model was chosen after comparison with the performance of a gradient boosting machine and a generalized linear model (see
Appendix A for further details on the performance of different models). The RF model correctly predicted the effluent tCOD concentration exceeding the design concentration for 71.4% of the samples. In contrast, the model correctly predicted the effluent tCOD concentration not exceeding the design concentration for 57.1% of the samples (
Table 2). This suggests the model is conservative, which may appeal to risk managers responsible for prioritizing asset investment against regulatory compliance or environmental targets. Such an approach might be useful for forecasting the performance reliability of multiple small WWTPs, simultaneously. The implication of the data is that there may be enough similarity between different sites to establish underlying trends and drivers of performance.
Considering the performance of the model for each of the six small WWTP categories, it is clear that the reliability of the package plants (especially RBCs) was harder to predict than the more traditional technologies (
Table 3). For example, the model correctly predicted the likelihood of the effluent concentration exceeding the design concentration for all samples collected at trickling filter sites. This is likely because the stability of effluent quality discharged from the RBCs is generally higher than other plants, which makes the difference between the measured effluent concentration and the design concentration small and therefore, harder to predict.
The relative value of different model predictors is shown in
Figure 4, which shows that influent wastewater characteristics and PE were the most important. Interestingly, the size of a treatment system appears to be more important to effluent quality than the treatment technology itself. This is supported, at least in part, by the variance observed between treatment plants within the same experimental category and differences among categories (
Figure 2 and
Figure 3). Furthermore, the smallest WWTPs (50–125 PE) appear to be consistently less stable (i.e., greater variability in effluent quality) than the sites with a PE between 125 and 250. It may not be appropriate to categorize all WWTPs according to these PE bands, but the model outputs combined with the analysis of the experimental categories suggest that these groupings may be sufficient and useful for assessing the influence of different parameters on treatment performance.
In contrast to system size and influent characteristics, most other predictors had relatively little importance in predicting effluent stability (<60,
Figure 4). The significant difference (unpaired
t-test,
p < 0.05) between wastewater and ambient air temperatures implies a buffering effect against the latter. This explains why seasonal changes were relatively unimportant as a predictor of resilience. However, while the temperature of the liquid influent was somewhat important, it did not appear to be a dominant predictor in this model. Interestingly, the DO concentration of the influent also had relatively little importance. This is likely because the effects of aeration capacity or hydraulic retention time, which were not considered here, both influence performance regardless of the influent DO concentration.
The final parameter of note relative to system performance is the frequency of visits to sites by operators. This parameter was included here as an indicator of the effect of operational practice. In the UK and elsewhere, the frequency at which small WWTPs are visited by operators can vary from several times per week to once every couple of months. The frequency of operator visits appears relatively unimportant and a poor predictor of WWTP stability (
Figure 4). This might be because the actual activity during each site visit can vary, both between sites and through time. Activities might range from checking pumps and plumbing to assessing controls, cleaning lines, and other incidental activities. However, implicitly, this suggests the original design and sizing of the processes are more important to day-to-day treatment performance. This seems to be especially true of smaller WWTPs that do not appear to be improved by simply increasing operational maintenance (e.g., cleaning).
3.5. Model Simplification
In an attempt to simplify the predictive model, all input parameters with a relative importance below 75 (
Figure 4) were removed. This meant the independent variables in the simplified model were pH of the influent, NH
4-N concentration of the influent, and the PE. The presence of influent pH and NH
4-N concentration in this list may be because they act as indicator metrics for the overall wastewater ‘strength’, rather than because the pH or NH
4-N themselves control the reliability of tCOD effluent concentration. RF classification using the same input conditions and training dataset as previously described generated an accuracy of 66.1%, which is an increase of approximately 2% compared to modeling with all parameters. Whilst such a marginal improvement might be attributed to chance, it is encouraging that the prediction of small WWTP reliability can be condensed to just three parameters without any great loss of accuracy. This is important because it reduces the data requirements at small sites (making some monitoring more feasible), and still allows wastewater managers to predict whether or not these systems might become unreliable.