Application of Neural Networks and Regression Modelling to Enable Environmental Regulatory Compliance and Energy Optimisation in a Sequencing Batch Reactor

Real-time control of wastewater treatment plants (WWTPs) can have significant environmental and cost advantages. However, its application to small and decentralised WWTPs, which typically have highly varying influent characteristics, remains limited to date due to cost, reliability and technical restrictions. In this study, a methodology was developed using numerical models that can improve sustainability, in real time, by enhancing wastewater treatment whilst also optimising operational and energy efficiency. The methodology leverages neural network and regression modelling to determine a suitable soft sensor for the prediction of ammonium-nitrogen trends. This study is based on a case-study decentralised WWTP employing sequencing batch reactor (SBR) treatment and uses pH and oxidation-reduction potential sensors as proxies for ammonium-nitrogen sensors. In the proposed method, data were pre-processed into 15 input variables and analysed using multi-layer neural network (MLNN) and regression models, creating 176 soft sensors. Each soft sensor was then analysed and ranked to determine the most suitable soft sensor for the WWTP. It was determined that the most suitable soft sensor for this WWTP would achieve a 67% cycle-time saving and 51% electricity saving for each treatment cycle while meeting the criteria set for ammonium discharges. This proposed soft sensor selection methodology can be applied, in full or in part, to existing or new WWTPs, potentially increasing the adoption of real-time control technologies, thus enhancing their overall effluent quality and energy performance.


Introduction
Advances in instrumentation, control and automation are aiding the development of intelligent real-time control (RTC) systems that can be used to predict, analyse and judge the real-time state of a system and self-adapt/organise based on input signals from sensors [1][2][3][4][5]. RTC systems can improve decision making and optimise system performance and are well suited to the control of complex and dynamic processes. However, sensors and detectors can produce large quantities of data that can be challenging to store, process and analyse. Thus, advances in analytic, decision-making, and process optimisation tools are required to enable the development of RTC systems. This has driven research into the use of numerical modelling techniques in a variety of engineering applications such as water fault detection, aquaculture and vaccine development [1,3,[6][7][8][9].
An area where RTC can disruptively innovate and increase process efficiencies is in wastewater treatment. Protection of water resources and water quality is a key sustainable development goal [10], and the effective and sustainable treatment of wastewater is essential Strategy examined using data collected from a pilot-scale SBR reactor [16] Investigation into the use of pH, ORP and DO sensors with an advanced control strategy to optimise nitrogen removal in a continuous system

Fuzzy logic
Urban wastewater with a small industrial input Pilot-scale continuous flow plant [31] Sustainability 2022, 14, 4098 3 of 28 Laboratory-scale continuous flow SBR reactor [28] Examination of using NNs for predicting biological nitrogen and phosphorus removal using ORP and pH NNs Synthetic wastewater Laboratory-scale SBR reactor [32] Examination of the establishment of an online controlling system for nitrogen and phosphorus removal.
A primary professional intelligent control filtered noise by filtration wave and used NNs, database and deducing machine to identify each breakpoint.

Municipal Wastewater
Laboratory-scale SBR reactor [33] Methodology development for process monitoring and process analysis for nitrogen and phosphorus removal Use of multi-way principal component analysis (MPCA) and clustering using historical process data

Domestic strength Synthetic wastewater
Pilot-scale SBR reactor [34] Validation study to assess the ability of an algorithm using networks to detect breakpoints using pH, ORP and DO sensors NNs, de-noising was achieved using a regularisation algorithm

Municipal wastewater
Pilot-scale SBR reactor [13] Examination of using a software sensor for real-time estimation of nutrient concentration using pH, ORP and DO sensors Fuzzy NN analysis Synthetic wastewater Bench-scale SBR reactor [23] Examination of using a software sensor for real-time estimation of nutrient concentration using pH, ORP and DO sensors Genetic algorithm-based neural fuzzy system, using self-adapting fuzzy c-means clustering and genetic algorithms Synthetic wastewater Laboratory-scale SBR reactor [24] Examination of an intelligent control system to achieve advanced nitrogen removal using DO, pH and ORP sensors.
Three-layer network technology with high-performance PLCs and fuzzy control for break point identification

Municipal wastewater
Pilot-scale SBR reactor [35] Review article on the general use of artificial NNSAT modelling biological water and wastewater treatment processes Artificial NNs Several types Several types [36] Examination of the use of a Gaussian-process (GP) model for the online optimisation of batch phases using pH, ORP and DO sensors.
GP regression was used to smooth the signals and GP classification was used for pattern recognition Not specified Laboratory-scale SBR reactor [37] Examination of the optimisation of a fuzzy logic controlled DO SBR system using pH and OUR trends for carbon and NH 4 -N removal Fuzzy control was used to switch on and off DO input, in order to smooth out pH and OUR profiles. The breaking point was identified using episode representation

Urban wastewater
Pilot-scale SBR reactor [38]  Artificial NNs Synthetic wastewater Laboratory-scale SBR reactor [21] Examination of a soft sensor for the optimisation of an SBR for biological nutrient removal NNs Synthetic wastewater Laboratory-scale SBR reactor [39] Development of a control strategy to enhance nitrogen and phosphorus removal in an SBR reactor using pH, ORP and OUR Use of a data acquisition system with curve fitting and characteristic point detection

Municipal wastewater
Semi industrial pilot SBR reactor [40] Development of a reliable RTC and supervision tool for DO control Fuzzy NNs Industrial wastewater Aerated submerged biofilm wastewater treatment process [41] Development of a soft computing method to predict sludge volume index (SVI) values in a real WWTP Recurrent self-organising NN Municipal WWTP Model based on SBR WWTP [42] Examination applies a self-organising cascade neural network (SCNN) with random weights to a non-linear system

Cascade NNs
Municipal WWTP Model based on municipal WWTP [43] Proposal using a model-free learning control (MFLC) system to control advanced oxidation in the treatment of industrial wastewaters

Reinforcement learning
Phenol wastewater Laboratory pilot plant [44] Development of a model for predicting TSS and chemical oxygen demand removal Fuzzy inference system with principal control analysis

Papermill process wastewater
Papermill WWTP with an anaerobic digester and submerged biofilm biological reactor [45] Identifying model to predict effluent nitrogen concentrations and assessment of controller efficiency in terms of economic and environmental performances Recurrent NNs for model identification and dynamic matrix control as predictive control (PC) algorithm and Benchmark Simulation Model 1 to test these PC configurations Biological wastewater Activated sludge process of a municipal WWTP [46] Development of soft sensor to predict effluent concentrations such as COD, TSS and TN content NN with principal component analysis Biological wastewater Activated sludge process of large-scale municipal WWTP [30] RTC using surrogate sensors requires developing relationships between the primary variable(s) of interest and the surrogate variables being measured. For example, an operator may wish to employ the following rule for controlling a wastewater treatment plant: "when y < t, stop processing", where y is the concentration of the chemical of interest and t is a threshold for safe discharge. When using surrogate sensors, the task then reduces to a non-linear modelling problem since "y" is not measured directly. Instead, a number of variables (x n ) are analysed to develop functions, whereby y = f (x 1 , x 2 , . . . x n ). Several authors have taken this type of approach (Table 1), focusing particularly on fuzzy modelling and advanced neural network (NN) approaches, including recurrent networks [23], cascade networks [43], self-organising network structures [42,43] and fuzzy-  [24,41,45]. There has also been work in developing NN-based soft sensors, using principal component analysis (PCA) to select the optimal number of input vectors [30,47]. These PCA-based NNs were applied to a large-scale municipal wastewater plant, where they predicted concentrations of COD, TN and TSS (among others) using measurements of oxygen and nitrogen concentrations with influent flow rate and alkalinity. However, to the authors' knowledge, no work has been reported on using a standard feedforward NN for regression. Standard feed-forward NNs often perform well in non-linear system modelling, so this is an important research gap.
The current study proposes a range of soft sensors, which can be selected according to weights assigned to criteria that might vary with site-specific requirements. There is an abundance of labelled data collected in real-world conditions (which reflect the application of the methodology in practice); hence, there is no need for a self-organising structure. The appropriate network structure can be investigated by comparing the performance of alternative structures directly.
Finally, this study takes a different approach to dealing with non-linear time-varying system dynamics, by using a recurrent or other dynamic network for this aspect. The data are pre-processed to produce a large selection of input variables, which encode information about time-varying aspects of the data. This approach makes the choice of input variables crucial. To address this, this study compares several variable sets (combinations of input variables)-each of which is assessed using a set of criteria describing key, usable features for performance optimisation. In contrast to [45] this study employs regularisation for feature reduction where needed, and leverages manually investigated feature subsets, rather than using PCA. This study presents a methodology capable of identifying the most suitable soft sensor, utilising surrogate probes and inferential estimating models, for RTC of small and decentralised WWTPs. This methodology can cater for the dynamic nature of small and decentralised WWTPs as well as ensuring key onsite goals which can be prioritised in soft sensor selection.

Numerical Modelling Methods
Regression is the task of modelling a real dependent variable y as a function of independent variables f (x n ), minimising the errors between y and f (x n ). A training set, a dataset of known values for x n and y, is required to develop the model with the goal of accurate out-of-sample prediction, which is typically measured using a hold-out or test set. A common regression technique is multiple linear regression (MLR), a linear least-squares approximation of the data. MLR provides equations linking a number of input variables (x n ) to a target variable (y) using Equation (1) [48].
where w 0 is the intercept, w n is a coefficient (or slope) for x n and n is the number of input variables. Out-of-sample accuracy can be improved by using regularisation methods which add a penalty term to the model input variables, shrinking the freedom of the input variable during learning [48]. A popular regularisation method is the least absolute shrinkage and selection operator (LASSO) [22,49].
In contrast, NNs are non-linear models with many more degrees of freedom, hence they can be used to model more complex systems. They do not require a priori knowledge about the systems' structure. They are trained using various gradient descent algorithms [32,50]. A typical NN structure can have one input layer, one or more hidden layers, and one output layer, as illustrated in Figure 1 [39]. Each layer has several nodes. Within a layer, the jth node computes a linear combination of its input variables (x 1 , x 2 , x 3 , . . . ,x n ), coming from the previous layer, with each signal having an associated weight (w 1j , w 2j , w 3j , . . . , w nj ) [51]. A second input to the node is the bias (b j ), a constant that governs the node's Sustainability 2022, 14, 4098 6 of 28 net input. Weights are multiplied by corresponding inputs to create a weighted input using Equation (2).
where i represents the inputs and j represents each node.
X Figure 1. Typical NN structure with n inputs, j nodes in the hidden layer, a hyperbolic tangent sigmoid transfer function, and a single output layer with a linear transfer function.
The node then applies a transfer function to give its output. Several transfer functions are commonly used including logistic sigmoid, hyperbolic tangent sigmoid and linear functions.
Beginning with the independent variables, values are fed into each successive layer, with outputs from one layer becoming inputs to the next. At the output layer, a single value is output, which is the predicted value of y for the current inputs x n . Training proceeds by adjusting weights and biases using gradient descent algorithms, such as Levenberg-Marquardt back-propagation [52][53][54][55][56] and Levenberg-Marquardt back-propagation with Bayesian regularisation [57][58][59][60], to minimise error at the output.
The specific goal, in this study, was to create a model to accurately predict current NH 4 -N concentration (output) given current and previous ORP and pH values (inputs). This study investigated two types of regression methods, (i) multiple linear regression (MLR) (R lin ) and (ii) MLR with LASSO regularisation (R reg ), and two types of NN training algorithms, (i) Levenberg-Marquardt back-propagation (NN lm ) and (ii) Levenberg-Marquardt back-propagation with Bayesian regularisation (NN br ). Results were analysed in two ways, Sustainability 2022, 14, 4098 7 of 28 (i) prediction of the general NH 4 -N trend and (ii) performance when predicting a specific NH 4 -N concentration-for example a regulatory discharge limit (performance was assessed in terms of accuracy of prediction, and time and energy savings achieved in the treatment cycle). Furthermore, a weighting and ranking system was used to determine the overall best setup that can enable optimal operational, environmental and energy performance.

Materials and Methods
The case-study site comprised a sequencing batch reactor (SBR), receiving wastewater from a residential development. The influent wastewater to the SBR comprised domestic wastewater that had undergone primary clarification. The SBR comprised a two-chamber precast concrete tank (a primary settlement chamber and a reaction chamber), with working volumes of 2.42 m 3 (hydraulic retention time (HRT) of 4 days) and 1.56 m 3 (HRT of 2.6 days), respectively ( Figure 2). Influent raw wastewater fed into the primary tank using a pump. This pump was operated using a programme that mimicked the typical diurnal domestic house flow pattern ( a specific NH4-N concentration-for example a regulatory discharge lim was assessed in terms of accuracy of prediction, and time and energy sav the treatment cycle). Furthermore, a weighting and ranking system was u the overall best setup that can enable optimal operational, environmenta formance.

Materials and Methods
The case-study site comprised a sequencing batch reactor wastewater from a residential development. The influent wastewater prised domestic wastewater that had undergone primary clarification prised a two-chamber precast concrete tank (a primary settlement chamb chamber), with working volumes of 2.42 m 3 (hydraulic retention time and 1.56 m 3 (HRT of 2.6 days), respectively ( Figure 2). Influent raw wa the primary tank using a pump. This pump was operated using a progr icked the typical diurnal domestic house flow pattern (     Figure 3). The aerated phase comprised 20 min blocks, each of which had a 5 min period during which the aeration system was turned on, followed by a 15 min quiescent period.

Cycle Control
A Siemens LOGO! PLC controlled a 464 min cycle comprising the following phases: 2 min fill phase, 400 min aeration phase, 60 min settlement phase and 2 min discharge phase ( Figure 3). The aerated phase comprised 20 min blocks, each of which had a 5 min period during which the aeration system was turned on, followed by a 15 min quiescent period. A feed pump installed in the reactor chamber (switched on for 5 s, to create a siphon) moved liquid from the primary settlement chamber into the reaction chamber as required. Siphoning was terminated when the liquid level in the primary chamber went below (i) the inlet level of the feed pipe, (ii) the liquid level, or (iii) once the two chambers had equalised. As only the volume available over the feed pipe was transferred for treatment, this technique resulted in a dynamic feed volume. Table 3 details the operations in each phase.

Monitoring
Influent and effluent wastewater samples were taken from the primary tank and from a collection vessel placed on the discharge line of the SBR, respectively. Filtered COD and TSS were tested in accordance with standard methods [62] whereby samples were passed through 1.2 μm Whatman GF/C microfiber filters. Total nitrogen (TN) was measured using a Biotector TOC TN TP Analyser (BioTector Analytical Limited, Cork, Ireland). Filtered NH4-N and NO3-N were measured using a Thermo Clinical Labsystem, Konelab 20 Nutrient Analyser (Fisher Scientific, Waltham, MA, USA). Hach sc1000 multi-meters monitored data collected from pH, ORP and NH4-N sensors, in the reactor chamber. pH and ORP were measured at 1 min intervals while NH4-N was measured at 5 min intervals on a 24 h basis (to match the pH and ORP data, NH4-N data were linearly interpolated to create a data point every 1 min). All sensors were fitted approximately 500 mm below the lowest liquid level within the reaction chamber and above any potential sludge blanket that might be formed during settlement. All instruments were calibrated, maintained and operated in accordance with manufacturers' instructions. A feed pump installed in the reactor chamber (switched on for 5 s, to create a siphon) moved liquid from the primary settlement chamber into the reaction chamber as required. Siphoning was terminated when the liquid level in the primary chamber went below (i) the inlet level of the feed pipe, (ii) the liquid level, or (iii) once the two chambers had equalised. As only the volume available over the feed pipe was transferred for treatment, this technique resulted in a dynamic feed volume. Table 3 details the operations in each phase. Table 3. Overview of the SBR treatment cycle.

Fill (1)
Pump: A-On The pump was switched on for 5 s, subsequently creating a siphon that moved liquid from the primary chamber into the reaction chamber.
Siphoning terminated when the liquid level in the primary chamber went below the inlet level of the feed pipe or the liquid level or once the two chambers had equalised.

Fill (1)
Pump: A-On The pump was switched on for 5 s, subsequently creating a siphon that moved liquid from the primary chamber into the reaction chamber. Siphoning terminated when the liquid level in the primary chamber went below the inlet level of the feed pipe or the liquid level or once the two chambers had equalised.

Aerobic-Repeated for 400 min (2)
(a) Aeration: B-On The aeration period consisted of a"repetitive sequence of (a) aeration on for 5 min and (b) off for 15 min. The aeration period consisted of a"repetitive sequence of (a) aeration on for 5 min and (b) off for 15 min.

Fill (1) Pump: A-On
The pump was switched on for 5 s, subsequently creating a siphon that moved liquid from the primary chamber into the reaction chamber. Siphoning terminated when the liquid level in the primary chamber went below the inlet level of the feed pipe or the liquid level or once the two chambers had equalised.

(a) Aeration: B-On
The aeration period consisted of a"repetitive sequence of (a) aeration on for 5 min and (b) off for 15 min.

Fill (1) Pump: A-On
The pump was switched on for 5 s, subsequently creating a siphon that moved liquid from the primary chamber into the reaction chamber. Siphoning terminated when the liquid level in the primary chamber went below the inlet level of the feed pipe or the liquid level or once the two chambers had equalised.

(a) Aeration: B-On
The aeration period consisted of a"repetitive sequence of (a) aeration on for 5 min and (b) off for 15 min.
A settle time allowed an activated sludge settle prior to discharge creating an upper layer of clarified treated wastewater. a"repetitive sequence of (a) aeration on for 5 min and (b) off for 15 min.

(3) Settle
A settle time allowed an activated sludge settle prior to discharge creating an upper layer of clarified treated wastewater.

(4)
Discharge: C-On The discharge pump I is used to remove the clarified treated wastewater from the upper portion of the reactor tank.

(4)
Discharge: C-On The discharge pump I is used to remove the clarified treated wastewater from the upper portion of the reactor tank.
a"repetitive sequence of (a) aeration on for 5 min and (b) off for 15 min.

(3) Settle
A settle time allowed an activated sludge settle prior to discharge creating an upper layer of clarified treated wastewater.

(4)
Discharge: C-On The discharge pump I is used to remove the clarified treated wastewater from the upper portion of the reactor tank.

Symbol Definition Pump On
Sustainability 2022, 14, x FOR PEER REVIEW 10 of 30

Symbol Definition
Pump On ; Pump Off Legend A-transfer pump, B-mechanical aerator, C-discharge pump

Overview of NH4-N, pH and ORP Profiles
A typical profile for NH4-N saw an increase in concentrations as influent was mixed with the treated wastewater remaining in the reactor from the previous cycle. NH4-N con centrations peaked soon after the fill phase. The time and magnitude of this peak varied depending on influent hydraulic volumes, organic carbon and NH4-N concentrations. Fol lowing this peak, NH4-N concentrations decreased due to organic carbon oxidation and subsequent nitrification. At approximately 225 min, the rate of decrease in NH4-N concen trations reduced/levelled off and continued thus for the remainder of the cycle.
A cyclical rise and fall in both pH' (Figure 4a) and ORP ( Figure 4c) profiles during the aeration phase occurred, as the aerator switched on and off, resulting in a peak (or apex) and trough (nadir) in each aeration period in both pH ( Figure 4b) and ORP ( Figure  4d) profiles. The increase in pH, corresponding to the aeration-on period, was likely, in this case, to be due to CO2 stripping [28]. The decreases in pH and ORP profiles during the 15 min quiescent period were likely due to a reduction in microbial activity over the course of the aerobic phase [63]. pH reduction was greatest and tailed off following the apex before a subsequent nadir was reached. A similar pattern was observed in the ORP profile. In general, pH decreases as alkalinity is consumed during the nitrification pro gresses [25]. The trend in pH decreased in response to aeration-on periods as a result o CO2 stripping (Figure 4b). ORP generally increased during aeration; on completion of ni trification, ORP change accelerated; this acceleration was caused by an abundance of DO [64].

Overview of NH4-N, pH and ORP Profiles
A typical profile for NH4-N saw an increase in concentrations as influent was mixed with the treated wastewater remaining in the reactor from the previous cycle. NH4-N concentrations peaked soon after the fill phase. The time and magnitude of this peak varied depending on influent hydraulic volumes, organic carbon and NH4-N concentrations. Following this peak, NH4-N concentrations decreased due to organic carbon oxidation and subsequent nitrification. At approximately 225 min, the rate of decrease in NH4-N concentrations reduced/levelled off and continued thus for the remainder of the cycle.
A cyclical rise and fall in both pH' (Figure 4a) and ORP ( Figure 4c) profiles during the aeration phase occurred, as the aerator switched on and off, resulting in a peak (or apex) and trough (nadir) in each aeration period in both pH ( Figure 4b) and ORP ( Figure  4d) profiles. The increase in pH, corresponding to the aeration-on period, was likely, in this case, to be due to CO2 stripping [28]. The decreases in pH and ORP profiles during the 15 min quiescent period were likely due to a reduction in microbial activity over the course of the aerobic phase [63]. pH reduction was greatest and tailed off following the apex before a subsequent nadir was reached. A similar pattern was observed in the ORP profile. In general, pH decreases as alkalinity is consumed during the nitrification progresses [25]. The trend in pH decreased in response to aeration-on periods as a result of CO2 stripping (Figure 4b). ORP generally increased during aeration; on completion of nitrification, ORP change accelerated; this acceleration was caused by an abundance of DO [64].

Legend
A-transfer pump, B-mechanical aerator, C-discharge pump

Monitoring
Influent and effluent wastewater samples were taken from the primary tank and from a collection vessel placed on the discharge line of the SBR, respectively. Filtered COD and TSS were tested in accordance with standard methods [62] whereby samples were passed through 1.2 µm Whatman GF/C microfiber filters. Total nitrogen (TN) was measured using a Biotector TOC TN TP Analyser (BioTector Analytical Limited, Cork, Ireland). Filtered NH 4 -N and NO 3 -N were measured using a Thermo Clinical Labsystem, Konelab 20 Nutrient Analyser (Fisher Scientific, Waltham, MA, USA). Hach sc1000 multi-meters monitored data collected from pH, ORP and NH 4 -N sensors, in the reactor chamber. pH and ORP were measured at 1 min intervals while NH 4 -N was measured at 5 min intervals on a 24 h basis (to match the pH and ORP data, NH 4 -N data were linearly interpolated to create a data point every 1 min). All sensors were fitted approximately 500 mm below the lowest liquid level within the reaction chamber and above any potential sludge blanket that might be formed during settlement. All instruments were calibrated, maintained and operated in accordance with manufacturers' instructions.

Overview of NH 4 -N, pH and ORP Profiles
A typical profile for NH 4 -N saw an increase in concentrations as influent was mixed with the treated wastewater remaining in the reactor from the previous cycle. NH 4 -N concentrations peaked soon after the fill phase. The time and magnitude of this peak varied depending on influent hydraulic volumes, organic carbon and NH 4 -N concentrations. Following this peak, NH 4 -N concentrations decreased due to organic carbon oxidation and subsequent nitrification. At approximately 225 min, the rate of decrease in NH 4 -N concentrations reduced/levelled off and continued thus for the remainder of the cycle.
A cyclical rise and fall in both pH' (Figure 4a) and ORP (Figure 4c) profiles during the aeration phase occurred, as the aerator switched on and off, resulting in a peak (or apex) and trough (nadir) in each aeration period in both pH ( Figure 4b) and ORP (Figure 4d) profiles. The increase in pH, corresponding to the aeration-on period, was likely, in this case, to be due to CO 2 stripping [28]. The decreases in pH and ORP profiles during the 15 min quiescent period were likely due to a reduction in microbial activity over the course of the aerobic phase [63]. pH reduction was greatest and tailed off following the apex before a subsequent nadir was reached. A similar pattern was observed in the ORP profile. In general, pH decreases as alkalinity is consumed during the nitrification progresses [25]. The trend in pH decreased in response to aeration-on periods as a result of CO 2 stripping (Figure 4b). ORP generally increased during aeration; on completion of nitrification, ORP change accelerated; this acceleration was caused by an abundance of DO [64].
lowing this peak, NH4-N concentrations decreased due to organic carbon oxidation and subsequent nitrification. At approximately 225 min, the rate of decrease in NH4-N concentrations reduced/levelled off and continued thus for the remainder of the cycle.
A cyclical rise and fall in both pH' (Figure 4a) and ORP (Figure 4c) profiles during the aeration phase occurred, as the aerator switched on and off, resulting in a peak (or apex) and trough (nadir) in each aeration period in both pH ( Figure 4b) and ORP ( Figure  4d) profiles. The increase in pH, corresponding to the aeration-on period, was likely, in this case, to be due to CO2 stripping [28]. The decreases in pH and ORP profiles during the 15 min quiescent period were likely due to a reduction in microbial activity over the course of the aerobic phase [63]. pH reduction was greatest and tailed off following the apex before a subsequent nadir was reached. A similar pattern was observed in the ORP profile. In general, pH decreases as alkalinity is consumed during the nitrification progresses [25]. The trend in pH decreased in response to aeration-on periods as a result of CO2 stripping (Figure 4b). ORP generally increased during aeration; on completion of nitrification, ORP change accelerated; this acceleration was caused by an abundance of DO [64].

Application
The methodology consisted of four main steps, namely, (i) data collection and preprocessing, (ii) experimental setup, (iii) soft sensor analyses and (iv) weighting and ranking application ( Figure 5).

Application
The methodology consisted of four main steps, namely, (i) data collection and preprocessing, (ii) experimental setup, (iii) soft sensor analyses and (iv) weighting and ranking application ( Figure 5).

Assessed Input Variables
A number of unprocessed (pH and ORP) and processed input variables were constructed and added to the set of independent variables ( Table 4). The selected processed input variables were constructed using the profile features identified in Section 2.3. For example, the change in pHapex values (pH∆apex) was observed to decrease with NH4-N reduction and was considered useful in identifying the end of NH4-N removal. The set of independent variables was then analysed in 22 variable sets encompassing a broad range of combinations. Each variable set included a unique collection of input variables (Table 5).
Within each 464 min cycle, data collected between 0 and 45 min and 402 and 464 min were excluded to eliminate the effects of filling and settlement periods (as these phases were not part of the biological reaction phases of the treatment cycle). Between 0 and 45 min, the effects of the filling stage were still apparent in terms of raw influent mixing with existing wastewater in the system. The settlement and discharge phase was between 402 and 464 min. Data from 41 treatment cycles (each 464 min in duration) were collected, 12 of which (approximately 30%) were randomly separated for use as a test dataset, and the remainder were used as a training dataset.

Assessed Input Variables
A number of unprocessed (pH and ORP) and processed input variables were constructed and added to the set of independent variables ( Table 4). The selected processed input variables were constructed using the profile features identified in Section 2.3. For example, the change in pH apex values (pH ∆apex ) was observed to decrease with NH 4 -N reduction and was considered useful in identifying the end of NH 4 -N removal. The set of independent variables was then analysed in 22 variable sets encompassing a broad range of combinations. Each variable set included a unique collection of input variables (Table 5).
Within each 464 min cycle, data collected between 0 and 45 min and 402 and 464 min were excluded to eliminate the effects of filling and settlement periods (as these phases were not part of the biological reaction phases of the treatment cycle). Between 0 and 45 min, the effects of the filling stage were still apparent in terms of raw influent mixing with existing wastewater in the system. The settlement and discharge phase was between 402 and 464 min. Data from 41 treatment cycles (each 464 min in duration) were collected, 12 of which (approximately 30%) were randomly separated for use as a test dataset, and the remainder were used as a training dataset.

Models
Two types of inferential estimation models were examined, namely regression and NNs. Two regression models were assessed, MLR without regularisation (R lin ) and MLR The feed-forward neural network architecture we have chosen is suitable for non-linear system modelling. As the input data are structured, not spatial, we do not need weightsharing schemes such as convolution. Since we aim to produce an instantaneous soft sensor (i.e., its output reflects the current state of the system), we do not need a stateful network such as a recurrent network. Our choices for (i) transfer function and regularisation, (ii) the number of hidden nodes tested as a hyperparameter and (iii) values chosen, relative to the number of input variables (≤15), are long-standing best practice [58,65]. The main advantages of our design are that it is simple, robust, easy to train, and not demanding to run even on low-power devices in the field. More sophisticated designs are possible and could have potential performance advantages but were considered out of scope.
In total, 176 soft sensors (i.e., a model applied to a variable set) were analysed. These soft sensors consisted of eight models with 22 identified variable sets using 15 input variables (Table 5, Figure 6). MATLAB was used as the computing environment to apply each of the models.

Models
Two types of inferential estimation models were examined, namely regression and NNs. Two regression models were assessed, MLR without regularisation (Rlin) and MLR with LASSO regularisation (Rreg). Levenberg-Marquardt back-propagation (NNlm) and Levenberg-Marquardt back-propagation with Bayesian regularisation (NNbr) were the two NN training models used. Within the NN training models, a hyperbolic tangent sigmoid hidden layer transfer function and a linear output layer transfer function were used. The feed-forward neural network architecture we have chosen is suitable for nonlinear system modelling. As the input data are structured, not spatial, we do not need weight-sharing schemes such as convolution. Since we aim to produce an instantaneous soft sensor (i.e., its output reflects the current state of the system), we do not need a stateful network such as a recurrent network. Our choices for (i) transfer function and regularisation, (ii) the number of hidden nodes tested as a hyperparameter and (iii) values chosen, relative to the number of input variables (≤15), are long-standing best practice [58,65]. The main advantages of our design are that it is simple, robust, easy to train, and not demanding to run even on low-power devices in the field. More sophisticated designs are possible and could have potential performance advantages but were considered out of scope.
In total, 176 soft sensors (i.e., a model applied to a variable set) were analysed. These soft sensors consisted of eight models with 22 identified variable sets using 15 input variables (Table 5, Figure 6). MATLAB was used as the computing environment to apply each of the models.

Models
Rlin

Analyses
The effectiveness of the models was assessed across six criteria, split between two categories. Category A assessed the accuracy of the general NH 4 -N trend prediction; and Category B the accuracy of the predicted trend at a selected NH 4 -N concentration, known as the "cut-off threshold value". This value was set at 2 mg NH 4 -N/l for the purposes of this study; site-specific values can vary due to local regulations. The assessment criteria are listed in Table 6. Referred to as the coefficient of determination, it is an indicator of the strength of the relationship between variables. 0 indicates a poor relationship, while 1 indicates a very close relationship.
Measures the strength of the relationship between predicted NH 4 -N trend and actual NH 4 -N trend

Criterion 2A: RMSE
Root mean square error (RMSE) is a standard statistical metric to measure model performance; it measures the difference between sample and predictor values and is a good measure of accuracy. The lower the RMSE value the more accurate the prediction.
Measures the average accuracy of the predicted NH 4 -N trend against the actual NH 4 -N trend

Category B Criterion 1B: Percentage of NH 4 -N removal (NH 4rem (%))
This criterion returns the percentage NH 4 -N removal from the peak NH 4 -N (NH 4 peak ) concentration (during any given cycle) from a model controlled cycleto the actual NH 4 -N concentration achieved on-site in a full (non-controlled) treatment cycle (NH 4 final ). The higher the NH 4rem value the better the soft sensor. NH 4 rem = NH 4 thres −NH 4 final NH 4 peak − NH 4 final × 100% where NH 4rem is the percentage of potential NH 4 -N removal achieved, NH 4 thres is the actual NH 4 -N concentration where the cycle was terminated by the selected cut-off threshold (mg NH 4 -N/l), NH 4 final is the final NH 4 -N concentration at the end of a full cycle (mg NH 4 -N/l) and NH 4 peak is the highest NH 4 -N concentration. NH 4 thres could be related to an ammonium discharge limit at a given site.

Criterion 2B: Percentage of time saved (T save )
This criterion returns the time saved (as a percentage of a non-controlled cycle) by the soft sensor in question, at the selected cut-off threshold value, when compared to the full treatment cycle (and expressed as a percentage). The higher the T save value, the better the soft sensor. T save = (1 − T thres T fixed ) × 100 where T save is the time saving (%), T thres is the time at which the cycle would be ended by the model in a controlled scenario and T fixed is the fixed time cycle length (min) set in an uncontrolled scenario.
Indicates the time saved with the selected cut-off threshold value. For example, the model might be asked to terminate the treatment cycle when NH 4 -N concentrations are predicted to reach a certain concentration (e.g., a discharge limit concentration). In general, the greater the time saved, the better, as in practice it increases system capacity

Criterion 3B: Number of successful cycles (SC)
During the application of the soft sensors, it was noticed that some soft sensors may end a treatment cycle very early due to the addition and subsequent mixing of influent at the start of a treatment cycle. This can influence pH and ORP trends temporarily and cause cycles to be ended at an early stage (often prior to the new influent beign completely missed with existing wastewater in the system). Where a cycle was ended before NH 4 peak occurred, a soft sensor was deemed unsuccessful for that cycle.

Criterion 4B: Absolute error (Ab error )
This criterion assessed the accuracy of the soft sensor in meeting a specific threshold concentration for effluent NH 4 -N discharges.
Indicates the accuracy of each soft sensor at the cut-off threshold value

Ranking System
A ranking and weighting system was developed to compare the overall impact of each soft sensor. This was necessary as soft sensors may differ in their impact on the overall performance and efficiency of the SBR. For example, a soft sensor may achieve good R 2 performance, but also return a poor RMSE result. This example scenario would produce results in line with the actual NH 4 -N trend but not necessarily close to the actual concentration, thus the overall result would not be acceptable. In consultation with WWTP operators, weights were applied to each of the criteria (Table 7). In general, the overriding concern in WWTPs is to meet environmental regulation, thus Ab error would be considered vital. For indicative purposes, the weights outlined in Table 7 were applied to this study. It should be noted that weightings may vary depending on site-specific requirements and demands. In addition, these weights can be adjusted to promote site-specific goals. For example, increasing T save would promote the selection of a soft sensor with good energy saving characteristics, but this may result in poor effluent quality. Soft sensor results were ranked against each other for each criterion, with better results receiving a higher rank value (ranked values are 1 to n, where n is the number of soft sensors in question). The ranked value was then multiplied by the corresponding criterion weight to acquire the weighted value. Weighted values were then added together and compared to determine the most appropriate soft sensor as follows: Step 1, determine the best soft sensor (highest weighted value) for each model using the system described above (Equation (3)); Step 2, determine the best soft sensor (highest weighted value) (and thus the overall best soft sensor) from Step 1 results using the system described above. Weighted where n = each criterion detailed in Table 7.

Further Analyses
Although determining the best soft sensor was the main objective of this study, a number of other studies, using the same criteria and weights, were also executed including (i) whether MLR and NN regularisation improved results, (ii) a comparison between MLR and NN methods, (iii) how adjusting the number of neurons in the NN hidden layers affected results, and (iv) an examination of which variable sets, which variables and which models were best. It should be noted that the model, variable set, etc., identified for the best soft sensor may differ from that for the best identified model, variable set, etc. The aim of this study was not just to identify the best soft sensor (combination of model and variable set), but also the best overall model and variable set.

Results
The overall influent and effluent results for the SBR are summarised in Table 8. models were best. It should be noted that the model, variable set, etc., identified for the best soft sensor may differ from that for the best identified model, variable set, etc. The aim of this study was not just to identify the best soft sensor (combination of model and variable set), but also the best overall model and variable set.

Results
The overall influent and effluent results for the SBR are summarised in Table 8.

Regression Results
Two regression models were assessed, Rlin and Rreg. Detailed results for each model are displayed in Tables A1 and A2, respectively. For NH4rem, results varied between 20% and 97% for Rlin (average value of 66%) and between 75% and 93% for Rreg (average value of 84%). Average Tsave and aberror results were 51% and 0.98 mg NH4-N/l for Rlin and 51% and 0.73 mg NH4-N/l Rreg. An overview of these results shows that Rreg was better than Rlin, as it, on average, achieved better NH4rem and aberror results while maintaining a similar Tsave result, thus resulting in better and more reliable effluent concentration predictions.

Neural Network Results
NNs were assessed using two algorithms, namely NNlm and NNbr. Overall results for NNlm[X] are displayed in

Regression Results
Two regression models were assessed, R lin and R reg . Detailed results for each model are displayed in Tables A1 and A2, respectively. For NH 4rem , results varied between 20% and 97% for R lin (average value of 66%) and between 75% and 93% for R reg (average value of 84%). Average T save and ab error results were 51% and 0.98 mg NH 4 -N/l for R lin and 51% and 0.73 mg NH 4 -N/l R reg . An overview of these results shows that R reg was better than R lin , as it, on average, achieved better NH 4rem and ab error results while maintaining a similar T save result, thus resulting in better and more reliable effluent concentration predictions.

Neural Network Results
NNs were assessed using two algorithms, namely NN lm and NN br . Overall results for NN lm [X] are displayed in

Weighting and Ranking Results
To decide the best soft sensor a weighting and ranking system was applied. Table 9 summarises the overall results from this study (full details are available in Table A9). The first step determined the best variable set (i.e., combination of independent input variables) for each model and the second step determined the best soft sensor.
Overall, NN br[2X]U was determined to be the most efficient soft sensor based on the weighting system. Variable set U used a combination of moving averages with nadirapex values for both pH and ORP. This soft sensor achieved an average NH 4rem result of 88% over the 12 test cycles with corresponding T save and ab error results of 67%, 0.57 mg NH 4 -N/l, respectively (Figure 7). This equated to a 51% reduction in electricity costs for the SB system due to the time savings during the treatment cycle (which in commercial settings may reduce aeration costs). Table 9.
Step 1 ranking results and Step 2 ranking.

Weighting and Ranking Results
To decide the best soft sensor a weighting and ranking system was applied. Table 9 summarises the overall results from this study (full details are available in Table A9). The first step determined the best variable set (i.e., combination of independent input variables) for each model and the second step determined the best soft sensor. Overall, NNbr[2X]U was determined to be the most efficient soft sensor based on the weighting system. Variable set U used a combination of moving averages with nadir-apex values for both pH and ORP. This soft sensor achieved an average NH4rem result of 88% over the 12 test cycles with corresponding Tsave and aberror results of 67%, 0.57 mg NH4-N/l, respectively (Figure 7). This equated to a 51% reduction in electricity costs for the SB system due to the time savings during the treatment cycle (which in commercial settings may reduce aeration costs).

Comparison between Methodologies Applied
Using the weighting and ranking method and comparing Rlin to Rreg for each variable set, it was observed that Rlin was marginally better than Rreg (in this comparison Rlin per formed better for 54.5% of the model/variable set combinations). A similar comparison was carried out comparing individual variable sets for the three sets of hidden layer neu ron models for NNlm (

Comparison between Methodologies Applied
Using the weighting and ranking method and comparing R lin to R reg for each variable set, it was observed that R lin was marginally better than R reg (in this comparison R lin performed better for 54.5% of the model/variable set combinations). A similar comparison was carried out comparing individual variable sets for the three sets of hidden layer neuron models for NN lm (NN  This showed that R lin performed better in 54.5% of variable sets. Alternatively, a study of the final ranked results (Table 9) shows that three of the top four ranked soft sensors use the NN br model; therefore, for future applications, it may be possible to use this model only. This result suggests that regularisation has indeed helped to avoid some over-fitting suffered by the unregularised NN_lm models. Table 10 compares each variable set for each soft sensor. The aggregate of variable set rank gives an indication of overall variable set performance (when compared to other models). A similar study comparing variable sets (Table A9) identified the top three variable sets as T (pH nadir-apex and ORP nadir-apex ), V (pH ma20 and pH nadir-apex ) and M (ORP cum and ORP nadir-apex )-each of these used only two input variables, suggesting that simpler variable sets can lead to better models. The nadir-apex input variable seems particularly useful, and more generally the processed input variables were clearly providing added value to the numerical modelling.

Discussion
As detailed in the results, soft sensors selected using NNs and regression models, in this case the NN br[2X]U soft sensor, have the potential to generate large operational savings such as reduced treatment cycle duration and reduced electricity usage, whilst also meeting discharge requirements. This study was conducted in a small-scale WWTP, using a suite of pH and ORP variables (i.e., variables identified from both pH and ORP profile characteristics in the SBR). Several studies have demonstrated that ORP and pH sensors can act as surrogates for NH 4 -N sensors [15,[25][26][27][28][29]31]; however, the implementation of these results at small-scale WWTPs is limited, and many of these studies did not look at pH and ORP sensors in a combined manner.
For the task at hand, the use of the NN training (optimisation) method was quite standard. The main advantage of the linear regression model was interpretability. The effect of each variable on the output of the model was easy to understand. Neural network models are often able to fit data better at the cost of interpretability. However, neural network models can be interrogated and visualised to give a good understanding of their effect.
The motivation for using Bayesian regularisation was to help avoid over-fitting. Overfitting is the scenario where the model fits the training data well but fails to generalise to unseen data. Regularisation pushes the model towards a simpler form which may fit the training data slightly less but is more likely to generalise.
Wastewater pollutant concentration datasets are suitable for application in NNs as they have a large number of inputs, each of which can vary significantly. In addition, given the 24/7 nature of wastewater treatment, large datasets can be collected from wastewater sensors, which can improve NN suitability even further. However, as discussed in Section 3 of this paper, NNs must be carefully designed and trained to ensure that the outputs are suitable for use in real-time control applications. Given the black box nature of NNs, careful attention is required when assessing input variables, selecting models and assign rankings.
The methodology proposed in this paper creates an opportunity for WWTPs utilising SBRs (and indeed any WWTP utilising other batch treatment processes) to select their own custom soft sensor to optimise on-site treatment processes. In addition, the methodology can be repeated over time in WWTPs to adapt to any significant on-site changes such as, substantial changes in influent wastewater constitution due to the connection of new wastewater sources, etc. However, it can be labour intensive to apply the methodology in a new site, particularly if it is difficult to source the database of parameters required to train the model. To assist with this, further research on this topic would include the application of the best sensor across a larger number of site-based systems, and further adaptation to enable control of biological nitrogen and phosphorous removal where required. Recent work investigated the prediction of N and P removal in municipal wastewater using microalgae modelling response surface methodology, multilayer perceptron artificial neural network and support vector regression [66]. However, despite this and other recent work there is a need to focus on robust methods for system control.
RTC using soft sensors offers many benefits from a managerial perspective. Improved treatment efficacy (in terms of discharge compliance) can be achieved in a more consistent manner without the need for manual intervention by WWTP operatives, whilst electrical energy savings can ease the burden in terms of financial management and assist with meeting targets such as the EU Energy Efficiency Directive (EED). As the equipment required for this methodology is economical, readily available, and easy to use, highly skilled operators are not required to apply the technology, the capital and operating costs are low which enhances sustainability of the technology in smaller WWTPs.
RTC may also be particularly advantageous in WWTPs which are subject to changing loadings due to seasonal changes in tourism, which can lead to seasonal, weekly or daily fluctuations, both hydraulically and organically, which can be difficult to manage. The technology could also be used to extend the duration of treatment cycles to ensure discharge compliance in the instance where a WWTP may be over-loaded in terms of pollutant load (dependent on site-specific conditions such as upstream wastewater storage provisions and other operational considerations allowing for extended cycle times), or reduce the treatment cycle duration to the minimum time required to meet discharge regulations, which can allow a WWTP to treat additional hydraulic load, if required.

Conclusions and Outlook
This research presents a methodology for enabling real-time control of NH 4 -N removal in wastewater treatment systems. The methodology was developed using a case-study SBR system treating residential wastewater. MLR and NN techniques were used and compared to develop suitable soft sensors that could enable RTC of wastewater treatment systems. This study also presented a method for selecting the optimal soft sensor based on the specific outcomes required at any site.
The estimating models' studies included linear regression (R lin ) and regularised linear regression (R reg ) and NN models leveraging Levenberg-Marquardt back-propagation (NN lm ) and Levenberg-Marquardt back-propagation with Bayesian regularisation (NN br ). The impact of neuron numbers in each NN model was also analysed. It was determined that for a typical treatment cycle, the best preforming soft sensor, using the site-specific criteria at this site (which heavily weighted accuracy in effluent NH 4 -N concentration prediction) used Bayesian regularisation and would achieve an average treatment time saving of 67%, resulting in an average energy saving of 51% of electricity costs. The controlled treatment cycle would achieve 88% NH 4 -N removal when compared to the fixed time treatment cycle but, significantly, ensured discharges remained within the threshold discharge concentration set. These results highlight how the methodology can provide a level of targeted control, which can significantly improve the sustainability of wastewater treatment by balancing the needs of safe discharge and efficient energy usage.
The methodology proposed to determine the most efficient soft sensor for any given site can allow a more targeted approach to enable a site to adapt as on-site considerations change. The models studied can be implemented on basic programmable logic controllers typically used for small-scale SBR systems, making the methodology suitable even in small WWTPs with limited resources. The methodology also has the potential to be applied to existing SBRs, making it a cost-effective option for process upgrade works in existing WWTPs.
One limitation of this research is that the methodology is focused specifically on SBRs. There is additional potential for the procedure to be modified to suit other technologies; in particular, systems that treat wastewater in batches. Further research on this topic would include the application of the best sensor across a larger number of site-based systems and further adaptation to enable control of biological nitrogen and phosphorous removal where required.  (Tables A1-A9). Other raw data are available on request.

Conflicts of Interest:
The authors declare no conflict of interest. The methodology proposed to determine the most efficient soft sensor for any given site can allow a more targeted approach to enable a site to adapt as on-site considerations change. The models studied can be implemented on basic programmable logic controllers typically used for small-scale SBR systems, making the methodology suitable even in small WWTPs with limited resources. The methodology also has the potential to be applied to existing SBRs, making it a cost-effective option for process upgrade works in existing WWTPs.

Appendix A
One limitation of this research is that the methodology is focused specifically on SBRs. There is additional potential for the procedure to be modified to suit other technologies; in particular, systems that treat wastewater in batches. Further research on this topic would include the application of the best sensor across a larger number of site-based systems and further adaptation to enable control of biological nitrogen and phosphorous removal where required.  (Tables A1-A9). Other raw data are available on request.

Conflicts of Interest:
The authors declare no conflict of interest.