This article is an openaccess article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
Total nitrogen (TN) and total phosphorus (TP) concentrations are important parameters to assess the quality of water bodies and are used as criteria to regulate the water quality of the effluent from a wastewater treatment plant (WWTP) in Korea. Therefore, continuous monitoring of TN and TP using
The Korean Ministry of Environment has recently imposed stricter permit requirement on the outflow of domestic wastewater treatment plants (WWTPs) to improve the water quality of receiving water bodies such as rivers and lakes. Therefore, the water quality monitoring program has become an important social issue.
At present, there are a total of 61
A software sensor is a common name for the software in which a given set of water quality data obtainable by easy and reliable methods are processed to estimate the quantities of other water quality variables using a model [
The basic concept of the software sensor is illustrated in
Concept of software sensor.
In fact, the software sensor concept has been applied in a few studies. Christensen
The software sensor concept also has been applied in WWTPs. Alastair
Total nitrogen and TP in streams or wastewater have been measured using software sensors by a few researchers. Jeong
In this study, software sensors (or regression models) were developed to estimate TN and TP of different waters (
Water samples for the current study were collected from the Daejeon area in the middle of South Korea (
Water samples were collected weekly from March, 2011 to June, 2012. In
Water sampling locations.
Namely, water quality data collected from March 2011 to August 2011 were used for model development, and the data from September 2011 to June 2012 were used for model validation.
Conditions of water quality analysis.
Water Type  Sampling points  Number of samples 

WWTP effluent  1  77 
CSOs  3  239 
Streams  15  228 
Lakes  3  1,183 
All the water quality parameters except TP and TN in
Water quality parameters monitored in this study.
Water quality Parameters  Unit  Measurement Method  

Variables measured by sensors  DO  mg·L^{−1}  Electrode Method (YSI6600EDS SONDE) 
pH    
EC  μS·cm^{−1}  
Turb  NTU  
Variables measured by chemical analysis  PO_{4}–P  mg·L^{−1}  IC (DIONEXICS1100) 
NO_{2}–N  mg·L^{−1}  
NO_{3}–N  mg·L^{−1}  
NH_{4}–N  mg·L^{−1}  
TP  mg·L^{−1}  Ascorbic Acid Method  
TN  mg·L^{−1}  Persulfate Method 
Initially, the correlation between different water quality parameters was analyzed. For better understanding the relationship, a scatter diagram was first drawn for pairs between TN or TP and each of the other water quality parameters. A scatter diagram can visually show the relative strength of the relationship between each pair of variables; the direction (
Dominant variables, which were derived as the result of a scatter diagram analysis, were utilized to develop a software sensor to predict TN and TP through the MLR analysis as a next step. An MLR is an analytical method used to develop an equation to relate a dependent variable
A general MLR equation (or the software sensor in this study) is provided below (Equation (1)):
In this study, we applied the stepwise regression based on forward selection. Namely, we started with a model with one explanatory variable that had been identified as the most significant, and added variables one by one until we could not improve the model significantly by adding another variable [
Comparison of water TN and TP concentrations for different water types (circles and stars indicate outliers).
TN and TP of water samples from different locations.
Parameters  Type  Min  Max  Mean  Median  Standard deviation 

TN  WWTP  1.36  23.01  9.179  7.897  4.188 
CSOs  10.08  41.31  27.415  28.250  6.450  
Stream  0.32  17.30  4.112  3.297  2.747  
Lake  0.19  7.44  1.739  1.549  1.021  
TP  WWTP  0.052  1.646  0.445  0.374  0.334 
CSOs  0.274  9.700  3.051  2.855  1.495  
Stream  0.007  0.950  0.176  0.145  0.132  
Lake  0.005  0.350  0.097  0.088  0.062 
Both box plots for the TN and TP concentrations of the CSOs have long whiskers indicating the widespread data. Another notable feature is that water quality data for the streams and lakes have a few outliers exceeding 1.5× interquartile range, compared with those for other water types [
The scatter plots constructed for all data measured from March, 2011 to August, 2011 in this study are shown in
In fact, the relatively lower correlation between TN or TP and other water quality parameters for stream and lake waters was expected. The water qualities of the lake and the stream are often affected by the external pollutant sources, internal reactions, or weather conditions.
Typically, DO, pH and EC data did not show significant correlation with the TN (r = −0.18 − 0.18 for DO, r = −0.37 − 0.02 for pH and r = −0.42 − 0.48 for EC) or the TP concentrations (r = −0.52 − 0.01 for DO, r = −0.28 − 0.44 for pH and r = −0.42 − 0.21 for EC) for all water types.
Scatter plots of water quality parameters for four water types.
With the datasets for the WWTP effluent, the stepwise MLR analysis was conducted. The result of the regression analysis is summarized in
Model_{N}3 for estimating TN in
Variance analysis of models predicting TN and TP of WWTP effluent.
TN (Dependent variable)  TP (Dependent variable)  

Model  Mean square  Model  Mean square  
Model_{N}1 ^{a}  552.371  0.882  <0.01  Model_{P}1 ^{a}  4.582  0.936  <0.01 
Model_{N}2 ^{b}  305.321  0.975  <0.01  
Model_{N}3 ^{c}  204.081  0.978  <0.01  
Independent variables  Independent variables  
a NH_{4}–N  a PO_{4}–P  
b NH_{4}–N, NO_{3}–N  
c NH_{4}–N, NO_{3}–N, PO_{4}–P 
With the water quality parameters measured for CSOs waters, the stepwise MLR analysis was conducted. The result of the analysis is summarized in
Variance analysis of models predicting TN and TP of CSOs.
TN (Dependent variable)  TP (Dependent variable)  

Model  Mean square 

pvalue  Model  Mean square 

pvalue 
Model_{N}1 ^{a}  3518.589  0.858  <0.01  Model_{P}1 ^{a}  325.279  0.902  <0.01 
Model_{N}2 ^{b}  1781.741  0.869  <0.01  Model_{P}2 ^{b}  165.252  0.917  <0.01 
Independent variables  Independent variables  
a NH_{4}–N  a PO_{4}–P  
b NH_{4}–N, PO_{4}–P  b PO_{4}–P, NH_{4}–N 
From the scatter plots for the CSOs water, five variables (
Using the water quality data for stream waters, a stepwise MRL analysis was carried out. The summary of the analysis is provided in
Variance analysis of models predicting TN and TP of stream water.
TN (Dependent variable)  TP (Dependent variable)  

Model  Mean square 

Model  Mean square 


Model_{N}1 ^{a}  3135.004  0. 633  <0.01  Model_{P}1 ^{a}  8.892  0.675  <0.01 
Model_{N}2 ^{b}  2001.062  0.808  <0.01  Model_{P}2 ^{b}  4.759  0.723  <0.01 
Model_{N}3 ^{c}  1361.633  0.825  <0.01  Model_{P}3 ^{c}  3.244  0.739  <0.01 
Model_{N}4 ^{d}  1026.397  0.829  <0.01  Model_{P}4 ^{d}  2.44  0.741  <0.01 
Model_{N}5 ^{e}  827.979  0.836  <0.01  Model_{P}5 ^{e}  1.957  0.743  <0.01 
Model_{N}6 ^{f}  693.635  0.84  <0.01  Model_{P}6 ^{f}  1.636  0.746  <0.01 
Independent variables  Independent variables  
a NH_{4}–N  a PO_{4}–P  
b NH_{4}–N, NO_{3}–N  b PO_{4}–P, Turb  
c NH_{4}–N, NO_{3}–N, Turb  c PO_{4}–P, Turb, NH_{4}–N  
d NH_{4}–N, NO_{3}–N, Turb ,EC,  d PO_{4}–P, Turb, NH_{4}–N, NO_{2}–N  
e NH_{4}–N, NO_{3}–N, Turb, EC, NO_{2}–N,  e PO_{4}–P, Turb, NH_{4}–N, NO_{2}–N, NO_{3}–N  
f NH_{4}–N, NO_{3}–N, Turb, EC, NO_{2}–N, pH,  f PO_{4}–P, Turb, NH_{4}–N, NO_{2}–N, NO_{3}–N, pH 
Using the water quality data for water samples collected from the lake of interest, the stepwise MLR analysis was conducted. The summary of the results is provided in
The case for predicting TP concentration was similar to the one for TN. The model with PO_{4}–P, EC, and NO_{3}N as independent variables (
Again, as the case with the stream waters, the TP concentrations of lake waters was too low; all the data was below 0.5 mg·L^{−1}. Therefore, it was hypothesized that errors from manual measurements might affect the overall predictability of the models.
Variance analysis of models predicting TN and TP of lake water.
TN(Dependent Variable)  TP(Dependent Variable)  

Model  Mean square 

Model  Mean square 


Model_{N}1 ^{a}  64.883  0.348  <0.01  Model_{P}1 ^{a}  0.305  0.572  <0.01 
Model_{N}2 ^{b}  38.921  0.417  <0.01  Model_{P}2 ^{b}  0.16  0.599  <0.01 
Model_{P}3 ^{c}  0.109  0.612  <0.01  
Independent variables  Independent variables  
a Turb  a PO_{4}–P  
b Turb, NO_{3}–N  b PO_{4}–P, EC  
c PO_{4}–P, EC, NO_{3}–N 
The best regression models for TN and TP derived from each MLR analysis for each water type are listed in
Software sensors obtained from MLR analysis.
Sites  Estimated parameters  Correlation equations 


WWTP effluent  TN  0.881 + 0.986 × NH_{4}–N + 1.092 × NO_{3}–N + 0.631 × PO_{4}–P  0.978 
TP  0.148 + 0.946 × PO_{4}–P  0.936  
CSOs  TN  5.918 + 0.857 × NH_{4}–N + 0.405 × PO_{4}–P  0.869 
TP  0.500 + 0.851 × PO_{4}–P + 0.04 × NH_{4}–N  0.917  
Stream water  TN  4.569 + 1.025 × NH_{4}–N + 0.838 × NO_{3}–N + 0.018 × Turb − 0.004 × EC + 5.432 × NO_{2}–N − 0.336 × pH  0.840 
TP  0.171 + 0.964 × PO_{4}–P + 0.002 × Turb + 0.008 × NH_{4}–N + 0.190 × NO_{2}–N − 0.01 × NO_{3}–N − 0.013 × pH  0.746  
Lake water  TN  0.361 + 0.158 × Turb + 0.693 × NO_{3}–N  0.417 
TP  0.158 + 0.962 × PO_{4}–P − 0.001 × EC − 0.017×NO_{3}–N  0.612 
These regression equations can be used as a software sensor. As stated above, the equations for the WWTP effluent and CSOs water have higher
Comparison of PO_{4}–P and TP concentrations for each water type.
Comparisons between measured TN or TP concentrations and those predicted by the software sensors for each water type were made in
For the validation of the developed models, the regression models were applied to another set of measured water quality data for each water type collected from September 2011 to June 2012. As shown in
Comparison of measured and estimated TN concentrations for each water type.
Comparison of measured and estimated TP concentrations for each water type.
Validation of TN models for each water type.
Validation of TP models for each water type.
Time series of TN concentration predicted by software sensor.
Time series of TP concentration predicted by software sensor.
In this study, software sensors (or linear regression models) based on the MLR analysis algorithms were developed; they utilized other water quality parameters for predicting TN and TP concentrations of WWTP effluent, CSOs, stream water, and lake water. Initially, a few independent variables such as pH, DO, EC, Turb, NO_{2}–N, NO_{3}–N, NH_{4}–N, and PO_{4}–P concentrations were evaluated for their individual correlation with TN or TP; the variables with higher correlation with TN and TP were incorporated in the software sensors (or regression models) as an independent variables.
In fact, the developed software sensors predicted the TN and TP concentrations for the WWTP effluent and CSOs waters reasonably well. In the case of the stream and lake waters, the predictability of the software sensors was relatively low, probably due to the low concentration ranges for the nutrients (especially for the TP) and variability of the ratios of PO_{4}–P to TP concentrations due to the external influence to the water bodies, such as nonpoint source pollution or weather changes.
From the result, nonetheless, it is expected that the proposed strategy (
This work was supported by the R&D program of MKE/KEIT (R&D program number: 10037331, Development of Core Water Treatment Technologies based on Intelligent BTNTIT Fusion Platform).