Principal Component Weighted Index for Wastewater Quality Monitoring

The quality of raw and treated wastewater was evaluated using the principal component weighted index (PCWI) which was defined as a sum of principal component scores weighted according to their eigenvalues. For this purpose, five principal components (PCs) explaining 88% and 83% of the total variability of raw and treated wastewater samples, respectively, were extracted from 11 original physico-chemical parameters by robust principal component analysis (PCA). The PCWIs of raw and treated wastewater were analyzed in terms of their statistical distributions, temporal changes, mutual correlations, correlations with original parameters, and common water quality indexes (WQI). The PCWI allowed us to monitor temporal wastewater quality by one parameter instead of several. Unlike other weighted indexes, the PCWI is composed of independent variables with minimal information noise and objectively determined weights.

The composite indexes are based on a principle of the simple additive weighting (SAW) method combining independent criteria of which importance are expressed by their statistical weights [31]. These appropriate weights can be determined by subjective and objective methods. Subjective methods estimate the weights based on expert opinions and judgments of decision makers [32,33] or recommended standards [34]. A typical representation of this approach is the analytical hierarchy process (AHP) developed by Saaty [35]. Other methods are described in the literature [36,37]. The determination of objective weights is based on the application of various statistical measures, such as variably [38,39], correlation [40,41], and information content [42].
The above given requirements on the SAW model are in line with the basic properties of principal components created by principal component analysis (PCA): (i) the components are orthogonal and thus independent, and (ii) the components' weights correspond to their eigenvalues. Therefore, principal components were already used to construct various composite indexes characterizing the socioeconomic situation [43], soil quality [11], environment assessment [12], and surface water quality [44].
The aim of this paper is to demonstrate the utilization of principal component weighted index (PCWI) for the monitoring of raw and treated wastewater quality, which has never been described in the literature. Wastewater quality is of high importance nowadays because it is associated with the Table 1. Summary statistics of raw wastewaters (n = 67).    The 67 raw and treated wastewater samples were characterized by 11 physico-chemical parameters, such as BOD after 5 days, chemical oxygen demand by dichromate (COD), total phosphorus (TP), total nitrogen (TN), total suspended solids (TSS), total dissolved salts (TDS), pH, ammonium, nitrate, nitrite, and phosphate. Water analyses including sampling and preservation were performed according to ISO

Robust Principal Component Analysis
Principal component analysis looks for new latent variables of n samples, which are orthogonal (not correlated) to each other [48]. Each latent variable principal component is a linear combination of p variables x i and describes a different source of total variation where X (n × m) is the data matrix, T (n × p) and W (m × p) are the matrixes of principal components scores and loadings, respectively, and E (n × m) is the residual matrix. Classical PCA can be performed by the eigenvalue decomposition of a correlation matrix. Robust PCA was performed by the eigenvalue decomposition of a correlation matrix converted from an estimated covariance matrix with the lowest possible determinant computed using the minimum covariance determinant (MCD) algorithm [49,50]. The covariance matrix was computed using a subroutine (mcdcov) in MATLAB (see below). The MCD estimator is considered to be a highly robust estimator of multivariate location and scatter.

Principal Component Weighted Index
The principal component weighted index was defined in consistency with the SAW model as where u k stands for the weight of k-th PC computed as and where λ k is the eigenvalue of k-th PC and q is the number of selected principal components.

Statistical Calculations
The original data matrixes of 67 wastewater samples were set up and processed in MS Excel. Statistical calculations were performed using the software packages QC.Expert (Trilobyte, Pardubice, Czech Republic) and XLSTAT 2018 (Addinsoft, Boston, MA, USA). The data smoothing was performed by the fast Fourier transform (FFT) algorithm in the program OriginPro 9.0.0. (Origin Corporation, Northampton, MA, USA). The data were also standardized in order for us to avoid misclassifications arising from different orders of magnitude of variables. For this purpose, the data was mean (µ) centered and scaled by standard deviations (σ) as (x − µ)/σ. The statistical calculations were performed at the α = 0.05 significance level.

Principal Component Analysis
Robust PCA was performed due to non-normal distributions of the physico-chemical parameters characterizing the wastewaters composition (see Tables 1 and 2). The eigenvalue decomposition of covariance matrixes with the lowest possible determinant was computed using the MCD algorithm. Based on the PCA results, the wastewater samples were characterized by a few first PCs and relationships between original parameters were discussed.
There is no universal rule for the estimation of a number of PCs. The first five principal components for both raw and treated wastewater were selected according to the magnitudes of corresponding eigenvalues, which should be equal to or higher than 1 [51], and according to their scree plots [52]. The eigenvalue scree and cumulative variability plots are demonstrated in Figure S1. In both cases, the selected PCs explained 88% and 83% of the total variability of raw and treated wastewater, respectively. It agrees with another traditional common rule that the cumulative proportion of variance could be explained by at least 80% [53].

Interpretation of Selected Principal Components
PCA often includes the interpretation of PCs which is necessary to understand the data structure. The component loadings summarized in Tables 3 and 4 can explain relationships among the original variables (parameters). In the case of the raw wastewater, the 1st principal component (PC1) was saturated mainly by ammonium, TN, BOD, COD, phosphate, and TP. All these parameters characterize organic and inorganic compounds occurring in municipal wastewater. The 2nd principal component (PC2) was affected by nitrate and nitrite resulting from nitrification processes in raw wastewater. The 3rd principal component (PC3) was affected mostly by TSS and TDS, the 4th principal component (PC4) by BOD and TDS, and 5th component (PC5) by pH. In the case of the treated wastewater, PC1 was mainly affected by nitrate, phosphate, and also by TN and TP, that is, by nutrients which went through the aerobic part of an activation tank. PC2 was affected by BOD and COD, characterizing the content of organic compounds which were persistent to the treatment process. PC3 was mainly saturated by ammonium and pH, indicating the presence of un-oxidized ammonium under acidic conditions during nitrification process as follows PC4 and PC5 were saturated by TDS (PC4), nitrate, and TSS (PC5) which were of low concentrations and thus contributed to the less important PCs.

Principal Component Weighted Index
Considering the fact that scores of individual PCs have different variability depending on their eigenvalues, both PCWIs were composed of the five weighted PCs and plotted in Figure 1. Since the samples were taken approximately monthly, the vertical axis (Sample) also presents the time axis. The plots were also smoothed by the FFT procedure (red and blue curves) for us to clearly see the temporal wastewater quality changes. Approximately six-month cycles were observed and also confirmed by the time series analysis performed using the seasonal autoregressive integrated moving average model (SARIMA). The PCWI values corresponding to the raw wastewater were slowly elevated in time. During the first 48 months they were lower than those of treated wastewater and then became higher. The reason is that new buildings were connected to a local sewage system and wastewater pollution increased. The PCWI of treated wastewater continually elevated up to the 45th month and then changed, similar to the case of the raw wastewater. The similarities between both PCWI values are discussed below.
Water 2019, 11, x FOR PEER REVIEW 5 of 13

Principal Component Weighted Index
Considering the fact that scores of individual PCs have different variability depending on their eigenvalues, both PCWIs were composed of the five weighted PCs and plotted in Figure 1. Since the samples were taken approximately monthly, the vertical axis (Sample) also presents the time axis. The plots were also smoothed by the FFT procedure (red and blue curves) for us to clearly see the temporal wastewater quality changes. Approximately six-month cycles were observed and also confirmed by the time series analysis performed using the seasonal autoregressive integrated moving average model (SARIMA). The PCWI values corresponding to the raw wastewater were slowly elevated in time. During the first 48 months they were lower than those of treated wastewater and then became higher. The reason is that new buildings were connected to a local sewage system and wastewater pollution increased. The PCWI of treated wastewater continually elevated up to the 45th month and then changed, similar to the case of the raw wastewater. The similarities between both PCWI values are discussed below.

Validation of PCWI
The normal distributions of the raw and treated wastewater PCWIs were confirmed by several common statistical tests, such as the Kolmogorov-Smirnov test (p = 0.523 and p = 0.741, respectively), the Shapiro-Wilk test (p = 0.506 and p = 0.915), Anderson-Darling test (p = 0.415 and p = 0.863), and Jarque-Bera test (p = 0.730 and p = 0.811). Their normality was also documented by their skewness and kurtosis of 0.153 and 0.107, resp. 2.64 and 2.64 for the raw and treated wastewater. Their standard deviations of 2.992 and 1.437 for raw and treated wastewaters, respectively, were consistent with the summary statistics given in Tables 1 and 2.
Although the normality was confirmed, the PCWIs were further analyzed by means of the Gaussian mixture modelling using the iterative EM algorithm [54]. The number of mixtures (groups) was determined according to the Bayesian information criterion, the Akaike information criterion, the Integrate complete likelihood, and the Normalized entropy criterion.

Validation of PCWI
The normal distributions of the raw and treated wastewater PCWIs were confirmed by several common statistical tests, such as the Kolmogorov-Smirnov test (p = 0.523 and p = 0.741, respectively), the Shapiro-Wilk test (p = 0.506 and p = 0.915), Anderson-Darling test (p = 0.415 and p = 0.863), and Jarque-Bera test (p = 0.730 and p = 0.811). Their normality was also documented by their skewness and kurtosis of 0.153 and 0.107, resp. 2.64 and 2.64 for the raw and treated wastewater. Their standard deviations of 2.992 and 1.437 for raw and treated wastewaters, respectively, were consistent with the summary statistics given in Tables 1 and 2.
Although the normality was confirmed, the PCWIs were further analyzed by means of the Gaussian mixture modelling using the iterative EM algorithm [54]. The number of mixtures (groups) was determined according to the Bayesian information criterion, the Akaike information criterion, the Integrate complete likelihood, and the Normalized entropy criterion. Figure 2  The similarity between the PCWI values of raw and treated wastewater was also confirmed by Pearson's and Spearman's correlation coefficients of 0.761 and 0.743, respectively, indicating their significant correlation demonstrated in Figure 3. The monitored BWWTP treated municipal wastewater was coming from households and non-industrial institutions which is why no unexpected pollution was supposed. However, five samples 17,22,35,51, and 62 lying out of the confidence ellipse indicated some deviations from the steady-state treatment process. For example, the sample 35 was characterized by the high PCWIs corresponding to both raw and treated wastewater. This demonstrates a situation when the raw wastewater was treated very effectively in terms of BOD and COD but not in terms of TN and TDS. The concentrations of nitrite and TDS were too high in the raw wastewater and the concentrations of ammonium, TN, BOD, and COD were too high in the treated wastewater, likely due to some problems in the treatment technology. The physico-chemical parameters of these outlying samples are summarized in Table S1. Individual parameters were assessed by means of the The similarity between the PCWI values of raw and treated wastewater was also confirmed by Pearson's and Spearman's correlation coefficients of 0.761 and 0.743, respectively, indicating their significant correlation demonstrated in Figure 3. The monitored BWWTP treated municipal wastewater was coming from households and non-industrial institutions which is why no unexpected pollution was supposed. The similarity between the PCWI values of raw and treated wastewater was also confirmed by Pearson's and Spearman's correlation coefficients of 0.761 and 0.743, respectively, indicating their significant correlation demonstrated in Figure 3. The monitored BWWTP treated municipal wastewater was coming from households and non-industrial institutions which is why no unexpected pollution was supposed. However, five samples 17, 22, 35, 51, and 62 lying out of the confidence ellipse indicated some deviations from the steady-state treatment process. For example, the sample 35 was characterized by the high PCWIs corresponding to both raw and treated wastewater. This demonstrates a situation when the raw wastewater was treated very effectively in terms of BOD and COD but not in terms of TN and TDS. The concentrations of nitrite and TDS were too high in the raw wastewater and the concentrations of ammonium, TN, BOD, and COD were too high in the treated wastewater, likely due to some problems in the treatment technology. The physico-chemical parameters of these outlying samples are summarized in Table S1. Individual parameters were assessed by means of the  However, five samples 17, 22, 35, 51, and 62 lying out of the confidence ellipse indicated some deviations from the steady-state treatment process. For example, the sample 35 was characterized by the high PCWIs corresponding to both raw and treated wastewater. This demonstrates a situation when the raw wastewater was treated very effectively in terms of BOD and COD but not in terms of TN and TDS. The concentrations of nitrite and TDS were too high in the raw wastewater and the concentrations of ammonium, TN, BOD, and COD were too high in the treated wastewater, likely due to some problems in the treatment technology. The physico-chemical parameters of these outlying samples are summarized in Table S1. Individual parameters were assessed by means of the Box and Whisker plots.
There is no "gold" standard composite index which could serve for the PCWI verification. Therefore, the PCWIs were verified based on their relationships with the individual physico-chemical parameters. One example concerning COD is shown in Figure 4. COD is the common parameter used for characterization of the total content of organic and inorganic compounds which can be oxidized by potassium dichromate. Since the COD values in raw and treated wastewaters are very different, their standardized ones were plotted in one graph. The temporal changes of the standardized CODs were similar to those of the PCWI values. The six-month periods, as well as their elevation concerning the raw wastewaters, were observed. The probable reason was already mentioned in case of the PCWI. There is no "gold" standard composite index which could serve for the PCWI verification. Therefore, the PCWIs were verified based on their relationships with the individual physico-chemical parameters. One example concerning COD is shown in Figure 4. COD is the common parameter used for characterization of the total content of organic and inorganic compounds which can be oxidized by potassium dichromate. Since the COD values in raw and treated wastewaters are very different, their standardized ones were plotted in one graph. The temporal changes of the standardized CODs were similar to those of the PCWI values. The six-month periods, as well as their elevation concerning the raw wastewaters, were observed. The probable reason was already mentioned in case of the PCWI. The similarities between the PCWIs and the original parameters were also documented by their correlation coefficients summarized in Tables S2 and S3. In the case of the raw wastewater, the PCWI significantly correlated with the parameters except nitrite, pH, and TDS. Nitrite was of low concentration, the pH changed very little, and the TDS changed independently on all the parameters except BOD. In case of the treated wastewater, the correlations were also insignificant for the parameters occurring in low concentrations, such as ammonium, BOD, nitrite, and TSS. The low concentrations of BOD corresponded to the high treatment efficiency of 95% declared by the BWWTP designer.

Comparison of PCWI with WQI
The validation of PCWI was also performed by its comparison with commonly used water quality index (WQI) which is defined as follows where Ci and Pi are the normalized values and relative weights assigned to each parameter i. The normalization was performed by dividing the values of each parameter by its maximal one. The relative weights ranged from 1 to 4 according to their importance for an aquatic system, which means that they are subjective: Pi = 4 for TSS; Pi = 3 for NH4 + , BOD, and COD; Pi = 2 for TDS, TN, NO3 − , and NO2 − ; Pi = 1 for TP, PO4 3− , and pH. They were adopted from several papers dealing with assessment of surface waters [23,27,30]. Figure 5 shows the correlation of PCWI and WQI, indicating The similarities between the PCWIs and the original parameters were also documented by their correlation coefficients summarized in Tables S2 and S3. In the case of the raw wastewater, the PCWI significantly correlated with the parameters except nitrite, pH, and TDS. Nitrite was of low concentration, the pH changed very little, and the TDS changed independently on all the parameters except BOD. In case of the treated wastewater, the correlations were also insignificant for the parameters occurring in low concentrations, such as ammonium, BOD, nitrite, and TSS. The low concentrations of BOD corresponded to the high treatment efficiency of 95% declared by the BWWTP designer.

Comparison of PCWI with WQI
The validation of PCWI was also performed by its comparison with commonly used water quality index (WQI) which is defined as follows where C i and P i are the normalized values and relative weights assigned to each parameter i. The normalization was performed by dividing the values of each parameter by its maximal one. The relative weights ranged from 1 to 4 according to their importance for an aquatic system, which means that they are subjective: P i = 4 for TSS; P i = 3 for NH 4 + , BOD, and COD; P i = 2 for TDS, TN, NO 3 − , and NO 2 − ; P i = 1 for TP, PO 4 3− , and pH. They were adopted from several papers dealing with assessment of surface waters [23,27,30]. Figure 5 shows the correlation of PCWI and WQI, indicating a good agreement between both indexes. It is possible to emphasize that, unlike WQI, PCWI works with objective weights computed for particular water composition.
Water 2019, 11, x FOR PEER REVIEW 8 of 13 a good agreement between both indexes. It is possible to emphasize that, unlike WQI, PCWI works with objective weights computed for particular water composition. These relative weights were also used for computing WQI concerning the treated wastewater but the correlation was very weak. The weights were likely not appropriate for this type of water, but the more suitable ones were not found in the literature.
The normal distribution of WQI was also confirmed by the Kolmogorov-Smirnov test (p = 0.607), Shapiro-Wilk test (p = 0.266), Anderson-Darling test (p = 0.372), and Jarque-Bera test (p = 0.694). The normality was also documented by a skewness of 0.212 and a kurtosis of 2.71. Sensitivity of PCWI and WQI to detect outlying observations was compared using the Grubbs test. The Z-scores of PCWI and WQI were plotted for all samples and those above and/or below +2 and −2 were detected as outliers. Figure 6 shows that using the PCWI Z-scores five samples (47,51,61,62, and 64) were detected, and using the WQI ones only two samples (35 and 51) were identified. These results demonstrate that PCWI is more sensitive for the identification of anomalies in water composition.  These relative weights were also used for computing WQI concerning the treated wastewater but the correlation was very weak. The weights were likely not appropriate for this type of water, but the more suitable ones were not found in the literature.

Examples of Possible PCWI Applications
The normal distribution of WQI was also confirmed by the Kolmogorov-Smirnov test (p = 0.607), Shapiro-Wilk test (p = 0.266), Anderson-Darling test (p = 0.372), and Jarque-Bera test (p = 0.694). The normality was also documented by a skewness of 0.212 and a kurtosis of 2.71. Sensitivity of PCWI and WQI to detect outlying observations was compared using the Grubbs test. The Z-scores of PCWI and WQI were plotted for all samples and those above and/or below +2 and −2 were detected as outliers. Figure 6 shows that using the PCWI Z-scores five samples (47,51,61,62, and 64) were detected, and using the WQI ones only two samples (35 and 51) were identified. These results demonstrate that PCWI is more sensitive for the identification of anomalies in water composition.
Water 2019, 11, x FOR PEER REVIEW 8 of 13 a good agreement between both indexes. It is possible to emphasize that, unlike WQI, PCWI works with objective weights computed for particular water composition. These relative weights were also used for computing WQI concerning the treated wastewater but the correlation was very weak. The weights were likely not appropriate for this type of water, but the more suitable ones were not found in the literature.
The normal distribution of WQI was also confirmed by the Kolmogorov-Smirnov test (p = 0.607), Shapiro-Wilk test (p = 0.266), Anderson-Darling test (p = 0.372), and Jarque-Bera test (p = 0.694). The normality was also documented by a skewness of 0.212 and a kurtosis of 2.71. Sensitivity of PCWI and WQI to detect outlying observations was compared using the Grubbs test. The Z-scores of PCWI and WQI were plotted for all samples and those above and/or below +2 and −2 were detected as outliers. Figure 6 shows that using the PCWI Z-scores five samples (47, 51, 61, 62, and 64) were detected, and using the WQI ones only two samples (35 and 51) were identified. These results demonstrate that PCWI is more sensitive for the identification of anomalies in water composition.

Examples of Possible PCWI Applications
As already mentioned, the PCWI has the potential to describe wastewater quality depending on time. The first example of a possible application concerns the evaluation of seasonal raw wastewater quality in various years as displayed in Figure 7.
Water 2019, 11, x FOR PEER REVIEW 9 of 13 As already mentioned, the PCWI has the potential to describe wastewater quality depending on time. The first example of a possible application concerns the evaluation of seasonal raw wastewater quality in various years as displayed in Figure 7.  Table 5 indicate that the wastewater quality in 1997 and 2001 (p = 0.031) was different. It was confirmed by the median concentrations of ammonium, TN, and COD (see Table S4).  The second example concerns the evaluation of the PCWI values according to warning and control limits in analogy to the Shewhart control charts [55,56]. Upper (UWL) and lower warning limits (LWL), and upper (UCL) and lower control limits (UCL) were computed as μ ± 2σ and μ ± 3σ, respectively (see Table 6). A majority of applications of the weighted indexes uses linear scales such as 0-25 excellent, 26-50 good, 51-75 poor, 76-100 very poor, >100 unsuitable [23,24,57]. The suggested ranks were based on the limits specific for this study. The number N of samples in the individual ranks are listed in Table 6. The PCWIs between UCL and UWL (rank I) as well as between LWI and LCL (rank IV) signify significant deviations from the steady-state treatment process: five samples (7.5%) of raw wastewater and four samples (6.0%) of treated wastewater. Such PCWIs ranking could be useful for operators to simply check raw and treated wastewater quality and to control working conditions on BWWTPs.  Table 5 indicate that the wastewater quality in 1997 and 2001 (p = 0.031) was different. It was confirmed by the median concentrations of ammonium, TN, and COD (see Table S4). The second example concerns the evaluation of the PCWI values according to warning and control limits in analogy to the Shewhart control charts [55,56]. Upper (UWL) and lower warning limits (LWL), and upper (UCL) and lower control limits (UCL) were computed as µ ± 2σ and µ ± 3σ, respectively (see Table 6). A majority of applications of the weighted indexes uses linear scales such as 0-25 excellent, 26-50 good, 51-75 poor, 76-100 very poor, >100 unsuitable [23,24,57]. The suggested ranks were based on the limits specific for this study. The number N of samples in the individual ranks are listed in Table 6. The PCWIs between UCL and UWL (rank I) as well as between LWI and LCL (rank IV) signify significant deviations from the steady-state treatment process: five samples (7.5%) of raw wastewater and four samples (6.0%) of treated wastewater. Such PCWIs ranking could be useful for operators to simply check raw and treated wastewater quality and to control working conditions on BWWTPs.
In general, the composite indexes are not supposed to be of universal validity and ability to describe reality in detail because one parameter cannot substitute a variety of variables [17,21]. Despite this, the PCWI can be employed as a useful indicator, providing overall information about water quality depending on the type of wastewater and temporal (seasonal) effects.

Conclusions
The wastewater quality before and after treatment was characterized by the principal component weighted index constructed as the sum of weighted PCs scores. The robust PCA of the 67 raw and treated wastewater samples extracted five principal PCs explaining 88%, resp. 83% of the total data variability. Based on the PCs loadings, the relationships among the original parameters were discussed. The PCWIs plots were constructed to show the temporal water quality changes. The six-month PCWIs cycles were identified. Using the Gaussian mixture modelling the PCWI values were separated into two groups of samples in agreement with the PCWI temporal plots. The PCWIs scatter plot identified the samples that deviated from the steady-state treatment. PCWI and WQI computed for the raw wastewater were compared and found to be in good agreement.
The possible application of PCWI for the raw wastewater quality monitoring was demonstrated by the evaluation of wastewater quality during 1997-2001 using a non-parametric Kruskal-Wallis test. The years 1997 and 2001 were found to be different which was explained comparing the median concentration of ammonium, BOD, and COD. The PCWI application in analogy with the Shewhart warning and control limits was also demonstrated. The PCWI was found to be used for the overall characterization of wastewater quality, especially from the temporal point of view.