# Information Entropy for Evaluation of Wastewater Composition

## Abstract

**:**

## 1. Introduction

_{i}and derived the relationship $S=-kn{\displaystyle \sum _{i=1}^{n}{p}_{i}}\mathrm{ln}{p}_{i}$ which inspired C. E. Shannon’s (1916–2001) concept of information entropy [1,2].

## 2. Materials and Methods

#### 2.1. Sample Collection and Analysis

#### 2.2. Entropy Calculation

_{j}of each variable x

_{j}(the number of variables is m) describing n observations can be defined by Shannon’s relationship [1] as

_{i,j}is the probability of x

_{j}occurrence; it holds: $\sum _{i=1}^{n}{p}_{i,j}}=1$. The maximal entropy is defined as ${H}_{j,\mathrm{max}}=\mathrm{ln}n$. The probabilities p

_{i,j}can be approximated with relative frequencies f

_{i,j}calculated using histograms for N intervals as follows

_{j}is the mean of parameter x

_{j}calculated from n samples and w

_{j}is the entropy weight. It holds: $\sum _{j=1}^{m}{w}_{j}=1$. The ratio $\frac{{x}_{i,j}}{{\mu}_{j}}$ compensates the different scales and units of the parameters and can be considered as a relative concentration. The entropy weights were calculated as

#### 2.3. Principal Component Analysis

_{i}and describes a different source of total variation

**X**=

**T W**+

^{T}**E**

**X**(n x m) is the data matrix,

**T**(n x p) and

**W**(m x p) are the matrix of principal components scores and loadings, respectively, and

**E**(n x m) is the residual matrix representing noise. Classical PCA can be performed by eigenvalue decomposition of a correlation matrix or singular value decomposition (SVD) of an original data matrix [35,36]. RPCA was performed by the eigenvalue decomposition of an estimated correlation matrix with the lowest possible determinant computed using a minimum covariance determinant (MCD) algorithm [37,38,39]. It was computed using a subroutine (mcdcov) in MATLAB (see below).

#### 2.4. Mahalanobis Distance

_{i}can be computed as

_{i},

**x**is the row vector of variable x

_{i}and

**C**is the covariance matrix. The robust Mahalanobis distance of the variable x

_{i}can be computed as

**x**is the row vector of variable x

_{i,}μ

_{M}is the MCD estimation of location and Σ is the MCD estimated covariance matrix. The MCD estimator is considered to be a highly robust estimator of multivariate location and scatter.

#### 2.5. Statistic Calculations

## 3. Results and Discussion

#### 3.1. Entropy and Entropy Weights of Wastewater Parameters

_{j}were scaled and centred to obtain the transformed parameters y

_{j}which were further used by us to approximate their density functions p

_{i,j}by relative frequencies f(y

_{i,j}) to be used in Equation (2). Two examples of histograms with the highest (PO

_{4}

^{3−}) and lowest entropy (CN

^{−}) are shown in Figure 1.

_{4}

^{3−}> NH

_{4}

^{+}> TDS > TN > pH > BOD > COD > TSS > TP > phenol > CN

^{−}. Based on explanatory analysis, for example the P-P plot shown in Figure S1 (Supplementary Materials), the parameters were separated into two groups: the first group contained the parameters with higher entropy, such as PO

_{4}

^{3−}, NH

_{4}

^{+}, TDS, TN, pH, BOD and COD, and the second one consisted of TSS, TP, phenol and CN

^{−}with lower entropy. It is obvious that entropy decreased with increasing kurtosis and skewness. The high values of kurtosis and skewness are typical for the variables, which changed in narrow intervals and existed in low magnitudes and, thus, their distributions were tailed. This is the case for the parameters in the second group.

#### 3.2. Entropy Weighted Index

_{j}is called the relative redundancy and can be interpreted as a degree of diversification of information provided [15,16,18,25,30,44]. In information theory, the entropy weights represent useful information on variables (parameters). In other words, the higher the entropy weight, the more useful information on the parameter and vice versa.

#### 3.3. Statistical Analysis of EWI Data

#### 3.4. Verification of EWI

_{k}stands for the weight of k-th principal component calculated as

_{k}is the eigenvalue of k-th PC and q is the number of selected principal components. The objectivity of PCWI is based on the following facts: (i) principal components are orthogonal and thus independent which is consistent with the SAW theory and (ii) the weights of principal components correspond to their eigenvalues expressing their importance. When all 11 principal components were used (q = 11) their weights were equal to their variabilities. The scree and cumulative plots are shown in Figure 4.

## 4. Conclusions

_{4}

^{3−}> NH

_{4}

^{+}> TDS > TN > pH > BOD > COD > TSS > TP > phenol > CN

^{−}. According to the entropy values the parameters were separated into two groups: (i) phosphate, ammonium, TDS, TN, pH, BOD and COD and (ii) TSS, TP, phenol and cyanide. The parameters from the first group should be monitored frequently because of their higher uncertainty in terms of the higher temporal changes.

## Supplementary Materials

## Funding

## Conflicts of Interest

## References

- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J.
**1948**, 27, 379–423. [Google Scholar] [CrossRef] [Green Version] - Dang, T.K.L.; Meckbach, C.; Tacke, R.; Waack, S.; Gültas, M.A. Novel Sequence-Based Feature for the Identification of DNA-Binding Sites in Proteins Using Jensen–Shannon Divergence. Entropy
**2016**, 18, 13. [Google Scholar] [CrossRef] [Green Version] - Maruyama, T.; Kawachi, T.; Singh, V.P. Entropy-based assessment and clustering of potential water resources availability. J. Hydrol.
**2005**, 309, 104–113. [Google Scholar] [CrossRef] - Tsallis, C. Approach of Complexity in Nature: Entropic Nonuniqueness. Axioms
**2016**, 5, 20. [Google Scholar] [CrossRef] [Green Version] - Chamberlin, R. The Big World of Nanothermodynamics. Entropy
**2014**, 17, 52–73. [Google Scholar] [CrossRef] [Green Version] - Barigye, S.J.; Marrero-Ponce, Y.; Perez-Gimenez, F.; Bonchev, D. Trends in information theory-based chemical structure codification. Mol Divers
**2014**, 18, 673–686. [Google Scholar] [CrossRef] - Eckschlager, K.; Štěpánek, V.; Danzer, K. A review of information theory in analytical chemometrics. J. Chemom.
**1990**, 4, 195–216. [Google Scholar] [CrossRef] - Farhadinia, B. Information measures for hesitant fuzzy sets and interval-valued hesitant fuzzy sets. Inf. Sci.
**2013**, 240, 129–144. [Google Scholar] [CrossRef] - Gültas, M.; Haubrock, M.; Tüysüz, N.; Waack, S. Coupled mutation finder: A new entropy-based method quantifying phylogenetic noise for the detection of compensatory mutations. BMC Bioinform.
**2012**, 13, 225. [Google Scholar] [CrossRef] [Green Version] - Martin, L.C.; Gloor, G.B.; Dunn, S.D.; Wahl, L.M. Using information theory to search for co-evolving residues in proteins. Bioinformatics
**2005**, 21, 4116–4124. [Google Scholar] [CrossRef] [Green Version] - Guido, R.C. A tutorial review on entropy-based handcrafted feature extraction for information fusion. Inf. Fusion
**2018**, 41, 161–175. [Google Scholar] [CrossRef] - Oguz, I.; Cates, J.; Datar, M.; Paniagua, B.; Fletcher, T.; Vachet, C.; Styner, M.; Whitaker, R. Entropy-based particle correspondence for shape populations. Int. J. Comput. Assist. Radiol. Surg.
**2016**, 11, 1221–1232. [Google Scholar] [CrossRef] [PubMed] - Ijadi Maghsoodi, A.; Abouhamzeh, G.; Khalilzadeh, M.; Zavadskas, E.K. Ranking and selecting the best performance appraisal method using the MULTIMOORA approach integrated Shannon’s entropy. Front. Bus. Res. China
**2018**, 12. [Google Scholar] [CrossRef] [Green Version] - Tsallis, C. Economics and Finance: Q-Statistical Stylized Features Galore. Entropy
**2017**, 19, 457. [Google Scholar] [CrossRef] [Green Version] - Wang, Q.; Wu, C.; Sun, Y. Evaluating corporate social responsibility of airlines using entropy weight and grey relation analysis. J. Air Transp. Manag.
**2015**, 42, 55–62. [Google Scholar] [CrossRef] - Shemshadi, A.; Shirazi, H.; Toreihi, M.; Tarokh, M.J. A fuzzy VIKOR method for supplier selection based on entropy measure for objective weighting. Expert Syst. Appl.
**2011**, 38, 12160–12167. [Google Scholar] [CrossRef] - Zhang, Y.; Yang, Z.; Li, W. Analyses of urban ecosystem based on information entropy. Ecol. Model.
**2006**, 197, 1–12. [Google Scholar] [CrossRef] - Delgado, A.; Romero, I. Environmental conflict analysis using an integrated grey clustering and entropy-weight method: A case study of a mining project in Peru. Environ. Model. Softw.
**2016**, 77, 108–121. [Google Scholar] [CrossRef] - Kawachi, T.; Maruyama, T.; Singh, V.P. Rainfall entropy for delineation of water resources zones in Japan. J. Hydrol.
**2001**, 246, 36–44. [Google Scholar] [CrossRef] - Singh, V.P. Entropy Theory and its Application in Environmental and Water Engineering; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2013; p. 642. [Google Scholar]
- Frank, B.; Pompe, B.; Schneider, U.; Hoyer, D. Permutation entropy improves fetal behavioural state classification based on heart rate analysis from biomagnetic recordings in near term fetuses. Med. Biol. Eng. Comput.
**2006**, 44, 179–187. [Google Scholar] [CrossRef] - Güçlü, B. Maximizing the entropy of histogram bar heights to explore neural activity: A simulation study on auditory and tactile fibers. Acta Neurobiol. Exp.
**2005**, 65, 399–407. [Google Scholar] - Hasson, U. The neurobiology of uncertainty: Implications for statistical learning. Philos. Trans. R. Soc. Lond. B Biol. Sci.
**2017**, 372. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Grupp, H. The concept of entropy in scientometrics and innovation research. Scientometrics
**1990**, 18, 219–239. [Google Scholar] [CrossRef] - Zhang, Y.; Qian, Y.; Huang, Y.; Guo, Y.; Zhang, G.; Lu, J. An entropy-based indicator system for measuring the potential of patents in technological innovation: Rejecting moderation. Scientometrics
**2017**, 111, 1925–1946. [Google Scholar] [CrossRef] - Islam, A.; Ahmed, N.; Bodrud-Doza, M.; Chu, R. Characterizing groundwater quality ranks for drinking purposes in Sylhet district, Bangladesh, using entropy method, spatial autocorrelation index, and geostatistics. Environ. Sci. Pollut. Res. Int.
**2017**, 24, 26350–26374. [Google Scholar] [CrossRef] - Gorgij, A.D.; Kisi, O.; Moghaddam, A.A.; Taghipour, A. Groundwater quality ranking for drinking purposes, using the entropy method and the spatial autocorrelation index. Environ. Earth Sci.
**2017**, 76. [Google Scholar] [CrossRef] - Shyu, G.S.; Cheng, B.Y.; Chiang, C.T.; Yao, P.H.; Chang, T.K. Applying factor analysis combined with kriging and information entropy theory for mapping and evaluating the stability of groundwater quality variation in Taiwan. Int. J. Environ. Res. Public Health
**2011**, 8, 1084–1109. [Google Scholar] [CrossRef] - An, Y.; Zou, Z.; Li, R. Water quality assessment in the Harbin reach of the Songhuajiang River (China) based on a fuzzy rough set and an attribute recognition theoretical model. Int. J. Environ. Res. Public Health
**2014**, 11, 3507–3520. [Google Scholar] [CrossRef] [Green Version] - Wu, J.; Li, P.; Qian, H.; Chen, J. On the sensitivity of entropy weight to sample statistics in assessing water quality: Statistical analysis based on large stochastic samples. Environ. Earth Sci.
**2015**, 74, 2185–2195. [Google Scholar] [CrossRef] - Wu, J.; Peiyue, L.; Hui, Q. Groundwater Quality in Jingyuan County, a Semi-Humid Area in Northwest China. E J. Chem.
**2011**, 8. [Google Scholar] [CrossRef] - Praus, P. Principal Component Weighted Index for Wastewater Quality Monitoring. Water
**2019**, 11, 2376. [Google Scholar] [CrossRef] [Green Version] - Hwang, C.L.; Yoon, K. Multiple Attribute Decision Making; Springer: Berlin/Heidelberg, Germany, 1981; p. 269. [Google Scholar]
- Wold, S.; Esbensen, K.; Geladi, P. Principal component analysis. Chemom. Intell. Lab. Syst.
**1987**, 2, 37–52. [Google Scholar] [CrossRef] - Praus, P. Water quality assessment using SVD-based principal component analysis of hydrological data. Water Sa
**2005**, 31, 417–422. [Google Scholar] [CrossRef] [Green Version] - Praus, P. SVD-based principal component analysis of geochemical data. Cent. Eur. J. Chem.
**2005**, 3, 731–741. [Google Scholar] [CrossRef] - Hubert, M.; Debruyne, M. Minimum covariance determinant. Comput. Stat.
**2009**, 2, 8. [Google Scholar] [CrossRef] - Rousseeuw, P.J.; Driessen, K.V. A Fast Algorithm for the Minimum Covariance Determinant Estimator. Technometrics
**1999**, 41, 212–223. [Google Scholar] [CrossRef] - Hubert, M.; Rousseeuw, P.J.; Vanden Branden, K. ROBPCA: A New Approach to Robust Principal Component Analysis. Technometrics
**2005**, 47, 64–79. [Google Scholar] [CrossRef] - Verboven, S.; Hubert, M. LIBRA: A MATLAB library for robust analysis. Chemom. Intell. Lab. Syst.
**2005**, 75, 127–136. [Google Scholar] [CrossRef] - Yeoman, S.; Stephenson, T.; Lester, J.N.; Perry, R. The removal of phosphorus during wastewater treatment. A review. Environ. Pollut.
**1988**, 49, 183–233. [Google Scholar] [CrossRef] - Yu, P.; Liu, S.; Zhang, L.; Li, Q.; Zhou, D. Selecting the minimum data set and quantitative soil quality indexing of alkaline soils under different land uses in northeastern China. Sci. Total Environ.
**2018**, 616, 564–571. [Google Scholar] [CrossRef] - Li, P.; Zhang, T.; Wang, X.; Yu, D. Development of biological soil quality indicator system for subtropical China. Soil Tillage Res.
**2013**, 126, 112–118. [Google Scholar] [CrossRef] - Lotfi, F.H.; Fallahnejad, R. Imprecise Shannon’s Entropy and Multi Attribute Decision Making. Entropy
**2010**, 12, 53–62. [Google Scholar] [CrossRef] [Green Version]

**Figure 2.**Entropy weighted index (EWI) plot of wastewaters during year. The labels identify the samples.

**Figure 3.**Chemical oxygen demand (COD) plot of wastewaters during year. The labels identify the samples.

Param. | Mean | Median | St. Dev. | MAD | Min. | Max. | Skew. | Kurt. |
---|---|---|---|---|---|---|---|---|

NH_{4}^{+} | 35.5 | 36.1 | 10.7 | 8.90 | 5.56 | 68.9 | −0.033 | 3.42 |

BOD | 194 | 191 | 70.6 | 54.9 | 25.7 | 625 | 1.25 | 8.79 |

COD | 387 | 381 | 144 | 113 | 80.1 | 1350 | 2.14 | 14.2 |

Phenol | 0.18 | 0.16 | 0.14 | 0.059 | 0.02 | 1.57 | 4.93 | 40.1 |

PO_{4}^{3−} | 5.22 | 5.13 | 2.63 | 3.32 | 0.526 | 12.1 | 0.044 | 1.97 |

CN^{−} | 0.174 | 0.146 | 0.165 | 0.0786 | 0.016 | 2.03 | 6.40 | 63.2 |

TN | 40.0 | 40.8 | 9.30 | 7.41 | 13.2 | 90 | 0.050 | 5.52 |

TSS | 280 | 256 | 141 | 94.9 | 44 | 1665 | 3.61 | 31.0 |

TP | 6.26 | 6.25 | 2.54 | 1.39 | 1.40 | 34.7 | 5.06 | 52.4 |

pH | 7.74 | 7.75 | 0.188 | 0.178 | 6.89 | 8.22 | −0.623 | 5.21 |

TDS | 706 | 730 | 129 | 91.9 | 276 | 1088 | −0.856 | 4.32 |

Parameter | Entropy | Entropy Weight |
---|---|---|

PO_{4}^{3−} | 4.196 | 0.0282 |

NH_{4}^{+} | 3.999 | 0.0417 |

TDS | 3.801 | 0.0554 |

TN | 3.695 | 0.0628 |

pH | 3.653 | 0.0656 |

BOD | 3.573 | 0.0712 |

COD | 3.456 | 0.0793 |

TSS | 2.647 | 0.135 |

TP | 2.497 | 0.1453 |

Phenol | 2.384 | 0.1531 |

CN^{−} | 2.250 | 0.1624 |

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Praus, P.
Information Entropy for Evaluation of Wastewater Composition. *Water* **2020**, *12*, 1095.
https://doi.org/10.3390/w12041095

**AMA Style**

Praus P.
Information Entropy for Evaluation of Wastewater Composition. *Water*. 2020; 12(4):1095.
https://doi.org/10.3390/w12041095

**Chicago/Turabian Style**

Praus, Petr.
2020. "Information Entropy for Evaluation of Wastewater Composition" *Water* 12, no. 4: 1095.
https://doi.org/10.3390/w12041095