Molecular Classification of Pesticides Including Persistent Organic Pollutants, Phenylurea and Sulphonylurea Herbicides

Pesticide residues in wine were analyzed by liquid chromatography–tandem mass spectrometry. Retentions are modelled by structure–property relationships. Bioplastic evolution is an evolutionary perspective conjugating effect of acquired characters and evolutionary indeterminacy–morphological determination–natural selection principles; its application to design co-ordination index barely improves correlations. Fractal dimensions and partition coefficient differentiate pesticides. Classification algorithms are based on information entropy and its production. Pesticides allow a structural classification by nonplanarity, and number of O, S, N and Cl atoms and cycles; different behaviours depend on number of cycles. The novelty of the approach is that the structural parameters are related to retentions. Classification algorithms are based on information entropy. When applying procedures to moderate-sized sets, excessive results appear compatible with data suffering a combinatorial explosion. However, equipartition conjecture selects criterion resulting from classification between hierarchical trees. Information entropy permits classifying compounds agreeing with principal component analyses. Periodic classification shows that pesticides in the same group present similar properties; those also in equal period, maximum resemblance. The advantage of the classification is to predict the retentions for molecules not included in the categorization. Classification extends to phenyl/sulphonylureas and the application will be to predict their retentions.


Introduction
Twenty-six billion litres of wine were produced worldwide and 24 billion litres, consumed in 2010 according to the International Organization of Vine and Wine. Wine, especially red wine, is rich in polyphenols (e.g., resveratrol, catechin, epicatechin), which are antioxidants that protect cells from oxidative damage caused by free radicals. Red-wine antioxidants inhibit cancer development, e.g., that of prostate cancer. Red-wine consumption presents heart-health benefits. Application of pesticides (e.g., fungicides, insecticides) to improve grape yields is common. However, pesticides permeate via the plant tissues and remain in harvested grapes/processed products (e.g., grape juice, wine). Because pesticides are a source of toxicants that are harmful to human beings it is important to test for their levels in grapes, juice and wine. Although EU has set maximum residue levels (MRLs) for pesticides in wine grapes of 0.01-10 mg·kg −1 it has not done so for wine. An EU-wine study revealed that 34 out of 40 bottles contained at least one pesticide. Average number was >4 pesticides per bottle while the highest number was 10. Pesticide analysis in red wine is challenging because of the complexity of the matrix that contains alcohol, organic acids, sugars, phenols and pigments, e.g., anthocyanins. Traditional red-wine sample preparation methods include liquid-liquid extraction (LLE) with organic solvents [1,2] and solid-phase extraction (SPE) with reversed-phase C18/polymeric sorbents [3][4][5]. However, LLE is labour-intensive, consumes large amounts of organic solvents and forms emulsions making difficult to separate organic/aqueous phases. In contrast, SPE demands more development. Solid-phase microextraction (SPME) [6,7], hollow-fibre liquid-phase microextraction [8] and stir-bar sorptive extraction (SBSE) [9] are lesser reproducible. Typical detections incorporate gas chromatography (GC), GC coupled to mass spectrometry (MS) (GC-MS) and liquid chromatography coupled to tandem MS (LC-MS-MS).
Quick, easy, cheap, effective, rugged and safe (QuEChERS) is a sample preparation method that was reported for pesticide-residue determination in vegetables/fruits [10]; it was used for pesticide/compound analysis in various food, oil and beverage matrices [11][12][13]; QuEChERS involves pesticide extraction from a sample with high water content into acetonitrile, with addition of salts to separate phases and partition the pesticides into the organic layer, which is followed by dispersive SPE (dSPE) to clean up various matrix co-extractives and achieve mixing of an aliquot of sample extract with sorbents prepacked in a centrifuge tube. Pesticide determination in red wine was reported [14]. Eight pesticides belonging to the insecticide (methamidophos, diazinone, pyrazophos, chlorpyrifos), fungicide (carbendazim, thiabendazole, pyrimethanil, cyprodinil, pyrazophos) and parasiticide (thiabendazole) classes were selected. Their polarities are different. Some are planar (carbendazim, thiabendazole, pyrimethanil, cyprodinil). Cyprodinil was most usually detected on grapes with chlorpyrifos, diazinone and methamidophos, frequent. Carbendazim was detected in three out of six red-wine samples. Occurrence and removal efficiency of pesticides in sewage treatment plants from Spanish, Mediterranean, Brazilian and other rivers were reviewed [15,16] and reported [17,18]. Transport of organic persistent microcontaminants associated with suspended particulate material in

Results and Discussion
For pesticides, LC-MS-MS retention times R t were taken from Wang and Telepchak. Methamidophos was taken as the reference R t (R t°) because of its least R t (cf. Table 1). Internal standard (IS) triphenyl phosphate (TPP) was included in the classification. The (R t -R t°) /R t° ratios were calculated. Molecular fractal dimensions were computed with our program TOPO [37].
Variations of (R t − R t°) /R t° vs. 1-octanol-water partition coefficient and fractal dimension averaged for nonburied atoms minus molecular fractal dimension D'-D show fit. The regression turns out to be:  (2) and AEV decays by 33%. When D' is included in the fit the correlation is bettered: and AEV drops by 51%. The best quadratic model vs. D' improves the fit: and AEV decreases by 56%. If IS TPP is excluded the results are bettered: and AEV decays by 60%. Model (3) is linear and expected to perform better than Equations (4) and (5) for extrapolation. However, the latter are nonlinear and could function better than Equation (3) for intrapolation. Additional fitting parameters were tested: absolute/differential formation enthalpies, molecular dipole moment, organic solvent/water partition coefficients, free energies of solvation and water → organic solvent transfer, molecular volume, surface area, globularity, rugosity, hydrophobic, hydrophilic and total solvent accessible surfaces, and numbers of P and total atoms. However, the results do not improve Equations (3)-(5). Pearson correlation coefficient matrix R was calculated between pairs of vector properties <i 1 , i 2 , i 3 , i 4 , i 5 , i 6 > for nine pesticides. Intercorrelations are illustrated in the partial correlation diagram, which contains high (r ≥ 0.75), medium (0.50 ≤ r < 0.75), low (0.25 ≤ r < 0.50) and zero (r < 0.25) partial autocorrelations. Pairs of molecules with higher partial correlations show similar vector property. However, results should be taken with care, because Entry 9 with constant vector <111111> shows null standard deviation, causing greatest partial correlations r = 1 with any compound, which is an artefact. With the equipartition conjecture the upper triangle of R resulted: Some correlations are high, e.g., R 3,4 = R 3,5 = R 4,5 = 0.984. They are illustrated in the partial correlation diagram, which contains 21 high (cf. Figure 1, red lines), seven medium (orange), one low (yellow) and seven zero (black) partial correlations. Two out of eight high partial correlations of Entry 9 are corrected: its correlation with Entry 2 is medium and its correlation with Entry 1 is zero partial correlation. For instance, pesticide 2 (carbendazim) shows medium partial correlations with molecules 3-9 (0.50 ≤ r < 0.75, orange) and low partial correlation with compound 1 (0.25 ≤ r < 0.50, yellow).
The grouping rule in the case with equal weights a k = 0.5 for b 1 = 0.93 allows the classes: (7,8,9) Five clusters are obtained with the associated entropy h -R -b 1 = 10.70 matching to <i 1 ,i 2 ,i 3 ,i 4 ,i 5 ,i 6 > and C − b 1 [38][39][40]; the binary taxonomy (Table 1) separates the classes 1, 2, 3, 4 and 5 with 1, 1, 3, 1 and 3 pesticides, respectively [41]. The planar molecules 3-5 with low retention are grouped into the same class; nonplanar thiophosphates 7-9 with the greatest retention are aggregated into the same cluster. Substances belonging to the same grouping appear highly correlated in the partial correlation diagram ( Figure 1). However, C -b 1 results should be taken with care because classes (1), (2) and (6) with only one substance could be outliers. At level b 2 with 0.74 ≤ b 2 ≤ 0.76, the set of groupings turns out to be: (3,4,5,6,7,8,9) Three clusters result and entropy decays to h -R -b 2 = 3.71 going with <i 1 ,i 2 ,i 3 ,i 4 ,i 5 ,i 6 > and C − b 2 dividing classes: 1-3 with 1, 1 and 7 pesticides. Again, nonplanar thiophosphates 7-9 with the greatest retention are aggregated into the same class. Compounds in the same cluster appear highly correlated in partial correlation diagram ( Figure 1). Notwithstanding, C -b 2 results should be taken with caution because groupings (1) and (2) with a unique compound could be outliers. Table 2 shows comparative analysis of the set containing 1-9 classes in agreement with partial correlation diagram ( Figure 1).
The illustration of the classification above in a radial tree (cf. Figure 3) shows the different behaviour of the pesticides depending on the number of cyles. The same classes above are recognized, in qualitative agreement with partial correlation diagram and dendrogram (Figures 1 and 2). One more time, planar molecules 3-5 with low retention are grouped into the same class, and nonplanar thiophosphates 6-9 with the greatest retention are aggregated into identical cluster.  (6,7,8,9).

Methamidophos
Program SplitsTree allows examining cluster analysis (CA) data [42]. Based on split decomposition it takes as input a distance matrix and produces as output a graph, which represents relations between taxa. For ideal data the graph is a tree whereas less ideal data will give rise to a tree-like net, which is interpreted as possible evidence for conflicting data. As split decomposition does not attempt to force data on to a tree it can provide a good indication of how tree-like are given data. In the splits graph for nine pesticides (cf. Figure 4), points 4 and 5 are superimposed on 3, and 9 on 8. It reveals conflicting relationship between class 1, and groupings 2 and 3 because of interdependences. It indicates spurious relation resulting from base-composition effects. It shows different pesticides behaviour depending on number of cycles in agreement with partial correlation diagram, binary and radial trees (Figures 1-3).  (6,7,8,9).
In QSPR, the data file contains less than 100 objects and thousands of X-variables. So many X-variables exist that no one can discover by inspection patterns, trends, clusters, etc. in objects. Principal components analysis (PCA) is a technique useful to summarize all information contained in X-matrix and put it understandable [43][44][45][46][47][48]. The PCA works decomposing X-matrix as the product of two smaller matrices P and T. Loading matrix (P) with information about the variables contains few vectors, the principal components (PCs), which are obtained as linear combinations of the original X-variables. In the score matrix (T) with information about objects, every object is described in terms of the projections on to PCs instead of the original variables: X = TP' + E, where ' denotes the transpose matrix. The information not contained in the matrices remains as unexplained X-variance in the residual matrix (E). Every PC i is a new co-ordinate expressed as linear combination of old features x j : PC i = ∑ j b ij x j . The new co-ordinates PC i are named scores/factors while the coefficients b ij are called loadings. The scores are ordered according to their information content with regard to the total variance among all objects. Score-score plots show positions of compounds in the new co-ordinate system while loading-loading plots indicate the locations of the features that represent the compounds in the new co-ordinates. The PCs present two interesting properties. (1) They are extracted in decaying order of importance. First PC F 1 always contains more information than the second F 2 does, F 2 more than the third F 3 , etc. (2) Every PC is orthogonal to one another. There is no correlation between the information contained in different PCs. A PCA was performed for the vector properties. The importance of PCA factors F 1 -F 6 for {i 1 ,i 2 ,i 3 ,i 4 ,i 5 ,i 6 } is collected in Table 3. The use of the first factor F 1 explains 39% of variance (61% error), combined application of two factors F 1/2 accounts for 66% of variance (34% error), utilization of factors F 1-3 justifies 87% of variance (13% error), etc. The PCA factor loadings are shown in Table 4.  In PCA F 2 -F 1 scores plot (cf. Figure 5), points 4 and 5 appear superimposed on 3. It shows different behaviour depending on number of cyles. It distinguishes three clusters: class 1 (two molecules, F 1 < F 2 , left), grouping 2 (three compounds, F 1 << F 2 , top) and cluster 3 (four units, F 1 >> F 2 , right). From PCA factor loadings of pesticides, F 2 -F 1 loadings plot (cf. Figure 6) depicts the six properties. In addition as a complement to the scores plot ( Figure 5) for the loadings (Figure 6), it is confirmed that pesticide 2 located on the left side presents a contribution of cyc 123 situated near the same side of  Correlations are high, e.g., R 4,6 = 0.986. Properties dendrogram (cf. Figure 7) separates cyc 123 and O 0345 from N 13 (class 1), and Cl 3 from NP/S= (cluster 2) in agreement with PCA loadings plot ( Figure 6).
The radial tree for the vector properties (cf. Figure 8) separates the same two classes as PCA loadings plot and dendrogram (Figures 6 and 7). Splits graph for properties (cf. Figure 9) reveals conflicting relation between classes because of interdependences. It is in agreement with PCA loadings plot and binary/radial trees (Figures 6-8).
A PCA was performed for the vector properties. Factor F 1 explains 50% of variance (50% error), factors F 1/2 account for 69% of variance (31% error), factors F 1-3 rationalize 82% of variance (18% error), etc. In PCA F 2 -F 1 scores plot, the same two groupings of properties are distinguished: class 1 {cyc 123 ,O 0345 ,N 13 } (F 1 >> F 2 , cf. Figure 10, right) and grouping 2 {NP,S=,Cl 3 } (F 1 << F 2 , left) in qualitative agreement with PCA loadings plot, binary/radial trees and splits graph (Figures 6-9).    The recommended format of the pesticides periodic table (PT, cf. Table 6) shows that they are classified first by i 1 , i 2 , i 3 , i 4 , i 5 and, finally, by i 6 . Vertical groups are defined by <i 1 ,i 2 ,i 3 ,i 4 ,i 5 > and horizontal periods, by <i 6 >. Periods of eight units are assumed; e.g., group g00101 stands for <i 1 ,i 2 ,i 3 ,i 4 ,i 5 > = <00101>: <001010> (cyc 0 ,O 2 ,NP,S= 0 ,N 1 ,Cl 0 ), etc. Pesticides in the same column appear close in partial correlation diagram, binary/radial trees, splits graph and PCA scores (Figures 1-5). Phenylurea herbicides were determined in tap water/soft drink samples by HPLC-UV [49]. Table 6 includes five phenylureas: metazachlor is similar to carbendazim. Can et al. determined sulphonyl/phenylurea herbicides toxicities [50]. Table 6 includes 27 sulphonyl/phenylurea herbicides: (1) phenylureas are similar to metoxuron, monuron, diuron and linuron; (2) sulphonylureas flazasulphuron, triasulphuron, azimsulphuron and chlorsulphuron go with TPP. High-resolution and ultratrace analyses of pesticides were reported via silica (SiO 2 ) monoliths [51]. Table 6 takes in six new pesticides: (1) metamitron and phenylurea isoproturon match metoxuron; (2) metolachlor goes with carbendazim; (3) carbofuran agrees with thiabendazole. Qualitative LC-MS analysis of pesticides was informed via monolithic SiO 2 capillaries [52]. Table 6 contains two novel pesticides: phenylurea pencycuron tallies metamitron. Analytical standards were provided for persistent organic pollutants (POPs) [53]. Table 6 embraces five POPs: lindane and pentachlorobenzene equal carbetamide Property P variation of vector <i 1 ,i 2 ,i 3 ,i 4 ,i 5 ,i 6 > (cf. Figure 11) is expressed in decimal system, P = 10 5 i 1 + 10 4 i 2 + 10 3 i 3 + 10 2 i 4 + 10i 5 + i 6 , vs. structural parameters {i 1 ,i 2 ,i 3 ,i 4 ,i 5 ,i 6 } for the pesticides. Most points and lines i 3 /i 5 collapse. For instance, for molecule 1 (methamidophos) <001010>, vector property P = 10 5 ·0 + 10 4 ·0 + 10 3 ·1 + 10 2 ·0 + 10·1 + 0 = 1010 where the structural parameters are 0 and 1, and the corresponding points are (i 1 = i 2 = i 4 = i 6 = 0, P = 1010) and (i 3 = i 5 = 1, P = 1010). The results show parameters hierarchy: i 1 > i 2 > i 3 > i 4 > i 5 > i 6 in agreement with PT of properties (Table 6) with vertical groups defined by {i 1 ,i 2 ,i 3 ,i 4 ,i 5 } and horizontal periods described by {i 6 }. The property was not used in PT development and validates it.  Property P change of vector <i 1 ,i 2 ,i 3 ,i 4 ,i 5 ,i 6 > in base 10 (cf. Figure 12) is represented vs. number of group in PT, for pesticides (Table 1 subset of Table 6). It reveals minima corresponding to compounds with <i 1 ,i 2 ,i 3 ,i 4 ,i 5 > ca. <00101> (group g00101) and maxima with <i 1 ,i 2 ,i 3 ,i 4 ,i 5 > ca. <11111> (group g11111). For group 6, period 2 is superimposed on 1. For instance, for group g001010 and period p0, molecule 1 (methamidophos) <001010> lies in the first group in the subset with P = 1010 and the point is (group = 1, P = 1010). Periods p0 and p1 represent rows 1 and 2, respectively, in Table 6. Function P(i 1 ,i 2 ,i 3 ,i 4 ,i 5 ,i 6 ) denotes two periodic waves clearly limited by two maxima, which suggest a periodic behaviour that recalls form of a trigonometric function. For <i 1 ,i 2 ,i 3 ,i 4 ,i 5 ,i 6 >, a maximum is shown. Distance in <i 1 ,i 2 ,i 3 ,i 4 ,i 5 ,i 6 > units between each pair of consecutive maxima is six, which coincides with pesticide sets in successive periods. The maxima occupy analogous positions and are in phase. The representative points in phase should correspond to elements in the same group in PT. For both maxima, <i 1 ,i 2 ,i 3 ,i 4 ,i 5 ,i 6 > some coherence exists between two representations; however, the consistency is not general. Waves comparison shows two differences: period 1 is somewhat step-like and period 2 is incomplete. The most characteristic points are maxima, which lie about group g11111. The values of <i 1 ,i 2 ,i 3 ,i 4 ,i 5 ,i 6 > are repeated as the periodic law (PL) states. An empirical function P(p) reproduces the different <i 1 ,i 2 ,i 3 ,i 4 ,i 5 ,i 6 > values. The minimum of P(p) has meaning only if it is compared with former P(p -1) and later P(p + 1) points needing to fulfill: The order relations (6) should repeat at determined intervals equal to period size and are equivalent to: As relations (7) are valid only for minima, more general ones are desired for all values of p. Differences D(p) = P(p + 1) − P(p) are calculated assigning every value to pesticide p: Instead of D(p), R(p) = P(p + 1)/P(p) is taken, assigning them to pesticide p. If PL were general, elements in the same group in analogous positions in different periodic waves would satisfy: and either R p However, the results show that this is not the case so that PL is not general, existing some anomalies; e.g., D(p) variation vs. group number (cf. Figure 13) presents lack of coherence between <i 1 ,i 2 ,i 3 ,i 4 ,i 5 ,i 6 > Cartesian and PT representations. For instance, for group g001010 and period p0, pesticide 1 (methamidophos) <001010> (group = 1, P = 1010) presents, in the next PT position, molecule 2 (carbendazim) <100010> (g100010, group = 2, P = 100010), D = 100010 − 1010 = 99000 and the point is (group = 1, D = 99000). If consistency were rigorous, all points in each period would have the same sign. In general, a trend exists in points to give D(p) > 0, especially for lower groups. The change of R(p) vs. group number (cf. Figure 14) confirms the lack of constancy between Cartesian and PT charts. For instance, for group g001010 and period p0, pesticide 1 (methamidophos) <001010> (group = 1, P = 1010) shows, in the next PT cell, molecule 2 (carbendazim) <100010> (g100010, group = 2, P = 100010), R = 100010/1010 = 99.0198 and the point is (group = 1, R = 99.0198). If the steadiness were exact, all points in each period would show R(p) either lesser or greater than one. A trend exists to give R(p) > 1, especially for the lower groups.

Experimental
The key problem in classification studies is to define similarity indices when several criteria of comparison are involved. The first step in quantifying similarity concept for pesticides is to list the most important chemical characteristics of molecules. The vector of properties i = <i 1 ,i 2 ,…i k ,…> should be associated with every pesticide i, whose components correspond to different molecular features in a hierarchical order according to their expected importance in retention. If characteristic m-th is chromatographically more significant for retention than k-th then m < k. Components i k are either "1" or "0", according to whether a similar characteristic of rank k is either present or absent in pesticide i compared to a reference. Analysis includes six structural and constitutional characteristics: presence of cycle (cyc 123 ), occurrence of either none or 3-5 O atoms (O 0345 ), nonplanarity (NP), double-bonded S atom (S=), incidence of either one or three N atoms (N 13 ) and existence of three Cl atoms (Cl 3 , cf. Figure 15). It is assumed that the chemical characteristics can be ranked according to their contribution to retention in the following order of decaying importance: cyc 123 > O 0345 > NP > S= > N 13 > Cl 3 . Index i 1 = 1 denotes cyc 123 (i 1 = 0 for cyc 0 ), i 2 = 1 means O 0345 (i 2 = 0 for O 2 ), i 3 = 1 signifies NP, i 4 = 1 indicates S=, i 5 = 1 stands for N 13 (i 5 = 0 for N 0 or N 2 ) and i 6 = 1 represents Cl 3 (i 6 = 0 for Cl 0 ). In chlorpyrifos number of cycles is one, O is three, it is NP and S=, number of N is one and number of Cl atoms is three; obviously its associated vector is <111111>. In this study chlorpyrifos was selected as reference because of its greatest retention. Table 1 contains vectors associated with nine pesticides. Vector <001010> is associated with methamidophos since it shows cyc 0 , O 2 , NP, not S=, N 1 and Cl 0 .
Let us denote by r ij (0 ≤ r ij ≤ 1) similarity index of two pesticides associated with vectors i and j , respectively. Similitude relation is characterized by similarity matrix R = [r ij ]. Similarity index between two pesticides i = <i 1 ,i 2 ,…i k …> and j = <j 1 ,j 2 ,…j k …> is defined as: where 0 ≤ a k ≤ 1 and t k = 1 if i k = j k but t k = 0 if i k ≠ j k . Definition assigns a weight (a k ) k to any property involved in description of molecule i or j.

Classification Algorithm
Grouping algorithm uses the stabilized similarity matrix by applying max-min composition rule o:  [54][55][56][57]. When applying composition rule max-min iteratively so that R(n + 1) = R(n) o R, an integer n exists such that: R(n) = R(n + 1) = … Matrix R(n) is called stabilized similarity matrix. Its importance lies in fact that in classification it generates partition into disjoint classes. Stabilized matrix is designated by R(n) = [r ij (n)]. Grouping rule follows: i and j are assigned to the same class if r ij (n) ≥ b.
Class of i noted   i is set of species j that satisfies rule: r ij (n) ≥ b. Matrix of classes is: where s stands for any index of species belonging to class   i (similarly for t and   j ). Rule (13) means finding largest similarity index between species of two different classes.

Information Entropy
In information theory, information entropy h measures the surprise that source emitting sequences, e.g., cannon-shots, can give [58,59]. Consider use of qualitative spot test to determine the presence of Fe in a water sample. Without any history of testing the analyst must begin by assuming that the two outcomes 0/1 (Fe absent/present) are equiprobable with probabilities 1/2. When up to two metals may be present in sample, e.g., Fe or Ni, four possible outcomes exist, ranging from neither (0,0) to both present (1,1) with equal probabilities 1/4. Which of four possibilities turns up can be determined via two tests each having two observable states. Similarly with three elements eight possibilities exist each with probability of 1/8 = 1/2 3 ; three tests are needed. Pattern relates uncertainty and information needed to resolve it. Number of possibilities is expressed to power of 2. Power to which 2 must be raised to give number of possibilities N is defined as logarithm to base 2 of that number. Information/uncertainty can be defined in terms of logarithm to base 2 of number of possible analytical outcomes: I = H = log 2 N = log 2 1/p = -log 2 p, where I is information contained in answer given that N possibilities existed, H, initial uncertainty resulting from need to consider N possibilities and p, probability of each outcome if all N possibilities are equally likely to occur. The expression is generalized to a situation in which the probability of every outcome is unequal. If one knows from past experience that some elements are more likely to be present than others, the equation is adjusted so that logarithms of individual probabilities suitable weighted are summed: H = -Σ p i log 2 p i , where Σ p i = 1. Consider original example except that now past experience showed that 90% of samples contained no Fe. Degree of uncertainty is calculated using: H = -(0.9 log 2 0.9 + 0.1 log 2 0.1) = 0.469 bits. For a single event occurring with probability p degree of surprise is proportional to −ln p. Generalizing result to random variable X (which can take N possible values x 1 , …, x N with probabilities p 1 , …, p N ) average surprise received on learning X value is: -Σ p i ln p i . Information entropy associated with similarity matrix R is: Denote i.e., entropy is monotone function of grouping level b.

Equipartition Conjecture of Entropy Production
In classification algorithm every hierarchical tree corresponds to entropy dependence on grouping level and diagram h − b is obtained. Tondeur and Kvaalen equipartition conjecture of entropy production is proposed as selection criterion among hierarchical trees. According to conjecture for given charge, dendrogram (binary tree) with best configuration is that in which entropy production is most uniformly distributed. One proceeds by analogy using information instead of thermodynamic one. Equipartition implies linear dependence so that equipartition line results: Since classification is discrete, way of expressing equipartition would be regular staircase function. Best variant is chosen to be that minimizing sum of squares of deviations:

Learning Procedure
Learning procedures were implemented similar to those encountered in stochastic methods [60]. Consider a given partition into classes as good from practical observations, which corresponds to reference similarity matrix S = [s ij ] obtained for equal weights a 1 = a 2 = … = a and for an arbitrary number of fictious properties. Next consider the same set of species as in the good classification and actual properties. Degree of similarity r ij is computed with Equation (11) giving matrix R. Number of properties for R and S differs. Learning procedure consists in finding classification results for R as close as possible to good classification. First weight a 1 is taken constant and only following weights a 2 , a 3 ,… are subjected to random variations. New similarity matrix is obtained using Equation (11) and new weights. Distance between partitions into classes characterized by R and S is: Definition was suggested by that introduced in information theory to measure distance between two probability distributions [61]. In the present case it is distance between matrices R and S. Since for every matrix a corresponding classification exists two classifications will be compared by distance, which is nonnegative quantity that approaches zero as resemblance between R and S increases. The algorithm result is a set of weights allowing classification. The procedure was applied to the synthesis of complex dendrograms using information entropy [62][63][64][65]. Our program MolClas is simple, reliable, efficient and fast procedure for molecular classification, based on equipartition conjecture of entropy production according to Equations (11)- (17); it reads number of properties and molecular properties; it allows optimization of coefficients; it optionally reads starting coefficients and number of iteration cycles. Correlation matrix can be either calculated by program or read from input file. The MolClas calculates property similarity matrix in symmetric storage mode; it applies graphical correlation model for partial correlation diagram; it computes classifications, calculates distances between clusters, computes groupings similarity matrices, works out classifications information entropy, optimizes coefficients, performs single/complete-linkage hierarchical cluster analyses and plots cluster diagrams; it was written not only to analyze the equipartition conjecture of entropy production but also to explore the world of molecular classification.

Conclusions
From the present results and discussion the following conclusions can be drawn: (1) The objective was to develop a structure-property relation for qualitative and quantitative prediction of chromatographic retention times of pesticides. Results of the present work contribute to relation prediction of pesticide residues, in food and environmental samples. Code TOPO allows fractal dimensions, and SCAP, solvation free energies and partition coefficient, which show that for a given atom energies and partitions are sensitive to the presence in the molecule of other atoms and functional groups. Fractal dimensions, partition coefficient, etc. differentiated pesticides. Parameters needed for co-ordination index are molar formation enthalpy, molecular weight and surface area. The morphological and co-ordination indices barely improved equations. Correlation between molecular area and weight points not only to a homogeneous molecular structure of pesticides, but also to the ability to predict and tailor their properties; the latter is nontrivial in environmental toxicology. (2) Several criteria selected to reduce the analysis to a manageable quantity of pesticides, referred to structural and constituent characteristics related to nonplanarity, and the number of rings, and O, double-bonded S, N and Cl atoms. Classification was in agreement with the principal component analyses. Program MolClas is a simple, reliable, efficient and fast procedure for molecular classification based on equipartition conjecture of entropy production. It was written to analyze equipartition conjecture of entropy production and explore molecular-classification world. (3) Periodic law does not satisfy physics-law status: (a) pesticides retentions are not repeated; perhaps chemical character; (b) order relations are repeated with exceptions. Analysis forces statement: Relations that any compound p has with its neighbour, p + 1, are approximately repeated for each period. Periodicity is not general; however, if substance natural order is accepted law must be phenomenological. Retention is not used in periodictable generation and serves to validate it. The analysis of other properties would give an insight into the possible generality of the periodic table. The periodic classification was extended to phenylureas and sulphonylureas.