Quantum Chemical Investigation of Polychlorinated Dibenzodioxins, Dibenzofurans and Biphenyls

All possible polychlorinated aromatic compounds in the classes of dibenzodioxins (PCDDs), dibenzofurans (PCDFs) and biphenyls (PCBs) were studied by quantum chemical methods of HF/6-311++G(d,p), B3LYP/6-311++G(d,p) and MP2/cc-pVTZ. Calculated stabilities and structures of these compounds were compared with available data on their abundance and toxicity. Prediction models for trends in energy and planarity among congeners were proposed. Results discussed here can help contribute to the understanding of the role of dioxin-like compounds (DLCs) in the environment.


Introduction
The Stockholm Convention on Persistent Organic Pollutants (POPs) [1] initially signed in 2001 includes 12 classes of compounds, three of which can be regarded as dioxin-like compounds (DLCs). (See Figure 1.) A subset of DLCs was assigned toxic equivalency factors (TEFs) by the World Health Organization (WHO) [2] to indicate their potential health impacts on human and mammals. These compounds may accumulate and bioconcentrate through the food chain and, as a result, food products in some jurisdictions such as the European Union are strictly monitored for the presence of these compounds [3]. Quantum chemical methods, semi-empirical calculations and other related computational techniques were previously applied to polychlorinated dibenzodioxins (PCDDs), polychlorinated dibenzofurans (PCDFs) and polychlorinated biphenyls (PCBs) . The goals of these work were primarily to detect the compounds and understand their mechanisms of action or degradation in biotic systems. A summary can be found in Supplementary Table S1.
This Communication describes semi-automated codes for generating chemical structures to be used as inputs for quantum chemical calculations in a standard package and to extract information from outputs for further analysis. Three different representative quantum chemical methods were selected for comparison: Hartree-Fock (HF), Density Functional Theory (DFT) and second-order Møller-Plesset perturbation theory (MP2). The lists of 76 PCDDs, 136 PCDFs and 210 PCBs (inclusive of three non-chlorinated base structures) are shown in Supplementary Table S2, Table S3  and Table S4, respectively. The use of numberings herein will be in accordance with those in the lists [37][38][39]. As shown in Supplementary Table S5, congeners can be categorised into homologue groups by their empirical formulas or degrees of substitution.
To the best of our knowledge, in comparison to the existing literature, this is the most comprehensive theoretical study in terms of the number of compounds in the study and the level of theory employed for the investigation. Computational results are presented in terms of relative stability and planarity of compounds. These calculated results are compared to experimental results in the literature in terms of isomer distribution and toxicity. The better understanding of the nature of these compounds can lead us to a better management and more sustainable solutions to POPs in the future.

Relative stability
The association between substitution patterns of each compound and relative stability with respect to its isomers was assessed. Substitution pattern parameters include degree of chlorination (using number of H atoms, or number of Cl atoms subtracted from 10 for PCBs and 8 for PCDDs and PCDFs) and intra-ring and cross-ring interactions between two substituents. Intra-ring interactions are quantified by the numbers of pairs of Cl and non-hydrogen substituents (Cl, O and C) at o-, m-and p-positions on each phenyl ring. Cross-ring interactions are defined as the numbers of pairs of Cl substitution over an O-bridge (CR-O) and a C-C bond (CR-C). Possible parameters for intra-ring and cross-ring interactions are unique for each class of compounds as shown in Table 1. The total numbers of o-, m-and p-pairs were also calculated from the tally of substituent pairs mentioned earlier. Overall, as listed in the table, there are a total of 11, 15 and 11 potential parameters for PCDDs, PCDFs and PCBs, respectively. The following simple linear model was proposed for relative stability prediction based on HF, B3LYP G at 300K and MP2 electronic energy: s = x0 + 10 2 x1 + 10 4 x2 + 10 6 x3 + 10 8 x4 + ... where s is the energy score and x0, x1, x2,.. are possible parameters. Coefficients of 10 2n are used as some variables can take up to two-digit values (e.g. 0 to 12). The association between energy scores and calculated energy values was assessed using Spearman's correlation coefficient (). Inspired by the knapsack problem, parameters that give the highest  values were added to the model one by one until the increase in  value was less than .0001.   (7,8) and (8,9).
Weights used in this rank modelling are to provide a non-overlapping priority of consideration for each predictor variable. Qualitative interpretation then becomes obvious. Multiple linear regression for the prediction of energy values, following similar knapsack approach, was also explored and shown in Supplementary Table S6. In this case, the priorities of predictors overlap and the already high  values can be increased further. Our computational model is able to predict the relative stabilities of all of these classes of compounds with stronger correlation than linear models proposed earlier [16][17][18][19], while not increasing the numbers of parameters.

Isomer distribution
Energetic data in the previous subsection were used to predict the distribution of isomers within each homologue group. For this analysis, energy values were used to evaluate the distribution of isomers using the Boltzmann distribution at 300 K, 600 K and 900 K [40]. Results were compared to available data on abundance of these compounds in nature [37,[40][41][42][43][44][45][46][47][48][49][50][51][52][53][54][55][56]. A brief review of these data in the literature is shown in Supplementary Table S7. Our PCDDs and PCDFs results agree with a limited number of reports on experimental isomer distribution from incineration sources while no clear trends were observed for PCBs. (See Supplementary Table S8. Median  values calculated for each homologue group are 0.5833, 0.6555 and −0.1044 for PCDDs, PCDFs and PCBs respectively.) For all three classes, the distribution of compounds found in nature usually differ due to different accumulation and decomposition mechanisms in biotic and abiotic sources. For PCBs, it was argued that the distribution of isomers from the source is kinetically controlled rather than thermodynamically controlled [26].

Planarity
Planarity is widely claimed to be an important factor for these compounds to bind with the aryl hydrocarbon receptor (AhR) [57]. Therefore, geometric data were extracted and selected dihedral angles were calculated to represent the coplanarity of two phenyl rings. Designated carbon atoms for dihedral angle calculation are shown in Figure 1 and the list below.
The lists for PCDDs and PCDFs are separated into two groups representing two orthogonal axes of rotation. Absolute values of the acute angle representations of the angles were used for mean calculation of each group.
Our data show that most PCDDs are planar, and therefore data are insufficient for further prediction.
For PCDFs, with a limited number of compounds exhibiting non-planarity, there appears to be a moderate trend (R values ranging from .61 to .65) between substitution pattern and coplanarity. The significant parameters (p<.05) involved are the presence/absence of a substituent pair at positions 1 and 9 and at positions 2 and 8. Reasons for this finding may be unclear due to limited number of data points (n=13) for analysis and moderate  values. The prediction models were derived from multiple linear regression and are shown in Table S9.
For PCBs, dihedral angle prediction based on chlorine substitution position was also assessed using multiple linear regression. In this model, dihedral angle A of a compound is predicted by the equation: where C0 is a constant, C1, C2, C3 and C4 are coefficients to be determined and xi is 1 if position i is occupied by a Cl atom and 0 if otherwise. Table 3 shows the coefficients of significant parameters (p<.05) and corresponding correlation coefficients for the predictions. Results from all methodologies show that the most important parameter is Cl substitution at any one of the four positions ortho to the C-C bond connecting two rings as mentioned in earlier findings [57]. It is then followed by the same kind of substitution on both rings (product of x2+x6 and x2'+x6'). Meta substitutions according to all methods and para substitutions according to MP2 results are less important parameters as C3 and C4 are relatively small.

Toxicity
When comparing our geometric data to the toxic equivalency factor (TEF) values of these compounds [2] shown in Supplementary Table S10, the following are observed.
• The twelve-known dioxin-like-PCBs with TEF values are among the ones with relatively small dihedral angles or closer to being planar. • According to B3LYP and MP2 results, there are 3 non-coplanar PCDDs (PCDDs-46, 65 and 71) from both methods and 2 additional (PCDDs-42 and 45) from B3LYP only. All of these are not substituted at all four lateral positions (2,3,7,8), which are the substitution positions of PCDDs known to be toxic [2]. One of these, PCDD-46, is substituted at positions exactly opposite to that of 2,3,7,8-tetrachlorodibenzodioxin (TCDD), the most toxic dioxin-like compound [2].

Discussion, Conclusion and Future work
Overall, this study provides high quality quantum chemical calculations for energetic and geometric properties of PCDDs, PCDFs and PCBs. Comprehensive data provided for all of these three-related classes of compounds can be used for further investigations. From our computational results, we have proposed prediction models for important characteristics of these compounds. Relative stability prediction extends beyond earlier prediction models [16][17][18][19] with high accuracy and with allowance of qualitative interpretations. Prediction models have also been proposed for planarity, an important structural parameter for the toxic activities of these compounds. Furthermore, we have made associations between theoretical calculations (relative stability and planarity) and experimental data in the literature (abundance and toxicity).
Our work can be used for further analyses on the persistence and activities of the toxic compounds in nature. In terms of methodology, semi-automated codes in the SI can be extended further for construction and analysis of other related compounds. All three methods, HF, B3LYP and MP2, yielded similar trends for the classes of compounds in our study. Our results suggest that computationally expensive treatment for correlation energy may not be needed for this purpose and a low-level method like HF, in most cases, can sufficiently provide insightful findings on the molecular properties of the studied classes of compounds. Further investigation of different variations of DFT methods may also be useful. For example, range-separated hybrid functionals such as B97XD can be used for comparison with the results presented here.

Materials and Methods
We extended our exhaustive combinatorial investigation approach [58][59][60] to all members of the three classes of compounds. Becke three-parameter hybrid functional combined with Lee-Yang-Parr correlation functional (B3LYP), the most popular variation of DFT was selected for this work. In total, three methods, HF/6-311++G(d,p), B3LYP/6-311++G(d,p) and MP2/cc-pVTZ [61], were used on Q-Chem 5.1 (developer version) [62] for geometry optimization. The HF-and B3LYP-optimised geometries were confirmed to be minima by verifying the absence of imaginary frequencies. All raw output files and incomplete attempts made with MP2/6-311++G(d,p) are provided in supplementary materials. The process from structure enumeration to analysis of molecular properties, particularly relative energy and planarity were completed by semi-automated scripts written in Mathematica for this project. The Mathematica notebook can be found in the supplementary materials. These Mathematica codes can be used for construction and analysis of other compounds and for teaching purposes in classes relating to molecular modelling and cheminformatics. Our computational results were compared to relative abundance of the compounds in nature and their estimated toxicity in humans and animals. Analyses presented are derived from energetic and geometric data extracted from the computational results.
For energetic information, electronic energy of each compound was obtained from results of all methodologies and, additionally, enthalpy, entropy and Gibbs energy values were obtained from frequency jobs of HF and B3LYP. Due to free rotation about C-C bonds connecting the two phenyl rings in PCBs, 78 potential atropisomers [38] were explored and only the structures with the lower energy values were used for further calculations.
Supplementary Materials: The following are available online at www.mdpi.com/xxx/s1, Table S1: Summary of computational data and analyses on PCDDs, PCDFs and PCBs in the literature, Table S2: List of all PCDDs by substitution positions and numbering, proposed by Ballschmiter et al., revised by Dorofeeva et al., Table S3: List of all PCDFs by substitution positions and numbering, proposed by Ballschmiter et al., Table 4S: List of all PCBs by substitution positions and numbering, first proposed by Ballschmiter et al., later revised by IUPAC, Table 5S: List of congener groups of PCDDs, PCDFs and PCBs by degree of substitution, Table 6S: Values for constant C0 and coefficients C1, C2, C3,…, parameters x1, x2, x3,… and correlation coefficients for the energy prediction model E = C0 + C1x1 + C2x2 + C3x3 + …, where E is the predicted HF, B3LYP G at 300K in Hartree and MP2 electronic energy in Hartree, Table S7: Summary of experimental data on abundance of PCDDs, PCDFs and PCBs in the literature, Table S8: Rank correlation coefficients ( values) for association between predicted distribution (using Boltzmann distribution) of PCDDs and PCDFs based on relative stability and observed distribution from MWI fly ash samples, Table S9: Values for constant C0, coefficients C1, C2 and correlation coefficients for the PCDF coplanarity prediction model, Table S10