#### 2.1. Relative Stability

The association between the substitution patterns of each compound and its relative stability, with respect to its isomers, was assessed. The substitution pattern parameters include the degree of chlorination (using the number of H atoms, or the number of Cl atoms subtracted from 10 for PCBs and 8 for PCDDs and PCDFs) and the intra-ring and cross-ring interactions between two substituents. Intra-ring interactions are quantified by the number of pairs of Cl and non-hydrogen substituents (Cl, O, and C) at

o-,

m- and

p- positions on each phenyl ring. Cross-ring interactions are defined as the number of pairs of Cl substitution over an O-bridge (CR-O) and a C-C bond (CR-C). The possible parameters for intra-ring and cross-ring interactions are unique for each class of compounds, as shown in

Table 1. The total number of

o-,

m- and

p- pairs were also calculated from the tally of substituent pairs mentioned earlier. Overall, as listed in the table, there are a total of 11, 15, and 11 potential parameters for PCDDs, PCDFs, and PCBs, respectively. The following simple linear model was proposed for a relative stability prediction based on HF, B3LYP ∆

G at 300 K, and MP2 electronic energy

S =

x_{0} + 10

^{2}x_{1} + 10

^{4}x_{2} + 10

^{6}x_{3} + 10

^{8}x_{4} + … where

S is the energy score and

x_{0},

x_{1},

x_{2}, … are possible parameters. Coefficients of 10

^{2n} are used as some variables can take up to two-digit values (0 to 12 for example). The association between energy scores and the calculated energy values was assessed using Spearman’s correlation coefficient (ρ). Inspired by the knapsack problem, parameters that give the highest ρ values were added to the model one by one, until the increase in the ρ value became less than 0.0001.

Table 2 shows the resulting parameters, along with the corresponding ρ values. The results from all methodologies gave ρ > 0.99, and were mostly in agreement with each other. The major predictor for all compounds, in addition to the degree of substitution, are the intra-ring interactions. The predictor with the highest priority is the number of substituent pairs at

o- positions, regardless of substituent type, followed by substitutions at the

m- and

p- positions, respectively. This supports the expectations of steric hindrance from the substituents (bulkiness: Cl > O ≈ C > H). Further, a cross-ring interaction over a C-C bond is an important parameter, especially for PCDFs, whereas a cross-ring interaction over an O-bridge is of less priority for PCDDs. With these results, the stability ranks of these compounds, regardless of methodology, can be sufficiently explained by their substituent pair interactions. For example, the energy scores

S of PCDDs-13 and 14, when fitted according to the B3LYP results, can be calculated as:

S_{PCDD-13} = x_{0} + 10^{2}x_{1} + 10^{4}x_{2} + 10^{6}x_{3} + 10^{8}x_{4} + 10^{10}x_{5}

= 1 + 10^{2}(0) + 10^{4}(1) + 10^{6}(2) + 10^{8}(3) + 10^{10}(5)

= 50,302,010,001

S_{PCDD-14} = x_{1} + 10^{2}x_{2} + 10^{4}x_{3}+ 10^{6}x_{4}+ 10^{8}x_{5} + 10^{10}x_{6}

= 2 + 10^{2}(1) + 10^{4}(1) + 10^{6}(1) + 10^{8}(3) + 10^{10}(5)

= 50,301,010,102, respectively.

PCDD-13 ranks higher than PCDD-14 and is predicted to have a higher energy and less stability.

The weights used in this rank modelling are to provide a non-overlapping priority of consideration for each predictor variable. Qualitative interpretation then becomes obvious. Multiple linear regression for the prediction of energy values, following a similar knapsack approach, was also explored and is shown in

Supplementary Table S6. In this case, the priorities of the predictors overlap and the already high ρ values can be increased further. Our computational model is able to predict the relative stabilities of all of these classes of compounds with a stronger correlation than the linear models proposed earlier [

16,

17,

18,

19], while not increasing the numbers of parameters.

#### 2.2. Isomer Distribution

The energetic data in the previous subsection were used to predict the distribution of isomers within each homologue group. For this analysis, energy values were used to evaluate the distribution of isomers using the Boltzmann distribution at 300, 600, and 900 K [

40]. The results were compared to available data on the abundance of these compounds in nature [

37,

40,

41,

42,

43,

44,

45,

46,

47,

48,

49,

50,

51,

52,

53,

54,

55,

56]. A brief review of these data in the literature is shown in

Supplementary Table S7. Our PCDDs and PCDFs results agree with a limited number of reports on experimental isomer distribution from incineration sources, while no clear trends were observed for PCBs (see

Supplementary Table S8. The median ρ values calculated for each homologue group are 0.5833, 0.6555, and −0.1044 for PCDDs, PCDFs, and PCBs, respectively). For all three classes, the distribution of compounds found in nature usually differ due to the different accumulation and decomposition mechanisms in biotic and abiotic sources. For PCBs, it was argued that the distribution of isomers from the source is kinetically controlled rather than thermodynamically controlled [

26].

#### 2.3. Planarity

Planarity is widely claimed to be an important factor for these compounds to bind with the aryl hydrocarbon receptor (AhR) [

57]. Therefore, geometric data were extracted and selected dihedral angles were calculated to represent the coplanarity of two phenyl rings. The designated carbon atoms for dihedral angle calculation are shown in

Figure 1 and the list below.

**PCDD** | **PCDF** | **PCB** |

C_{a},C_{c},C_{d},C_{h} C_{b},C_{d},C_{c},C_{g} C_{a},C_{e},C_{f},C_{h} C_{b},C_{f},C_{e},C_{g} | C_{a},C_{d},C_{f},C_{g} C_{b},C_{c},C_{e},C_{h} | C_{i},C_{k},C_{l},C_{p} C_{j},C_{l},C_{k},C_{o} C_{i},C_{m},C_{n},C_{p} C_{i},C_{n},C_{m},C_{o} | C_{i},C_{l},C_{n},C_{o} C_{j},C_{k},C_{m},C_{p} | C_{q},C_{s},C_{t},C_{u} C_{q},C_{s},C_{t},C_{v} C_{r},C_{s},C_{t},C_{u} C_{r},C_{s},C_{t},C_{v} |

The lists for PCDDs and PCDFs are separated into two groups representing two orthogonal axes of rotation. Absolute values of the acute angle representations of the angles were used for the mean calculation of each group.

Our data show that most PCDDs are planar, and therefore the data are insufficient for further prediction.

For PCDFs, with a limited number of compounds exhibiting non-planarity, there appears to be a moderate trend (

R values ranging from 0.61 to 0.65) between the substitution pattern and coplanarity. The significant parameters (

p < 0.05) involved are the presence or absence of a substituent pair at positions 1 and 9 and at positions 2 and 8. The reasons for this finding may be unclear due to a limited number of data points (

n = 13) for analysis and moderate correlation. The prediction models were derived from multiple linear regression and are shown in

Table S9.

For PCBs, a dihedral angle prediction, based on chlorine substitution position, was also assessed using multiple linear regression. In this model, the dihedral angle

A of a compound is predicted by the equation:

where;

C_{1},

C_{2},

C_{3}, and

C_{4} are the coefficients to be determined; and

x_{i} is 1 if position

i is occupied by a Cl atom, and 0 if otherwise.

Table 3 shows the coefficients of significant parameters (

p < 0.05) and the corresponding correlation coefficients for the predictions. The results from all methodologies show that the most important parameter is Cl substitution at any one of the four positions ortho to the C-C bond connecting two rings, as mentioned in earlier findings [

57]. This is then followed by the same kind of substitution on both rings (product of

x_{2} +

x_{6} and

x_{2′} +

x_{6′}). Meta substitutions according to all methods and para substitutions, according to MP2 results, are less important parameters, as

C_{3} and

C_{4} are relatively small.