3.1. Analysis of the Shortest C–Hal...Hal–C Distances
The shortest C–Hal...Hal–C distances of various types for crystals of organic compounds are reported in
Table 2. The analysis of the adequacy of such distances is usually not carried out, although the almost twofold reduction in distances compared to 2
RHal apparently indicates errors in the determination of structures. In this work, the following approach was used to assess the adequacy of the shortest distances. All distances of each type were sorted by increase, and if the difference between the previous and next contact was more than 0.1 Å, then the previous contact was classified as unrealistically short. The presence of too short Hal...Hal distances may be associated with general errors in the determination of the structure; therefore, not only these short distances but also all distances in the corresponding substances (records) were excluded from further analysis. It turned out that there was no need to remove substances only from subsets of contacts I...I. In the remaining subsets, the entries PUCREL, FFMXZP, JODVAZ, VAWNUE, MIWHOP01, and XACXEE in the Org set and TUKNOC, KUSMOD, and CUCNUM in the Orgmet set were deleted. The number and range of distances used in further analysis are shown in
Table 3. The results obtained below indicate that the previously used value
RF = 1.40 Å is underestimated; therefore, the sample for C–F...F–C distances was also extended for this type of distance in
Table 3 and has two rows.
It can be seen that, in the corrected arrays, the reduction in the shortest distances compared to 2
RHal (proposed in [
26,
27,
28]) does not exceed ~20%, which seems reasonable.
3.2. Analysis of Distance Distributions
Often, when determining van der Waals radii, only the shortest distances are considered. However, van der Waals radii are most valuable when they allow for estimation of not the shortest but the most probable non-valence distances. In this paper, for each type of distances C–Hal...Hal–C in each of the two main sets Org and Orgmet, two variants of distance arrays were analyzed. The first one included all distances of the corresponding type within
dmax in the selected substances, while the second included the one shortest distance of the corresponding type from each selected substance; thus, the longest distances in the second array also did not exceed
dmax. For these arrays of distances, histograms were constructed, examples of which for the Org set are shown in
Figure 1 and
Figure 2.
The maxima on all histograms were described by Gaussian function (red lines in
Figure 1 and
Figure 2). It should be noted that, when the histogram step changes, the view of histogram changes to some extent (for example,
Figure 1a,b shows the distributions of the F...F distances with steps of 0.2 and 0.1 Å), but the position of the maximum when described by the Gaussian function remains almost constant (see
Table 4 and
Table 5). In
Figure 1a, the position and the very presence of the maximum do not seem obvious, which, as noted earlier, is most likely a consequence of the initially chosen
Ragg value for the F...F distances being underestimated. Therefore, additional arrays with
dmax = 3.50 Å were analyzed for this type of distance.
Table 4 shows that, in this case, as for other types of distances, the distribution parameters depend very little on the histogram step.
Table 6 shows the differences between the positions of the maxima of the Gaussian functions for different options used for histograms.
The results for I…I contacts in the Orgmet set do not quite match the trends for other types of contacts in several cases. Perhaps this outcome is due to the significantly smaller number of I…I contacts in this set, especially for the sample of the shortest contacts.
In the Org set, the differences in the positions of the maxima for histograms with steps of 0.2 and 0.1 Å do not exceed 0.011 Å. The same maximum difference appears in the Orgmet set if the results for contacts I…I are not considered. Thus, the differences in the positions of the maxima between Org and Orgmet can be considered significant if they exceed 0.01 Å.
It turns out that the positions of the maxima for the shortest contacts of all types in Orgmet correspond to significantly shorter distances than in Org: the contraction is in the range of 0.023–0.055 Å for contacts F…F, Cl…Cl, and Br…Br. For the same contacts from the All arrays, a different pattern is observed: for F…F, on average, shorter (by 0.050–0.058 Å) contacts exist in the Org set, Cl…Cl contacts are also shorter in Org but only by 0.013–0.014 Å, and the Br…Br contacts are on average longer (by 0.017–0.022 Å) in Org than in Orgmet.
As expected, the maxima for First contacts correspond to shorter distances than the maxima for All for all types of distances in both sets (Org and Orgmet), while in its meaning, the sum of van der Waals radii should be greater than the maximum values for First and less than the maximum values for All.
The expression (
xall +
xfirst)/4 was used to estimate the value of the van der Waals radii. It turns out (
Table 7) that the values obtained in this way are in excellent (within 0.01 Å) agreement with each other, both for different histogram spacing and for the Org and Orgmet sets. An exception is the discrepancy in the
RI estimates for Org and Orgmet, which, as noted above, may be due to an insufficient number of contacts in the C–I… I–C sample for Orgmet.
Table 8 compares the results of determining the van der Waals radii in this work with some data from the literature.
The values obtained in this work are in good agreement with the data [
30] obtained using a similar technique. At the same time, it is important to note the good agreement between the values for Cl, Br, and I with the results [
26,
27,
28], which were obtained by another method and which were previously used to estimate statistically the values of
Ragg. Thus, the previously obtained data on halogen aggregation involving these atoms remain relevant, while the data on F-aggregation can be revised considering the new value of
RF.
Thus, the obtained results indicate that the following van der Waals radii for halogen atoms bonded to a carbon atom can be recommended as unified values for crystals of organic and organometallic compounds under normal conditions: RF = 1.57, RCl = 1.90, RBr = 1.99, and RI = 2.15 Å. It makes sense to clarify the value of RI in crystals of organometallic compounds when data for a larger number of structures become available.
3.3. The M–Hal…Hal–M Distances
Halogen bonds involving M–Hal groups have been the subject of many studies [
3,
4,
34,
35,
36]. However, as a rule, contacts of halogen atoms from such groups have been considered either with other elements or with halogen atoms that do not form M–Hal bonds. Contacts M–Hal…Hal–M have rarely been noted by researchers [
37]. Therefore, one of the goals of this work was to statistically analyze the M1–Hal1…Hal2–M2 distances. In this case, as in the analysis of the C–Hal1…Hal2–C distances, only the distances between the same atoms (Hal1 = Hal2, M1 = M2) under normal conditions were considered. The rules for selecting substances for the MHal set are described in more detail in
Section 4.
Data on the number of different metal elements (M) with M–Hal bonds, according to the CSD [
38] search, are provided in
Table 9. Detailed information about the number of symmetrically independent bonds of each M–Hal type is given in
Table 10. It can be seen that, among the studied substances, some types of M–Hal bonds are very rare. Therefore, in addition to the number of metal elements that have at least one particular M–Hal bond in all crystals, the numbers of M with more than 20 and more than 40 symmetrically independent M–Hal bonds in total are presented in
Table 9.
Table 10 shows the statistical data characterizing all the M–Hal groups found in CSD (under normal conditions of crystal study):
NM–Hal is the number of symmetrically independent bonds;
NHal…Hal is the number of distances M–Hal…Hal–M ≤
Ragg, as well as the Hal-aggregation coefficient for M–Hal bonds, which is proposed in this work to estimate the propensity of M–Hal groups of a certain type to participate in aggregates formed by M–Hal…Hal–M contacts,
kMHal-agg =
NHal…Hal/
NM–Hal. The higher the
kMHal-agg value is, the more likely it is that the M–Hal group with specific M and Hal forms at least one M–Hal…Hal–M contact with length ≤
Ragg. In this part of the work, when calculating
Ragg = 2
RHal + 0.5 Å, the
RHal values obtained in the previous stage as a result of the analysis of the C–Hal…Hal–C distances (
Section 3.2) were used, namely:
RF = 1.57,
RCl = 1.90,
RBr = 1.99, and
RI = 2.15 Å.
Obviously, the
kMHal-agg coefficient has a predictive value if the number of symmetrically independent M–Hal bonds in the training set is sufficiently large. The smallest numbers of such bonds were found for the M–F groups; therefore, the
NM–Hal boundaries in the analysis of
kMHal-agg were chosen, considering the available
NM–F. The graphs illustrating the changes in
kMHal-agg depending on M and Hal show values for metals having more than 20 symmetrically independent M–Hal bonds in the set (
Figure 3a) and more than 40 bonds (
Figure 3b) (for convenience, the abscissa scales on
Figure 3a,b are made identical; elements M are listed in alphabetical order).
In the set under consideration, when the condition
NM–Hal > 20 is fulfilled (
Figure 3a), the maximum and minimum values of
kMHal-agg correspond to bonds involving fluorine. The largest value of
kMHal-agg (2.17) is for the Sb–F group and the smallest (0) for the Ti–F group; i.e., the groups of Sb–F participate on average in more than two symmetrically independent distances of Sb–F...F–Sb, while the Ti–F groups are not at all inclined to form Ti–F...F–Ti contacts. It should be noted that the number of symmetrically independent groups of M–F in both cases is not too large (35 each), and as their number increases, the values of
kMHal-agg can notably change.
In the subset with
NM–Hal > 40 (
Figure 3b), the largest value of
kMHal-agg (1.17) is for the Ge–Cl group, while the value of Hg–I is close to it (1.09). In general, for the Hg–Hal groups (Hal = Cl, Br, I), the
kMHal-agg values are high (0.80–1.09), while the numbers of such groups in the set is 190 or more (
Table 10); i.e., with high probability, one can expect the presence of substances of distances of Hg–Hal...Hal–Hg with length ≤
Ragg for these Hal.
To estimate the parameters of the distributions of the M–Hal...Hal–M distances, the same approach was used as for the C–Hal...Hal–C distances. It should be noted that the smallest number of C–Hal...Hal–C distances is in the First subset of the Orgmet set for C–I...I–C (140). In the MHal set, there is about the same number of values (124) in the All subset for M–F...F–M distances, and in the First subset for distances of the same type, it is several times less (33). Accordingly, the error in parameters describing these distances can be high, especially in the First subset, as evidenced in particular by the low correlation coefficient for it (
r2 = 0.702). The distances of M–Hal1...Hal2–M (Hal1 = Hal2 = Cl, Br, I) were described using Gaussian functions with two step sizes (0.1 and 0.2 Å). In this case, as in the analysis of the C–Hal...Hal–C distances, the position of the maxima of the functions changed very little; therefore,
Table 11 lists only the parameters of the distributions obtained with a step of 0.2 Å.
Considering that the Hal atoms in the M–Hal groups can coexist with large ligands around M, which hinder the approach of the same groups from neighboring molecules, it is not surprising that the positions of the maximum exceed 2
RHal, even in the First subsets. Somewhat surprisingly, for M–Cl...Cl–M, the
xAll value is almost insignificant (by 0.026 Å) but is less than
xFirst. Formally, for the distances of M–F...F–M, the same effect is observed and is much larger in magnitude; however, it could be due to the small number of these distances in the set, and, accordingly, the inaccuracy of the obtained parameters. For the M–Br...Br–M and M–I...I–M distances, the values of
xm in the All subset are expectedly greater than in the First subset, but the difference of
xall–
xfirst for M–Br...Br–M is notably smaller (0.053 Å) on average than for the C–Hal...Hal–C distances (
Table 6).