A Machine Learning-Based Study of Li + and Na + Metal Complexation with Phosphoryl-Containing Ligands for the Selective Extraction of Li + from Brine

: The growth of technologies concerned with the high demand in lithium (Li) sources dictates the need for technological solutions garnering Li supplies to preserve the sustainability of the processes. The aim of this study was to use a machine learning-based search for phosphoryl-containing podandic ligands, potentially selective for lithium extraction from brine. Based on the experimental data available on the stability constant values of phosphoryl-containing organic ligands with Li + and Na + cations at 4:1 THF:CHCl 3 , candidate di-podandic ligands were proposed, for which the stability constant values (log K ) with Li + and Na + as well as the corresponding selectivity values were evaluated using machine learning methods (ML). The modelling showed a reasonable predictive performance with the following statistical parameters: the determination coefﬁcient R 2 = 0.75, 0.87 and 0.83 and root-mean-square error RMSE = 0.485, 0.449 and 0.32 were obtained for the prediction of the stability constant values with Li + and Na + cations and Li + /Na + selectivity values, respectively. This ML-based analysis was complemented by the preliminary estimation of the host–guest complementarity of metal–ligand 1:1 complexes using the HostDesigner software.


Introduction
Sustainable technologies aimed at resource handling have attracted increasing interest over the last decade. The high electrochemical reactivity and redox potential, low atomic mass and its coefficient of thermal expansion have led to a high demand for lithium (Li) in many technologies [1]. Lithium consumption is responsible for the production of batteries, 48%; ceramics and glass, 26%; lubricating greases, 7%; polymer production, 5%; continuous casting mould flux powders, 4%; air treatment, 2%; and other uses, 8% [2]. Its consumption increases every year with this growth becoming more dramatic in the foreseeable future. The main sources considered are lithium extracted from ore (granite pegmatite-type deposits), salt lake and seawater/geothermal brines or spent batteries recycling. The latter is not yet as economically attractive as its counterparts; however, it has greatest potential among the above-mentioned alternatives due to its sustainability. According to the UK and US geological surveys [2][3][4], Li supplies are mostly contained in ores (mainly spodumene and petalite) and brines, as represented in Figure 1, while it is hard to evaluate the value of lithium extracted from spent batteries [5]. In the latter case, lithium supplies can be roughly evaluated through analysis of the corresponding lithium consumption over the last decade. According to [1], Li consumption for rechargeable batteries increases by an average of 25% annually. Salt lake brines and the seawater/geothermal water resources are estimated to contain 70-80% of the total Li amount [6]. The extraction of Li from these sources is technologically well-established and economically attractive; therefore, it is considered as the most technologically justified method of Li extraction. Among the various methods of Li extraction from brines, as the most widely used Li source nowadays, comprising solvent extraction, ion-sieve adsorption, electrochemical approaches and membrane separation, solvent extraction is one of the most economically and technologically beneficial technologies [7].
One of the most effective classes of extractants of Li + are crown-ethers. The polydentate structure of crown-ethers with its controllable cavity size and additional chelating effects resulting from the ionizable pendant arms which guarantee a high selectivity for Li extraction and Li + isotope separation from Na + and Mg 2+ cations. However, the commercial use of crown-ethers for Li extraction at industrial scale is limited due to economical reasons: the high cost of crown-ethers compared to the cost of Li. The phosphoryl-containing ligands considered in this study are acyclic analogues of crown-ethers and are also characterized by the high propensity to form three-dimensional complexes with metal cations as well as by their selective extraction of metal cations due to ligand self-organization, in particular [8]. The selectivity order of crown-ethers for alkaline metal cations is known for different solvents. In Figure 2, the 14-crown-4 ligand is represented as a ligand selective fr Li + in line with dipodandic phosphoryl-containing compounds characterized by the largest K Li /K Na ratio for 1:1 complexes (in 4:1 THF:CHCl 3 ). The corresponding logarithm of the stability constant value for the 14-crown-4 ligand is known for acetonitrile (chosen as a solvent alternative to THF-CHCl 3 ), established as 5.94 or 7.04 depending on the experimental method used [9]. For the represented phosphoryl-containing ligands the corresponding values are given for a 4:1 THF:CHCl 3 mixture.
The present study attempted a computer-aided search for new phosphoryl-containing dipodandic ligands, characterized by high stability constant values for metal-ligand complexes with Li + in THF-CHCl 3 (4:1) with high Li + /Na + selectivity. These crownethers acyclic analogues have diverse applications as metallo-agents in biomedical applications [10] (e.g., radiopharmaceuticals and chemotherapeutic agents [11,12]), as extractants in separation processes [13][14][15], as compounds ion-selective membranes, and many other areas. The ligands considered here, involved in model development have been previously studied for complex formation with different metal cations in different solvents using various approaches [16][17][18][19][20][21]. Quantitative structure-property relationship (QSPR) methods are a well-known approach to the rational screening of new compounds with desired characteristics. These methods have been successfully applied for the stability constant modelling of diverse metal cations [22][23][24][25][26][27][28][29] and alkali metal cations [16,21,[30][31][32][33][34][35], in particular, with organic ligands in water. In this study, we performed a search for new candidate structures in the constrained family of the dipodandic ligands containing a phosphoryl oxygen group varying the central fragments that contribute to metal binding while the phosphoryl oxygen group remains preserved.
This study consists of two parts: first, the quantitative structure-property modelling aimed to provide a computer-aided rational screening of new selective metal chelators for Li or Na complexation within the studied class of compounds in THF:CHCl 3 (4:1); secondly, this was followed by an analysis of the potential complexes in their conformations with metal cations using the HostDesigner software [36][37][38]. This complemented this statistical insights by evaluating the complex geometry and cation-ligand complementarity.

ExperimentalData
The structures of the mono-, di-and tripodandic phosphoryl-containing ligands forming complexes with Li + and Na + in the 4:1 THF:CHCl 3 solvent are presented in Figure 3a. This main dataset of compounds were involved in the modelling.
Overall, the data include ligands with different central fragments with the same terminal groups. Thus, the length of the ethyleneglycol-containing fragments varied, while the central fragment contained either pyridine, benzene or naphtalene as well as the simple carbon-carbon bridge fragment. The pyridine-and naphtalene-containing ligands were used in the additional external dataset. The distribution of stability constant values for the 1:1 metal-ligand complexes with Li + and Na + is presented in Figure 3b. From the figure, one can see that the stability constants of the complexes are evenly represented with a slight edge of Na + complexes (the grey colour corresponds to the overlapped values). The information on the complexes with stability constants exceeding 6.25 is under-represented.
The final dataset involved in the modelling contained 95 organic ligands. The external dataset, including twelve candidate ligands, is presented in Figure 4. The additional data used as the external dataset, including 16 ligands with known stability constant values, were separated from the initial dataset used in modelling. The corresponding ligands are given in Table 1 with the associated experimental and predicted property values.  A number of dipodandic phosphoryl-containing candidate ligands with a cavity size close to the 14-crown-4 ligand were considered as candidate ligands with an expected Li/Na selectivity ( Figure 4).
It is worth noting that, in general, the structure of the phosphoryl-containing podandic ligands containing fragments showed their high efficiency in synergistic solvent extraction, where one ligand, e.g., β-diketone has a chelating role while the second, e.g., TOPO, is the solvation ligand containing a phosphoryl oxygen group. In this example of synergistic solvent extraction, the alkali metal cations are extracted due to the formation of [M(TTA)(TOPO) 2 ] complexes. The maximum separartion efficiency (maximum separation factor) among the different extraction synergistic systems was found to be equal 104 between the Li and Na (for the thenoyltrifluoroacetone(TTA)-2,9 dimethyl-1,10 phenanthroline (DMP) system) [1,39]. Therefore, this value can be used as the preliminary benchmark for the search for selective ligands.

Methodology
The data description used in this study is based on the molecular descriptors implemented in the Dragon software package [40].
XGBoost [41] software with implemented gradient boosting trees was used for modelling. Although this algorithm possesses state-of-the-art performances in a range of data mining and machine learning tasks, it is a relatively novel approach for chemical and materials science problems.
Suppose the dataset consists of n examples (compounds), and m number of parameters Here, q is the structure of each tree comprising individual leafs, and T is the number of leafs in the tree. The loss function can be represented as follows: where Ω = γT+ λ 1 2 w is a term responsible for the regularization of the model's complexity.
With the regularization parameter set to zero, the algorithm corresponds to original gradient boosting trees. The process of training introduces the parameter f t directly in (2) that minimizes the difference between the observed and target values of the function: The first-and second-order gradient approximations are used in the optimization process. For each considered tree q, the values of the optimal weights and the loss function are evaluated.
The following parameters were used for model development: max depth = 3, learning rate = 0.1, number of estimators = 100, and booster 'gbtree'.
Statistical parameters characterizing the predictive performance of the models include the determination coefficient R 2 and the root-mean-squared error (RMSE), defined as: A ten-fold cross-validation procedure was used: the initial dataset was split into ten pairs of non-overlapped subsets of training and test data in such a way to predict every compound of the initial dataset.
The Shapley explainability [42] analysis was performed to elucidate the contribution of the descriptors. The Shapley value [43] is a method adopted from cooperative game theory and assesses the distribution of gain earned by a team among individual members. In a regression, the general gain corresponds to the predicted property value, whereas the individual members are the descriptors. Thus, the assessed global Shapley value Φ f (i) can be considered as part of the model's accuracy attributable to the individual descriptors, represented as: where p(x,y) is the data distribution (we have used all the data available), f y (x) the model's predicted value, and N the number of features. The local Shapley values assess the model's prediction explainability for features for data point x:

Results and Discussion
The results obtained in this study are presented and discussed in the following order: first, we begin with the results of the QSPR modelling of the stability constant with the corresponding selectivity values for 1:1 metal-ligand complexes of Li + and Na + with phosphoryl-containing podandic-type ligands in a 4:1 THF:CHCl 3 solvent. The corresponding statistical parameters evaluate the model's applicability to the virtual screening of new ligands. Second, using the models obtained, a prediction was performed for the compounds of the external test set. Third, complementary to the statisical results of cheminformatics modelling, the candidate ligands were considered based on the geometric assessment of the metal-ligand complexes, consisting of the two distinct steps evaluating the degree of the pre-organization followed by the host-guest complementarity.

Results of QSPR Modelling and Stability Constant Value Prediction for Selective Li Podandic Ligand Complex Formation
As the initial step of data pre-processing, the feature selection procedure based on the Shapley explainability analysis was performed to select the descriptors with notable impact on the property value. The obtained pools of descriptors were used for the stability constant and selectivity modelling. QSPR modelling of the stability constants was performed for 1:1 metal-ligand complexes of Li + and Na + with phosphoryl-containing podandic-type ligands in a 4:1 THF:CHCl 3 solvent. The modelling stage comprised 100 steps of reshuffling compounds from the modelling dataset followed by splitting the data into non-overlapped pairs of the training and test sets and the model's predictions for each test compound. The statistical parameters of the obtained models are presented in Figure 5a. The determination coefficient R 2 and RMSE values were averaged over 100 models. The value of the determination coefficient  The analysis of descriptor contribution based on the Shapley value estimations are shown in Figure 6. Among the most relevant descriptors, the role of topological indices, such as the sum of topological distances between the P..P and O..O pairs, Moran and centred Broto-Moreau autocorrelation, eigenvalues from the augmented adjacency matrix, and acceptor-lipophilic at a fixed distance were highlighted as well as simple descriptors such as the number of ether fragments, mean atomic polarizabilty, and the fraction of conformational (rotational) bonds. The polarizability-related parameters were found to be more important for modelling the stability constants of organic ligands with Na and the selectivity values. The role of distance between the phosphoryl groups and the van der Waals volume was shown to be very important for Li complexation and Li/Na selectivity. The two-dimensional Petitjean shape index had the second highest contribution in the QSPR models of Li logK values. The latter may be an especially valuable factor when describing the mechanism of the complex formation. Table 1 contains the predicted values for the considered compounds from the external test set (Figures 4 and 7). One can see that for new candidate ligands, several ligands with potentially high Li/Na selectivity have predicted stability constant value with Li of 6, practically reasonable for the complete binding of metal cations. The selectivity for these compounds is shown to be close or exceed 1. One can assume a possible underestimation of the stability constants: similar "small" ligands 17, 20 and 26 in the external test set, for which the experimental values are known, were severely underestimated by the developed models. Ligands 1, 2 and 4 were considered as the most probable effective selective binders for Li. Ligand 4 has clear potential of selectivity improvement by the replacement of NH for an ether group. For ligands 23 and 28 an overestimation is seen. However, in general, almost all of the ligands in the external dataset were predicted with reasonable accuracy.

Evaluation of the Complementarity of the Host Structures of the Candidate Ligands with the Li Cation Guest
The binding properties of the ligands are related to several important conditions: (i) the presence of several binding sites in the host compounds, (ii) sufficient pre-organization of the ligand to form the metal-ligand complex, (iii) a high degree of complementarity between the ligand cavity and the size of the cation. The geometry of the complexes was evaluated via molecular mechanics using two software packages: HostDesigner [36] and Open Babel [44]. The HostDesigner software package has been successfully applied in a number of studies [36][37][38].
The degree of structural complementarity was evaluated as the difference in energy between the bound form of the host ligand and the corresponding binding conformer. The second measure concerns the structural relaxation after the cation removal, where small RMSD values correspond to a high degree of host-guest complementarity. For the considered ligands, the correlation between the changes in steric energy for the complex formation and the predicted stability constant values with Li was not found for ligand 3. The quantitative results of the geometric analysis of the metal-ligand complexes of phosphorylcontaining dipodandic ligands with Li cation are given in Table 2. The calculated energy values allow one to estimate the binding characteristics of the considered ligands [36].

The Novelty of the Results and the Perspectives
This study is the continuation of the series of papers that have used the QSPR-based methodology for predicting stability constant values of metal-ligand complexes for diverse metal cations in different solvents which is complemented by the analysis of the host-guest complementarity of the metal-ligand complexes. This is complemented by the analysis of the host-guest complementarity of the metal-ligand complexes. This methodology exploited the sub-structural molecular fragment (SMF) methodology, allowing a high predictive ability in quantitative structure-property modelling. This study involved an alternative to SMF data description using the Dragon software package, which includes one of the largest collections of molecular descriptors. One family of descriptors, electrotopological indices (E-state indices), have already been benchmarked with sub-structural molecular fragments [29]. Several families of descriptors (e.g., physicochemical) were involved in this study, allowing an alternative to sub-structural molecular fragment-based descriptions of the experimental data on metal-ligand complex formation. This may provide additional insights into the mechanisms of complex formation (e.g., the molecular shape, mean vdW volume). The obtained results show the comparable predictive performance of this technique against SMF-based models. The joint complementary application of statistical and geometry-based approaches can be recommended for further studies.
The limitations of the results obtained in this study include the solvent's effects (we suppose that the dielectric characteristics of the solvent may be used to assess the difference). The models take into account 1:1 metal-ligand complexes, assuming the denticity and structure of the complexes. The network-like mechanism of complex formation as well as the synergistic aggregation [45,46] have not been considered. Another limitation concern the new ligands, for which a logK value of 6 is related to the binding of all metal cations.

Conclusions
In this study, a search for dipodandic ligands containing tetraphenyl for Li extraction from brines was performed. The experimental data comprises stability constant values for 1:1 metal-ligand complexes of phosphoryl-containing podandic ligands with Li and Na cations in THF-CHCl 3 . The modelling procedure involved a gradient boosting trees approach implemented in the XGBoost package, showing a reasonable predictive performance with the following statistical parameters: the determination coefficient R 2 = 0.75, 0.87 and 0.83 and RMSE = 0.485, 0.449 and 0.32 for the predicted stability constants with Li + and Na + and the selectivity values, respectively. The analysis of the impact of the descriptors on the target property values was performed using the Shapley value estimator. Several ligands were considered as candidates for the selective binding of Li. Additionally, a statistical approach was complemented by geometric analysis of the metal-ligand complexes by assessing the complementarity of the host structures with Li cations in the process of complex formation.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.