QSPR Modelling of Potentiometric HCO 3 − /Cl − Selectivity for Polymeric Membrane Sensors †

: Since the development process of new sensors is long and tedious, it would be very helpful to develop a model that could predict sensor properties based on active compound structure without the actual synthesis and characterization of the corresponding sensors. In this work, the model for the prediction of logK (HCO 3 − /Cl − ) was constructed based on 40 ligand structures suggested in the literature for carbonate sensing. Substructural molecular fragments (SMF) were used to describe the structure of compounds, where fragments were considered as sequences of bonds and atoms. The projection on latent structures (PLS) method was used to calculate the regression model.


Introduction
Polymeric membrane electrodes offer numerous advantages, and their properties can be tuned in a wide range by the modification of membrane composition. However, the process of the selection of an appropriate ligand to construct the sensor for a particular task is time-consuming and requires ligand synthesis, sensor preparation, and characterization. It would be very helpful for researchers to make a model that allows the prediction of sensor properties of an electrode based on the structure of the employed ionophore.
Using the computational chemistry, different characteristics of chemical compounds can be predicted. Quantitative relations between physical or chemical properties of organic compounds and their chemical structures can be set with the QSPR (quantitative structure property relationship). This methodology is widely applied nowadays, e.g., in pharmaceutical investigations; specifically, a search for new drugs [1]. There are QSPR models for various materials, such as nanomaterials [2], catalysts [3], and ionic liquids [4]. Recently, an application of QSPR for predicting the sensor properties of membrane electrodes was suggested [5]. It was possible to relate the structure of the organic ligand with the selectivity constant of the corresponding membrane sensor for Ca 2+ /Mg 2+ .
This study aims to expand this approach to anion-selective sensors, where ligand selection is much more challenging than in the case of metal cations, due to the wide variability of the geometries of inorganic ligands, and their hydrolysis in the case of weak acids.
Among anions, there are ones with important biological and industrial roles. An example of such anions is carbonate. Therefore, it is a prospective object for predicting selectivity to carbonate against chloride anions by means of QSPR.

Dataset
The dataset of 40 samples was composed with literature sources and experimental data. A summary table with compositions of all samples and literature sources can be found 2 of 5 in the supplementary materials. Whereas the number of anionic (especially carbonate) ionophores is small due to the reasons that we have already discussed earlier, there are a few articles about carbonate ionophores. A great part of the data was extracted from IUPAC review "POTENTIOMETRIC SELECTIVITY COEFFICIENTS OF ION-SELECTIVE ELECTRODES", with a summary table of anions existing in 2002 [6]. Considering the significant shortage of carbonate ionophores, we had to add in the table ionophores with Cl − /HCO 3 − selectivity that were converted to HCO 3 − /Cl − selectivity according to the Nikolsky-Eisenman equation: All of the structures of ionophores are available in Table S1 in Supplementary Materials. We also made sure that all these data were obtained in the narrow range of pH (7.0-8.6) for understanding which particular ionic form prevailed in an examined solution.
Due to the small number of carbonate ionophores examined, we had to compile the resulting database with membranes that differ not just in ionophores, but plasticizers as well. We took different plasticizers into account and added their dielectric constant (also known as relative permittivity) as a descriptor for adjusting our model and making it more comprehensive.
The selectivity coefficients of these ionophores varied from −5.8 to 6.2 on the logarithmic scale. The average selectivity logK (HCO 3 − /Cl − ) was −1.425, and the median value was −2.6.

Descriptors
We used substructural molecular fragments (SMF) for encoding molecular structures in a matrix. A molecular structure can be described with this method by dividing a molecule into all possible fragments and writing the number of these fragments into the matrix. These SMFs were obtained by using "ISIDA QSPR" software [7]. There are two approaches for obtaining a molecule's SMF in ISIDA: sequences of atoms and/or bonds (topological path) and selected ("augmented") atom (atom-centered fragments) with its environment that can be atoms, bonds, or both ( Figure 1). In this work, atom and bond sequences were applied. A molecule was represented as a graph and its descriptors were, consequently, subgraphs.
Hence, ISIDA SMF descriptors are numbers of fragments (or subgraphs) in a molecule with each element of the descriptor associated with one of the detected possible fragments. Only the shortest paths from one atom to the other were used. It should be noted that the length of sequences is limited. The minimal and the maximal lengths are 2 and 15, respectively.

Projection on Latent Structures (PLS) Modelling
In order to relate molecular descriptors of ligands with selectivity coefficients of the corresponding sensors, we employed the PLS regression algorithm. PLS regression searches for a set of components are known as latent vectors that perform a synchronous decomposition of X and Y, with the clause that these components explain, as much as possible, the covariance between X and Y. Data matrix size was 40 × 1855, where 40 is the number of samples (ligands) and 1855 is the number of descriptors.

Projection on Latent Structures (PLS) Modelling
In order to relate molecular descriptors of ligands with selectivity coefficients of the corresponding sensors, we employed the PLS regression algorithm. PLS regression searches for a set of components are known as latent vectors that perform a synchronous decomposition of X and Y, with the clause that these components explain, as much as possible, the covariance between X and Y. Data matrix size was 40 × 1855, where 40 is the number of samples (ligands) and 1855 is the number of descriptors.

Results and Discussion
The molecular descriptors obtained for the chosen ligands were calculated with ISIDA QSPR software [7]. The 40 ionophore structures and their properties, specifically substructural molecular fragments (SMF) and the permittivity of membranes, were used as descriptors. The PLS model relating the descriptors with selectivity was evaluated according to the following parameters: root mean square error (RMSE) and squared determination coefficient (R 2 ). The results of QSPR modelling are shown in Figure 2. Each point in the graph corresponds to an item in the database, whereas straight lines represent the resulting models. Blue and red colours correspond to training and test samples, respectively.
It can be seen that the derived model allows for a semi-quantitative estimation of the selectivity coefficients, based on the ligand structure. As far as each compound is an individual molecule, which consists of a variety of molecular fragments, regression coefficients allow evaluating the contribution of each fragment (represented as an independent variable in the matrix) in the selectivity of the sensors. The largest coefficients signify the variables (in this case, SMF, which are the encoded fragments of molecules) with the most important impact. The fragments with the largest contribution in the absolute value of the selectivity logK (HCO3 − /Cl − ) of potentiometric membrane sensors are shown in Figure 3.
As follows from the graph, the fragment С=C-C=C-C-C-F makes the largest negative contribution, and it is part of a longer fragment C-С=C-C=C-C-C-F with a smaller contribution. The shortest fragment with negative contribution C-C=C-C=C is included in the remaining fragments with the largest negative contribution. The fragment with positive

Results and Discussion
The molecular descriptors obtained for the chosen ligands were calculated with ISIDA QSPR software [7]. The 40 ionophore structures and their properties, specifically substructural molecular fragments (SMF) and the permittivity of membranes, were used as descriptors. The PLS model relating the descriptors with selectivity was evaluated according to the following parameters: root mean square error (RMSE) and squared determination coefficient (R 2 ). The results of QSPR modelling are shown in Figure 2. Each point in the graph corresponds to an item in the database, whereas straight lines represent the resulting models. Blue and red colours correspond to training and test samples, respectively.   It can be seen that the derived model allows for a semi-quantitative estimation of the selectivity coefficients, based on the ligand structure. As far as each compound is an individual molecule, which consists of a variety of molecular fragments, regression coefficients allow evaluating the contribution of each fragment (represented as an independent variable in the matrix) in the selectivity of the sensors. The largest coefficients signify the variables (in this case, SMF, which are the encoded fragments of molecules) with the most important impact. The fragments with the largest contribution in the absolute value of the selectivity logK (HCO 3 − /Cl − ) of potentiometric membrane sensors are shown in Figure 3.  As follows from the graph, the fragment C=C-C=C-C-C-F makes the largest negative contribution, and it is part of a longer fragment C-C=C-C=C-C-C-F with a smaller contribution. The shortest fragment with negative contribution C-C=C-C=C is included in the remaining fragments with the largest negative contribution. The fragment with positive contribution C=C-C=C-Hg contains mercury in its composition. These observations provide valuable information for the further design of the ligands with required selectivity.

Conclusions
Despite some problems with anions' ionophores that are described more fully in the introduction, we were able to collect the database that allowed making a QSPR model while satisfying RMSE and R 2 for a relatively small amount of data. We found the fragments with the highest impact in selectivity logK (HCO 3 − /Cl − ) of potentiometric membrane sensors, and we believe that this will help in the future search for the new ionophores. It appears that semi-quantitative prediction of sensor selectivity is possible based on the ligand structure.