Modelling of Hydrophilic Interaction Liquid Chromatography Stationary Phases Using Chemometric Approaches

Metabolomics is a powerful and widely used approach that aims to screen endogenous small molecules (metabolites) of different families present in biological samples. The large variety of compounds to be determined and their wide diversity of physical and chemical properties have promoted the development of different types of hydrophilic interaction liquid chromatography (HILIC) stationary phases. However, the selection of the most suitable HILIC stationary phase is not straightforward. In this work, four different HILIC stationary phases have been compared to evaluate their potential application for the analysis of a complex mixture of metabolites, a situation similar to that found in non-targeted metabolomics studies. The obtained chromatographic data were analyzed by different chemometric methods to explore the behavior of the considered stationary phases. ANOVA-simultaneous component analysis (ASCA), principal component analysis (PCA) and partial least squares regression (PLS) were used to explore the experimental factors affecting the stationary phase performance, the main similarities and differences among chromatographic conditions used (stationary phase and pH) and the molecular descriptors most useful to understand the behavior of each stationary phase.


Introduction
Over the past decade, metabolomics has received considerable attention from the scientific community. Metabolomics aims to screen endogenous small molecules (metabolites) present in biological samples, providing a direct measure of the phenotypic state of an organism [1][2][3]. There are two main metabolomic strategies: targeted and non-targeted. The targeted approach is focused on the investigation of a specific metabolic pathway and, therefore, in the analysis of a reduced and known set of compounds. In contrast, non-targeted metabolomics aims to screen the entire metabolite content of biological samples containing compounds with different physical and chemical properties [4,5].
Due to its high-resolution power, sensitivity and accuracy of m/z detection, liquid chromatography coupled with high-resolution mass spectrometry (LC-HRMS) has become the analytical platform most used in metabolomics studies. Reversed phased liquid chromatography (RPLC) is useful for the separation of the more hydrophobic compounds, such as lipids. However, RPLC is not recommended for the analysis of some of the most usual metabolite families characterized as being polar and hydrophilic compounds [2,[6][7][8][9]. In the analysis of polar compounds, and in particular in the metabolomics field, hydrophilic interaction liquid chromatography (HILIC) has become a valuable alternative to RPLC, due to its ability to separate these more hydrophilic compounds [10,11]. Different types of HILIC stationary phases, such as amide, amine, mixed-mode diol and zwitterionic, have  Figure 1 shows the obtained retention factors for the 54 metabolites in the 48 chromatographic runs. The visual inspection of the obtained retention factors already allowed the differentiation of the evaluated chromatographic conditions. For instance, the retention factors of all metabolites in BEH amide are shorter than in the rest of the stationary phases. However, direct evaluation of the HILIC stationary phases behavior is not straightforward. For this reason, matrix D was first evaluated using two chemometric exploratory methods, PCA and ASCA, to get a deeper insight into the effects of experimental factors on the retention behavior of metabolites. compounds). Figure 1 shows the obtained retention factors for the 54 metabolites in the 48 chromatographic runs. The visual inspection of the obtained retention factors already allowed the differentiation of the evaluated chromatographic conditions. For instance, the retention factors of all metabolites in BEH amide are shorter than in the rest of the stationary phases. However, direct evaluation of the HILIC stationary phases behavior is not straightforward. For this reason, matrix D was first evaluated using two chemometric exploratory methods, PCA and ASCA, to get a deeper insight into the effects of experimental factors on the retention behavior of metabolites.

Evaluation of HILIC Stationary Phases Behavior
PCA was applied to matrix D in order to explore the behavior of the chromatographic systems studied in this work (four different stationary phases and mobile phase at three different pH conditions and two different ionic strengths). PCA results indicated that stationary phase was the most critical factor to be considered. In Figure 2a, the PCA scores plot shows that samples analyzed with the four stationary phases were differentiated. Moreover, PC1 distinguished BEH amide stationary phase (XBridge TM Amide) samples (cyan triangles in Figure 2a), which appeared on the left side of PC1, from the rest of the samples. Additionally, PC2 distinguished mixed-mode diol stationary phase (Acclaim TM Mixed-Mode HILIC-1) samples (green squares in Figure 2a), with large positive PC2 scores values, from the rest of the samples. Amide (TSK-Gel Amide-80) and zwitterionic (ZIC-HILIC) stationary phases showed a similar behavior since their samples appeared close to each other in the scores plot. Furthermore, samples analyzed with the three different pH values were also clearly distinguished (see Figure S1 in Supplementary Information), whereas, samples analyzed at different ionic strengths (low and high) could not be differentiated by the PCA scores plot. These results were consistent with the results obtained in the previous authors' work about HILIC stationary phases [17].
Statistical significance of the three experimental factors considered in this work (stationary phase, pH and ionic strength) and their interactions were assessed by applying ASCA to matrix D. Results showed that both stationary phase and pH had statistically significant effects (p-value of 0.0001). On the contrary, the ionic strength effects were not significant (p-value of 0.2). Moreover, the interaction of three factors (stationary phase × pH, stationary phase × ionic strength and pH × ionic strength) was also found to be statically significant (p-value of 0.0001). Hence, from the combination of ASCA and PCA results, the two most relevant factors were defined as the stationary phase and the pH of the aqueous solvent. Figure 2b shows the ASCA principal component scores (at the mean level) for each HILIC stationary phase. In this scores plot, some trends could be observed. For example, PC1 distinguished the mixed-mode diol stationary phase with a large negative scores value, whereas the two amides and the zwitterionic stationary phases showed a similar positive PC1 scores value. PC2 also differentiated the BEH amide stationary phase with a negative scores value. In contrast, amide, zwitterionic and mixed-mode diol stationary phases had a similar positive PC2 scores value. Finally, it should be mentioned that, as observed in PCA results, amide and zwitterionic stationary phases had similar PC1 and PC2 scores values. Therefore, these two stationary phases showed a similar behavior.
ASCA loadings were useful to know which variables (metabolite retention factors) were the most important to distinguish the stationary phases. Figure 2c,d show the ASCA loadings plot for PC1 and PC2, respectively. For instance, six amino acids (L-(−)-proline, L-valine, L-methionine, L-tyrosine, L-homocysteine and L-anserine) showed a higher loadings value in PC1 ( Figure 2c). Consequently, these amino acids were useful to distinguish mixed-mode diol stationary phase from the other three (amide, BEH amide and zwitterionic). In the case of PC2 (Figure 2d), pimelic and citric organic acids showed the highest loadings values. Therefore, these two organic acids appeared as important to differentiate BEH amide from the rest of the stationary phases.
In general, PCA and ASCA results coincided showing that the most important factors were the HILIC stationary phase and the pH of the aqueous solvent. In addition, some facts related to the stationary phase behavior can be highlighted. For instance, the zwitterionic stationary phase showed an intermediate behavior between the two amide stationary phases.
The next step in this work was to find relationships between the observed chromatographic retention observed and the physicochemical properties of metabolites using their molecular descriptors. In addition, since PCA and ASCA results showed that the ionic strength of the mobile phase was not a significant factor in metabolomics studies, these new PLS models were only assessed considering chromatographic runs done at low ionic strength.

Exploratory Relationship between Physicochemical Properties and HILIC Chromatographic Retention
PLS models were independently built for the 12 chromatographic studied systems. They were obtained by the combination of the four stationary phases (BEH amide, amide, zwitterionic and mixed-mode diol) and of the three mobile phases at different pH values (acid, moderately acid and neutral). These models were generated using the experimental retention factors of the 54 metabolites contained in the mixture (a y vector for each condition) and their corresponding molecular descriptors (MDs) organized in an X matrix. A preliminary selection step was performed over the whole set of MDs available from PCLIENT to reduce the total number and finally consider 844 of them (see Materials and Methods section for more details).
PLS modelling of the retention factor obtained for each condition using metabolite molecular descriptors required a reduced number of latent variables (between 2 and 3) to explain most of the variance of each y vector (between 80% and 95%). These models did not show an accuracy enough to be used for the prediction of the retention factors of unknown compounds. However, as these PLS models explain a major part of the retention factor variance, the exploration of scores and loading plots can provide additional insight into the HILIC behavior. First, the analysis of scores plots could allow confirming the differentiation between groups of samples. More interestingly, the analysis of loadings plots could provide information regarding the molecular descriptors related to this differentiation, and could give additional information to known the main physicochemical properties involved in the HILIC retention mechanisms. As an example, Figure 3 shows the scores plots obtained for the amide stationary phase at the three studied pH values of the mobile phases. In these plots, some interesting trends can be observed. First, the differentiation between nucleosides and the rest of the metabolites present in the mixture. In the three cases, a clear group with all nucleosides is visible. Amino acids are the metabolite family with the most compounds in the mixture. These amino acids are spread along the first latent variable. However, in the three pH conditions, differentiation

Exploratory Relationship between Physicochemical Properties and HILIC Chromatographic Retention
PLS models were independently built for the 12 chromatographic studied systems. They were obtained by the combination of the four stationary phases (BEH amide, amide, zwitterionic and mixed-mode diol) and of the three mobile phases at different pH values (acid, moderately acid and neutral). These models were generated using the experimental retention factors of the 54 metabolites contained in the mixture (a y vector for each condition) and their corresponding molecular descriptors (MDs) organized in an X matrix. A preliminary selection step was performed over the whole set of MDs available from PCLIENT to reduce the total number and finally consider 844 of them (see Materials and Methods section for more details).
PLS modelling of the retention factor obtained for each condition using metabolite molecular descriptors required a reduced number of latent variables (between 2 and 3) to explain most of the variance of each y vector (between 80% and 95%). These models did not show an accuracy enough to be used for the prediction of the retention factors of unknown compounds. However, as these PLS models explain a major part of the retention factor variance, the exploration of scores and loading plots can provide additional insight into the HILIC behavior. First, the analysis of scores plots could allow confirming the differentiation between groups of samples. More interestingly, the analysis of loadings plots could provide information regarding the molecular descriptors related to this differentiation, and could give additional information to known the main physicochemical properties involved in the HILIC retention mechanisms. As an example, Figure 3 shows the scores plots obtained for the amide stationary phase at the three studied pH values of the mobile phases. In these plots, some interesting trends can be observed. First, the differentiation between nucleosides and the rest of the metabolites present in the mixture. In the three cases, a clear group with all nucleosides is visible. Amino acids are Metabolites 2017, 7, 54 6 of 16 the metabolite family with the most compounds in the mixture. These amino acids are spread along the first latent variable. However, in the three pH conditions, differentiation between two groups can be detected (Figure 3a). An inspection of metabolites forming these two groups allowed observing that the left group was composed of metabolites with a molecular weight lower than 130 Da, whereas the right group was composed of metabolites with a molecular weight larger than 130 Da. Regarding the other families of metabolites present in the mixture (i.e., sugars or organic acids), metabolites were grouped but they were overlapping with amino acids. Similar trends can be observed when considering the other chromatographic conditions.
Metabolites 2017, 7, 54 6 of 17 between two groups can be detected ( Figure 3a). An inspection of metabolites forming these two groups allowed observing that the left group was composed of metabolites with a molecular weight lower than 130 Da, whereas the right group was composed of metabolites with a molecular weight larger than 130 Da. Regarding the other families of metabolites present in the mixture (i.e., sugars or organic acids), metabolites were grouped but they were overlapping with amino acids. Similar trends can be observed when considering the other chromatographic conditions. The evaluation of which MDs allowed obtaining this retention factor modelling is more appealing. The identification of MDs can be performed using the variable importance on projection (VIP) scores obtained for each PLS model as a feature selection tool allowing to identify which variables (MDs) were the most descriptive of every chromatographic system studied in this work. Table 2 shows the twenty most relevant MDs for each PLS model and their VIP scores values. MDs that appeared to be important in more than one chromatographic system are shown in bold letters. The most repeated MDs values present in Table 2 were the 3D-MoRSE values like the molecular representation of structures based on electronic diffraction, Mor02p, Mor10p, Mor15p, Mor08u, Mor19u, Mor24u, Mor31u, Mor01m, Mor06m, Mor10m, Mor 10v, Mor15v, Mor08e, Mor24e and Mor31e. All of them are geometrical descriptors that codify the 3D molecular structure [31][32][33][34]. Some other geometrical descriptors (HTcxp, RTe and RTm) also seemed relevant. Geometrical descriptors are calculated from the coordinates of the molecule atoms, interatomic distances and distances from a specific origin. They are geometrical descriptors of the molecular size, shape, symmetry and atom distribution [33,34]. Moreover, these descriptors are weighted by ionization potential, electronegativity, polarizability and molecular mass [11]. Different topological descriptors (J, Lop, MWC03, MWC04, EEig10r, ESpm01d, ESpm07d, ESpm09d, ESpm13d, ESpm14d and ESpm14r) were also highlighted as relevant for the modelling of the studied chromatographic systems. These topological descriptors are numerical quantifiers of molecular topology that are sensitive to one or more structural properties, such as size, shape, symmetry or branching, and can also include chemical information about atom type and bond multiplicity [34][35][36][37]. Autocorrelation descriptors (GATS2e, GATS2p, GATS2m, GATS2v, MATS4e, MATS4p, MATS4m, MATS4v, MATS7v, RTm and ATS3m) also appeared to be significant in almost all the studied systems. These descriptors encode both molecular structure and physicochemical properties of a molecule (molecular mass, van der Waals volume, electronegativity or polarizability) [34,38,39]. Constitutional descriptors and molecular properties, like the number of double bonds (nDB) and the number of ratable bonds (RBN), the unsaturation index (Ui) and the octanol-water partition coefficient (ALOGP and MLOGP) also were relevant in the obtained PLS models. Finally, MDs related to connectivity had significant VIP values for the studied systems. These MDs are called BCUT (Burden eigenvalue descriptors) descriptors (BELe3) [34,40] and Randic connectivity indexes (VRv2 and VRp2) [34,41]. In this case, the selection of different descriptors related to the Sanderson electronegativity can be mentioned giving a preliminary insight into the interaction mechanism between metabolites and stationary phases. between two groups can be detected (Figure 3a). An inspection of metabolites forming these two groups allowed observing that the left group was composed of metabolites with a molecular weight lower than 130 Da, whereas the right group was composed of metabolites with a molecular weight larger than 130 Da. Regarding the other families of metabolites present in the mixture (i.e., sugars or organic acids), metabolites were grouped but they were overlapping with amino acids. Similar trends can be observed when considering the other chromatographic conditions. The evaluation of which MDs allowed obtaining this retention factor modelling is more appealing. The identification of MDs can be performed using the variable importance on projection (VIP) scores obtained for each PLS model as a feature selection tool allowing to identify which variables (MDs) were the most descriptive of every chromatographic system studied in this work. Table 2 shows the twenty most relevant MDs for each PLS model and their VIP scores values. MDs that appeared to be important in more than one chromatographic system are shown in bold letters. The most repeated MDs values present in Table 2 were the 3D-MoRSE values like the molecular representation of structures based on electronic diffraction, Mor02p, Mor10p, Mor15p, Mor08u, Mor19u, Mor24u, Mor31u, Mor01m, Mor06m, Mor10m, Mor 10v, Mor15v, Mor08e, Mor24e and Mor31e. All of them are geometrical descriptors that codify the 3D molecular structure [31][32][33][34]. Some other geometrical descriptors (HTcxp, RTe and RTm) also seemed relevant. Geometrical descriptors are calculated from the coordinates of the molecule atoms, interatomic distances and distances from a specific origin. They are geometrical descriptors of the molecular size, shape, symmetry and atom distribution [33,34]. Moreover, these descriptors are weighted by ionization potential, electronegativity, polarizability and molecular mass [11]. Different topological descriptors (J, Lop, MWC03, MWC04, EEig10r, ESpm01d, ESpm07d, ESpm09d, ESpm13d, ESpm14d and ESpm14r) were also highlighted as relevant for the modelling of the studied chromatographic systems. These topological descriptors are numerical quantifiers of molecular topology that are sensitive to one or more structural properties, such as size, shape, symmetry or branching, and can also include chemical information about atom type and bond multiplicity [34][35][36][37]. Autocorrelation descriptors (GATS2e, GATS2p, GATS2m, GATS2v, MATS4e, MATS4p, MATS4m, MATS4v, MATS7v, RTm and ATS3m) also appeared to be significant in almost all the studied systems. These descriptors encode both molecular structure and physicochemical properties of a molecule (molecular mass, van der Waals volume, electronegativity or polarizability) [34,38,39]. Constitutional descriptors and molecular properties, like the number of double bonds (nDB) and the number of ratable bonds (RBN), the unsaturation index (Ui) and the octanol-water partition coefficient (ALOGP and MLOGP) also were relevant in the obtained PLS models. Finally, MDs related to connectivity had significant VIP values for the studied systems. These MDs are called BCUT (Burden eigenvalue descriptors) descriptors (BELe3) [34,40] and Randic connectivity indexes (VRv2 and VRp2) [34,41]. In this case, the selection of different descriptors related to the Sanderson electronegativity can be mentioned giving a preliminary insight into the interaction mechanism between metabolites and stationary phases.  The evaluation of which MDs allow appealing. The identification of MDs can b (VIP) scores obtained for each PLS mode variables (MDs) were the most descriptiv Table 2 shows the twenty most relevant M that appeared to be important in more tha The most repeated MDs values present in representation of structures based on ele Mor19u, Mor24u, Mor31u, Mor01m, Mor Mor31e. All of them are geometrical descri other geometrical descriptors (HTcxp, RTe are calculated from the coordinates of the a specific origin. They are geometrical des distribution [33,34]. Moreover, these electronegativity, polarizability and molec MWC03, MWC04, EEig10r, ESpm01d, ESp also highlighted as relevant for the mod topological descriptors are numerical quan more structural properties, such as size, sha information about atom type and bond m GATS2p, GATS2m, GATS2v, MATS4e, M also appeared to be significant in almost molecular structure and physicochemical volume, electronegativity or polarizabilit properties, like the number of double bo unsaturation index (Ui) and the octanol-w relevant in the obtained PLS models. Final for the studied systems. These MDs are c (BELe3) [34,40] and Randic connectivity in of different descriptors related to the S preliminary insight into the interaction me tween two groups can be detected (Figure 3a). An inspection of metabolites forming these two ups allowed observing that the left group was composed of metabolites with a molecular weight er than 130 Da, whereas the right group was composed of metabolites with a molecular weight ger than 130 Da. Regarding the other families of metabolites present in the mixture (i.e., sugars or anic acids), metabolites were grouped but they were overlapping with amino acids. Similar trends be observed when considering the other chromatographic conditions. The evaluation of which MDs allowed obtaining this retention factor modelling is more pealing. The identification of MDs can be performed using the variable importance on projection IP) scores obtained for each PLS model as a feature selection tool allowing to identify which riables (MDs) were the most descriptive of every chromatographic system studied in this work. ble 2 shows the twenty most relevant MDs for each PLS model and their VIP scores values. MDs t appeared to be important in more than one chromatographic system are shown in bold letters. e most repeated MDs values present in Table 2 were the 3D-MoRSE values like the molecular resentation of structures based on electronic diffraction, Mor02p, Mor10p, Mor15p, Mor08u, r19u, Mor24u, Mor31u, Mor01m, Mor06m, Mor10m, Mor 10v, Mor15v, Mor08e, Mor24e and r31e. All of them are geometrical descriptors that codify the 3D molecular structure [31][32][33][34]. Some er geometrical descriptors (HTcxp, RTe and RTm) also seemed relevant. Geometrical descriptors calculated from the coordinates of the molecule atoms, interatomic distances and distances from pecific origin. They are geometrical descriptors of the molecular size, shape, symmetry and atom tribution [33,34]. Moreover, these descriptors are weighted by ionization potential, ctronegativity, polarizability and molecular mass [11]. Different topological descriptors (J, Lop, C03, MWC04, EEig10r, ESpm01d, ESpm07d, ESpm09d, ESpm13d, ESpm14d and ESpm14r) were o highlighted as relevant for the modelling of the studied chromatographic systems. These ological descriptors are numerical quantifiers of molecular topology that are sensitive to one or re structural properties, such as size, shape, symmetry or branching, and can also include chemical ormation about atom type and bond multiplicity [34][35][36][37]. Autocorrelation descriptors (GATS2e, TS2p, GATS2m, GATS2v, MATS4e, MATS4p, MATS4m, MATS4v, MATS7v, RTm and ATS3m) o appeared to be significant in almost all the studied systems. These descriptors encode both lecular structure and physicochemical properties of a molecule (molecular mass, van der Waals lume, electronegativity or polarizability) [34,38,39]. Constitutional descriptors and molecular operties, like the number of double bonds (nDB) and the number of ratable bonds (RBN), the saturation index (Ui) and the octanol-water partition coefficient (ALOGP and MLOGP) also were evant in the obtained PLS models. Finally, MDs related to connectivity had significant VIP values the studied systems. These MDs are called BCUT (Burden eigenvalue descriptors) descriptors Le3) [34,40] and Randic connectivity indexes (VRv2 and VRp2) [34,41]. In this case, the selection different descriptors related to the Sanderson electronegativity can be mentioned giving a eliminary insight into the interaction mechanism between metabolites and stationary phases. en two groups can be detected (Figure 3a). An inspection of metabolites forming these two s allowed observing that the left group was composed of metabolites with a molecular weight than 130 Da, whereas the right group was composed of metabolites with a molecular weight than 130 Da. Regarding the other families of metabolites present in the mixture (i.e., sugars or ic acids), metabolites were grouped but they were overlapping with amino acids. Similar trends observed when considering the other chromatographic conditions. he evaluation of which MDs allowed obtaining this retention factor modelling is more ling. The identification of MDs can be performed using the variable importance on projection scores obtained for each PLS model as a feature selection tool allowing to identify which les (MDs) were the most descriptive of every chromatographic system studied in this work. 2 shows the twenty most relevant MDs for each PLS model and their VIP scores values. MDs ppeared to be important in more than one chromatographic system are shown in bold letters. ost repeated MDs values present in Table 2 were the 3D-MoRSE values like the molecular entation of structures based on electronic diffraction, Mor02p, Mor10p, Mor15p, Mor08u, u, Mor24u, Mor31u, Mor01m, Mor06m, Mor10m, Mor 10v, Mor15v, Mor08e, Mor24e and e. All of them are geometrical descriptors that codify the 3D molecular structure [31][32][33][34]. Some geometrical descriptors (HTcxp, RTe and RTm) also seemed relevant. Geometrical descriptors lculated from the coordinates of the molecule atoms, interatomic distances and distances from ific origin. They are geometrical descriptors of the molecular size, shape, symmetry and atom ution [33,34]. Moreover, these descriptors are weighted by ionization potential, negativity, polarizability and molecular mass [11]. Different topological descriptors (J, Lop, 03, MWC04, EEig10r, ESpm01d, ESpm07d, ESpm09d, ESpm13d, ESpm14d and ESpm14r) were ighlighted as relevant for the modelling of the studied chromatographic systems. These gical descriptors are numerical quantifiers of molecular topology that are sensitive to one or structural properties, such as size, shape, symmetry or branching, and can also include chemical ation about atom type and bond multiplicity [34][35][36][37]. Autocorrelation descriptors (GATS2e, 2p, GATS2m, GATS2v, MATS4e, MATS4p, MATS4m, MATS4v, MATS7v, RTm and ATS3m) ppeared to be significant in almost all the studied systems. These descriptors encode both ular structure and physicochemical properties of a molecule (molecular mass, van der Waals e, electronegativity or polarizability) [34,38,39]. Constitutional descriptors and molecular rties, like the number of double bonds (nDB) and the number of ratable bonds (RBN), the uration index (Ui) and the octanol-water partition coefficient (ALOGP and MLOGP) also were nt in the obtained PLS models. Finally, MDs related to connectivity had significant VIP values e studied systems. These MDs are called BCUT (Burden eigenvalue descriptors) descriptors 3) [34,40] and Randic connectivity indexes (VRv2 and VRp2) [34,41]. In this case, the selection ferent descriptors related to the Sanderson electronegativity can be mentioned giving a inary insight into the interaction mechanism between metabolites and stationary phases. between two groups can be detected (Figure 3a). A groups allowed observing that the left group was co lower than 130 Da, whereas the right group was com larger than 130 Da. Regarding the other families of m organic acids), metabolites were grouped but they we can be observed when considering the other chroma The evaluation of which MDs allowed obtai appealing. The identification of MDs can be perform (VIP) scores obtained for each PLS model as a fea variables (MDs) were the most descriptive of every Table 2 shows the twenty most relevant MDs for eac that appeared to be important in more than one chr The most repeated MDs values present in Table 2 representation of structures based on electronic d Mor19u, Mor24u, Mor31u, Mor01m, Mor06m, Mor Mor31e. All of them are geometrical descriptors that other geometrical descriptors (HTcxp, RTe and RTm are calculated from the coordinates of the molecule a a specific origin. They are geometrical descriptors of distribution [33,34]. Moreover, these descripto electronegativity, polarizability and molecular mass MWC03, MWC04, EEig10r, ESpm01d, ESpm07d, ESp also highlighted as relevant for the modelling of topological descriptors are numerical quantifiers of more structural properties, such as size, shape, symm information about atom type and bond multiplicity GATS2p, GATS2m, GATS2v, MATS4e, MATS4p, M also appeared to be significant in almost all the st molecular structure and physicochemical properties volume, electronegativity or polarizability) [34,38, properties, like the number of double bonds (nDB unsaturation index (Ui) and the octanol-water partit relevant in the obtained PLS models. Finally, MDs re for the studied systems. These MDs are called BCU (BELe3) [34,40] and Randic connectivity indexes (VR of different descriptors related to the Sanderson preliminary insight into the interaction mechanism b ).
The evaluation of which MDs allowed obtaining this retention factor modelling is more appealing. The identification of MDs can be performed using the variable importance on projection (VIP) scores obtained for each PLS model as a feature selection tool allowing to identify which variables (MDs) were the most descriptive of every chromatographic system studied in this work. Table 2 shows the twenty most relevant MDs for each PLS model and their VIP scores values. MDs that appeared to be important in more than one chromatographic system are shown in bold letters. The most repeated MDs values present in Table 2 were the 3D-MoRSE values like the molecular representation of structures based on electronic diffraction, Mor02p, Mor10p, Mor15p, Mor08u, Mor19u, Mor24u, Mor31u, Mor01m, Mor06m, Mor10m, Mor 10v, Mor15v, Mor08e, Mor24e and Mor31e. All of them are geometrical descriptors that codify the 3D molecular structure [31][32][33][34]. Some other geometrical descriptors (HTcxp, RTe and RTm) also seemed relevant. Geometrical descriptors are calculated from the coordinates of the molecule atoms, interatomic distances and distances from a specific origin. They are geometrical descriptors of the molecular size, shape, symmetry and atom distribution [33,34]. Moreover, these descriptors are weighted by ionization potential, electronegativity, polarizability and molecular mass [11]. Different topological descriptors (J, Lop, MWC03, MWC04, EEig10r, ESpm01d, ESpm07d, ESpm09d, ESpm13d, ESpm14d and ESpm14r) were also highlighted as relevant for the modelling of the studied chromatographic systems. These topological descriptors are numerical quantifiers of molecular topology that are sensitive to one or more structural properties, such as size, shape, symmetry or branching, and can also include chemical information about atom type and bond multiplicity [34][35][36][37]. Autocorrelation descriptors (GATS2e, GATS2p, GATS2m, GATS2v, MATS4e, MATS4p, MATS4m, MATS4v, MATS7v, RTm and ATS3m) also appeared to be significant in almost all the studied systems. These descriptors encode both molecular structure and physicochemical properties of a molecule (molecular mass, van der Waals volume, electronegativity or polarizability) [34,38,39]. Constitutional descriptors and molecular properties, like the number of double bonds (nDB) and the number of ratable bonds (RBN), the unsaturation index (Ui) and the octanol-water partition coefficient (ALOGP and MLOGP) also were relevant in the obtained PLS models. Finally, MDs related to connectivity had significant VIP values for the studied systems. These MDs are called BCUT (Burden eigenvalue descriptors) descriptors (BELe3) [34,40] and Randic connectivity indexes (VRv2 and VRp2) [34,41]. In this case, the selection of different descriptors related to the Sanderson electronegativity can be mentioned giving a preliminary insight into the interaction mechanism between metabolites and stationary phases. Note: Bold format is used to highlight those MDs appearing in more than one chromatographic system.
A deep analysis of the identified molecular descriptors and their relationships with the significant experimental factors (stationary phase and pH value) allowed finding some interesting trends. Figure 4a shows a Venn diagram showing the MDs for each stationary phase considering all pH values. In this plot, the different behavior of the diol stationary phase can be observed. Most MDs (38) were unique, and only some of them appeared as relevant to other stationary phases. This difference in the behavior of the mixed-mode diol stationary phase can be explained by the mixed chemistry of the surface with a hydrophobic alkyl chain with a diol group. These dual properties allow the use of this stationary phase for both RP and HILIC separations but, from our results, modelling using a PLS model approach was more difficult. In addition, despite the ionic strength factor not being significant (using ASCA), the mixed-mode diol stationary phase seemed to be more affected than the other stationary phases, which could also be related to worse modelling. When considering amide, BEH amide and zwitterionic stationary phases, more similarities in the identified MDs were observed. In addition, from these results, and in accordance with PCA, the zwitterionic stationary phase seemed to have an intermediate behavior between the two amide stationary phases. Finally, evaluation of the MDs identified for the different stationary phases at different pH values showed that most of MDs were unique for a particular condition. However, more similarities can be observed between the moderately acid and neutral pH values (especially in the case of the diol stationary column, confirming its different behavior). (38) were unique, and only some of them appeared as relevant to other stationary phases. This difference in the behavior of the mixed-mode diol stationary phase can be explained by the mixed chemistry of the surface with a hydrophobic alkyl chain with a diol group. These dual properties allow the use of this stationary phase for both RP and HILIC separations but, from our results, modelling using a PLS model approach was more difficult. In addition, despite the ionic strength factor not being significant (using ASCA), the mixed-mode diol stationary phase seemed to be more affected than the other stationary phases, which could also be related to worse modelling. When considering amide, BEH amide and zwitterionic stationary phases, more similarities in the identified MDs were observed. In addition, from these results, and in accordance with PCA, the zwitterionic stationary phase seemed to have an intermediate behavior between the two amide stationary phases. Finally, evaluation of the MDs identified for the different stationary phases at different pH values showed that most of MDs were unique for a particular condition. However, more similarities can be observed between the moderately acid and neutral pH values (especially in the case of the diol stationary column, confirming its different behavior).
A mixture of 54 metabolites was used to evaluate the HILIC stationary phases behavior. Table 1 shows the metabolites contained in the analyzed mixture. All standards were purchased from Sigma-Aldrich (St. Louis, MO, USA). Nucleosides and amino acids were from two mix solutions provided by Sigma-Aldrich (St. Louis, MO, USA). A summary of the polarity of the analyzed metabolites is shown in Table S1 in the supplementary information.
A mixture of 54 metabolites was used to evaluate the HILIC stationary phases behavior. Table 1 shows the metabolites contained in the analyzed mixture. All standards were purchased from Sigma-Aldrich (St. Louis, MO, USA). Nucleosides and amino acids were from two mix solutions provided by Sigma-Aldrich (St. Louis, MO, USA). A summary of the polarity of the analyzed metabolites is shown in Table S1 in the supplementary information.
Standard stock solutions (1000 µg mL −1 ) of metabolite mixture were prepared by dissolving an appropriate amount of each metabolite in water and stored at −20 • C until their use. Working standard solutions (20 µg mL −1 ) were prepared by diluting the stock solution in ACN:H2O (1:1).

Instrumentation
The metabolite mixture was analyzed using an Acquity UHPLC system (Waters, Milford, MA, USA) for the chromatographic separation, equipped with a quaternary pump, an autosampler and a column oven. The mass spectrometer was a triple quadrupole detector (TQD, Waters, Milford, MA, USA) equipped with an electrospray (ESI) ionization source in negative and positive modes. The mass acquisition range was set to 90-1000 m/z. Four different HILIC stationary phases (BEH amide, amide, zwitterionic and mixed-mode diol) were evaluated (properties summarized in Table 3). The elution gradient was performed using solvent A (acetonitrile) and solvent B (ammonium acetate buffer solution). Chromatographic conditions used for each column are also detailed in Table 3. In order to reproduce the most used chromatographic conditions in metabolomics studies, the experiments were performed using solvent B at three different pH values: acidic (3.0 adjusted with formic acid), moderately acidic (5.5 adjusted with acetic acid) and neutral (7.0 adjusted with ammonia). The pH measurements were performed at 25 • C using an Orion Star A111 pH meter (Thermo Scientific, Waltham, MA, USA), before the additions of the organic solvent. Moreover, two ionic strengths in the aqueous phase were also compared: low (5.0 mM) and high (25 mM).
The metabolite mixture was analyzed with the four stationary phases working with solvent B at the three pH values and the two ionic strengths. Each condition was injected twice giving an experimental design with a total number of 48 chromatographic runs. Figure 5 shows a complete picture of the data analysis strategy from the raw MS data to the chemometric modelling. First, raw chromatographic data files (in .raw format) were converted to the standard CDF format by Databridge function of MassLynx TM v 4.1 software (Waters, Milford, MA, USA). Then, these data files were imported into the MATLAB environment (Release 2015b, The Mathworks Inc., Natick, MA, USA) by using mzcdfread and mzcdf2peak functions of the MATLAB Bioinformatics Toolbox (4.3.1.version). LC-MS data were then arranged and aligned according to their m/z in a data matrix, containing retention times in the rows and selected m/z values in the columns. Here, this data matrix was built up using the previously proposed regions of interest (ROI) strategy [42,43]. The ROI approach selects the most relevant mass traces, which are those m/z values whose intensity signals are higher than a fixed signal-to-noise ratio threshold and appear a number of times consecutively in the time dimension. These mass traces are searched among all the chromatographic and spectral data. The obtained vectors, containing the intensity of the found ROIs at each time point, are reorganized into a matrix grouping ROIs among all the retention times. The final m/z values of each ROI are calculated as the mean of all m/z values obtained for that particular ROI. In this work, the parameters for the implementation of this ROI approach are the signal-to-noise ratio threshold (set at 0.1% of the maximum MS signal intensity), the mass accuracy of the mass spectrometer (set at 0.5 Da/e for the TQD MS analyzer used in this work) and the minimum number of consecutive retention times to be considered as a chromatographic peak (set at 25). More details on how this strategy works are given in previous works [1]. Finally, an ROI matrix was obtained for positive and negative ionization modes for each of the 72 chromatographic runs of the present study.

Retention Factor Determination
Every ROI matrix corresponding to each chromatographic run was then evaluated to automatically find the m/z value of each metabolite and provide their retention time in each chromatogram. Lastly, retention factor (k) of each metabolite in each chromatographic run was calculated using their retention times (t R ) and the dead time (t 0 , theoretically obtained from the dead volume) as follows:  Figure 5. Scheme of data analysis strategy. Figure 5. Scheme of data analysis strategy.

Molecular Descriptors Determination
Canonical SMILES representations for the standard metabolites were retrieved from PubChem [44] and HMDB [45] databases. These SMILES were input into the PCLIENT software to calculate molecular descriptors (MDs). PCLIENT software can calculate more than 3000 MDs that are divided into 25 logical blocks. Here 1376 MDs were calculated including constitutional, topological, geometrical, electrostatic, physical, shape, and quantum chemical descriptors. Details of MDs calculation can be found in the Handbook of Molecular Descriptors [34]. PCLIENT software is available online at http://www.vcclab.org. When 3D atom coordinates were needed for parameter calculation, they were obtained using CORINA software (Molecular Networks GmbH, Nürnberg, Germany).
To reduce the number of MDs descriptors with a percentage of variation (calculated dividing the standard deviation of the MD values by their mean) lower than 20% were excluded. Also, those descriptors not available for all compounds were removed. After this reduction, 844 MDs were obtained for further analysis.

Evaluation of HILIC Chromatographic Performance
The retention factors of the 54 metabolites in each HILIC stationary phase at different chromatographic conditions were used to investigate the behavior of the chromatographic systems studied in this work by explorative chemometric methods.
The behaviors of the chromatographic systems studied in this work were evaluated using the retention factors of the 54 metabolites. The retention factor data matrix D (containing the retention factor of 54 metabolites at the 48 chromatographic runs) was evaluated using diverse chemometrics exploratory methods: principal component analysis (PCA) [18], ANOVA-simultaneous component analysis (ASCA) [19] and partial least squares regression (PLS).
PCA [18] compresses the information of the original variables into a smaller number of uncorrelated variables known as principal components [18]. In this work, PCA was applied to evaluate the relationships between the experimental conditions studied: stationary phase, pH and ionic strength. Therefore, matrix D was analyzed and information about the chromatographic runs and metabolite distribution were obtained in scores and loadings, respectively.
ASCA [19] is a multivariate analysis of variance method that combines the capacity of ANOVA to separate variance sources with the advantages of simultaneous component analysis (SCA, a generalization of PCA for the situation where the same variables have been measured in multiple conditions) [46]. In this work, ASCA was applied to statistically assess the significance of experimental factors in the experimental design: stationary phase, pH and ionic strength. ASCA was performed on a well-balanced experimental design, and 10,000 permutations were used for the permutation test [47]. More details about the ASCA method can be found in the work of Smilde [19] and Jansen [48]. Data were autoscaled prior to applying PCA and centered before applying ASCA.
Finally, PLS regression was used to explore the relationships between obtained retention factors for each chromatographic condition and molecular descriptors (MDs). PLS [49][50][51] is a multivariate linear regression model used to find correlation models between predictor variables (X data matrix) and response values to be predicted (y vector). In this work, PLS is used as a regression analysis method to build a model to link the determined retention factor of metabolites (arranged in a vector y) using their MDs (arranged in matrix X) and to investigate the most influential MDs in the regression. In this work, the optimum number of latent variables for each model was selected using leave-one-out cross-validation.
PLS also provides information about the most relevant variables for achieving the retention factors modelling. For instance, variables importance in projection (VIP) scores can be used for that purpose [51]. According to the common use, variables with a VIP score greater than 1 were important [52]. In this work, these VIP variables corresponded to those MDs that allowed a better description of the retention factor for each considered metabolite. PCA, ASCA, and PLS were performed using PLS Toolbox 8.0 (Eigenvector Research Inc., Wenatchee, WA, USA) working under MATLAB (The Mathworks, Natick, MA, USA).

Conclusions
Results obtained in the assessment of the behavior of HILIC chromatographic stationary phases by means of a variety of chemometric methods showed that the two most important factors to be considered in metabolic studies are the stationary phase and the pH of the aqueous solvent. Moreover, BEH amide and mixed-mode diol stationary phases behaved rather differently compared to amide and zwitterionic phases, which performed similarly. ASCA loadings were useful to know which metabolites were the most important to distinguish the stationary phases. Amino acids appeared to be useful to distinguish the mixed-mode diol stationary phase, while organic acids seemed to distinguished BEH amide from the rest of the stationary phases. In addition, exploratory PLS models allowed linking the retention of metabolites at different chromatographic conditions with molecular descriptors defining their physicochemical properties. Again, a similar behavior was observed for amide and zwitterionic stationary phases whereas the mixed-mode diol stationary phase showed a different performance.
Finally, the obtained PLS models could be considered as a starting point in a more comprehensive work for modelling and prediction of chromatographic retention factors of metabolites in different HILIC stationary phases. However, building up these models requires bigger metabolite datasets with a larger number of compounds for each of the metabolite families. Moreover, efforts should be made in performing a comprehensive external validation of the models. Future work should address these issues.

Supplementary Materials:
The following are available online at www.mdpi.com/2218-1989/7/4/54/s1, Figure S1: PCA scores plot of samples classified according to the pH of the mobile phase. Table S1: Metabolites' logP values obtained from PCLIENT software.