Machine Learning for the Identiﬁcation of Hydration Mechanisms of Pharmaceutical-Grade Cellulose Polymers and Their Mixtures with Model Drugs

: Differently bound water molecules conﬁned in hydrated hydroxypropyl cellulose (HPC) type MF and their mixtures (1:1 w / w ) with lowly soluble salicylic acid and highly soluble sodium salicylate were investigated by differential scanning calorimetry (DSC). The obtained ice-melting DSC curves of the HPC/H 2 O samples were deconvoluted into multiple components, using a specially developed curve decomposition tool. The ice-melting enthalpies of the individual deconvoluted components were used to estimate the amounts of water in three states in the HPC matrix: free water (FW), freezing bound water (FBW), and non-freezing water (NFW). A search for an optimal number of Gaussian functions was carried out among all available samples of data and was based on the analysis of the minimum ﬁtting error vs. the number of Gaussians. Finally, three Gaussians accounting for three fractions of water were chosen for further analysis. The results of the calculations are discussed in detail and compared to previously obtained experimental DSC data. AI/ML tools assisted in theory elaboration and indirect validation of the hypothetical mechanism of the interaction of water with the HPC polymer.


Introduction
Machine learning (ML) took over the whole artificial intelligence (AI) world, introducing computers' self-adaptation and autonomous learning capabilities to the various areas of science and technology. These applications are not restricted anymore to predictive modeling but are also exploited in the knowledge processing and discovery. For the latter, data mining is a key example of application of ML based on raw data and used for the explanation of possible mechanisms governing observed phenomena. In pharmaceutical industry, where PAT/QbD [1] approaches are now obligatory, this enhancement of data comprehension and explanation is becoming more and more important for rational drug development processes. PAT is a source of increasing amounts of data based on a constantly growing number of available analytical techniques standardized for pharmaceutical applications. One of such techniques is differential scanning calorimetry (DSC). Calorimetry is a fundamental technique for measuring the thermal properties of materials to establish the relationship between temperature and certain physical properties of a compound. Moreover, it is the only method for the direct study of changes in enthalpy of the examined physical processes. Phase transition analysis in modern drug studies has significantly attracted the attention of many research groups dealing with drug discovery 2 of 15 and development [2][3][4][5]. The study of thermodynamic parameters such as melting (T m ), crystallization (T c ), or glass transition temperature (T g ), enthalpy (∆H), and heat capacity (Cp) is a source of valuable information that can be used in the development of new drugs and/or in the improvement of those already used in therapy [6][7][8]. Due to the relative ease of both qualitative and quantitative analysis [9], the DSC technique has its well-established position and priority over others.
AI/ML tools have also found their application in DSC signal analysis, usually in the predictive approach. An example of such application is the work of Wyttenbach et al. [10], where DSC signals were used directly as molecular descriptors for the prediction of solubility of various compounds. Such modeling approach falls into the category of quantitative structure-property relationship (QSPR) and uses DSC results as input variables for a model to predict a physicochemical property of interest-the latter being solubility in this example.
As DSC provides empirical data concerning the thermal behavior of various systems, we still need to understand the possible physical mechanisms governing that behavior. For this purpose, one can use an empirical approach, whereby, based on certain numerical assumptions, the DSC curve is decomposed into simpler elements, and these elements' behavior, namely, parameters of their structure, is a source of knowledge about the whole system. Such signal processing works under the principle of the expectation-maximization algorithm.
Hydroxypropyl cellulose (HPC) (Figure 1) is a water-soluble polymer with a semicrystalline structure, high degree of amorphous content, and high molecular mobility and plasticity [11]. As other structural derivatives of cellulose, HPC is biocompatible with human tissues [12]. Its physical and chemical properties can be easily modified during synthesis, because the secondary hydroxyl groups in the side chain are available for further etherification with propylene oxides; consequently, the side chains may have more than one propylene substituent. Thus, HPC can be synthesized in various grades, and each grade can correspond to up to six viscosity types. This makes the polymer frequently used in pharmaceutical sciences [13][14][15][16] as a drug carrier for oral drug delivery systems.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 2 of 15 and development [2][3][4][5]. The study of thermodynamic parameters such as melting (Tm), crystallization (Tc), or glass transition temperature (Tg), enthalpy (ΔH), and heat capacity (Cp) is a source of valuable information that can be used in the development of new drugs and/or in the improvement of those already used in therapy [6][7][8]. Due to the relative ease of both qualitative and quantitative analysis [9], the DSC technique has its well-established position and priority over others. AI/ML tools have also found their application in DSC signal analysis, usually in the predictive approach. An example of such application is the work of Wyttenbach et al. [10], where DSC signals were used directly as molecular descriptors for the prediction of solubility of various compounds. Such modeling approach falls into the category of quantitative structure-property relationship (QSPR) and uses DSC results as input variables for a model to predict a physicochemical property of interest-the latter being solubility in this example.
As DSC provides empirical data concerning the thermal behavior of various systems, we still need to understand the possible physical mechanisms governing that behavior. For this purpose, one can use an empirical approach, whereby, based on certain numerical assumptions, the DSC curve is decomposed into simpler elements, and these elements' behavior, namely, parameters of their structure, is a source of knowledge about the whole system. Such signal processing works under the principle of the expectation-maximization algorithm.
Hydroxypropyl cellulose (HPC) ( Figure 1) is a water-soluble polymer with a semicrystalline structure, high degree of amorphous content, and high molecular mobility and plasticity [11]. As other structural derivatives of cellulose, HPC is biocompatible with human tissues [12]. Its physical and chemical properties can be easily modified during synthesis, because the secondary hydroxyl groups in the side chain are available for further etherification with propylene oxides; consequently, the side chains may have more than one propylene substituent. Thus, HPC can be synthesized in various grades, and each grade can correspond to up to six viscosity types. This makes the polymer frequently used in pharmaceutical sciences [13][14][15][16] as a drug carrier for oral drug delivery systems. The interaction of water with polysaccharides changes their internal structure and influences their physical, mechanical, and chemical properties. As for other hydrophilic derivatives (cellulose, chitosan, schizophyllan, hyaluronan, carboxymethyl cellulose) [17], the interactions between HPC and water are related to hydrogen bonding and the presence of "nanocavities" formed in the matrix [18]. Depending on water concentration and the grade of the polymer, new stable and reversible structures can form [19]. The existence of three distinct fractions of water is now widely assumed: free (or bulk) water (FW), freezing bound water (FBW), and non-freezing bound water (NFW) [20,21]. Free water does not significantly differ in its melting and crystallization temperature and enthalpy The interaction of water with polysaccharides changes their internal structure and influences their physical, mechanical, and chemical properties. As for other hydrophilic derivatives (cellulose, chitosan, schizophyllan, hyaluronan, carboxymethyl cellulose) [17], the interactions between HPC and water are related to hydrogen bonding and the presence of "nanocavities" formed in the matrix [18]. Depending on water concentration and the grade of the polymer, new stable and reversible structures can form [19]. The existence of three distinct fractions of water is now widely assumed: free (or bulk) water (FW), freezing bound water (FBW), and non-freezing bound water (NFW) [20,21]. Free water does not significantly differ in its melting and crystallization temperature and enthalpy from normal (bulk) water. The second type, freezing bound water, indicates the water fraction that is less Appl. Sci. 2021, 11, 7751 3 of 15 closely associated with the polymer's chains. It reveals a crystallization phase transition and a melting point below 0 • C that distinguishes it from free water. The FBW fraction is also characterized by supercooling and significantly smaller enthalpies than the FW fraction. While first-order phase transitions of small fractions of water that are strongly associated with the polymer chains cannot be observed calorimetrically, we can describe the NFW fraction of water. There are reports showing that this type of water entrapped in a hydrophilic polymer does not crystallize even when cooled down to −100 • C [18]. Moreover, for a low concentration of water, all water found in the matrix is considered non-freezing [18]. The sum of freezing bound and non-freezing bound water fractions is the bound water content.
Two drugs, having distinct properties with respect to solubility, were used as model drugs. Due to structural similarities and their low molecular weight, salicylic acid (SA) and sodium salicylate (NaSA) were selected. Both drugs are often used as model drugs [22][23][24][25]. The former is poorly soluble in water, while the latter is freely soluble in water. Aromatic anions, such as salicylate anion, are known to improve the solubility of various types of cellulose [26,27]. Nevertheless, these interesting ionic liquids have not been analyzed.
The aim of this work was to use AI/ML tools for the decomposition of DSC curves into elements representing different fractions of water bound to the HPC polymer. AI/ML tools were used for empirical data analysis, without any physical assumptions, and based on the behavior of the water fractions, some hypotheses elucidating possible mechanisms of interaction between water and cellulose polymers were postulated.

DSC Studies
Hydrated hydroxypropyl cellulose (HPC) type MF and its mixtures (1:1 w/w) with lowly soluble salicylic acid (SA) or highly soluble sodium salicylate (NaSA) were investigated by differential scanning calorimetry (DSC). The basic physicochemical properties of HPC type MF as well as of salicylic acid and sodium salicylate were previously presented [20]. The same work also presented sample preparation, DSC instrumentation, software, thermal protocol, and other experimental details. The water concentrations of the hydrated samples under study, expressed as W c , ranged from 0.2 to 5.0 g/g. The DSC curves obtained in that study, in the section related to melting processes, were the subject of deconvolution analysis. Based on the previously reported results [20], the main principles of the current research were set as follows: • the water concentrations of the hydrated samples under study were expressed as W c , although W c is defined as the water fraction m H2O related the to dry mass m dry of raw HPC or an HPC mixture (W c = m H2O /m dry ) • the decomposition of the curves was used for the DSC plots, from the lowest to the highest hydration level W c • only melting peaks were used for decomposition analysis • the amount of NFW water calculated for raw HPC was equal to 0.54 g/g, which is comparable to the amount calculated for HPC/NaSA 0.48 g/g • NFW calculated for HPC/SA was equal to 0.18 g/g, which is almost 1/3 of the NFW amount measured for raw HPC and the HPC/NaSA mixture • for water contents below NFW, all water found in the matrix is considered non-freezing NFW • below the NFW value, the melting peaks for raw HPC and its mixtures are not visible in the DSC measurements; for that reason, the decomposition procedure was performed only for samples with a water concentration W c > NFW, usually, above 0.7-0.8 g/g • the weight of freezing-bound FBW and non-freezing NFW water per g HPC or HPC/mixture is constant • the developed model should consider the contribution of specific nanostructures, the so-called nanocavities, in water holding • the developed model should consider the strong dissociative effect of the Na + cation

Curve Decomposition Tool
The basic concept in the approach for building in-house decomposition tool for DSC curves was the simple definition of the target function as a sum of Gaussian functions resulting in a signal compared with the measured one. The error of such comparison was used for the optimization of parameters of the Gaussian functions, namely, center, spread, and height. In our work, non-symmetric or bi-Gaussian functions (NSG) were applied following the Equation (1): where NSG-non-symmetric Gaussian function a-peak constant b-center constant c 1 and c 2 -spread constants The tool was developed as a script of R statistical environment and published on the sourceforge website [28] under GPL v3 license, which means that it is freely available for both personal and commercial use.
The following external packages were used for building R peak decomposer: nloptr [29], GenSA [30], rgenoud [31], and optimx [32]. These packages were optionally chosen for the whole optimization process. nloptr NLoptr is a package for accessing the NLopt system from the level of R script. Out of many optimization algorithms available in NLopt, the general optimization algorithm, the so-called Controlled Random Search (CRS) with local mutation [33], was chosen. It is a first-choice tool for the global optimization loop of the R peak decomposer. -GenSA GenSA stands for "generalized simulated annealing" [34] and introduces another global optimization algorithm based on the simulated annealing approach with fully automated control over starting and operational parameters, like temperature and its decay rate. This is the second-choice tool for the global optimization part of R peak decomposer. rgenoud Rgenoud is the R Version of GENetic Optimization Using Derivatives. This tool is both a global and a local optimizer using a genetic algorithm for global optimization and subsequently the BFGS method for local fine-tuning of the optimized parameters [35]. optimx The R package optimx is a multi-optimizer using multiple local optimization methods in a simple and elegant interface [36]. This package was implemented with a limited number of optimization methods: BFGS, nlm, Nelder-Mead, and nlminb with follow.on = False setting. These settings impose non-sequential use of the above-mentioned optimization methods within one optimx() run. The final result is chosen as the best achieved solution for the given execution loop, regardless of the actual algorithm being the source of this solution.
The default optimizer for the R peak decomposer is the R optim() function with the BFGS algorithm that is used as the last one in case any optional algorithm was chosen. There is also a global optimization method, SANN-simulated annealing introduced by optim() method, yet there is usually no need to invoke this algorithm due to GenSA's superior performance in most cases.
The R peak decomposition tool works in the looped manner where the number of iterations in the loop is predetermined together with the stopping criterion-in this case, NRMSE = 0.1%, and the loop number was set to 50.
The optional optimizers are used in the cascade mode one after the other, gradually refining the set of Gaussian functions parameters. The user may specify the additional optimization algorithms on his own, yet there is also an automated mode using the following sequence: (1) nloptr, (2) rgenoud, (3) gensa, and (4) optimx. This sequence means that all the algorithms are enabled gradually when the tool prediction error is not meeting a stopping criterion. Firstly, the default optimization algorithm BFGS attempts to solve the problem and works for 10% of the number of loops-here, for 5 iterations. If the stopping criterion is not met, then on the iteration no. 6, the nloptr takes over and passes its solution to BFGS for a final refinement. If after the next 9 iterations (30% of the maximum iterations number) the stopping criterion is still unmet, then rgenoud is enabled, and the system works in the sequence nloptr, rgenoud, and BFGS. Next, if no improvement is achieved, GenSA is added to the top of the sequence at the iteration no. 25 (50% of the maximum iterations number). Last, optimx is added after iteration no. 40 (80% of the maximum iterations number), resulting in the sequence of optimization algorithms: GenSA, nloptr, rgenoud, optimx, and BFGS.
The R peak decomposer reports its results in the form of a text file formatted for an easy import into the spreadsheet. The elements of this report include the raw data of each Gaussian function, parameters of Gaussians, and areas of each Gaussian function.
As the real DSC signal is never ideally positioned in parallel to the temperature axis, a baseline is computed to compensate for this signal distortion. The baseline is expressed in the form of a linear function, whose coefficients may be calculated with two methods: (1) linear regression using signal boundaries at the ends of the temperature axis, (2) non-linear optimization of both coefficients of the linear function carried out jointly with Gaussian functions' coefficients. The latter method was used in this work.
The areas above the curves were computed considering the calculated baselines and used as indicators of the amounts of the respective fractions of water.
In the fitting procedure, the following criteria were kept: • following Occam's razor, the number of components curves should be minimal • since the ice-melting curves were asymmetrical, they were deconvoluted using bi-Gaussian functions • the theoretical curve given by the sum of the individual ones was best fitted to the experimental DSC curve • the goodness-of-fit criterion was normalized root-mean-squared error (NRMSE) calculated according to the formula in Equation (2); an empirical rule of NRMSE < 0.1% was employed as an algorithm stopping criterion.
where: NRMSE-normalized root-mean-squared error RMSE-root-mean-squared error yiMAX-maximum value of the observed output variable yiMIN-minimum value of the observed output variable yPRED-output variable predicted by the model yOBS-observed output variable i-ith index of the output variable n-number of the data points

Search for the Optimum Number of Gaussian Components
The search for an optimal number of Gaussian functions was carried out among all available samples of data and was based on the analysis of the minimum achievable NRMSE errors vs. number of Gaussians ( Figure 2). In the methodology of the R peak decomposition run, an arbitrary NRMSE threshold of 0.1% was applied as a stop criterion for the whole algorithm. In other words, it was decided that 0.1% of NRMSE was a threshold of overfitting of a model. Each non-symmetric Gaussian (NSG) introduces four parameters, namely, center, peak, and two spread constants (Equation (2)). In Figure 2, a common relationship between prediction error (NRMSE) and number of NSG is presented-the more the Gaussians, the more accurate the model is. The minimum achievable NRMSE approximated 0.1% with three Gaussians and dropped below 0.1% for the four Gaussians; thus, three NSG were chosen as the system with best accuracy, yet still preventing overfitting. The optimized number of three NSG accounting for three fractions of water was chosen for further analysis. An exemplary decomposition of the DSC curve into three components is shown in Figure 3.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 6 of 15 yOBS-observed output variable i-ith index of the output variable n-number of the data points

Search for the Optimum Number of Gaussian Components
The search for an optimal number of Gaussian functions was carried out among all available samples of data and was based on the analysis of the minimum achievable NRMSE errors vs. number of Gaussians ( Figure 2). In the methodology of the R peak decomposition run, an arbitrary NRMSE threshold of 0.1% was applied as a stop criterion for the whole algorithm. In other words, it was decided that 0.1% of NRMSE was a threshold of overfitting of a model. Each non-symmetric Gaussian (NSG) introduces four parameters, namely, center, peak, and two spread constants (Equation (2)). In Figure 2, a common relationship between prediction error (NRMSE) and number of NSG is presented-the more the Gaussians, the more accurate the model is. The minimum achievable NRMSE approximated 0.1% with three Gaussians and dropped below 0.1% for the four Gaussians; thus, three NSG were chosen as the system with best accuracy, yet still preventing overfitting. The optimized number of three NSG accounting for three fractions of water was chosen for further analysis. An exemplary decomposition of the DSC curve into three components is shown in Figure 3.

Search for the Optimum Number of Gaussian Components
The search for an optimal number of Gaussian functions was carried out among all available samples of data and was based on the analysis of the minimum achievable NRMSE errors vs. number of Gaussians (Figure 2). In the methodology of the R peak decomposition run, an arbitrary NRMSE threshold of 0.1% was applied as a stop criterion for the whole algorithm. In other words, it was decided that 0.1% of NRMSE was a threshold of overfitting of a model. Each non-symmetric Gaussian (NSG) introduces four parameters, namely, center, peak, and two spread constants (Equation (2)). In Figure 2, a common relationship between prediction error (NRMSE) and number of NSG is presented-the more the Gaussians, the more accurate the model is. The minimum achievable NRMSE approximated 0.1% with three Gaussians and dropped below 0.1% for the four Gaussians; thus, three NSG were chosen as the system with best accuracy, yet still preventing overfitting. The optimized number of three NSG accounting for three fractions of water was chosen for further analysis. An exemplary decomposition of the DSC curve into three components is shown in Figure 3.

Three-Component Analysis
As we explained in the introduction, one of the keys to the identification of the appropriate fraction of water is its melting temperature. For this purpose, we assigned each component a corresponding temperature in this process. Figure 4 shows an example, where determined raw HPC components were grouped in an order determined by the temperature-from the highest to the lowest: first series-the highest temperature-free water fraction FW, second series, fraction of freezing bound water FBW, and third serieslast in order, the lowest temperature-NFW fraction. Such assignment allows observing how water fractions are distributed in the polymer matrix or its mixture with drugs.

Three-Component Analysis
As we explained in the introduction, one of the keys to the identification of the appropriate fraction of water is its melting temperature. For this purpose, we assigned each component a corresponding temperature in this process. Figure 4 shows an example, where determined raw HPC components were grouped in an order determined by the temperature-from the highest to the lowest: first series-the highest temperature-free water fraction FW, second series, fraction of freezing bound water FBW, and third serieslast in order, the lowest temperature-NFW fraction. Such assignment allows observing how water fractions are distributed in the polymer matrix or its mixture with drugs. One can see that the Tmax temperatures of the FW fraction (blue circles) were mostly positive, while the non-freezing water pool (red empty circles) was characterized by negative temperatures. This is due to the fact that the NFW is defined only by water molecules strongly bound by hydrogen bonds and/or embedded in nanocavities of polymer chains. What are nanocavities? FW, FBW, and NFW coexisting in the polymer system form complexes consisting of polymer chains, ice (frozen water), unfrozen water, and air, represented by empty spaces. Both ice and unfrozen water can be trapped in the cavities and/or pores of the matrix. When the dimensions of such a spatial formation do not exceed several Angstroms, the crystallization process is difficult or even impossible. Liu et al. [37] called such hollow spaces ''nanocavities'' and proved that they are able to form the NFW fraction of water. In this way, they concluded that hydrogen bonding is not the only one factor influencing water crystallization and suggested that it is just one of the possible physical states of hydrated systems. As the NFW pool increases, the interactions between successive layers of water are still strong enough, but they weaken as the water concentration increases. Thus, the temperature Tmax of the water fraction is initially negative and then, when the distances between the binding sites decrease, it becomes positive. In this way, the next fraction of water appears in the matrix. The FBW fraction's temperatures are negative at low water concentrations (up to around 1.5 g/g), but positive at higher concentrations (>1.5 g/g).
In the next section of this paper, we show the dependence of the surface area of the water fractions, obtained after a deconvolution procedure (grouped using the maximum temperatures Tmax), on water concentration Wc. Figure 5 provides a very interesting observation. The FW fraction (blue circles) is below the other fractions, and for low concentrations-around Wc ≤ 1.1 g/g-the concentrations of all water fractions are almost the same. Initially it may seem strange, but such a One can see that the T max temperatures of the FW fraction (blue circles) were mostly positive, while the non-freezing water pool (red empty circles) was characterized by negative temperatures. This is due to the fact that the NFW is defined only by water molecules strongly bound by hydrogen bonds and/or embedded in nanocavities of polymer chains. What are nanocavities? FW, FBW, and NFW coexisting in the polymer system form complexes consisting of polymer chains, ice (frozen water), unfrozen water, and air, represented by empty spaces. Both ice and unfrozen water can be trapped in the cavities and/or pores of the matrix. When the dimensions of such a spatial formation do not exceed several Angstroms, the crystallization process is difficult or even impossible. Liu et al. [37] called such hollow spaces "nanocavities" and proved that they are able to form the NFW fraction of water. In this way, they concluded that hydrogen bonding is not the only one factor influencing water crystallization and suggested that it is just one of the possible physical states of hydrated systems. As the NFW pool increases, the interactions between successive layers of water are still strong enough, but they weaken as the water concentration increases. Thus, the temperature T max of the water fraction is initially negative and then, when the distances between the binding sites decrease, it becomes positive. In this way, the next fraction of water appears in the matrix. The FBW fraction's temperatures are negative at low water concentrations (up to around 1.5 g/g), but positive at higher concentrations (>1.5 g/g).
In the next section of this paper, we show the dependence of the surface area of the water fractions, obtained after a deconvolution procedure (grouped using the maximum temperatures T max ), on water concentration W c . Figure 5 provides a very interesting observation. The FW fraction (blue circles) is below the other fractions, and for low concentrations-around W c ≤ 1.1 g/g-the concentrations of all water fractions are almost the same. Initially it may seem strange, but such a phenomenon was explained using Raman spectroscopy [38] to study the same samples of hydroxypropyl cellulose type MF. In the conclusions of this work, the authors pointed out that for low water contents equal to 0.69 g/g, all water found in the matrix was considered to be non-freezing. That is why the concentrations of all water fractions were almost the same. Next, it was mentioned that in the polymer samples with a water concentration of 1.41 g/g, a decrease in FWB water content was found, while the concentrations of the other types of water remained at the same level. This phenomenon can be easily seen in Figure 5. Finally, the authors concluded that for a water content W c ≥ 2.2 g/g, the increase in the concentrations of all types of water began after 16 days. These features seem to be associated with a transition from an anisotropic to an isotropic structure, caused by the clustering of hydrophobic groups. phenomenon was explained using Raman spectroscopy [38] to study the same samples of hydroxypropyl cellulose type MF. In the conclusions of this work, the authors pointed out that for low water contents equal to 0.69 g/g, all water found in the matrix was considered to be non-freezing. That is why the concentrations of all water fractions were almost the same. Next, it was mentioned that in the polymer samples with a water concentration of 1.41 g/g, a decrease in FWB water content was found, while the concentrations of the other types of water remained at the same level. This phenomenon can be easily seen in Figure  5. Finally, the authors concluded that for a water content Wc ≥ 2.2 g/g, the increase in the concentrations of all types of water began after 16 days. These features seem to be associated with a transition from an anisotropic to an isotropic structure, caused by the clustering of hydrophobic groups. The above-mentioned process appears in the form of irregular changes in the concentration of each water fraction: an initial increase, then a decrease, and finally a new increase. Figure 6 shows the same relationship, Area vs. Wc, although related to an HPC mixture with lowly soluble salicylic acid SA. Unlike mixtures with sodium salicylate, it is clearly visible how the FBW (green triangles) and NFW (red empty circles) fractions of water became saturated first, while the area associated with free water (blue circles) increased with increasing water concentration in the mixture. This was due to the fact that the content of polymer chains was reduced-HPC + SA 1:1 w/w. This lowered the number of possible binding sites and i reduced the number of nanocavities. Moreover, some nanocavities could be occupied by small SA molecules. For that reason, the FBW and NFW areas saturated very quickly, and the FW area rapidly increased [21]. The above-mentioned process appears in the form of irregular changes in the concentration of each water fraction: an initial increase, then a decrease, and finally a new increase. Figure 6 shows the same relationship, Area vs. W c , although related to an HPC mixture with lowly soluble salicylic acid SA. Unlike mixtures with sodium salicylate, it is clearly visible how the FBW (green triangles) and NFW (red empty circles) fractions of water became saturated first, while the area associated with free water (blue circles) increased with increasing water concentration in the mixture. This was due to the fact that the content of polymer chains was reduced-HPC + SA 1:1 w/w. This lowered the number of possible binding sites and i reduced the number of nanocavities. Moreover, some nanocavities could be occupied by small SA molecules. For that reason, the FBW and NFW areas saturated very quickly, and the FW area rapidly increased [21]. Figure 7 shows a very interesting phenomenon. Similarly to HPC/SA, the number of water binding sites found in HPC/NaSA mixtures was associated with the polymer chains content and was also limited to 1/2 − HPC + NaSA − 1:1 w/w. However, as it was previously reported [20], the amount of NFW water, calculated for HPC/NaSA, was equal to 0.48 g/g, comparable to the amount of 0.54 g/g calculated for raw HPC. This phenomenon, shown later in Figure 11C, is related to the presence of a strongly dissociating Na + ion [20,37]. The mobility of water molecules in such a system increased so much that it became impossible to distinguish one fraction of water from the other.
The relationships between the percentage of the water fractions and W c in HPC/SA ( Figure 8) and HPC/NaSA mixtures (Figure 9) follows the previously observed relationships between surface area and W c . The most important observation is that the percentage of NFW in the HPC/SA samples was greater than that of the FWB fraction, and the previously observed exchange between water types in HPC/NaSA mixtures was quick and easy.  Figure 7 shows a very interesting phenomenon. Similarly to HPC/SA, the number of water binding sites found in HPC/NaSA mixtures was associated with the polymer chains content and was also limited to 1/2 − HPC + NaSA − 1:1 w/w. However, as it was previously reported [20], the amount of NFW water, calculated for HPC/NaSA, was equal to 0.48 g/g comparable to the amount of 0.54 g/g calculated for raw HPC. This phenomenon, shown later in Figure 11C, is related to the presence of a strongly dissociating Na + ion [20,37]. The mobility of water molecules in such a system increased so much that it became impossible to distinguish one fraction of water from the other.    Figure 7 shows a very interesting phenomenon. Similarly to HPC/SA, the number o water binding sites found in HPC/NaSA mixtures was associated with the polymer chains content and was also limited to 1/2 − HPC + NaSA − 1:1 w/w. However, as it was previously reported [20], the amount of NFW water, calculated for HPC/NaSA, was equal to 0.48 g/g comparable to the amount of 0.54 g/g calculated for raw HPC. This phenomenon, shown later in Figure 11C, is related to the presence of a strongly dissociating Na + ion [20,37]. The mobility of water molecules in such a system increased so much that it became impossible to distinguish one fraction of water from the other.  Similarly, it is very thought-provoking that the percentage of FW fraction was the lowest in the raw HPC mixture ( Figure 10) when compared to the other NFW and FBW fractions.
of NFW in the HPC/SA samples was greater than that of the FWB fraction, and the previously observed exchange between water types in HPC/NaSA mixtures was quick and easy.  Similarly, it is very thought-provoking that the percentage of FW fraction was the lowest in the raw HPC mixture ( Figure 10) when compared to the other NFW and FBW fractions. of NFW in the HPC/SA samples was greater than that of the FWB fraction, and the previously observed exchange between water types in HPC/NaSA mixtures was quick and easy.  Similarly, it is very thought-provoking that the percentage of FW fraction was the lowest in the raw HPC mixture ( Figure 10) when compared to the other NFW and FBW fractions. Finally, we compared all FW, FBW, and NFW fractions of water obtained for raw HPC, HPC/SA, and HPC/NaSA mixtures to Wc, as shown in Figure 11A-C, respectively. Finally, we compared all FW, FBW, and NFW fractions of water obtained for raw HPC, HPC/SA, and HPC/NaSA mixtures to W c , as shown in Figure 11A-C, respectively. Figure 10. Raw HPC: percentage of fractions of water in the mixture as a function of water concentration Wc. The blue circles indicate the FW fraction, the green triangles the FBW fraction, the red empty circles the NFW fraction.
Finally, we compared all FW, FBW, and NFW fractions of water obtained for raw HPC, HPC/SA, and HPC/NaSA mixtures to Wc, as shown in Figure 11A-C, respectively.
Such a comparison allowed strengthening the observations made earlier in Figures  5-10. One can see ( Figure 11A) that the content of free water FW in raw HPC was the lowest, while in the HPC/SA mixture, it was the highest. Could this be surprising? Certainly not, considering what we wrote about it in the commentary to Figure 6. In a hydrated matrix, where there are few hydrogen binding sites as well as few places capable of trapping water, the FBW and NFW pools saturated faster, while the FW area sharply Figure 11. Comparison of all fractions (components) of (A) free water (FW), (B) freezing bound water (FBW), (C) non-freezing water (NFW) to W c The red circles, green empty circles, and grey crosses indicate HPC/SA, HPC/NaSA, and raw HPC, respectively. Such a comparison allowed strengthening the observations made earlier in Figures 5-10. One can see ( Figure 11A) that the content of free water FW in raw HPC was the lowest, while in the HPC/SA mixture, it was the highest. Could this be surprising? Certainly not, considering what we wrote about it in the commentary to Figure 6. In a hydrated matrix, where there are few hydrogen binding sites as well as few places capable of trapping water, the FBW and NFW pools saturated faster, while the FW area sharply increased. This corresponded to the content of the FBW fraction, clearly shown in Figure 11B. The strongly dissociating effect of the Na + cation, shown in Figure 11C, was explained while discussing Figure 7.

Conclusions
The aim of this work was the use of AI/ML tools for an empirical data analysis of the mutual interactions of water with hydroxypropyl cellulose HPC. AI/ML potential to broaden knowledge was employed to unravel some hidden relationships concerning thermal and ionic phenomena. We were able to confirm the assisting role of AI/ML in the formulation of hypotheses and their at least partial verification and/or falsification in relation to the physical phenomena observed with commonly applicable analytical methods-in our case, a DSC assay.
The main conclusions of the paper are: • assigning the maximum temperatures T max to the empirically determined components allowed confirming the identity of each water fraction-the same conclusions regarding the relationship T max vs. W c , but obtained from raw, not decomposed DSC curves, were the subject of another publication [20] • it was shown that the location of the components determined during AI/ML analysis was justified by the Raman spectrometry method [36,37] • as a result of the method used, the effects of drugs with different solubilities were very well differentiated and shown. The location of the components as well as quantitative aspects were confirmed in the DSC experiments carried out earlier [20,21,38] • a significant contribution of nanocavities in the formation of strongly bound nonfreezing water (NFW) was indicated-the obtained results were confirmed in previous publications [18,20,21,29,37,39] • it was pointed out the influence of strongly dissociating Na + ions on some quantitative relationships associated with strongly bound non-freezing water (NFW) [20,40] The presented three-component analysis met our all assumptions and expectations. The obtained theoretical model confirmed the physical properties of hydrated HPCboth raw and in a mixture with drugs of different solubility-previously obtained using differential scanning calorimetry DSC. Such results provide a very promising and proven instrument for modelling hydrated polysaccharide systems, also mixed with small-molecule drugs with varying solubility. The obtained results encourage to perform a decomposition analysis using the same algorithm on polymers with a different structure and physicochemical properties, also in mixtures with macromolecular drugs or proteins.
This type of research may contribute to a better understanding of the release of both highly and poorly soluble drugs from tablet formulations and their dissolution kinetics. Moreover, based on the well-known biopharmaceutical paradigm of drugs fate in the body LADME, the identification of the driving forces and mechanisms of drug release is crucial to the understanding and prediction of drugs' bioavailability. The latter is a primary endpoint for many types of clinical trials and plays a pivotal role in the assessment of drugs' efficacy and safety-an obligatory assessment required for every drug introduced to the market.
The major achievements and novelty of this work include: the development and publication of the AI/ML tool for the decomposition of DSC curves as an open source software freely available both for personal and for commercial use [28] the proof-of-concept use of the above-mentioned tool for DSC results for HPC/water mixtures with model drugs the confirmation of some AI/ML empirically developed hypotheses in the literature about the thermal behavior of HPC/water mixtures Data Availability Statement: Software: R peak decomposer and Database (database.zip). Available online: https://sourceforge.net/projects/r-peak-decomposer (accessed on 1 July 2021).

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
PAT process analytical technologies QbD quality by design DSC differential scanning calorimetry HPC hydroxypropyl cellulose FW free (or bulk) water FBW freezing bound water NFW non-freezing bound water SA salicylic acid NaSA sodium salicylate HPC/SA a binary physical mixture of hydroxypropyl cellulose and salicylic acid 1:1 w/w HPC/NaSA a binary physical mixture of hydroxypropyl cellulose and sodium salicylate 1:1 w/w BFGS Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm is an iterative method for solving unconstrained nonlinear optimization problems LADME Liberation, Absorption, Distribution, Metabolism, Excretion-major processes responsible for drugs behavior in the human body.