Estimating the Dissolution of Anticancer Drugs in Supercritical Carbon Dioxide with a Stacked Machine Learning Model

Synthesizing micro-/nano-sized pharmaceutical compounds with an appropriate size distribution is a method often followed to enhance drug delivery and reduce side effects. Supercritical CO2 (carbon dioxide) is a well-known solvent utilized in the pharmaceutical synthesis process. Reliable knowledge of a drug’s solubility in supercritical CO2 is necessary for feasible study, modeling, design, optimization, and control of such a process. Therefore, the current study constructs a stacked/ensemble model by combining three up-to-date machine learning tools (i.e., extra tree, gradient boosting, and random forest) to predict the solubility of twelve anticancer drugs in supercritical CO2. An experimental databank comprising 311 phase equilibrium samples was gathered from the literature and applied to design the proposed stacked model. This model estimates the solubility of anticancer drugs in supercritical CO2 as a function of solute and solvent properties and operating conditions. Several statistical indices, including average absolute relative deviation (AARD = 8.62%), mean absolute error (MAE = 2.86 × 10−6), relative absolute error (RAE = 2.42%), mean squared error (MSE = 1.26 × 10−10), and regression coefficient (R2 = 0.99809) were used to validate the performance of the constructed model. The statistical, sensitivity, and trend analyses confirmed that the suggested stacked model demonstrates excellent performance for correlating and predicting the solubility of anticancer drugs in supercritical CO2.


Introduction
The low solubility of solid pharmaceutical substances in the aqueous-based media of the human body is often resolved by utilizing a higher dosage of drugs [1,2]. This increase in the dosage usually increases the cost of pharmacological treatment [3], decreases the drug's therapeutic efficiency, and produces several side effects [2,4]. To overcome these critical limitations/drawbacks, synthesizing either micro-or nano-sized pharmaceutical substances with a uniform size distribution has been suggested by researchers [2,5]. Some researchers also used the ionically crosslinked complex [6] and self-indicating cellulosebased [7] gels to improver drug delivery. Therefore, a practical process must be established to synthesize pharmaceutical substances with these morphological characteristics.
Therefore, some researchers have utilized equations of state to simulate solid drug solubility in supercritical CO 2 [28,29]. This technique relies on the solid drugs' physio-chemical and critical properties to calculate their solubility in supercritical CO 2 [2]. Unfortunately, equations of state not only require complex mathematical operations, but the required drug's characteristics are also often unavailable [2].
Several empirical/semiempirical correlations have also been recommended to estimate the solubility of solid drugs in supercritical CO 2 [30][31][32][33]. Although these correlations are easy to use and only require temperature, pressure, and solvent density to calculate the drug solubility value, they are often applicable for a specific system under predefined operating conditions [13]. Therefore, these methods cannot be applied to monitor the solubility of drugs in supercritical CO 2 in a wide range of domains [13].
Recently, artificial intelligence models have been considered to estimate drug dissolution in supercritical CO 2 as a function of operating conditions and solvent property [16,27,34,35]. The quantitative structure-property relationships [27], artificial neural networks [36,37], adaptive neuro-fuzzy inference systems [13,34], and support vector machines [37,38] have been applied to predict both drug and drug-like substances in supercritical CO 2 .
The stacked/ensemble models that are often constructed by systematically combining several previously designed machine learning (ML) models have found great popularity in different fields of science and technology [39][40][41]. This up-to-date modeling scenario has not previously been utilized to estimate anticancer drug solubility in supercritical CO 2 . Therefore, the current research combines the extra tree (ET), gradient boosting (GB), and random forest (RF) models to construct a reliable stacked model for estimating anticancer drug solubility in supercritical CO 2 . The suggested stacked approach can monitor the solubility of twelve anticancer drugs in supercritical carbon dioxide in a broad range of operating conditions. We can claim that the proposed model in this study is straightforward, easy to use, generalized, and has no applicable range limitation. Moreover, such an approach is essential for designing pharmaceutical processes using supercritical CO 2 .

Anticancer Drugs' Solubility in Supercritical Carbon Dioxide
Laboratory investigations, empirical or semiempirical correlations, and machine learning models are often employed to measure or estimate the solubility of a specific solid drug in supercritical CO 2 versus equilibrium pressure/temperature and solvent density. Since this study aims to design a single model for simultaneously estimating the solubility of twelve anticancer drugs in supercritical CO 2 , it is also necessary to include the solute property in the model development phase. Therefore, the machine learning models have been applied to extract the relationship defined by Equation (1).
It can be said that the machine learning methods are responsible for deducing the inherent relationship between the solubility (y cal drug ) and drug molecular weight (M drug ), equilibrium temperature (T eq ) and pressure (P eq ), and solvent density (ρ CO 2 ). Table 1 introduces the experimental data gathered from the literature to develop the  machine learning models. Moreover, Table 2 reports the molecular weight of the considered anticancer drugs and their chemical structure.
equilibrium temperature ( eq T ) and pressure ( eq P ), and solvent density ( 2 CO ρ ). Table 1 introduces the experimental data gathered from the literature to develop the machine learning models. Moreover, Table 2 reports the molecular weight of the considered anticancer drugs and their chemical structure.
equilibrium temperature ( eq T ) and pressure ( eq P ), and solvent density ( 2 CO ρ ). Table 1 introduces the experimental data gathered from the literature to develop the machine learning models. Moreover, Table 2 reports the molecular weight of the considered anticancer drugs and their chemical structure.
equilibrium temperature ( eq T ) and pressure ( eq P ), and solvent density ( 2 CO ρ ). Table 1 introduces the experimental data gathered from the literature to develop the machine learning models. Moreover, Table 2 reports the molecular weight of the considered anticancer drugs and their chemical structure.  Thymidine 242 It should be mentioned that each row of Table 1 is actually a summary of multiple data instances, and that pressure, temperature, and CO2 density are input features. Moreover, an additional input feature is the molecular weight of drugs from Table 2, and the target feature to predict is anti-cancer drug solubility in supercritical CO2.

Methods
This study first develops three different ML regressors (i.e., extra tree, gradient boosting, and random forest) to monitor the equilibrium behavior of anticancer drug-supercritical CO2 systems. It then develops a stacked model using the three previously developed models as a base learner and linear regression as a meta learner.

Extra Tree
Geurts et al. [49] originally derived the extra tree regression (ETR) approach from the random forest (RF) algorithm [50]. According to the conventional top-down technique, the ETR develops a group of unpruned decisions (or regression trees) [49]. The ETR and RF models have two main differences. First, ETR utilizes whole cutting points and divides nodes by the random choice among these points. Second, it cultivates the trees utilizing the whole-learning samples to reduce bias as much as possible [49]. ETR controls the splitting process helping two parameters, namely, and . The former is the number of randomly chosen features in the node, and the latter represents the minimum sample size expected to separate nodes. In addition, and determine the strength of both the selection of attributes and the average output noise, respectively. These parameters have a key role in improving the ETR accuracy and decreasing the possibility of overfitting [50].

Gradient Boosting
The gradient boost model (GB) is an ensemble regressor used to enhance accuracy of function approximation, according to the boosting process [51]. This scenario gradually reduces observed error by sequentially combining several weak learners. This study employs the decision tree as a weak learner. Although the performance of GB-based models depends on the loss function, the logarithm of loss function is often applied to handle regression problems. Furthermore, adaptive components and weak learners are the key parameters of GB-based models. If a gradient boosting model has 300 n estimators, it means that 300 decision trees (weak learners) have been coupled under the boosting process, and each tree is limited to 300 max depth. Thymidine 242 It should be mentioned that each row of Table 1 is actually a summary of multiple data instances, and that pressure, temperature, and CO2 density are input features. Moreover, an additional input feature is the molecular weight of drugs from Table 2, and the target feature to predict is anti-cancer drug solubility in supercritical CO2.

Methods
This study first develops three different ML regressors (i.e., extra tree, gradient boosting, and random forest) to monitor the equilibrium behavior of anticancer drug-supercritical CO2 systems. It then develops a stacked model using the three previously developed models as a base learner and linear regression as a meta learner.

Extra Tree
Geurts et al. [49] originally derived the extra tree regression (ETR) approach from the random forest (RF) algorithm [50]. According to the conventional top-down technique, the ETR develops a group of unpruned decisions (or regression trees) [49]. The ETR and RF models have two main differences. First, ETR utilizes whole cutting points and divides nodes by the random choice among these points. Second, it cultivates the trees utilizing the whole-learning samples to reduce bias as much as possible [49]. ETR controls the splitting process helping two parameters, namely, and . The former is the number of randomly chosen features in the node, and the latter represents the minimum sample size expected to separate nodes. In addition, and determine the strength of both the selection of attributes and the average output noise, respectively. These parameters have a key role in improving the ETR accuracy and decreasing the possibility of overfitting [50].

Gradient Boosting
The gradient boost model (GB) is an ensemble regressor used to enhance accuracy of function approximation, according to the boosting process [51]. This scenario gradually reduces observed error by sequentially combining several weak learners. This study employs the decision tree as a weak learner. Although the performance of GB-based models depends on the loss function, the logarithm of loss function is often applied to handle regression problems. Furthermore, adaptive components and weak learners are the key parameters of GB-based models. If a gradient boosting model has 300 n estimators, it means that 300 decision trees (weak learners) have been coupled under the boosting process, and each tree is limited to 300 max depth.
It should be mentioned that each row of Table 1 is actually a summary of multiple data instances, and that pressure, temperature, and CO 2 density are input features. Moreover, an additional input feature is the molecular weight of drugs from Table 2, and the target feature to predict is anti-cancer drug solubility in supercritical CO 2 .

Methods
This study first develops three different ML regressors (i.e., extra tree, gradient boosting, and random forest) to monitor the equilibrium behavior of anticancer drug-supercritical CO 2 systems. It then develops a stacked model using the three previously developed models as a base learner and linear regression as a meta learner.

Extra Tree
Geurts et al. [49] originally derived the extra tree regression (ETR) approach from the random forest (RF) algorithm [50]. According to the conventional top-down technique, the ETR develops a group of unpruned decisions (or regression trees) [49]. The ETR and RF models have two main differences. First, ETR utilizes whole cutting points and divides nodes by the random choice among these points. Second, it cultivates the trees utilizing the whole-learning samples to reduce bias as much as possible [49]. ETR controls the splitting process helping two parameters, namely, k and n min . The former is the number of randomly chosen features in the node, and the latter represents the minimum sample size expected to separate nodes. In addition, k and n min determine the strength of both the selection of attributes and the average output noise, respectively. These parameters have a key role in improving the ETR accuracy and decreasing the possibility of overfitting [50].

Gradient Boosting
The gradient boost model (GB) is an ensemble regressor used to enhance accuracy of function approximation, according to the boosting process [51]. This scenario gradually reduces observed error by sequentially combining several weak learners. This study employs the decision tree as a weak learner. Although the performance of GB-based models depends on the loss function, the logarithm of loss function is often applied to handle regression problems. Furthermore, adaptive components and weak learners are the key parameters of GB-based models. If a gradient boosting model has 300 n estimators, it means that 300 decision trees (weak learners) have been coupled under the boosting process, and each tree is limited to 300 max depth.

Random Forest
To perform a regression problem by the random forest (RF) method, the bootstrapping and bagging stages should be followed. The first stage generates a group of decision trees by the growth of each distinct tree that uses a random training dataset sample. The second stage breaks down the decision tree nodes after achieving the ensemble, where several random subdivisions of training samples are chosen during the initial bagging process. The decision-making is performed by choosing the best subdivision and its value [52]. In summary, the RF model can be viewed as a group of decision trees, in which G(x, θ r ) is the Gth predicting tree and θ shows a uniform independent distribution vector assigned before the tree growth [53]. The Breiman equation (i.e., Equation (2)) is used to construct the forest (i.e., an ensemble of trees) by combining and averaging the whole trees [53].

Stacked Model
The study proposed a stacking-based approach and compared the performance with conventional ML regressors. This approach consists of a two-step learner such as baselearner and meta-learner. The three best-performing ML regression models were selected as base-learner models in the stacking model and linear regression was used as a metalearner model (M f ) in the second phase of the stacking model and eventually produced the final prediction. Figure 1 shows the architecture of the proposed stacking model, which combined p numbers of best-performing regression models M 1 , . . . . . . , M p using an input dataset A, with features (x i ) and corresponding label (y i ). In the first step, p numbers of base-level ML regression models produced the predictionsŷ 1 , . . . . . . ,ŷ m . The predictions of the base learners were fed into the meta learner model (M f ) for the final prediction.

Random Forest
To perform a regression problem by the random forest (RF) method, the bootstrapping and bagging stages should be followed. The first stage generates a group of decision trees by the growth of each distinct tree that uses a random training dataset sample. The second stage breaks down the decision tree nodes after achieving the ensemble, where several random subdivisions of training samples are chosen during the initial bagging process. The decision-making is performed by choosing the best subdivision and its value [52]. In summary, the RF model can be viewed as a group of decision trees, in which , is the Gth predicting tree and shows a uniform independent distribution vector assigned before the tree growth [53]. The Breiman equation (i.e., Equation (2)) is used to construct the forest (i.e., an ensemble of trees) by combining and averaging the whole trees [53].

Stacked Model
The study proposed a stacking-based approach and compared the performance with conventional ML regressors. This approach consists of a two-step learner such as baselearner and meta-learner. The three best-performing ML regression models were selected as base-learner models in the stacking model and linear regression was used as a metalearner model ( in the second phase of the stacking model and eventually produced the final prediction. Figure 1 shows the architecture of the proposed stacking model, which combined p numbers of best-performing regression models , … … , using an input dataset A, with features ( ) and corresponding label ( ). In the first step, p numbers of base-level ML regression models produced the predictions , … … , . The predictions of the base learners were fed into the meta learner model ( ) for the final prediction. Algorithm 1 provides a step-by-step procedure to explain the construction of the stacked model.
The above statistical criteria also require the average value of drug solubilities (y ave drug ) and the number of data (N). Equation (8) defines the average value of drug solubilities in supercritical CO 2 .
Moreover, several graphical techniques (cross-plot, histogram, kernel density estimation, and Bland-Altman) and trend analyses have been applied to check the performance of the most accurate machine learning approach (i.e., the stacked model).

Developing Base Machine Models
The anticancer drug-supercritical CO 2 phase equilibrium measurements (311 datasets) were randomly divided into internal and external groups (4:1 ratio). The five-fold crossvalidation utilized the earlier group (i.e., 248 data samples) for the training and validation phases of the base learner machines. On the other hand, the remaining 63 data samples were engaged in the testing phase of the trained base-learner machines.
In this study, we leverage the hyperparameter optimization framework Optuna [57] as follows. First, the corresponding parameter spaces for the Scikit-Learn implementations of the RF, ETR, and GB models were identified. Then, the objective function (OF) was defined as the MSE. Lastly, a pipeline was applied to minimize the OF over a predefined maximum iteration on multiple cores (i.e., 300). Some of the RF and ETR hyperparameters, including the number of trees in the forest (i.e., number of estimators) were adjusted. We checked 80-150 trees in the forest and the best accuracy was obtained with 110 trees for the RF and 100 trees for ETR. Furthermore, upon checking different accuracy criteria (i.e., squared error, absolute error, and Poisson), squared error shows the best performance for both ETR and RF algorithms. In addition, the GB model [51] was tuned by the learning rate, ranging from 0.0 to 1 with a step size of 0.1. The results show that the learning rate = 0.2 produced the best performance. The model was also tuned with diverse loss functions (i.e., squared error, absolute error, Huber, and quantile). It was found that absolute error provided the model with the best performance. All reported results in this study were obtained by using the optimized models. Table 3 introduces the uncertainty level observed in predictions of the base learner machines. The numerical values of five statistical indices are reported for the internal and external groups, as well as their combination. This table confirms that the deviation between the experimental solubility measurements and the associated predictions by the base-learner machines was relatively high. The extra tree model prediction accuracy was better than two other developed regression machines.

Designing the Stacked Model
As mentioned before, it is possible to build a stacked model by combining the previous three base learner machines utilizing the flowchart presented in Figure 1. Table 4 summarizes the prediction accuracy of the built stacked model in the cross-validation and testing phases and for all available datasets. It can be concluded that the stacked model provides acceptable prediction accuracy for calculating the phase equilibrium behavior of the anticancer drug-supercritical CO 2 binary system. The constructed stacked model estimated 311 data samples of anticancer drug solubilities in supercritical CO 2 with excellent accuracy, i.e., AARD = 8.62%, MAE = 2.86 × 10 −6 , RAE = 2.42%, MSE = 1.26 × 10 −10 , and R 2 = 0.99809. These values of the statistical indexes for predicting the ultra-low ranges of anticancer drug solubility in supercritical CO 2 (10 −7 to 10 −3 mole fraction based on Table 1) are sufficient for designing pharmaceutical processes.

Comparison with the Other Modeling Scenarios
The literature has estimated solubilities of Decitabine [36] and Busulfan [38] in supercritical CO 2 using adaptive neuro-fuzzy inference systems and support vector machines, respectively. These models have been developed to estimate the solubility of a single drug or two drugs in supercritical CO 2 , while our stacked model covers 12 different anti-cancer drugs. Table 5 shows that the accuracy of the stacked model is comparable or even better than the previously developed intelligent techniques. This stage suggests a simple correlation based on the partial least-squares regression (PLS-R) to linearly relate the anticancer drug solubility in supercritical CO 2 to the independent variables (Equation (9)).

Evaluating the Performance of the Stacked Model Using Graphical Analyses
This section utilizes several graphical analyses to visually inspect the stacked model's performance for predicting anticancer drug solubility in supercritical CO 2 .
The histogram of the observed relative errors illustrated in Figure 3 justifies that the major part of the solubility data (248 samples) was estimated with a relative error equal to zero. Moreover, the relative errors' average and standard deviation values were −8.2194 × 10 −7 and 1.12 × 10 −5 mole fractions, respectively.
The histogram of the observed relative errors illustrated in Figure 3 justifies that the major part of the solubility data (248 samples) was estimated with a relative error equal to zero. Moreover, the relative errors' average and standard deviation values were −8.2194 × 10 −7 and 1.12 × 10 −5 mole fractions, respectively.  The kernel density estimation (KDE) graphs of the experimental and calculated solubility values for the internal and external groups are exhibited in Figure 4. The two graphs in the figure show that only a little deviation exists between the experimental and calculated KDEs of the external groups. This deviation is observable in the 2 × 10 −9 < magnitude < 4 × 10 −9 of Figure 4b.  The kernel density estimation (KDE) graphs of the experimental and calculated solubility values for the internal and external groups are exhibited in Figure 4. The two graphs in the figure show that only a little deviation exists between the experimental and calculated KDEs of the external groups. This deviation is observable in the 2 × 10 −9 < magnitude < 4 × 10 −9 of Figure 4b. The kernel density estimation (KDE) graphs of the experimental and calculated solubility values for the internal and external groups are exhibited in Figure 4. The two graphs in the figure show that only a little deviation exists between the experimental and calculated KDEs of the external groups. This deviation is observable in the 2 × 10 −9 < magnitude < 4 × 10 −9 of Figure 4b.
The lower and upper LoAs in Figure 5a were −2.52 × 10 −10 and 2.28 × 10 −10 , whereas Figure 5b had lower and upper LoAs of −9.51 × 10 −11 and 1.07 × 10 −10 , respectively. Figure 5a,b demonstrated that only 9 out of 248 internal data points (3.63%) and 3 out of 63 external data points (4.76%) were located outside the feasible domains.   Figure 5a,b depict the Bland-Altman plots for the internal and external anticancer drug solubility data. These figures have two horizontal dashed lines associated with the upper and lower LoA (95% limit of agreement). Equations (13) and (14) define the upper and lower LoAs indices, respectively [58].

Trend Analyses
This section investigates the effect of equilibrium temperature/pressure and anticancer drug type on the system behavior from experimental and modeling perspectives.
The profile of Capecitabine solubility in supercritical CO2 versus equilibrium pressure for five temperature levels was plotted in Figure 6. It can be seen that increasing the equilibrium pressure continuously improved the Capecitabine solubility in the applied supercritical solvent. It can also be concluded that the pressure effect on the solid drug solubility was linear at low temperature and became nonlinear at high temperature.
An excellent agreement between the laboratory-measured equilibrium data and the stacked model predictions is easily deduced from this figure. Indeed, the stacked model was trained so well that it precisely anticipated the effect of pressure/temperature change, linear/nonlinear behavior of the system, and all individual data points.

Trend Analyses
This section investigates the effect of equilibrium temperature/pressure and anticancer drug type on the system behavior from experimental and modeling perspectives.
The profile of Capecitabine solubility in supercritical CO 2 versus equilibrium pressure for five temperature levels was plotted in Figure 6. It can be seen that increasing the equilibrium pressure continuously improved the Capecitabine solubility in the applied supercritical solvent. It can also be concluded that the pressure effect on the solid drug solubility was linear at low temperature and became nonlinear at high temperature. equilibrium pressure continuously improved the Capecitabine solubility in the applied supercritical solvent. It can also be concluded that the pressure effect on the solid drug solubility was linear at low temperature and became nonlinear at high temperature.
An excellent agreement between the laboratory-measured equilibrium data and the stacked model predictions is easily deduced from this figure. Indeed, the stacked model was trained so well that it precisely anticipated the effect of pressure/temperature change, linear/nonlinear behavior of the system, and all individual data points.  An excellent agreement between the laboratory-measured equilibrium data and the stacked model predictions is easily deduced from this figure. Indeed, the stacked model was trained so well that it precisely anticipated the effect of pressure/temperature change, linear/nonlinear behavior of the system, and all individual data points. Figure 7 exhibits the dependency of Decitabine solubility in supercritical CO 2 on the temperature at eight pressure levels. This figure shows that the temperature had two different impacts on the solid drug solubility at low and high equilibrium pressures. Indeed, increasing the temperature at low pressures decreased Decitabine solubility in the supercritical CO 2 . On the other hand, increasing the temperature at high equilibrium pressures gradually intensified the Decitabine solubility in supercritical CO 2 .
Pharmaceutics 2022, 14, x FOR PEER REVIEW 14 of 18 Figure 7 exhibits the dependency of Decitabine solubility in supercritical CO2 on the temperature at eight pressure levels. This figure shows that the temperature had two different impacts on the solid drug solubility at low and high equilibrium pressures. Indeed, increasing the temperature at low pressures decreased Decitabine solubility in the supercritical CO2. On the other hand, increasing the temperature at high equilibrium pressures gradually intensified the Decitabine solubility in supercritical CO2.
The complete agreement between experimental solubility data and their associated stacked model predictions can also be justified in this figure. The stacked model correctly predicts the solubility-temperature profiles and accurately estimates all single data points. Since the analyzed anticancer drugs have different compositions and chemical structures, their solubility is also influenced by drug type. The effect of anticancer type on the average solubility value is shown in Figure 8.  The complete agreement between experimental solubility data and their associated stacked model predictions can also be justified in this figure. The stacked model correctly predicts the solubility-temperature profiles and accurately estimates all single data points.
Since the analyzed anticancer drugs have different compositions and chemical structures, their solubility is also influenced by drug type. The effect of anticancer type on the average solubility value is shown in Figure 8. As expected, the anticancer drugs show different dissolution tendencies in supercritical CO 2 . Decitabine, Busulfan, and Tamoxifen have the highest dissolution ability in the applied supercritical solvent. In contrast, Paclitaxel, Sorafenib tosylate, Thymidine, and Tamsulosin show the lowest tendency for dissolving in the considered solvent.

Importance of Independent Variables
As the last analysis, the Pearson technique [58] was applied to monitor the relative importance of each individual independent anticancer on the drug's solubility in supercritical CO2. This technique presents a value ranging from −1 to +1 to clarify the direction and importance of the relationship between each pair of dependent and independent variables. Table 6 summarizes the results of this analysis.

Conclusions
The current research study applied the novel stacked model to precisely monitor phase equilibria of twelve anticancer drug-supercritical CO2 systems. The proposed stacked model was constructed by systematically combining extra tree, gradient boosting, and random forest machine learning models (known as base learners). Performance analyses of the based leaner models and the stacked model confirmed that the latter has the best accuracy in the cross-validation and testing phases. The designed stacked model showed excellent accuracy for predicting 311 experimentally-measured data samples (i.e., AARD = 8.62%, MAE = 2.86 × 10 −6 , RAE = 2.42%, MSE = 1.26 × 10 −10 , and R 2 = 0.99809). This stacked model performance is far better than those results obtained by the base-learner This figure also compares the average solubility values obtained by the experimental measurement and modeling analysis (i.e., stacked model). It is clear that the observed and calculated average solubility values are equal to up to four to five decimal places.

Importance of Independent Variables
As the last analysis, the Pearson technique [58] was applied to monitor the relative importance of each individual independent anticancer on the drug's solubility in supercritical CO 2 . This technique presents a value ranging from −1 to +1 to clarify the direction and importance of the relationship between each pair of dependent and independent variables. Table 6 summarizes the results of this analysis.

Conclusions
The current research study applied the novel stacked model to precisely monitor phase equilibria of twelve anticancer drug-supercritical CO 2 systems. The proposed stacked model was constructed by systematically combining extra tree, gradient boosting, and random forest machine learning models (known as base learners). Performance analyses of the based leaner models and the stacked model confirmed that the latter has the best accuracy in the cross-validation and testing phases. The designed stacked model showed excellent accuracy for predicting 311 experimentally-measured data samples (i.e., AARD = 8.62%, MAE = 2.86 × 10 −6 , RAE = 2.42%, MSE = 1.26 × 10 −10 , and R 2 = 0.99809). This stacked model performance is far better than those results obtained by the base-learner model (i.e., extra tree, gradient boosting, random forest), machine learning approaches suggested in the literature (support-vector machines and Adaptive neuro-fuzzy inference systems), and partial least-squares regression (PLS-R). Moreover, the graphical accuracy monitoring techniques (cross-plot, histogram, kernel density estimation, and Bland-Altman) and trend inspections (solubility-pressure and solubility-temperature profiles) confirmed the reliability of the stacked model predictions. Finally, the experimental data and modeling results revealed that Decitabine and Thymidine have the highest and lowest tendency for dissolving in supercritical CO 2 , respectively.