Development of GBRT Model as a Novel and Robust Mathematical Model to Predict and Optimize the Solubility of Decitabine as an Anti-Cancer Drug

The efficient production of solid-dosage oral formulations using eco-friendly supercritical solvents is known as a breakthrough technology towards developing cost-effective therapeutic drugs. Drug solubility is a significant parameter which must be measured before designing the process. Decitabine belongs to the antimetabolite class of chemotherapy agents applied for the treatment of patients with myelodysplastic syndrome (MDS). In recent years, the prediction of drug solubility by applying mathematical models through artificial intelligence (AI) has become known as an interesting topic due to the high cost of experimental investigations. The purpose of this study is to develop various machine-learning-based models to estimate the optimum solubility of the anti-cancer drug decitabine, to evaluate the effects of pressure and temperature on it. To make models on a small dataset in this research, we used three ensemble methods, Random Forest (RFR), Extra Tree (ETR), and Gradient Boosted Regression Trees (GBRT). Different configurations were tested, and optimal hyper-parameters were found. Then, the final models were assessed using standard metrics. RFR, ETR, and GBRT had R2 scores of 0.925, 0.999, and 0.999, respectively. Furthermore, the MAPE metric error rates were 1.423 × 10−1 7.573 × 10−2, and 7.119 × 10−2, respectively. According to these facts, GBRT was considered as the primary model in this paper. Using this method, the optimal amounts are calculated as: P = 380.88 bar, T = 333.01 K, Y = 0.001073.


Introduction
Recent studies in the area of clinical pharmacology have necessitated the invention of novel, promising, and environmentally friendly tools to increase the performance of therapeutic drugs [1,2]. In order to achieve this purpose, numerous efforts have been made to develop disparate approaches to reduce the application of potentially detrimental/deleterious organic solvents.
Decitabine (currently sold under the brand name DACOGEN ® ) is an intravenously administered chemotherapeutic drug, which acts as a nucleoside metabolic inhibitor [3][4][5]. Despite the emergence of some adverse events such as neutropenia, thrombocytopenia, and embryo-fetal toxicity in patients, the drug's great efficiency in ameliorating MDS has encouraged researchers to use it extensively [6][7][8]. Having long been considered, the most important duty of research and development (R&D) centers of pharmaceutical companies has been to concentrate on the development of supercritical fluid technology (SCFT) [9].
The existence of noteworthy advantages such as negligible processing times, the manufacturing of no organic co-solvent, and its great capability of extracting bioactive molecules has encouraged the researchers to use SCFT for drug discovery from natural sources [10][11][12]. In recent years, CO 2 supercritical fluid (CO 2 SCF) has received more attention within SCFT as an efficient solvent, due to significant privileges such as chemical inactivity, availability, cost-effectiveness, low critical temperature/pressure, and its approval as a food-grade solvent [13].
In recent years, artificial intelligence (AI) has found its place as a versatile tool, offering a high potential for applications in different industries such as separation, extraction, and nanotechnology, as well as drug development, including the identification, validation, and designation of novel drugs [14,15]. The various advantages of AI technology, such as robustness and time-effectiveness, have provided an appropriate chance to overcome the incompetence and discrepancies which may take place during conventional drug optimization and development techniques [16,17]. Machine learning (ML) is a predictive mathematical approach based on AI, which has paved the way to estimate the solubility of drugs in CO 2 SCF. To increase the generalization and performance of a single model, an ensemble of models is used in ML. Because of the blending of diverse predictions, ensembles generate effective predictive algorithms with enormous generalization capability [18]. Some ML techniques, such as decision tree and linear regression, are inherently unstable, which means that changing the training dataset results in a significantly different estimator. Unstable estimators have a low bias and high variability. Ensemble approaches have been proposed to reduce generalization error, that is, to reduce variance, bias, or both. In these approaches, the training dataset is modified, and an ensemble of different base estimators is created. These estimators are then combined to create a single estimator [19]. This section provides a quick overview of three main ensemble algorithms: bagging, gradient boosting, and Extra Trees.
In this study, a few basic models were first studied. Given that the decision tree gave significantly better outcomes, but these findings were not general enough to be presented as a powerful model backend, it was decided to employ models that reinforce it. Bagging and boosting are known as the most efficient advanced methods with decision trees. Bagging (bootstrap aggregating), created by Breiman et al. [20], can be considered as both a principal approach and a straight ensemble approach, illustrating brilliant efficiency as long as it reduces variance and avoids overfitting. The bootstrap technique, which replicates training datasets and creates training data subsets, contributes to the diversity of the bagging algorithm. Each subset is utilized to fit a various basic estimator, and the ultimate prediction outcomes are compiled by applying a majority vote procedure.
The other ensemble technique which is introduced from the work of Freund and Schapire is boosting [21]. In contrast to bagging, that provided a variety of basic learners by gradually reweighting the training data. Each sample weakly estimated by the previous estimator is given a higher weight in the next training step. As a result, training samples weakly estimated by predecessors are more likely to occur in the following bootstrap sample, and bias can be removed effectively. The final boosting algorithm model integrates all the underlying base estimators, which are weighted using their prediction performance.
There is a recently developed decision tree model called Extremely Random Tree (ExtraTree) that is an improved version of the traditional top-down decision tree (DT) model. There is an ensemble of DTs that has been trained in a recursive manner. The final model is built using a massive DT that is trained in a recursive manner. In each case, the tree must be expand using the entire set of data, and the proper cut point for each split can be calculated through the amount of data gained from each split [22,23].
The three algorithms selected for this study are: • Random Forest (Bagging of Regression Trees); • Extra Trees (Bagging of Regression Trees); • Gradient Boosting (Boosting of Regression Trees).

Dataset
Solubility models were created using a dataset with 32 input vectors, similar to [24]. The dataset is illustrated in Table 1. The distribution of features and output is shown in Figure 1. The diagonal subplots (when the x-axis and y-axis are identical) of this figure also show the kernel density estimate (KDE) plot. KDE plots visualize the distribution of observations in a dataset, like histograms. With KDEs, one or more dimensions of probability density curves are used to represent the data.

Random Forest Regression (RFR)
This regression method is an ML procedure which estimates the targeted output by combining the results of several DT learners [25,26]. When Random Forest receives an (x) input data point, it includes the amounts of the various input features probed for a given training area, which creates K regression trees and the averages of their results. The RF regression predictor, after such K-trees {T(x)}1 K have been trained, is [27]: 1 For bypass tree correlation, Random Forest makes the trees grow using different training subsets generated across a routine called bagging. This process is a subset creation technique that involves resampling the original dataset randomly with replacement, in order to generate the next training subset {h (x, Θk), k = 1, 2, …, K}, where each Θk is the same distributed independent random vector. Accordingly, some data can be applied many times during training, while some data points may never be used. By creating a tree using RF, the best split point

Random Forest Regression (RFR)
This regression method is an ML procedure which estimates the targeted output by combining the results of several DT learners [25,26]. When Random Forest receives an (x) input data point, it includes the amounts of the various input features probed for a given training area, which creates K regression trees and the averages of their results. The RF regression predictor, after such K-trees {T(x)} 1 K have been trained, is [27]: For bypass tree correlation, Random Forest makes the trees grow using different training subsets generated across a routine called bagging. This process is a subset creation technique that involves resampling the original dataset randomly with replacement, in order to generate the next training subset {h (x, Θ k ), k = 1, 2, . . . , K}, where each Θ k is the same distributed independent random vector. Accordingly, some data can be applied many times during training, while some data points may never be used. By creating a tree using RF, the best split point will be created through a set of input light [28][29][30].
Moreover, the data which are not used in training step in the k-th tree model in the bagging method, they have been utilized in an out-of-bag subset (oob). k-th tree model be able to utilize the oob items to calculate accuracy [29]. Moreover, the non-selected items in the training step of the k-th tree along the bagging routine are considered in the out-of-bag subset (oob). On the other hand, the k-th tree must be able to utilize these oob items in order to evaluate the efficiency. Increasing the quantity of trees results in a reduction in error, which is illustrated by the fact that the Random Forest does not have an overfitting issue. The relative importance of the features is likewise determined using RF. In order to select the best features in multi-source investigations, it is critical to find the relationship between each item and predicted procedure, and this feature can assist with that understanding [30,31].

Extra Tree Regression (ETR)
Geurts et al. [32] introduced the extremely randomized tree (ExtraTree), which is an improved version of traditional top-down decision tree (DT) models. The ExtraTree is an ensemble of DTs that has been trained in a recursive manner, and the final model was built employing a massive DT. Each develops the tree utilizing the whole dataset, and the suitable cut point for each split can be decided through achieved information [22,23].
This model is very close to the Random Forest and the Extra Tree model's primary innovations in that (i) the nodes are separated randomly when applying cut points, and (ii) the whole training dataset was used for developing the decision tree instead of subset generation using the bootstrap in Random Forest method [23,33].

Gradient Boosting Regression Trees (GBRT)
To improve prediction accuracy, boosting uses a series of base estimator compare to a single predictor to get an average between them. Base estimators/models (such as decision trees) are coordinated to clear bias in a stage-wise process. In order to modify the loss function, a new learner is introduced at each phase. Using training data, the first learner decreases the loss function to the lowest amount [34][35][36]. The following estimators make use of the previous estimators' residuals. The following Algorithm 1 demonstrates the gradient boosting procedure [35][36][37]:

Algorithm 1
Initialize F 0 (x) = argmin p ∑ N i−1 L(y i , P) For m ∈ {1, 2, . . . , M} : 1. Compute the negative gradient 3. Select a gradient descent step size as Here, x demonstrates the feature vector and y demonstrates the corresponding class label. {x i , y i }N i = 1,as training data and the aim is to calculate F * (x), be able to design x to y graph, a specific loss function L(y, F(x)) could be reduced to the lowest amount.

Results
After tweaking the hyper-parameters of models by testing various combinations of them, we employed MAPE and R 2 [38] to verify the accuracy and generality of the models.
The performance success of estimating findings is frequently measured using Rsquared, which is without a doubt the most often utilized criterion. Using this metric, you can see how closely the predicted results match up with the observed data [39].
MAPE is also one of the most used evaluation metrics. MAPE illustrates error size, which is between 0 and 1 [40].  Table 2 confirms the greater accuracy of the GBRT model with the best generality. The performance success of estimating findings is frequently measured using Rsquared, which is without a doubt the most often utilized criterion. Using this metric, you can see how closely the predicted results match up with the observed data [39].

∑ e o ∑ o o
MAPE is also one of the most used evaluation metrics. MAPE illustrates error size, which is between 0 and 1 [40].  Table 2 confirms the greater accuracy of the GBRT model with the best generality.     The reason why GBRT is superior to the other two models is because the dataset is small. Therefore, every data point can have a profound impact on the final model. In order   The reason why GBRT is superior to the other two models is because the dataset is small. Therefore, every data point can have a profound impact on the final model. In order The reason why GBRT is superior to the other two models is because the dataset is small. Therefore, every data point can have a profound impact on the final model. In order to improve air performance, it has been shown that the boosting method, by which the points that are incorrectly predicted are corrected by weighting, is more effective than the conventional method. Figure 5 illustrates the three-dimensional result based on the GBRT predictive model to simultaneously evaluate the effect of those two parameters, pressure and temperature as input parameters on the solubility of the anti-cancer drug decitabine as the only output.
Additionally, Figures 6 and 7 schematically demonstrate two-dimensional variations in pressure and temperature versus decitabine solubility. For all evaluated isotherms, an increase in pressure considerably improves the density of CO 2 SCF due to an increase in the compaction of molecules. The increment in density resulted in an enhancement in the efficiency of the solvent and, therefore, the solubility value of decitabine in CO 2 SCF increases. Despite the correlation between pressure and drug solubility, the influence of temperature is not straightforward, and a reverse alteration is seen after the 200-bar pressure point. It is worth pointing out that the solvent density and the sublimation pressure are considered as two competing parameters which entirely affect the effect of temperature on the solubility of the drug. By increasing the temperature, the density of the solvent significantly reduces, since greater molecular energy eventuates in the free movement of solvent molecules. Moreover, enhancement in the temperature of the system can improve the sublimation pressure, which presents a positive influence on drug solubility in a supercritical system. Considering the description, the net effect of the abovementioned competing factors can determine the positive or negative role of temperature on drug solubility. Analysis of the graphs illustrate a threshold pressure, the temperature increment positively encourages drug solubility owing to the significant role of sublimation pressure in comparison to density. Further, in pressure below the cross-over pressure, an increase in the temperature results in a substantial decrement in the decitabine solubility because of the decline in the solvent density. According to Table 3, 380.88 bar and 333.01 K, can be mentioned as the optimum values at highest decitabine solubility. to improve air performance, it has been shown that the boosting method, by which the points that are incorrectly predicted are corrected by weighting, is more effective than the conventional method. Figure 5 illustrates the three-dimensional result based on the GBRT predictive model to simultaneously evaluate the effect of those two parameters, pressure and temperature as input parameters on the solubility of the anti-cancer drug decitabine as the only output. Additionally, Figures 6 and 7 schematically demonstrate two-dimensional variations in pressure and temperature versus decitabine solubility. For all evaluated isotherms, an increase in pressure considerably improves the density of CO2SCF due to an increase in the compaction of molecules. The increment in density resulted in an enhancement in the efficiency of the solvent and, therefore, the solubility value of decitabine in CO2SCF increases. Despite the correlation between pressure and drug solubility, the influence of temperature is not straightforward, and a reverse alteration is seen after the 200-bar pressure point. It is worth pointing out that the solvent density and the sublimation pressure are considered as two competing parameters which entirely affect the effect of temperature on the solubility of the drug. By increasing the temperature, the density of the solvent significantly reduces, since greater molecular energy eventuates in the free movement of solvent molecules. Moreover, enhancement in the temperature of the system can improve the sublimation pressure, which presents a positive influence on drug solubility in a supercritical system. Considering the description, the net effect of the abovementioned competing factors can determine the positive or negative role of temperature on drug solubility. Analysis of the graphs illustrate a threshold pressure, the temperature increment positively encourages drug solubility owing to the significant role of sublimation pressure in comparison to density. Further, in pressure below the cross-over pressure, an increase in the temperature results in a substantial decrement in the decitabine solubility because of the decline in the solvent density. According to Table 3, 380.88 bar and 333.01 K, can be mentioned as the optimum values at highest decitabine solubility.

Conclusions
In this study, various predictive models were developed using AI approaches to estimate the optimum solubility value of the anti-cancer drug decitabine inside carbon dioxide supercritical fluid (CO 2 SCF). We used three ensemble methods to build models on a small dataset: Random Forest (RFR), Extra Tree (ETR), and Gradient Boosted Regression Trees (GBRT). Various configurations were tested, and optimal hyper-parameters were discovered. The final models were then evaluated using industry-standard metrics. RFR, ETR, and GBRT all had R2 scores of 0.925, 0.999, and 0.999, respectively. Furthermore, the MAPE metric error rates were 1.423 × 10 −2 , 7.573 × 10 −2 , and 7.119 × 10 −2 , respectively. GBRT was selected as the primary method for this study through these facts and other visual considerations. The optimal values were calculated using this model as P = 380.88, T = 333.01, and Y = 0.001073.