Base Oil Process Modelling Using Machine Learning

: The quality of feedstock used in base oil processing depends on the source of the crude oil. Moreover, the reﬁnery is fed with various blends of crude oil to meet the demand of the reﬁning products. These circumstances have caused changes of quality of the feedstock for the base oil production. Often the feedstock properties deviate from the original properties measured during the process design phase. To recalculate and remodel using ﬁrst principal approaches requires signiﬁcant costs due to the detailed material characterizations and several pilot-plant runs requirements. To perform all material characterization and pilot plant runs every time the reﬁnery receives a different blend of crude oil will simply multiply the costs. Due to economic reasons, only selected lab characterizations are performed, and the base oil processing plant is operated reactively based on the feedback of the lab analysis of the base oil product. However, this reactive method leads to loss in production for several hours because of the residence time as well as time required to perform the lab analysis. Hence in this paper, an alternative method is studied to minimize the production loss by reacting proactively utilizing machine learning algorithms. Support Vector Regression (SVR), Decision Tree Regression (DTR), Random Forest Regression (RFR) and Extreme Gradient Boosting (XGBoost) models are developed and studied using historical data of the plant to predict the base oil product kinematic viscosity and viscosity index based on the feedstock qualities and the process operating conditions. The XGBoost model shows the most optimal and consistent performance during validation and a 6.5 months plant testing period. Subsequent deployment at our plant facility and product recovery analysis have shown that the prediction model has facilitated in reducing the production recovery period during product transition by 40%. the naphthene rings the viscosity of base oil The product from this undergoes the hydroisomerization reaction, base oil molecules isomerized hydrodearomatization process to increase the base oil product oxidative the product is distilled at vacuum condition to remove the lighter materials. Figure shows a simpliﬁed ﬂow diagram of the base oil processing plant. the production loss by utilizing machine learning algorithms has been carried out. Four machine learning algorithms, which are the Support Vector Regression (SVR), Decision Tree Regression (DTR), Random Forest Regression (RFR) and Extreme Gradient Boosting (XGBoost), have been studied and developed using plant historical data to predict the base oil product kinematic viscosity and viscosity index based on the feedstock qualities and the process operating conditions. The XGBoost model has shown the most optimal and consistent performance during validation and a 6.5 month plant testing periods. The beneﬁt of the machine learning model is that it can be used to predict the product properties with the given feedstock properties and operating condition earlier than the residence time of the material moving through the operating units. The prediction model has been shown to be capable of providing the right operating conditions especially during the transitions between the production modes that allows earlier product realization compared to the reactive method. Subsequent deployment at our plant facility and product recovery analysis have shown that the prediction model has facilitated in reducing the production recovery period during product transition by 40%.


Introduction
Lubricants are important fluids and are commonly used to suppress friction between two metallic surfaces and as a medium for heat transportation. These base oils are obtained from various sources such as petroleum raffinates, bio-based oils, olefins and plastic wastes, with some additives to enhance their properties. A few examples of processes used to produce the base oils are solvent extraction, severe hydrocracking, olefin polymerization and esterification.
This paper focuses on a severe hydrocracking process in our facility that involves three series of reactions which are hydrotreating, hydroisomerisation and hydrodearomatisation. The feedstock used is originated from the waxy raffinate obtained from crude oil atmospheric distillation. This waxy raffinate is then distilled at vacuum condition to obtain three different grades with the final product kinematic viscosities of 4 cSt, 6 cSt and 10 cSt ranges. In the hydrotreating process, feedstock is fed into the hydrotreating reactor to reduce the sulfur and nitrogen content to an acceptable level. The hydrotreating reactor also cracks condition to obtain three different grades with the final product kinematic viscosities of 4 cSt, 6 cSt and 10 cSt ranges. In the hydrotreating process, feedstock is fed into the hydrotreating reactor to reduce the sulfur and nitrogen content to an acceptable level. The hydrotreating reactor also cracks the naphthene rings which will improve the viscosity index of the base oil product. The product from this reaction then undergoes the hydroisomerization reaction, where the base oil molecules are isomerized to lower its pour point. Next, the product undergoes the hydrodearomatization process to increase the base oil product oxidative stability. Finally, the product is distilled at vacuum condition to remove the lighter materials. Figure 1 shows a simplified flow diagram of the base oil processing plant. In the hydrotreating reactor, the sulfur and nitrogen contents are reduced to a very low level with the right combination of catalyst, temperature and appropriate hydrogen partial pressures. The ease of desulfurization depends on the type of carbon-sulfur bonds present in the feedstock [1,2]. Groups of alkyl sulfides and polysulfides possess weaker carbon-sulfur bonds, which makes for easier sulfur removal. On the other hand, the sulfur that comes from thiophene and aromatics are more difficult to be removed, requiring higher temperature [2,3]. Meanwhile the nitrogen contents commonly comes in the form of pyrrole and pyridine derivatives [2,4]. The initial step of the nitrogen removal is pressure dependent as it is close to the equilibria [2,4], yet very high temperatures can reduce the nitrogen removal rate [2,5]. Therefore, controlling the temperature and pressure balance is highly important for the sulfur and nitrogen removal process.
In the hydroisomerization process, n-paraffin in the feed reacts with zeolite catalysts and are selectively cracked and isomerized under a hydrogen-rich environment. Through this hydroisomerization process, the pour point of the base oil product is reduced. At the same time, the aromatic content in both feedstock and the product are a critical parameter in base oil processing. Aromatics cause oxidative instability in the product, as well as lower the viscosity index [2]. The lower the aromatic content, the more stable the final product is over time. Therefore, the hydrodearomatization reaction is needed to remove the remaining aromatics in the base oil product. The temperature and pressure condition of the hydrodearomatization reactor is critical, as the limiting factor of hydrodearomatization activity over sulfide metal oxide catalyst is the thermodynamic equilibrium [2].
Apart from the operating conditions, such as temperature and pressure, the feed properties are equally important in predicting the base oil product properties. The nature of the feed, primarily the heavy parts of the crude oil, is made up of various chemical groups such as paraffins, aromatics, sulfur and nitrogen-containing compounds. The heavy hydrocarbons can have numerous possibilities of molecular structures, to the point where obtaining the exact composition of the molecules is not a feasible task [6]. Thus, the In the hydrotreating reactor, the sulfur and nitrogen contents are reduced to a very low level with the right combination of catalyst, temperature and appropriate hydrogen partial pressures. The ease of desulfurization depends on the type of carbon-sulfur bonds present in the feedstock [1,2]. Groups of alkyl sulfides and polysulfides possess weaker carbon-sulfur bonds, which makes for easier sulfur removal. On the other hand, the sulfur that comes from thiophene and aromatics are more difficult to be removed, requiring higher temperature [2,3]. Meanwhile the nitrogen contents commonly comes in the form of pyrrole and pyridine derivatives [2,4]. The initial step of the nitrogen removal is pressure dependent as it is close to the equilibria [2,4], yet very high temperatures can reduce the nitrogen removal rate [2,5]. Therefore, controlling the temperature and pressure balance is highly important for the sulfur and nitrogen removal process.
In the hydroisomerization process, n-paraffin in the feed reacts with zeolite catalysts and are selectively cracked and isomerized under a hydrogen-rich environment. Through this hydroisomerization process, the pour point of the base oil product is reduced. At the same time, the aromatic content in both feedstock and the product are a critical parameter in base oil processing. Aromatics cause oxidative instability in the product, as well as lower the viscosity index [2]. The lower the aromatic content, the more stable the final product is over time. Therefore, the hydrodearomatization reaction is needed to remove the remaining aromatics in the base oil product. The temperature and pressure condition of the hydrodearomatization reactor is critical, as the limiting factor of hydrodearomatization activity over sulfide metal oxide catalyst is the thermodynamic equilibrium [2].
Apart from the operating conditions, such as temperature and pressure, the feed properties are equally important in predicting the base oil product properties. The nature of the feed, primarily the heavy parts of the crude oil, is made up of various chemical groups such as paraffins, aromatics, sulfur and nitrogen-containing compounds. The heavy hydrocarbons can have numerous possibilities of molecular structures, to the point where obtaining the exact composition of the molecules is not a feasible task [6]. Thus, the process is very difficult to model for all combinations of reactions and feed properties using a first principal approach. Furthermore, in practice only certain lab analyses are performed at non-rapid frequencies, solely to indicate the bulk properties of the feedstock in general. The feedstock lab analysis performed are simulated distillation, sulfur content, nitrogen Energies 2021, 14, 6527 3 of 25 content, aromatics content, wax content, viscosity, dewaxed oil viscosity, dewaxed oil viscosity index and density. In addition, the necessary pilot plant runs require days of operation to develop even the first principal model. Further iterations of these detailed studies carry a cumulative monetary and time cost. With these costs in mind, the plant currently runs reactively based on lab analysis of the base oil product, using the reported product specification to guide in further decision making. If the product is under or over the specification, the plant operator will adjust the operating condition accordingly. This reactive mode of operation possesses its own weakness in that during situations where product results are uncertain, the product must be diverted to a slopping tank. As the proper results are being waited upon, which can take several hours, there is loss of production in this period.
This presents a motivation for the authors to explore other ways to improve production. Recently, the rise of several machine learning algorithms, which are readily available at zero cost, provides an interesting and promising alternative for solving prediction problems. These machine learning capabilities have grown substantially as computing capabilities become more advanced and accessible. Through utilization of the readily available plant historical data, machine learning has the potential to reduce the time consumed and costs expended for these laboratory tests [7], and deliver a prediction of its own either for use as a secondary reference or as an accurate black-box representation of the laboratory analysis. Recent studies on modeling viscosity prediction using low field nuclear magnetic resonance also highlighted the need for data driven approaches due to the complexity of the viscometry behavior of hydrocarbon mixtures, especially within the petroleum industry [8]. In this paper, Support Vector Regression (SVR), Decision Tree Regression (DTR), Random Forest Regression (RFR) and Extreme Gradient Boosting (XGBoost) machine learning models are developed using historical data of the plant to predict the base oil product kinematic viscosity and viscosity index based on the feedstock qualities and the process operating conditions. Subsequent plant deployment at our facility and product recovery analysis by using the final selected machine learning model are also presented to highlight the key benefits.
This paper is presented as follows: Section 2 provides some theoretical background of the machine learning models considered in this paper. Section 3 discusses the dataset used and the methodology adopted. Section 4 presents the modelling results and the corresponding plant testing and deployment at our facility. Finally, Section 5 concludes the study and findings.

Machine Learning Models
In this paper, four types of the machine learning algorithms are studied, namely, Support Vector Regression (SVR), Decision Tree Regression (DTR), Random Forest Regression (RFR) and Extreme Gradient Boosting (XGBoost). SVR was developed by Vladimir N. Vapnik in the statistical learning Vapnik-Chervonenkis (VC) theory. SVR is positioned to generalize on yet-to-be seen data [9]. SVR formulates regression as convex optimization problems [10,11]. The goal is to find the optimum distance of the support vectors from the prediction function or known as tube width [9,10]. The support vector algorithm will define the hyperplanes H i such that: (1) The prediction function F(ŵ·x i ) is the median, H 0 between the H 1 and H 2 hyperplanes as illustrated in Figure 2. The H 1 and H 2 also known as tube with the width of 2ε [10]. The points that lie on H 1 and H 2 are the support vectors. Hence to find the best fit is by defining the ε-insensitive loss function L as (3) [10]. For the DTR, the algorithm generates the regression tree by piece-wis variable into several ranges and predicts the value based on the averag values. Using this piece-wise function, the trees are developed as a graph with one node and branched to many nodes. The topmost node contains the and each subsequent node contains a subset of the dataset. Decision tre considered as alternatives to regressions and discriminant analysis [13]. E input variable in the piecewise function has the average observed values For illustration, Figure 3 shows the top part of the decision tree for predictin product kinematic viscosity at 100 °C. The constraint can be defined as in (7) where y i is defined as (8) and (9).
y i = +1 for observed sample is above prediction (8) The width of the hyperplane 2ε is the dot product of the difference between vector → x and → x * and the vector w divided by magnitude of w as in (11). Thus, maximizing the width by maximizing the magnitude of → w is equivalent to minimizing half of the squared magnitude of → w (that is convenient for the differentiation). The minimization of the half of the squared magnitude of → w with the constraint of (4) and (5) is represented in (10). Given the constraints via Lagrange multiplier α i in the form of Loss function L as in (11) whilst α i and α * i satisfies (14).
where Q is an n × n positive semidefinite matrix, which is equivalent to a kernel function K [10,12].
Therefore, the prediction can be written in the form of kernel function as (16). Substituted with kernel radial based function is represented by (17).
For the DTR, the algorithm generates the regression tree by piece-wising the input variable into several ranges and predicts the value based on the average of observed values. Using this piece-wise function, the trees are developed as a graph which begins with one node and branched to many nodes. The topmost node contains the entire dataset, and each subsequent node contains a subset of the dataset. Decision trees have been considered as alternatives to regressions and discriminant analysis [13]. Each range of input variable in the piecewise function has the average observed values as prediction. For illustration, Figure 3 shows the top part of the decision tree for predicting the base oil product kinematic viscosity at 100 • C. For the DTR, the algorithm generates the regression tree by piece-wising the input variable into several ranges and predicts the value based on the average of observed values. Using this piece-wise function, the trees are developed as a graph which begins with one node and branched to many nodes. The topmost node contains the entire dataset, and each subsequent node contains a subset of the dataset. Decision trees have been considered as alternatives to regressions and discriminant analysis [13]. Each range of input variable in the piecewise function has the average observed values as prediction.
For illustration, Figure 3 shows the top part of the decision tree for predicting the base oil product kinematic viscosity at 100 °C. The RFR algorithm, on the other hand, is an ensemble learning built from multiple decision trees, resembling a forest. RFR was introduced by Breiman as computing method to handle the nonlinear data [14,15]. Furthermore, the advantage of random forest is that it has the ability to calculate the importance of each input variable which enables it to rank  The RFR algorithm, on the other hand, is an ensemble learning built from multiple decision trees, resembling a forest. RFR was introduced by Breiman as computing method to handle the nonlinear data [14,15]. Furthermore, the advantage of random forest is that it has the ability to calculate the importance of each input variable which enables it to rank the variables in conformity with their importance [15]. The two important criteria applied in random forest are mean decrease impurity and mean decrease accuracy to order the rank of important variables [12,15]. The mean decrease impurity is calculated based on the sum of the reduction in the impurity node of the categorization on the variables mean over all trees. Mean decrease accuracy is calculated based on the following condition, if a variable is not important, the reuse of its value should not increase the error of the prediction [15]. Out of bag error estimation is used to determine the important variables [14,16]. The Random Forest algorithm will create a number of decision trees and average the outcome from all decision trees as its prediction as illustrated in Figure 4.
Energies 2021, 14, x FOR PEER REVIEW the variables in conformity with their importance [15]. The two important crite in random forest are mean decrease impurity and mean decrease accuracy t rank of important variables [12,15]. The mean decrease impurity is calculate the sum of the reduction in the impurity node of the categorization on the var over all trees. Mean decrease accuracy is calculated based on the following co variable is not important, the reuse of its value should not increase the e prediction [15]. Out of bag error estimation is used to determine the importa [14,16]. The Random Forest algorithm will create a number of decision trees a the outcome from all decision trees as its prediction as illustrated in Figure 4. XGBoost is a decision tree-based algorithm with an optimized distribut boosting machine learning that is very efficient and flexible [18]. The notion is to use an ensemble of classification or regression trees to fit the samples of tr by minimizing the regularized objective function. It consists of internal nodes leaf nodes. Based on the boosting method, a series of classifications and regr built sequentially. These estimators are associated with the weight to be tu training process so that robust ensemble can be constructed [19,20]. Accordi and Guestrin the loss function added a function ( ) that most improves the loss function simplified at step to (18)   XGBoost is a decision tree-based algorithm with an optimized distributed gradient boosting machine learning that is very efficient and flexible [18]. The notion of XGBoost is to use an ensemble of classification or regression trees to fit the samples of training data by minimizing the regularized objective function. It consists of internal nodes and a set of leaf nodes. Based on the boosting method, a series of classifications and regressions are built sequentially. These estimators are associated with the weight to be tuned in the training process so that robust ensemble can be constructed [19,20]. According to Chen and Guestrin the loss function added a function ( f t ) that most improves the model, the loss function simplified at step t to (18) [18]: Define I j = {i|q(x i ) = j} as the instance set of leaf j. The loss function can be written by expanding Ω as follows: For a fixed structure q(x), the optimal weight w * j of leaf j by The corresponding optimal value can be calculated using Unlike the Random Forest regression, the XGBoost algorithm give scores on every right and wrong prediction and built models from individual learning in an iterative way [18,21].

Base Oil Processing Plant Product Sampling and Laboratory Analysis
An actual base oil product sample is taken at every 8 h interval and used as the target output of the prediction. The properties that are be predicted in this study are the kinematic viscosity measured at 100 • C and the viscosity index. The kinematic viscosity measures the ease of flow of the base oil measured in cSt unit. The higher the viscosity, the lower is its flowability. Meanwhile, the viscosity index measures how the viscosity changes with respect to temperature and it is dimensionless. The kinematic viscosity and viscosity index of the base oil are measured using ASTM D445 and ASTM D5293 standards, respectively [22,23].

Training, Validation and Plant Testing and Model Deployment Data
Historical data from the base oil processing plant are used in this study. The data are inclusive of feedstock properties, operating conditions and the product properties. As mentioned in Section 1, the critical variables for each reactor unit as well as product distillation units are included. These variables are reactor temperatures, reactor pressures, flow rates, product fractionation pressures, product fractionation draw temperatures, product fractionation draw rates and product fractionation bed temperatures. The feedstock properties that are used are simulated distillation boiling points, viscosity, wax content, dewaxed feed viscosity, dewaxed viscosity index density, sulfur content and nitrogen content. These input variables accumulated to a total of 54 variables. The targeted output parameter are the base oil viscosity and viscosity index.
The historical data contain non-numeric data such as strings whenever there is an error at the transmitter and contains empty data whenever there is unavailability of data especially the low frequency data, such as lab analysis data. Therefore, data pre-processing is performed prior to the machine learning model development. The irrelevant data are removed, missing values are imputed, feature scaling (for SVR) is performed and outliers are removed. When imputing the missing data, the fill forwarding method is used to maintain the continuation of values rather than substituting it with mean value, with the assumption that the value is maintained for a particular period. This is because in practice, the feed tank sampling is performed once prior to feeding to the base oil processing plant and the samplings are taken at the top, middle and bottom of the tank and the values are averaged out. Subsequently whenever the feed tank is switched to another tank, new sampling is performed on the new feed tank.
The dataset is cleaned from outliers by using z score method, where z is equal to difference between observed value, x and mean of sample, µ divided by standard deviation, σ of the sample as in (22). Thus, the row with absolute z score value greater than 3 will be considered an outlier and removed from the dataset [24].
Two sets of data from the plant are used as follows: • Training and validation datasets (for model development) These are data collected from 1 January 2016 to 29 June 2020 consisting of the 54 input variables and 2 output variables (base oil viscosity and viscosity index). The data are split into 70% training data set and 30% validation data set.
• Plant testing and model deployment (for out-of-sample model test) To assess the predictive model performance with the real time data, the models are deployed in a platform application that is connected directly to the plant information system. A testing period is performed from 29 June 2020 until 13 January 2021. The production ran from one product grade continuously for several days and then changed to another grade continuously as per scheduled by the production planner.
The input and the output variables are presented in Table 1. The SVR model is trained using the SVR package from scikit-learn library [12]. Radial basis function kernel is used to tackle non-linearity of the base oil process [25]. The hyperparameters C and γ are tuned using grid search cross validation by scikit-learn [12,26,27]. The C parameter is the regularization parameter [12,26] and its values are set to 10 −2 , 10 −1 , 10 0 , 10 1 and 10 2 . Γ is the kernel coefficient for the radial based function [12,26] and its values are set to 10 −5 , 10 −4 , 10 −6 , 10 −2 , 10 −1 , 10 0 , 10 1 and 10 2 . Two kernels are evaluated: radial basis function and linear. The SVR model is then trained using sets of hyperparameter combinations. All the hyperparameter combinations are tested with 5 splits of cross validation which are evaluated by the R 2 score. The hyperparameter combinations are then ranked based on the mean test score of the 5 splits scores. The time taken for each hyperparameter combination calculation is also recorded. Table 2 shows the top 5 of the rank from 40 different combinations of hyperparameters used in the training of kinematic viscosity using SVR. Average time taken for all the trainings is 0.769 s. The highest mean R 2 was 0.992984 and the lowest, RMSE of 0.083759 where the hyperparameters C, γ and kernel are 10, 0.01 and RBF, respectively. Table 3 shows the top 5 of the rank from 40 combinations of hyperparameters used in the training of the viscosity index using SVR. The average time taken for all the trainings is 0.970 s. From the results in both tables, it can be observed that the RBF kernel works well in both kinematic viscosity and viscosity index models development. This is because the base oil process is highly unlikely a linear process as it involves reaction kinetics and equilibriums. The RBF kernel is capable of handling the non-linearity nature of the base oil processing plant behavior. C parameter is the regularization parameter as highlighted in (11). Higher C values imply wider width between hyperplanes H 1 and H 2 which means a larger margin of error; however, it may increase the chance of misclassification. In this study, each model development has shown that the best C value is 10, which is higher than the default value of 1 as given by Scikit Learn [12]. As observed from Tables 2 and 3, there are two cases where C is equal to 1 resulted in the top 5 ranking (4th rank and 3rd rank in kinematic viscosity and viscosity index model, respectively). This is interesting because it shows the complexity of the base oil process. Although the R 2 is high, there are some data points located outside the hyperplanes H 1 and H 2 possibly due to noises.
From both Tables 2 and 3, it can also be observed that the optimal γ value is 0.01 for both kinematic viscosity and viscosity index model development. γ is the kernel's coefficient, as shown in (17). The default value of γ given is 0.059. The γ in this study is moderately lower than the default value indicating that the model is still capable in capturing the behavior and complexity of the data, but at the expense of higher value of C. In term of time consumption, SVR is the second fastest compared to the others.

Decision Tree Regression (DTR) Model Development
For the DTR development, the decision tree is developed using scikit-learn decision tree package [12]. The hyperparameters such as splitter, minimum sample split and minimum samples leaf are varied by using grid search cross validation provided by scikitlearn. In the grid search, two strategies can be implemented for the splitter hyperparameter as provided by the scikit-learn decision tree package: "random" split and "best" split. The minimum number of samples required to split an internal node is set to 2, 3, 4 and 0. The minimum number of samples required to be at a leaf node is set to 1, 2, 3, 4 and 0. For illustration purposes, Figure 3 shows the top part of the decision tree for predicting the base oil product kinematic viscosity at 100 • C. Tables 4 and 5 show the top 5 of the rank from 32 different combinations of hyperparameters used in the training of kinematic viscosity and viscosity index models using DTR, respectively. The average time taken for all the trainings is 0.073 s and 0.075 s, respectively. For the kinematic viscosity model, the highest mean R 2 is 0.994289 where the hyperparameters minimum samples leaf, minimum samples split and splitter are 4, 2 and random, respectively. On the other hand, the best DTR model for the viscosity index is obtained using the hyperparameters minimum samples leaf, minimum samples split and splitter of 4, 4 and Random, respectively. It can be seen that both kinematic viscosity and viscosity index best models used the random splitter as it has less tendency to overfit. In addition, overfitting is observed for lower number of minimum sample leaf (less than 4). Overall, the DTR models are the fastest to develop compared to the other machine learning models studied in this paper.

Random Forest Regression (RFR) Model Development
The RFR model development is performed by using random forest regressor package provided by scikit-learn [12]. The number of estimators, minimum sample split, minimum sample leaf and bootstrap hyper parameters are varied by using grid search cross validation by scikit-learn [12]. The number of estimators parameter which indicates the number of trees in the forest are set to 100, 150 and 200. The minimum number of samples required to be split in an internal node is set to 2, 3, 4 and 0. The minimum number of samples required to be at a leaf node is set to 1, 2, 3, 4 and 0. In the grid search, both cases for bootstrap is used, which is enabled ("true") and disabled ("false"). The Random Forest algorithm creates N number of decision trees and averages the outcome from all decision trees as its prediction as illustrated in Figure 4. Table 6 shows the top 5 of the rank from 54 combinations of hyperparameters used in the training of viscosity index prediction using RFR. The average time taken for all the trainings is relatively higher compared to the previous two models at 26.084 s. The highest rank is obtained with the hyperparameters bootstrap sampling, and minimum samples leaf, minimum samples split, number of estimators for bootstrap sampling of 2, 4 and 200, respectively. Table 7 shows the corresponding top 5 of the rank from 54 combinations of hyperparameters used in the training of viscosity index prediction using RFR. Average time taken is also high for all the trainings at 26.823 s. The best ranking is given by the hyperparameters bootstrap sampling, and minimum samples leaf, minimum samples split, number of estimators for bootstrap sampling of 1, 2 and 150, respectively.

Extreme Gradient Boosting (XGBoost) Model Development
The XGBoost model is trained using the XGBoost package provided by PyPl and maintained by Hyunso Cho and team [28]. The hyperparameters (i.e., number of estimator, maximum depth, learning rate and regularization lambda) are tuned using the grid search cross validation by scikit-learn [12]. The number of gradient boosted trees, which is also equivalent to the number of boosting rounds are set to 1000, 1500 and 2000. The maximum depths for base learners are set to 1, 2, 3 and 0. The boosting learning rate are set to 0.1, 0.2, 0.3, 0.4 and 0.5, and the regularization lambda are set to 0.1, 0.5 and 1.0. Table 8 shows the top 5 of the rank from 180 combinations of hyperparameters used in the training of kinematic viscosity prediction using XGBoost. The average time taken for all the trainings is 26.875 s. The highest mean R 2 is 0.996713 with RMSE of 0.0706426 where the learning rate, maximum depth of tree, number of estimators, regularization term lamda are 0.1, 0, 1000 and 0.1, respectively. Table 9 shows the top 5 of the rank from 180 combinations of hyperparameters used in the training of viscosity index prediction using XGBoost. The average time taken for all the trainings is 31.637 s with the best rank obtained with learning rate, maximum depth of tree, number of estimators, and regularization term of 0.1, 0, 2000 and 0.5, respectively.

Model Training and Validation Performance
The performance of the machine learning algorithms is compared based on their root mean squared error (RMSE), mean absolute error (MAE), mean squared log scale prediction (MSLE), mean absolute percentage error (MAPE), coefficient of determination (R 2 ) and adjusted coefficient of determination (adjusted R 2 ) as presented in Equations (23)- (28). The n, y predicted , y observed ,ȳ observed , and k are number of observation, predicted value, observed value, mean observed value, and number of features, respectively [8]. The RMSE is the square root of the average squared difference between the predicted value and observed value whilst MAE is the mean average absolute difference between predicted value and observed value. The difference between RMSE and MAE is that RMSE can be more sensitive to outliers because of the square function [8,29]. The MSLE can be interpreted as a measure of the ratio between the true and predicted values and avoids penalization due to high differences between the viscosity grades and viscosity indexes [8,30]. The MAPE indicates the relative percentages between the sum of errors and provides intuitive interpretation of the prediction as the errors are represented in percentages [8,31].
As mentioned in Section 3.2, these are data collected from 1 January 2016 to 29 June 2020 consisting of the 54 input variables and 2 output variables (base oil viscosity and viscosity index). The data are split into a 70% training data set and a 30% validation data set. It can be observed that in general, the XGBoost showed the best consistency in performance during the training and validation phases for both the kinematic viscosity and viscosity index prediction models. For the kinematic viscosity prediction, the adjusted R 2 values are very similar among the machine learning models (approximately 0.996 ± 0.003) but the differences are clearly observed in terms of RMSE and MLSE as presented in Table 10. For the viscosity index as presented in Table 11, models developed through XGBoost and SVR show similar performance. In general, the performance of the machine learning models can be ranked as XGBoost > RFR > DTR > SVR for the kinematic viscosity, and XGBoost > SVR > RFR > DTR for the viscosity index output variable.

Physicochemical Insights from the Machine Learning Activities
One of the advantages of DTR, RFR and XGBoost is the ability to evaluate and produce feature importance. This is very useful for the user because it can tell relative influence of the input variables to the output variable. This enables the operators to prioritize the operating conditions to be changed to meet the desired product target. Figures 5-7 show the top 10 relative important factors for kinematic viscosity at 100 • C by DTR, RFR and XGBoost, respectively. The hydrotreating reactor temperature is one of the important input variables as it can impact the product viscosity. At a higher temperature, more molecular cracking will take place, and this will result in a lighter material, hence lowering the product viscosity. The product fractionation parameters such as base oil product flowrate, fractionation bed temperature and pump around flow rate also has a major influence on the base oil viscosity as they determine the purity of the base oil. If the base oil contains too much lighter hydrocarbon, the viscosity will be lower and vice versa. Feature importance analysis also shows that feed properties are significant; for example, the dewaxed feed kinematic viscosity at 100 • C, density, and nitrogen content will impact the process.
The dewaxed feed kinematic viscosity and density is indicative of the feed oil quality, in terms of polyaromatics content in the feed. Higher polyaromatics tends to be more viscous and requires higher hydrotreating reactor temperature hence more cracking took place. Similarly, the higher the feed nitrogen content, the higher the hydrotreating reactor temperature, thus base oil viscosity is lower.

OR PEER REVIEW 15 of 24
the dewaxed feed kinematic viscosity at 100 °C, density, and nitrogen content will impact the process. The dewaxed feed kinematic viscosity and density is indicative of the feed oil quality, in terms of polyaromatics content in the feed. Higher polyaromatics tends to be more viscous and requires higher hydrotreating reactor temperature hence more cracking took place. Similarly, the higher the feed nitrogen content, the higher the hydrotreating reactor temperature, thus base oil viscosity is lower.      Figures 8-10 show the top 10 relative importance for kinematic viscosity at 100 °C by DTR, RFR and XGBoost, respectively. It is observed that the hydrotreating reactor plays a major role in determining the base oil viscosity index. This is because one of the functions of the hydrotreating reactor is to crack the naphthene molecules in the feed, which has a lower viscosity index. Therefore, higher hydrotreating reactor temperatures are required to elevate the final base oil viscosity index.  Figures 8-10 show the top 10 relative importance for kinematic viscosity at 100 • C by DTR, RFR and XGBoost, respectively. It is observed that the hydrotreating reactor plays a major role in determining the base oil viscosity index. This is because one of the functions of the hydrotreating reactor is to crack the naphthene molecules in the feed, which has a lower viscosity index. Therefore, higher hydrotreating reactor temperatures are required to elevate the final base oil viscosity index. DTR, RFR and XGBoost, respectively. It is observed that the hydrotreating reactor plays a major role in determining the base oil viscosity index. This is because one of the functions of the hydrotreating reactor is to crack the naphthene molecules in the feed, which has a lower viscosity index. Therefore, higher hydrotreating reactor temperatures are required to elevate the final base oil viscosity index.

Plant Testing and Model Deployment
To assess the predictive model performance with the real time data, the models are then deployed in a platform application that is connected directly to the plant information system. A testing period is performed from 29 June 2020 until 13 January 2021. The production ran from one product grade continuously for several days and then changed to another grade continuously as per scheduled by the production planner. Figures 11 and 12 show the performance of the machine learning models in predicting the base oil kinematic viscosity at 100 °C and viscosity index, respectively. In general, XGBoost and RFR models results in lower prediction errors in comparison to DTR and SVR. Tables 12 and 13 show the RMSE, MAE, MLSE, MAPE, R 2 and Adjusted R 2 values calculated from the predicted against actual for each machine learning model for both the kinematic viscosity and viscosity index output variables, respectively.

Plant Testing and Model Deployment
To assess the predictive model performance with the real time data, the models are then deployed in a platform application that is connected directly to the plant information system. A testing period is performed from 29 June 2020 until 13 January 2021. The production ran from one product grade continuously for several days and then changed to another grade continuously as per scheduled by the production planner. Figures 11 and 12 show the performance of the machine learning models in predicting the base oil kinematic viscosity at 100 • C and viscosity index, respectively. In general, XGBoost and RFR models results in lower prediction errors in comparison to DTR and SVR. Tables 12 and 13 show the RMSE, MAE, MLSE, MAPE, R 2 and Adjusted R 2 values calculated from the predicted against actual for each machine learning model for both the kinematic viscosity and viscosity index output variables, respectively.  Figure 11. Performance of each machine learning on predicting kinematic viscosity at 100 °C during model deployment.      During this plant deployment study, the SVR model provides the third best performance in predicting the viscosity index and fourth best performance for predicting the kinematic viscosity compared to the others. Comparing to RFR and XGBoost models, however, the SVR model suffers more when dealing with noises. It is observed that towards the end of the plant testing period, the prediction of the viscosity index deviated further and further from the actual value. This indicates that the SVR model requires frequent re-trainings to maintain its accuracy. Interestingly, SVR model shows a matching trend at the beginning of the index better than that of the DTR model where the prediction is more conservative in their approach by averaging the training data in its piece-wise functions. On the other hand, the XGBoost and RFR models exhibited both trending and conservative predictions which lead to better prediction accuracy.
The DTR model ranked third in predicting the base oil kinematic viscosity and fourth in predicting viscosity index. Although the DTR is a simple and very fast algorithm, it has the tendency to overfit, resulting in high bias and variance errors. The reasons that led to the overfitting could be due to the presence of noise and lack of representative instances. The bias error happens when too many restrictions are placed on the targeted functions, while the variance error is introduced by the changes of the training set. When the decision tree has high amount of variance, there is a potential to cause significant changes in the final results [32]. The base oil process dataset itself has a lot of noises which come from the interchanges of modes between 4 cSt, 6 cSt and 10 cSt, and the low frequency of feed properties lab data. Therefore, by doing a single decision tree at a time in finding the best DTR model is not as effective as the RFR and XGBoost algorithms.
Random Forest (RFR) and XGBoost models performed very well in this study. RFR model performed the best in predicting base oil kinematic viscosity at 100 • C, and XGBoost is the best at predicting the base oil viscosity index. The averaged prediction of the RFR model is expected to be closer to the true value as compared to a single decision tree. Furthermore, the RFR algorithm reduced the variance in decision trees by sampling differently during training, through the use of specified random feature subsets while also combining with shallow trees. These advantages however also comes with a cost, as the higher the number of trees that are generated in the random forest, the slower the process becomes [33]. The XGBoost algorithm, on the other hand, does not use the assumption of a fixed architecture, and instead finds the function which best approximates the data [34]. The gradient boosting algorithm finds the best parameters as well as the best function. Rather than looking for possible functions and their parameters to find the best one, which will take a very long time, it takes a lot of simple functions and adds them together. In short, the algorithm trains an ensemble of simple models. The algorithm corrects the mistakes from the first model, and the second model is trained on the gradient of error with respect to the first model's loss predictions. Through utilization of many simple models to compensate for each other's weaknesses, the data behavior can be learned more effectively. The DTR is a simple decision-making graph, while the RFR model is built on a large number of trees and combined using averages at the end of the process. XGBoost, on the other hand, combines the decision trees at the beginning of the process rather than the end. Gradient boosting has larger complexity in its modelling steps due to the hyperparameters tuning required, compared to RFR [33].

Product Recovery Using Prediction Model
The advantage of having a predictive model is the ability to predict the product properties in the near future, i.e., after the material has gone through the processing units (residence time). This is very useful especially during the transition between the production mode.
In this base oil production process, the most significant change in the product quality is during the transition between 10 cSt product and 4 cSt product. During this transition period, the product is diverted into a slopping tank. Previously, the decision to end the transition period is made based on the laboratory analysis results which took several hours, normally the lab result available at the 10th hours. The next challenge is to determine the right operating conditions to ensure the product will meet the specification at the end of the transition period. To solve this, the predictive model is very useful because with the given feed properties, combinations of operating condition can be entered to predict the product properties after the residence time. For this plant implementation, the XGBoost model has been utilized due to its optimal performance as shown in the earlier sections.
Therefore, performance test runs are conducted at the base oil processing plant with the aim of faster product recovery during this transition period. The 10 cSt production mode runs at lower flowrate than 4 cSt mode. Figure 13 shows the suggested feed flow rate by the model and the actual feed flowrate during the performance test run. Figures 14 and 15 show hydrotreatment and hydroisomerization model suggested temperatures vs actual temperatures, respectively. hours, normally the lab result available at the 10th hours. The next challenge is to determine the right operating conditions to ensure the product will meet the specification at the end of the transition period. To solve this, the predictive model is very useful because with the given feed properties, combinations of operating condition can be entered to predict the product properties after the residence time. For this plant implementation, the XGBoost model has been utilized due to its optimal performance as shown in the earlier sections.
Therefore, performance test runs are conducted at the base oil processing plant with the aim of faster product recovery during this transition period. The 10 cSt production mode runs at lower flowrate than 4 cSt mode. Figure 13 shows the suggested feed flow rate by the model and the actual feed flowrate during the performance test run. Figures  14 and 15 show hydrotreatment and hydroisomerization model suggested temperatures vs actual temperatures, respectively.   hours, normally the lab result available at the 10th hours. The next challenge is to determine the right operating conditions to ensure the product will meet the specification at the end of the transition period. To solve this, the predictive model is very useful because with the given feed properties, combinations of operating condition can be entered to predict the product properties after the residence time. For this plant implementation, the XGBoost model has been utilized due to its optimal performance as shown in the earlier sections.
Therefore, performance test runs are conducted at the base oil processing plant with the aim of faster product recovery during this transition period. The 10 cSt production mode runs at lower flowrate than 4 cSt mode. Figure 13 shows the suggested feed flow rate by the model and the actual feed flowrate during the performance test run. Figures  14 and 15 show hydrotreatment and hydroisomerization model suggested temperatures vs actual temperatures, respectively.   The estimated time where slopping can be ended is at the 6th hours after the transition period starts. However, samplings of the base oil product are only performed at the 4th, 5th and 6th hours because significant changes in the product properties normally happened in between the 4th to 6th hours and plant manpower optimization. The target kinematic viscosity is within the range of 4.25 ± 0.25 cSt and the viscosity index is 135.0 ± 2.5. Figures 16 and 17 shows that the product met the specification at time 6th hours, and this is achieved through the correct estimation of the appropriate operating conditions as predicted by the XGBoost model during the transition period. The transition period now has been reduced from 10 h to 6 h, a 40% improvement is observed.   The estimated time where slopping can be ended is at the 6th hours after the transition period starts. However, samplings of the base oil product are only performed at the 4th, 5th and 6th hours because significant changes in the product properties normally happened in between the 4th to 6th hours and plant manpower optimization. The target kinematic viscosity is within the range of 4.25 ± 0.25 cSt and the viscosity index is 135.0 ± 2.5. Figures 16 and 17 shows that the product met the specification at time 6th hours, and this is achieved through the correct estimation of the appropriate operating conditions as predicted by the XGBoost model during the transition period. The transition period now has been reduced from 10 h to 6 h, a 40% improvement is observed. The estimated time where slopping can be ended is at the 6th hours after the transition period starts. However, samplings of the base oil product are only performed at the 4th, 5th and 6th hours because significant changes in the product properties normally happened in between the 4th to 6th hours and plant manpower optimization. The target kinematic viscosity is within the range of 4.25 ± 0.25 cSt and the viscosity index is 135.0 ± 2.5. Figures 16 and 17 shows that the product met the specification at time 6th hours, and this is achieved through the correct estimation of the appropriate operating conditions as predicted by the XGBoost model during the transition period. The transition period now has been reduced from 10 h to 6 h, a 40% improvement is observed.

Limitations and Assumptions
Several limitations are encountered in this study, which are amended through certain assumptions. Firstly, data from the lab are normally less frequent compared to data from equipment and transmitters. Therefore, it is assumed that the lab data, especially the feed properties, are constant until a new sample is taken. Secondly, in actual industry practice the amount of sampling and lab analysis done is normally reduced for time and cost optimization, therefore it is assumed that the data that had been used were sufficient enough to develop the model. Throughout the model development and implementation, the project team has been continuously communicating with the refinery team to obtain the best possible results. Finally, in actual operation some of the transmitters will have intermittent failures, causing historical data loss. In such cases, the operating condition values are calculated by linear interpolation.

Conclusions
The quality of feedstock used in base oil processing varies significantly depending on the source of the crude oil as well due to the fact that the refinery is fed with various blends of crude oil to meet the demand for the refining products. Due to cost optimization, only selected lab characterizations are performed, and the base oil processing plant is operated reactively based on the feedback of the lab analysis of the base oil product. This leads to loss in production for several hours because of the residence time as well as time required to perform the lab analysis. In this paper, a proactive method with the aim to minimize the production loss by utilizing machine learning algorithms has been carried out. Four machine learning algorithms, which are the Support Vector Regression (SVR),

Limitations and Assumptions
Several limitations are encountered in this study, which are amended through certain assumptions. Firstly, data from the lab are normally less frequent compared to data from equipment and transmitters. Therefore, it is assumed that the lab data, especially the feed properties, are constant until a new sample is taken. Secondly, in actual industry practice the amount of sampling and lab analysis done is normally reduced for time and cost optimization, therefore it is assumed that the data that had been used were sufficient enough to develop the model. Throughout the model development and implementation, the project team has been continuously communicating with the refinery team to obtain the best possible results. Finally, in actual operation some of the transmitters will have intermittent failures, causing historical data loss. In such cases, the operating condition values are calculated by linear interpolation.

Conclusions
The quality of feedstock used in base oil processing varies significantly depending on the source of the crude oil as well due to the fact that the refinery is fed with various blends of crude oil to meet the demand for the refining products. Due to cost optimization, only selected lab characterizations are performed, and the base oil processing plant is operated reactively based on the feedback of the lab analysis of the base oil product. This leads to loss in production for several hours because of the residence time as well as time required to perform the lab analysis. In this paper, a proactive method with the aim to minimize the production loss by utilizing machine learning algorithms has been carried out. Four machine learning algorithms, which are the Support Vector Regression (SVR), Decision Tree Regression (DTR), Random Forest Regression (RFR) and Extreme Gradient Boosting (XGBoost), have been studied and developed using plant historical data to predict the base oil product kinematic viscosity and viscosity index based on the feedstock qualities and the process operating conditions. The XGBoost model has shown the most optimal and consistent performance during validation and a 6.5 month plant testing periods. The benefit of the machine learning model is that it can be used to predict the product properties with the given feedstock properties and operating condition earlier than the residence time of the material moving through the operating units. The prediction model has been shown to be capable of providing the right operating conditions especially during the transitions between the production modes that allows earlier product realization compared to the reactive method. Subsequent deployment at our plant facility and product recovery analysis have shown that the prediction model has facilitated in reducing the production recovery period during product transition by 40%.