Exploring of the Incompatibility of Marine Residual Fuel: A Case Study Using Machine Learning Methods

: Providing quality fuel to ships with reduced SOx content is a priority task. Marine residual fuels are one of the main sources of atmospheric pollution during the operation of ships and sea tankers. Hence, the International Maritime Organization (IMO) has established strict regulations for the sulfur content of marine fuels. One of the possible technological solutions allowing for adherence to the sulfur content limits is use of mixed fuels. However, it carries with it risks of ingredient incompatibilities. This article explores a new approach to the study of active sedimentation of residual and mixed fuels. An assessment of the sedimentation process during mixing, storage, and transportation of marine fuels is made based on estimation three-dimensional diagrams developed by the authors. In an effort to ﬁnd the optimal solution, studies have been carried out to determine the inﬂuence of marine residual fuel compositions on sediment formation via machine learning algorithms. Thus, a model which can be used to predict incompatibilities in fuel compositions as well as sedimentation processes is proposed. The model can be used to determine the sediment content of mixed marine residual fuels with the desired sulfur concentration.


Introduction
Greenhouse gas emissions from ships, especially sulfur oxides (SOx), have serious impacts on human health, the marine environment, and natural resources. Dozens of countries, including China and the EU, have already announced their readiness to achieve carbon neutrality on their territory by 2050-2060, i.e., to reduce to zero the difference between greenhouse gas emissions and their absorption by taking into account the capabilities of the region's ecosystem. The EU was one of the first to propose a transition to a carbon neutral economy by 2050 within the framework of the European Green Deal [1]. To achieve the goal of decarbonization and sustainable development of the fuel and energy complex in Russia, unprecedented legislative measures have been taken [2][3][4][5][6] by aiming at tax benefits for domestic majors, and this stimulates research projects aimed at reducing emissions of harmful gases [7][8][9][10]. The EU has already launched CO 2 emissions quotas, and the EU Emissions Trading System has begun to operate. Transboundary carbon regulation for the EU and the US is becoming a way to preserve EU and US marine fuel markets and a part of protectionist policies. The Russian economy expects a difficult trade-off between the recognition of global principles of environmental, social, and governance responsibility and the realities of a carbon-intensive economy. The most exposed to climate impacts will be oil and various types of fuel. While maintaining the status quo, BP believes [11] that the peak of world oil consumption was already reached in 2019, although OPEC expects it by 2040 [12], and the International Energy Agency after 2030 [13].
In January 2020, the International Maritime Organization (IMO) introduced new requirements for the sulfur content of marine fuels [14]. Therefore, the control of COx and SOx emissions is a major concern in the maritime industry [15,16]. As a result, the allowed sulfur content in marine fuels decreased 7 times, from 3.5 to 0.5 wt. For the period 2007-2012, the IMO also estimated the average annual SOx emissions to be 11.3 million tons of shipping, representing 13% of global SOx emissions [17]. In port cities, ship sulfur oxide emissions are often the main source of pollution [18][19][20]. Moreover, SOx emissions from ships spread to the atmosphere over several hundred kilometers and contribute to the degradation of air quality on land, even if they are released into the sea [21]. Recently, the IMO has been actively regulating marine pollution rules and introducing emission control areas [22,23].
Shipowners must be able to ensure that the sulfur content complies with the sulfur limits based on existing laws. Currently, there are three technologically feasible solutions to reduce SOx emissions that can be used by ships, namely the use of liquefied natural gas as a fuel, the installation of gas scrubbers, and marine residual fuels with low sulfur content. This article discusses the third method-the use of marine fuel with a sulfur content of up to 0.5 wt. %.
The refineries in Russia are currently unable to fully provide the new type of fuel, as this requires considerable financial investment and in some cases is not economically profitable [24,25]. Therefore, in order to meet the demand for a new type of marine fuel, bunker companies are actively engaged in mixed fuels operations to obtain the required quality indicators. Today there is a sharp increase in the share of mixed fuels for ship installations. As a result, the risks of the manifestation of fuel incompatibility increase, causing active sedimentation [26].
In this regard, during storage, transportation, and production, this problem becomes extremely relevant [27][28][29][30][31]. The maximum permissible content of the total sediment potential (TSP) in marine fuels is 0.1 wt. % according to ISO 8217. It should be noted that there are risks of incompatibility even when mixing the same brand of fuels due to differences in composition [32]. The incompatibility of residual fuels is manifested due to the occurrence of strong intermolecular interactions, which are caused by a change in the group composition, as well as a change in the concentration ratio of high-molecular compounds of residual fuels. All these contribute to the formation of associations of molecules, volumetric colloidal particles of various shapes and structures [33,34].
We carried out studies to determine stability through the xylene equivalent indicator. It characterizes the resistance of marine fuel to stratification during storage, transportation and operation, but exact dependences of the influence of the composition have not been obtained, since the selected indicator of the xylene equivalent does not allow determining the degree of sedimentation [35].
Also, based on production and experimental data, we carried out studies of the effect of aromatic hydrocarbons on the precipitation of asphaltenes. We found that an increase in the proportion of aromatic hydrocarbons reduces sedimentation. However, the effect of paraffins on the precipitation of asphaltenes has not been considered; therefore, more detailed studies in this direction are required [36,37]. We obtained and substantiated the dependences of the effect of n-paraffins (from 55 to 70 wt. %) at specific values of asphaltenes (0.5, 1, 1.5, 2, 2.5, 3, 3.5 wt. %) on the basis of experimental studies [38].
In this article, the task is to expand the possibilities of applying the results obtained using widely tested methods of machine learning [39]. All these make it possible to determine the risks of incompatibility in the considered range (but not at specific values of asphaltenes and n-paraffins) with a high level of confidence.
A fairly large set of classification/regression algorithms is tested to choose the most efficient for the problem to be investigated. The tested methods can be divided into three families: decision trees, k-nearest neighbors (k-NN), multiple linear regression, and function of two variables. They are briefly described as follows. Braidotti et al. [40] applied machine learning methods (such as decision trees, k-NN, and support vector machine) in their study. Based on their research, the best choice was represented by package decision trees, which showed very good accuracy for methods of classifying and defining damaged compartments. To solve other problems, they recommended giving preference to weighted k-NN, which ensured better accuracy of the prediction scenario.
Codjo et al. [41] used various machine learning methods to identify certain features of the investigated object degradation. Various methods of controlled learning were applied. A comparison of classifiers and methods showed that logistics regression and decision tree approaches were robust binary classification tools with an accuracy of 97.917% and 99.884%, respectively, while the k-NN method cannot provide accurate predictions. Due to the processing of the results by using modern machine learning methods, it is possible to expand the application of the results obtained.
Zhang et al. [42] performed an experiment to simulate the exhaust emissions from ships. The sulfur content prediction model has made it possible to effectively predict the sulfur content in marine fuel oil. The paper discusses the use of a deep neural network to improve the accuracy of predicting the sulfur content in marine residual fuel.
The literature analysis showed that there is a global problem: approximately 15% of NOx emissions and 4% to 9% of SO2 emissions are caused by shipping [43]. About 70% of exhaust gases are discharged into the maritime atmosphere at a distance of less than 400 km from the land, causing serious air pollution in coastal areas, especially around ports with a high flow of cargo [44][45][46][47][48]. Sulfur contained in marine fuel oil leads to the formation of large amounts of SO 2 as a result of chemical reactions occurring during engine operation, and the amount of SO 2 emitted is directly related to the sulfur content in the marine fuel [49]. The tightening of requirements for the sulfur content has shown that the available knowledge is insufficient, and this direction of research is extremely urgent.
Based on the current state of the problem and possible solution methods, it can be concluded that the use of machine learning tools for predicting and modeling the composition of marine fuels is relevant. However, to date, there is no specific method that allows one to accurately determine the characteristics of incompatibility of fuels. In addition, the accuracy of using various machine learning methods is highly dependent on the input data.
The lack of exact dependencies on the influence of the composition of residual fuel oils on sediment formation due to the manifestation of incompatibility allows us to identify a gap in the studies. Many researchers limit themselves only to general recommendations, and this problem has not been fully resolved in any way. Understanding and correct use of machine learning methods will allow you to develop an effective tool and significantly improve forecast accuracy.

Materials and Methods
Several methods of machine learning have been considered for developing a practical robust tool, which can be used to determine sediment formation based on the results of experimental studies of the influence of n-paraffins and asphaltenes on the compatibility of marine residual fuels.
The first method considered is multiple linear regression. This model is one of the most important and widely used regression techniques. One of its advantages is the ease of interpretation of the results.
The mathematical model based on graphs-the decision tree-is considered. This model defines the decision making process in such a way that every possible decision, the preceding and subsequent events or other decisions, and the consequences of each final decision are represented. The decision tree is a well-known non-parametric controlled algorithm based on binary solutions. Decision trees can be used for both classification and regression (providing a piecewise approximation of the response function). The decisionmaking process takes the form of a tree or graph, starting from a zero vertex and then moving from one vertex to another according to the predictors' values of the predictors (the leaves are the predicted answer). A typical decision tree structure is shown in Figure 1. Starting with the route, decisions are made from two possible directions according to the value of one predictor xj. A similar process is performed in each passed node along with the tree structure until a decision-by-decision sheet corresponding to the answer class is reached.
Energies 2021, 14, x FOR PEER REVIEW 4 of 16 regression (providing a piecewise approximation of the response function). The decisionmaking process takes the form of a tree or graph, starting from a zero vertex and then moving from one vertex to another according to the predictors' values of the predictors (the leaves are the predicted answer). A typical decision tree structure is shown in Figure  1. Starting with the route, decisions are made from two possible directions according to the value of one predictor xj. A similar process is performed in each passed node along with the tree structure until a decision-by-decision sheet corresponding to the answer class is reached. Decision trees are trained using a dataset that provides relationships between predictors and responses, simulating the relationship between them. In this paper, a single decision tree method is adopted by using the Gini Diversity Index as a measure of taking into account the separation criteria [50]. All predictors are tested in each node to select one that maximizes the benefit of the separation criterion.
The k-NN method is also used to solve the classification problem. It assigns objects to a class that belongs to most of k of its nearest neighbors in a multidimensional feature space. The number k is the number of neighboring objects in the feature space that are compared with the classified object. In other words, if k = 4, then each object is compared with four neighbors. The method is widely used in Data Mining technologies.
This algorithm can be divided into two simple phases: learning and classification. During training, the algorithm remembers the observation feature vectors and their class labels (i.e., examples). Also, the algorithm parameter k is set, which sets the number of "neighbors" that will be used in the classification. During the classification phase, a new object is presented for which no class label has been set. The k-nearest preliminary classified observations are determined for it. Then a class is selected to which most of the knearest neighbor examples belong, and the object being classified belongs to the same class [51] ( Figure 2). Decision trees are trained using a dataset that provides relationships between predictors and responses, simulating the relationship between them. In this paper, a single decision tree method is adopted by using the Gini Diversity Index as a measure of taking into account the separation criteria [50]. All predictors are tested in each node to select one that maximizes the benefit of the separation criterion.
The k-NN method is also used to solve the classification problem. It assigns objects to a class that belongs to most of k of its nearest neighbors in a multidimensional feature space. The number k is the number of neighboring objects in the feature space that are compared with the classified object. In other words, if k = 4, then each object is compared with four neighbors. The method is widely used in Data Mining technologies.
This algorithm can be divided into two simple phases: learning and classification. During training, the algorithm remembers the observation feature vectors and their class labels (i.e., examples). Also, the algorithm parameter k is set, which sets the number of "neighbors" that will be used in the classification. During the classification phase, a new object is presented for which no class label has been set. The k-nearest preliminary classified observations are determined for it. Then a class is selected to which most of the k-nearest neighbor examples belong, and the object being classified belongs to the same class [51] (Figure 2).
The circle on Figure 2 represents the object that needs to be classified into one of the two classes "triangles" and "squares". If we choose k = 3, then out of the three nearest objects, two will turn out to be "triangles" and one "square".
The Python programming language is used to implement the aforementioned methods. In addition, to solve the problem, the mathematical method of finding a function of two variables by approximation in the Matlab program is employed. The circle on Figure 2 represents the object that needs to be classified into one of the two classes "triangles" and "squares". If we choose k = 3, then out of the three nearest objects, two will turn out to be "triangles" and one "square".
The Python programming language is used to implement the aforementioned methods. In addition, to solve the problem, the mathematical method of finding a function of two variables by approximation in the Matlab program is employed.

Results
In the present study, marine residual fuels of the following grades are used: • Sample No.1-Compound oils grade A type 1(KMC). The main physical and chemical indicators of the fuels have been determined, which are presented in Table 1.

Results
In the present study, marine residual fuels of the following grades are used: The main physical and chemical indicators of the fuels have been determined, which are presented in Table 1. Using SARA analysis, the group composition of the presented fuels was determined (Table 2), after which the fuel mixtures were prepared according to the experiment plan [38,44]. Seven series of laboratory tests have been carried out based on the method proposed by the authors [52] to determine the compatibility and stability of marine residual fuels. The influence of n-paraffins from 55 to 70 wt. % is determined when the asphaltene content ranges from 0.5 to 3.5 wt. % with a step of 0.5% and the dependencies shown in Figure 3. Using SARA analysis, the group composition of the presented fuels was determined (Table 2), after which the fuel mixtures were prepared according to the experiment plan [38,44]. Seven series of laboratory tests have been carried out based on the method proposed by the authors [52] to determine the compatibility and stability of marine residual fuels. The influence of n-paraffins from 55 to 70 wt. % is determined when the asphaltene content ranges from 0.5 to 3.5 wt. % with a step of 0.5% and the dependencies shown in Figure  3. A total of 112 different fuel mixtures with the required composition are prepared and laboratory tests are performed to determine the TSP content. Figure 4 shows filters for one series of tests with an asphaltene content of 0.5 wt. % and n-paraffins from 56 to 70 wt. %, A total of 112 different fuel mixtures with the required composition are prepared and laboratory tests are performed to determine the TSP content. Figure 4 shows filters for one series of tests with an asphaltene content of 0.5 wt. % and n-paraffins from 56 to 70 wt. %, as an example [38,52,53]. Even visually, it is possible to determine the incompatibility of fuel compositions with n-paraffin content of 63 to 70 wt. %, where there is a large amount of total sediment in the upper filters. Three-dimensional visualization of the obtained experimental results is shown in Figure 5. as an example [38,52,53]. Even visually, it is possible to determine the incompatibility of fuel compositions with n-paraffin content of 63 to 70 wt. %, where there is a large amount of total sediment in the upper filters. Three-dimensional visualization of the obtained experimental results is shown in Figure 5.  A three-dimensional visualization of the present results is presented in Figure 4. A three-dimensional visualization of the present results is presented in Figure 4. Table 3 shows the numerical interpretation of the three-dimensional model due to the influence of asphaltenes and n-paraffins on sediment formation.  Table 3 shows the numerical interpretation of the three-dimensional model due to the influence of asphaltenes and n-paraffins on sediment formation. Based on the obtained experimental data, it is possible to determine the dependences of the influence of fuel mixture composition on sedimentation. However, the presented data only describe dependencies on the specific content of the fuel composition, which is difficult to apply in practice. Therefore, it is necessary to develop a practical robust tool to  Based on the obtained experimental data, it is possible to determine the dependences of the influence of fuel mixture composition on sedimentation. However, the presented data only describe dependencies on the specific content of the fuel composition, which is difficult to apply in practice. Therefore, it is necessary to develop a practical robust tool to determine the sedimentation activity at different values of n-paraffin and asphaltene in the ranges considered.
The following methods are considered to process the experimental data for the development of the numerical tool: finding a function of two variables by approximation.
Using a linear regression model, weights for the parameters x and y are found. Based on this, a linear regression formula is derived that allows combining the results of all experiments. The linear regression formula for the parameters analyzed parameters: (1) The determinant coefficient R 2 of this formula is 0.772. Using the machine learning model "decision tree", it is also possible to achieve a higher coefficient of determination R 2 = 0.910. In the visualization of the decision tree model, it can be seen how branches and leaves are divided by parameters and values. This model can be used to predict other results depending on parameters x and y.
By finding the function of the two variables by approximation, a calculation program has been developed based on the obtained data, which will make it possible to obtain the result in a convenient format under certain boundary conditions. The pseudo-programming Procedure 1 is shown below.

Procedure 1. Approximation of two variables
Input: n, m, data for i = 1 to ndo for j = 1 to m do Polyfit()the coefficients of the fifth-order polynomial Linspace() Polyval() calculating the ordinates of the approximating polynomial end for For each k = Ipolynomialcoefficients by xi Endfor In order to define the unknown function of two variables Z = f(X,Y), it is necessary to construct graphs for known values Y from the data presented as a table. Then construct the most accurate trend line (5th degree polynomial with approximation value above R 2 = 0.990). The polynomial trend line in the 4th degree gives less accuracy of the approximation, and in the 6th degree it is identical, as in the 5th degree. To obtain a more accurate model, the data range of the corresponding n-paraffin content from 57 to 70 wt. % is considered. The next step is to determine the coefficients of the polynomial and to create an equation Z = f(X) for each curve. The following is the function Z = f(X,Y) and the definition of polynomial coefficients for the function of two variables. Thus, introducing any values of n-paraffins (X) between 55 and 70 and asphaltenes (Y) between 0.5 and 3.5 makes it possible to define TSP values. The results of the calculations are presented in Figure 6 and Table 4.  Based on the k-NN method, a program in Python is developed for calculating the TSP indicator in the fuel mixture from the results of the received experimental data. The program Procedure 2 is shown as follows.  Based on the k-NN method, a program in Python is developed for calculating the TSP indicator in the fuel mixture from the results of the received experimental data. The program Procedure 2 is shown as follows. To increase the numerical precision, normalization is performed and four neighbors are taken, i.e., k = 4. With an increase in the coefficient k, the accuracy of calculations will decrease due to the insufficient data set. The essence of the method is the classification of objects that belong to a larger number of nearest neighbors in a multidimensional feature space compared to the object being classified. This method is used in Data Mining technologies. Mathematically, this method is based on measuring the minimum distance from the center of the group to each observation (2), for which the length of the vector is determined: By calculating the length of each vector, when entering the values x and y, the value z is determined by interpolation, based on known data from «neighbors». This model has a high confidence level in the approximation R 2 = 0.985.
For a visual comparison with the experimental data, Figure 7 shows the threedimensional model, and Table 5 presents the results of the calculations that are performed in the developed program.

Discussion
This article considers four methods for processing experimental data and obtaining a robust tool applicable in practice: linear regression, decision tree, k-NN, and finding the function of two variables as the approximation. Linear regression showed a low confidence result (R 2 = 0.772), and the decision tree method (R 2 = 0.910) is not reliable enough to address this issue. Presumably, a larger sample of data is required to construct a more accurate model based on this method.
The relatively accurate predictions are obtained from the methods to find the function of two variables by approximation (R 2 = 0.90) and k-NN (R 2 = 0.985). It can be noted that the performed calculations of the TSP values by the k-NN method have the greatest error at the boundary maximum values, since neighbors are represented only by groups from opposite sides relative to the data border in the area of maximum values.
On the contrary, in calculations based on a function of two variables, the largest error is observed at a low concentration of n-paraffins and simultaneously at the highest asphaltene values. The trend line is represented by a polynomial function of the 5th degree. Therefore, the beginning of the dependence has a wave form. The amplitude of polynomial function increases with increasing TSP values, so the graph of the function may cross the x-axis. In order to avoid the occurrence of negative values, the obtained data are taken modulo.
The present results show that it is possible to calculate the incompatibility manifestation in the mixing of marine residual fuels in practice. Because of the obtained dependences of the influence of n-paraffins and asphaltenes on the manifestation of incompatibility, it is possible to reduce the risks in advance when constructing the logistic supply chain of fuels to marine fuel terminals and tank farms. The present results can be used to create a numerical model for predicting the sediment content in compounded marine fuels with the required sulfur content.
The change in the symmetrical position of the three-dimensional surface shown in Figures 4-6 indicates a change in the dependence between asphaltene and n-paraffin content as well as the corresponding sedimentation. Thus, the asymmetry of the threedimensional diagrams is the incompatibility degree of residual marine fuels, showing the influence of the presented factors on precipitation [54][55][56][57][58][59].

Conclusions
In this study, experiments are performed to determine the physical and chemical characteristics of the fuels. Measurements of the influence of n-paraffins and asphaltenes on the total sediment content have been carried out. By analysis of the experimental results, the main correlations between the composition of the fuel mixture composition and sedimentation are obtained. Various machine learning methods are used in processing the data. In this way, a robust tool has been developed to determine the possibility of incompatibility occurrence in the mixing of various types of marine residual fuels.
The absence of requirements in the international standard ISO 8217 for the content of asphaltenes in marine residual fuels preserves the risks of incompatibility. In this regard, active sediment formation can cause not only environmental problems, but technical problems as well. During the operation of marine engines, a high sediment content can lead to a breakdown of the entire fuel system and clogging of filters, which is a substantial threat to normal operation and personnel safety, especially if a breakdown occurs during navigation far from berths and ports and/or in bad weather conditions. It is proposed to regulate this parameter. This will allow shipowners and buyers to assess in advance the risks of incompatibilities and thus maintain fuel quality.
In future studies, it is possible to carry out experiments to determine the sulfur content of the exhaust gases as well as in the sediment itself, depending on the calculated incompatibility parameters, to create a model to predict SO 2 concentration in the exhaust gases of ships. In addition, the deep neural network will be enhanced in order to improve the accuracy of its predictions and calculations

Conflicts of Interest:
The authors declare no conflict of interest.

GOST Russian government standard ISO
International organization for standardization IMO International Maritime Organization k-NN k-nearest neighbors SARA Saturate, aromatic, resin, and asphaltene TSA Total sediment accelerated TSP Total sediment potential