Predicting Sooting Propensity of Oxygenated Fuels Using Artificial Neural Networks

The self-learning capabilities of artificial neural networks (ANNs) from large datasets have led to their deployment in the prediction of various physical and chemical phenomena. In the present work, an ANN model was developed to predict the yield sooting index (YSI) of oxygenated fuels using the functional group approach. A total of 265 pure compounds comprising six chemical classes, namely paraffins (n and iso), olefins, naphthenes, aromatics, alcohols, and ethers, were dis-assembled into eight constituent functional groups, namely paraffinic CH3 groups, paraffinic CH2 groups, paraffinic CH groups, olefinic -CH=CH2 groups, naphthenic CH-CH2 groups, aromatic C-CH groups, alcoholic OH groups, and ether O groups. These functional groups, in addition to molecular weight and branching index, were used as inputs to develop the ANN model. A neural network with two hidden layers was used to train the model using the Levenberg–Marquardt (ML) training algorithm. The developed model was tested with 15% of the random unseen data points. A regression coefficient (R2) of 0.99 was obtained when the experimental values were compared with the predicted YSI values from the test set. An average error of 3.4% was obtained, which is less than the experimental uncertainty associated with most reported YSI measurements. The developed model can be used for YSI prediction of hydrocarbon fuels containing alcohol and ether-based oxygenates as additives with a high degree of accuracy.


Introduction
Carbonaceous particles known as soot, which are formed and emitted during the incomplete combustion of fossil fuels, have a negative impact on human health and the environment. Airborne soot particles (<2.5 µm) have been shown to be one of the largest cancer-causing air pollutants [1], and when inhaled can cause of a number of health issues, such as bronchitis, asthma, and premature death. Soot also results in environmental issues, such as haze formation, reduced visibility, acidification of water bodies, and global warming. Soot is second only to carbon dioxide in its potential to cause global warming [2][3][4] because it heats the atmosphere by absorbing thermal radiation from the sun. Studies [5] have shown that limiting the emissions of particulate matter (PM) from fossil fuel combustion can be one of the most effective ways of slowing the rate of global warming, and a net 20-45% reduction in global warming can be realized by eliminating PM emissions.
Gasoline spark-ignited (SI) engines tend to produce lower soot compared to compression-ignited (CI) diesel engines [6]. However, due to the application of PM filters in the exhausts of diesel engines, PM emissions have been significantly reduced, to now be less than those from gasoline engines [7][8][9]. Soot emitted from internal combustion (IC) engines has a short lifetime of around a week and the positive impact of net soot reduction can be realized quickly. Soot formation in engines is often an impediment to meeting the conflicting objectives of increased efficiency and lower emissions because advanced combustion strategies such as gasoline direct injection (GDI), which aim towards maximizing fuel efficiency, may result in increased emissions. An effective and tenable approach to reduce PM emissions from engines is by optimizing engine design to attain a balance between fuel efficiency and emissions. Another method is to blend the fuels with selective oxygenate additives, such as alcohols and ethers, that can reduce soot formation during combustion. A number of studies have shown that oxygenates, such as methanol [10][11][12], ethanol [10,13,14], n-butanol [15,16], n-octanol [17], dimethyl ether [18], polyoxymethylene dimethyl ether (PODE) [19][20][21][22], and diethyl ether [23][24][25], can reduce soot formation in IC engines. The propensity of oxygenated compounds to reduce soot is not only dependent on the oxygen content, but also strongly depends on the molecular structure, as shown by a number of works [26][27][28][29][30][31].
A theoretical model to predict the propensity of a fuel to form soot based on fuel composition and combustion conditions remains elusive due to the lack of soot mechanisms coupled with the complexity arising from turbulence-chemistry interactions. This has prompted the development and use of empirical models for quantifying sooting tendencies. The threshold sooting index (TSI) [32] is an artificially defined number that represents the tendency of a pure compound or a fuel mixture to form soot. TSI is measured using a standardized smoke point (SP) lamp, where the SP corresponds to the maximum height (in mm) of a smoke-free laminar non-premixed flame. The TSI is calculated from the measured SP of the fuel using Equation (1).
where MW refers to the molecular weight of the fuel and the dimensionless notations, a and b are the constants of the experimental set-up provides the TSI, which is apparatusindependent. The TSI of a fuel is inversely proportional to the SP, and therefore a heavily sooting fuel has a lower SP compared to a fuel with a lower sooting propensity. One of the major drawbacks of using the TSI to quantify sooting is the uncertainty associated with the SP measurements, which are often prone to observational errors. Aromatic fuels, which have large TSIs, produce low smoke heights, which are difficult to measure and, to the contrary, paraffinic fuels, which have lower TSIs, generate larger smoke heights, which may be outside the measurement range of the standard lamps. The yield sooting index (YSI) is another method for measuring the sooting tendency, introduced by McEnally and Pfefferle [33], in which the fuel whose YSI is to be measured is doped with methane (base fuel) at ppm level concentrations. A non-premixed diffusion flame is then generated and the maximum soot volume fraction (f v,max ) is measured at the centerline of the flame from which the YSI is calculated, as defined in Equation (2).
where c and d are experimental constants, established with a scale that assigns benzene with a YSI of 30 and 1,2-dihydronaphthalene with a YSI of 100. The advantage of using YSI is that soot volume fractions correlate closely with fuel sooting propensity and can be more accurately measured using laser-induced incandescence compared to SP, as in the case of TSI. Because only a small amount of the test fuel is used, there is no significant impact on the flame temperature, and the measured soot volume fraction effectively captures the effect of fuel molecular structure on sooting propensity. Experimental YSI measurements have been reported by McEnally and Pfefferle's group for aromatic hydrocarbons [33], non-volatile aromatic hydrocarbons [34], large hydrocarbons [35], oxygenated hydrocarbons [31], unsaturated esters [36], gasolines and their surrogates [37], and diesel and jet fuel and their surrogates [38]. YSI measurements have been made using two scales with different sets of reference compounds. The "high scale" [31,34] uses benzene and naphthalene as index reference compounds with assigned YSI values of 30 and 100, respectively. The "low scale" [31,36] is indexed using n-hexane and benzene with YSI values of 0 and 100, respectively. Recently a "unified scale" [27] was introduced to resolve the incompatibility of the different scales, and a new database was created by measuring the YSI of a number of hydrocarbon and oxygenated compounds by combining the two scales. n-Hexane and benzene with YSI values of 30 and 100, respectively, were used as the reference compounds in the unified scale, which also benefitted from the application of color ratio pyrometry for soot volume measurements, and has a better range compared to earlier diagnostics. The objective of the present work was to develop an artificial neural network (ANN)based model using the unified YSI database [27] that can predict the YSI of pure compounds and real fuels, containing the following chemical classes: paraffins (n and iso), olefins, naphthenes, aromatics, alcohols, and ethers. A large number of alternative fuels or additives blended with transportation fuels, such as gasoline or diesel, usually comprise alcohols and ethers [39], and therefore other oxygenated groups, such as aldehydes, ketones and esters, were not included in the model. An ANN was used as the tool for developing the model due to its potential to mathematically capture complex, non-linear behavior often noticed in chemical phenomenon. ANNs are a set of nodes that are interconnected with each other and resemble the neurons in the brain. ANNs work towards a unitary goal and attempt to fit a mathematical function based on fed inputs to predict one or more desired outputs. ANNs have the ability to recognize and learn the relationship between a set of inputs(s) and output(s) from a dataset, provided the ANN be trained correctly. The ANNs can be trained and adapted in a number of different ways by optimizing a number of hyper-parameters, such as the number of hidden layers, number of nodes in each layer, number of training cycles of the full dataset (known as epochs), and batch size. Detailed information on the theory and background of ANNs is reported in a number of works [40][41][42][43][44][45][46]; thus, the theory behind them is not discussed here. The input features of the model consist of the fuel functional groups, molecular weight (MW), and a structural parameter called the branching index, which quantifies the branching in a molecule. The combination of these input parameters adequately represents the molecular structure of a pure compound, mixture, or real fuel, and these have been successfully used to predict derived cetane number [47,48], octane number [44], and formulate surrogates for a number of fuels using the minimalist functional group (MFG) approach [49][50][51].

Dataset Generation
The dataset used for training the ANN model contains 265 pure compounds, which in turn consists of 30 paraffins, 32 olefins, 27 naphthenes, 101 aromatics, 40 alcohols, and 35 ether compounds. The pure compounds were disassembled into the following underlying structural moieties or functional groups: paraffinic CH 3 groups, paraffinic CH 2 groups, paraffinic CH groups, olefinic -CH=CH 2 groups, naphthenic CH-CH 2 groups, aromatic C-CH groups, alcoholic OH groups, and ether O groups. The functional group distribution in weight percent (wt%) of n-heptane, 1-heptene, ethylcyclopentane, toluene, 2-propanol, and methylbutyl ether is presented in Figure 1. For example, methyl butyl ether (MW = 88 g/mol) has two paraffinic CH 3 groups on either side of the molecule, three paraffinic CH 2 groups, and an ether O group. The molecular weights of the paraffinic CH 3 group, paraffinic CH 2 group, and ether O group are 15, 14, and 16 g/mol, respectively. Therefore, the functional groups in methyl butyl ether are calculated as 34.1 wt% of paraffinic CH 3 groups, 47.7 wt% of paraffinic CH 2 groups, and 18.2 wt% of ether O groups. The other two input features, namely MW and BI, are calculated from the molecular structure of the compound. BI represents the degree of branching in a compound, in which both the size and position of the side chains are taken into account while computing the value. This parameter helps to distinguish isomers (for example, 2-methyl heptane and 3methyl heptane) that have the same functional group distribution but have slightly different combustion properties, such as DCN, octane number, TSI, and YSI. The methodology used to calculate the BI of compounds has been previously explained in detail [47]. The entire dataset used in the present work, in addition to the input features and output of interest (YSI), are provided in the Supplementary Material. used to calculate the BI of compounds has been previously explained in detail [47]. The entire dataset used in the present work, in addition to the input features and output of interest (YSI), are provided in the Supplementary Material.

ANN Development
The model was developed using the neural network (NN) toolbox available in MATLAB 2020a. The ANN model was developed using a feed forward propagation technique by employing a multilayer perceptron (MLP), which is suitable for the present supervised input-output learning problem. The topology of a MLP consists of nodes in an initial input layer, a middle hidden layer, and a final output layer. Ten inputs (functional groups, MW and BI) were specified as the inputs and YSI was specified as the output. The ANN was developed with two hidden layers. The topology of the ANN model is presented in Figure 2. The dataset is split into three sets: a training set that consists of 70% of the data points, a validation set containing 15% of the points, and a test set possessing the remaining 15% of the data. The selection of points in these sets was randomly performed by the tool with no manual interference. An inbuilt training technique, namely the Levenberg-Marquardt (ML) algorithm, was used to train the dataset due to its better performance. The mean squared error (MSE) was chosen as the performance function and the architecture (number of neurons in the hidden layer) of the ANN was varied until the required MSE was obtained by trial and error. The validation set was used for assessing the performance of the models during the course of model optimization and the test set was used only once at the end to check the performance of the final model. The parameters used for the development of the ANN model are presented in Table 1. The neurons in the two hidden layers were varied by 5, from 5 to 30, until satisfactory results were obtained.

ANN Development
The model was developed using the neural network (NN) toolbox available in MAT-LAB 2020a. The ANN model was developed using a feed forward propagation technique by employing a multilayer perceptron (MLP), which is suitable for the present supervised input-output learning problem. The topology of a MLP consists of nodes in an initial input layer, a middle hidden layer, and a final output layer. Ten inputs (functional groups, MW and BI) were specified as the inputs and YSI was specified as the output. The ANN was developed with two hidden layers. The topology of the ANN model is presented in Figure 2. The dataset is split into three sets: a training set that consists of 70% of the data points, a validation set containing 15% of the points, and a test set possessing the remaining 15% of the data. The selection of points in these sets was randomly performed by the tool with no manual interference. An inbuilt training technique, namely the Levenberg-Marquardt (ML) algorithm, was used to train the dataset due to its better performance. The mean squared error (MSE) was chosen as the performance function and the architecture (number of neurons in the hidden layer) of the ANN was varied until the required MSE was obtained by trial and error. The validation set was used for assessing the performance of the models during the course of model optimization and the test set was used only once at the end to check the performance of the final model. The parameters used for the development of the ANN model are presented in Table 1. The neurons in the two hidden layers were varied by 5, from 5 to 30, until satisfactory results were obtained.

Results and Discussion
The YSI of an individual molecule is related to its constituent functional groups/molecular structure as the measurement conditions, and the apparatus used were the same for all measurements. The individual impact of each of the input parameters/functional groups on YSI is discussed in the following sections.

Paraffinic CH3 Groups
Amongst the hydrocarbon chemical classes, paraffins possess the lowest YSI compared to naphthenes, olefins, and aromatics of similar carbon numbers. This can be demonstrated by using the example of n-heptane, which has a YSI of 36; this is the lowest value compared to the seven carbon counterparts of other chemical classes. 1-heptene, cycloheptane, and toluene have higher YSIs, of 48.4, 50, and 170.9, respectively.
The effect of paraffinic CH3 groups on the YSI of compounds in the dataset is shown in Figure 3. For straight chain paraffins, it can be clearly observed (in red) that when the mass contribution of the paraffinic CH3 group increases, the YSI is reduced. n-pentane, which has a paraffinic CH3 content of 41.7 wt%, has a lower YSI of 24.6 compared to nhexane (paraffinic CH3 groups, 34.9 wt%) and n-heptane (paraffinic CH3 groups, 30 wt%), which have higher YSIs of 30.4 and 36, respectively. Branched paraffins have higher YSIs compared to n-paraffins of the same carbon number. For example, n-octane has a lower YSI of 42.6 compared to 2-methylheptane (YSI, 49.4) and 2,2-dimethylhexane (YSI, 52.8). This trend is true for paraffins of all carbon numbers. No definite trends are observed in other compound classes because YSI is a net effect of the functional groups constituting the molecule.

Results and Discussion
The YSI of an individual molecule is related to its constituent functional groups/ molecular structure as the measurement conditions, and the apparatus used were the same for all measurements. The individual impact of each of the input parameters/functional groups on YSI is discussed in the following sections.

Paraffinic CH 3 Groups
Amongst the hydrocarbon chemical classes, paraffins possess the lowest YSI compared to naphthenes, olefins, and aromatics of similar carbon numbers. This can be demonstrated by using the example of n-heptane, which has a YSI of 36; this is the lowest value compared to the seven carbon counterparts of other chemical classes. 1-heptene, cycloheptane, and toluene have higher YSIs, of 48.4, 50, and 170.9, respectively.
The effect of paraffinic CH 3 groups on the YSI of compounds in the dataset is shown in Figure 3. For straight chain paraffins, it can be clearly observed (in red) that when the mass contribution of the paraffinic CH 3 group increases, the YSI is reduced. n-pentane, which has a paraffinic CH 3 content of 41.7 wt%, has a lower YSI of 24.6 compared to nhexane (paraffinic CH 3 groups, 34.9 wt%) and n-heptane (paraffinic CH 3

Paraffinic CH2 Groups
The paraffinic CH2 groups can be thought to represent the linearity of a molecule, as indicated by previous studies [47,52], and high contents of this group indicates the degree of linearity. Lengthening the side chains in aromatic molecules has been shown to increase their sooting propensities [34]. This is primarily due to the decomposition of the side chains to form hydrocarbon radicals that react with available benzylic species to form naphthalene, which eventually leads to the formation of polyaromatic heterocycle (PAH) species.
The effect of paraffinic CH2 groups on YSI is presented in Figure 4, and it can be observed that these groups enhance the sooting propensity of n-paraffins. Straight chain paraffins, namely n-pentane, n-heptane, n-nonane, and n-undecane, that have increasing methylene content, exhibit increasing YSI values of 24.6, 36, 50.1, and 64.7, respectively. This trend is also observed for some alkyl aromatics, in which a higher methylene content

Paraffinic CH 2 Groups
The paraffinic CH 2 groups can be thought to represent the linearity of a molecule, as indicated by previous studies [47,52], and high contents of this group indicates the degree of linearity. Lengthening the side chains in aromatic molecules has been shown to increase their sooting propensities [34]. This is primarily due to the decomposition of the side chains to form hydrocarbon radicals that react with available benzylic species to form naphthalene, which eventually leads to the formation of polyaromatic heterocycle (PAH) species.
The effect of paraffinic CH 2 groups on YSI is presented in Figure 4, and it can be observed that these groups enhance the sooting propensity of n-paraffins. Straight chain paraffins, namely n-pentane, n-heptane, n-nonane, and n-undecane, that have increasing methylene content, exhibit increasing YSI values of 24.6, 36, 50.1, and 64.7, respectively. This trend is also observed for some alkyl aromatics, in which a higher methylene content in the side chains results in higher YSI values compared to aromatics with a lower degree of Processes 2021, 9, 1070 7 of 18 linearity in the alkyl side chains. As an example, it can be observed that n-propylbenzene has a higher YSI of 235.7 compared to iso-propylbenzene, whose YSI is 187.6. Similarly, ethylbenzene has a larger YSI value of 223.7 when compared with both 1,3-dimethylbenze (YSI, 221.6) and 1,4-dimethylbenzene (YSI, 211.1).
Processes 2021, 9, 1070 7 of 19 in the side chains results in higher YSI values compared to aromatics with a lower degree of linearity in the alkyl side chains. As an example, it can be observed that n-propylbenzene has a higher YSI of 235.7 compared to iso-propylbenzene, whose YSI is 187.6. Similarly, ethylbenzene has a larger YSI value of 223.7 when compared with both 1,3-dimethylbenze (YSI, 221.6) and 1,4-dimethylbenzene (YSI, 211.1).

Figure 4.
Effect of paraffinic CH2 groups on YSI.

Paraffinic CH Groups
Sooting propensity is largely influenced by the degree of branching, and the presence of paraffinic CH groups indicates branching in saturated molecules. Generally, branching, and hence paraffinic CH groups, tends to increase the sooting propensity, as indicated by a number of works [53]. Figure 5 shows the effect of these groups on YSI of the compounds in the dataset. 2,3-dimethylbutane, a six-carbon atom with 30.2 wt% of paraffinic CH groups, has a higher YSI of 44 compared with both 2-methylpentane (YSI, 36.7) and 3-

Paraffinic CH Groups
Sooting propensity is largely influenced by the degree of branching, and the presence of paraffinic CH groups indicates branching in saturated molecules. Generally, branching, and hence paraffinic CH groups, tends to increase the sooting propensity, as indicated by a number of works [53]. Figure 5 shows the effect of these groups on YSI of the compounds in the dataset. 2,3-dimethylbutane, a six-carbon atom with 30.2 wt% of paraffinic CH groups, has a higher YSI of 44 compared with both 2-methylpentane (YSI, 36.7) and 3-methylpentane (YSI, 38.2) that have lower paraffinic CH contents. No such trend is observed in non-paraffinic molecules, as can be seen in Figure 5. methylpentane (YSI, 38.2) that have lower paraffinic CH contents. No such trend is observed in non-paraffinic molecules, as can be seen in Figure 5.

Olefinic -CH=CH2 Groups
Olefins in general are not highly abundant in transportation fuels, such as gasoline and diesel. The unsaturation added by the double bond tends to increase the ability of the compound to form soot, and hence olefins generally have higher YSI values compared to their paraffinic counterparts. 1-heptene has a higher YSI of 48.4 compared to n-heptane, which has a YSI of 36.

Olefinic -CH=CH 2 Groups
Olefins in general are not highly abundant in transportation fuels, such as gasoline and diesel. The unsaturation added by the double bond tends to increase the ability of the compound to form soot, and hence olefins generally have higher YSI values compared to their paraffinic counterparts. 1-heptene has a higher YSI of 48.4 compared to n-heptane, which has a YSI of 36. Branched olefins have higher sooting propensities compared to straight chain olefins. 1-hexene has the lowest YSI of a six-carbon olefin at 42.4 compared to 2-methyl-1-pentene (YSI, 42.9), 2-ethyl-1-butene (YSI, 45.6), 3-methyl-1-pentene (YSI, 45.1), 2,3-dimethyl-1-butene (YSI, 53.4), etc. Figure 6 illustrates the impact of olefinic -CH=CH 2 groups on YSI. In olefins, we can observe that increasing mass contribution of the olefinic groups results in a reduction in the YSI. This opposing trend is observed because most of the olefinic compounds in the dataset have a single double bond whose contribution to the total reduces with changing molecular size and, in the present case, the olefinic content reduces numerically with increase in molecular size. This shows that other parameters, such as molecular size, have a more pronounced effect on YSI than the olefinic groups themselves.
Processes 2021, 9, 1070 9 of 19 Figure 6 illustrates the impact of olefinic -CH=CH2 groups on YSI. In olefins, we can observe that increasing mass contribution of the olefinic groups results in a reduction in the YSI. This opposing trend is observed because most of the olefinic compounds in the dataset have a single double bond whose contribution to the total reduces with changing molecular size and, in the present case, the olefinic content reduces numerically with increase in molecular size. This shows that other parameters, such as molecular size, have a more pronounced effect on YSI than the olefinic groups themselves. Figure 6. Effect of olefinic -CH=CH2 groups on YSI.

Effect of Naphthenic CH-CH2 Groups
In general, naphthenes have higher sooting propensities compared to paraffins and olefins of the same carbon number. The addition of naphthenic molecules in fuels results in an increase in the number and size of the soot particles formed due to the formation of benzene rings and small PAH structures. Naphthenes with a double bond (such as cyclopentene and cyclohexene) promote the formation of soot more than saturated cyclic molecules, as evident from their higher YSI values. Cyclopentene and cyclohexene possess higher YSI values of 80.5 and 62.2, respectively, compared to cyclopentane (YSI, 39.4) and cyclohexane (YSI, 42.7). Figure 7 presents the effect of naphthenic CH-CH2 groups on YSI and no apparent trend can be observed from the entries in the dataset. Naphthenes have significantly lower YSI values compared to aromatics of the same carbon number and rings.

Effect of Naphthenic CH-CH 2 Groups
In general, naphthenes have higher sooting propensities compared to paraffins and olefins of the same carbon number. The addition of naphthenic molecules in fuels results in an increase in the number and size of the soot particles formed due to the formation of benzene rings and small PAH structures. Naphthenes with a double bond (such as cyclopentene and cyclohexene) promote the formation of soot more than saturated cyclic molecules, as evident from their higher YSI values. Cyclopentene and cyclohexene possess higher YSI values of 80.5 and 62.2, respectively, compared to cyclopentane (YSI, 39.4) and cyclohexane (YSI, 42.7). Figure 7 presents the effect of naphthenic CH-CH 2 groups on YSI and no apparent trend can be observed from the entries in the dataset. Naphthenes have significantly lower YSI values compared to aromatics of the same carbon number and rings.

Effect of Aromatic C-CH Groups
Aromatic hydrocarbons present in transportation fuels are primarily responsible for the majority of soot formed during combustion. It is now well established that aromatic molecules produce more soot compared to paraffins, because aromatics rings may stay intact and enable the growth of PAH precursors. Aromatic C-CH groups also have a significant impact on the ignition delay time, cetane number, and octane number, in addition to YSI. Aromatic fuels such as toluene have high ignition delay times (>100 ms) compared to paraffins, which have shorter ignition delays. As a result, aromatic species have larger octane numbers compared to paraffins and olefins.

Effect of Aromatic C-CH Groups
Aromatic hydrocarbons present in transportation fuels are primarily responsible for the majority of soot formed during combustion. It is now well established that aromatic molecules produce more soot compared to paraffins, because aromatics rings may stay intact and enable the growth of PAH precursors. Aromatic C-CH groups also have a significant impact on the ignition delay time, cetane number, and octane number, in addition to YSI. Aromatic fuels such as toluene have high ignition delay times (>100 milliseconds) compared to paraffins, which have shorter ignition delays. As a result, aromatic species have larger octane numbers compared to paraffins and olefins.
Diesel is known to possess about 10-25% aromatic content by weight, whereas gasolines contain a higher proportion, of around 20-50% [54]. A number of studies [55] reported in the literature have investigated the impact of aromatics on soot emission and characterization. Aromatic flames result in the formation of soot particles and precursors that are structurally and morphologically more complex compared to soot produced from paraffinic fuels [56]. Aromatic molecules in the dataset have very large YSI values compared to other hydrocarbon and oxygenated compounds. 1,2-diphenylbenzene has the largest YSI, of 1338.9, in the present dataset, followed by pyrene at 1250.1. Aromatic PAH precursors are usually formed at the lower end of the flame height, and the size of the soot particles grows by the addition of methyl and methylene groups that are added towards the tip of the flame [56].
The effect of aromatic C-CH groups on YSI is shown in Figure 8. A general trend of increasing YSI with increasing aromatic content can be observed and this is more pronounced for aromatic compounds that have YSI values in the range of 800-1300. The aromatic compounds present in the dataset are highly diverse, ranging from a single ring benzene to a fused four ring pyrene. Similarly, aromatics with short alkyl chains (toluene) to long alkyl chains (n-tetradecylbenzene) are also present. These diverse molecules in the dataset increase the prediction range and the accuracy of the resulting model. Diesel is known to possess about 10-25% aromatic content by weight, whereas gasolines contain a higher proportion, of around 20-50% [54]. A number of studies [55] reported in the literature have investigated the impact of aromatics on soot emission and characterization. Aromatic flames result in the formation of soot particles and precursors that are structurally and morphologically more complex compared to soot produced from paraffinic fuels [56]. Aromatic molecules in the dataset have very large YSI values compared to other hydrocarbon and oxygenated compounds. 1,2-diphenylbenzene has the largest YSI, of 1338.9, in the present dataset, followed by pyrene at 1250.1. Aromatic PAH precursors are usually formed at the lower end of the flame height, and the size of the soot particles grows by the addition of methyl and methylene groups that are added towards the tip of the flame [56].
The effect of aromatic C-CH groups on YSI is shown in Figure 8. A general trend of increasing YSI with increasing aromatic content can be observed and this is more pronounced for aromatic compounds that have YSI values in the range of 800-1300. The aromatic compounds present in the dataset are highly diverse, ranging from a single ring benzene to a fused four ring pyrene. Similarly, aromatics with short alkyl chains (toluene) to long alkyl chains (n-tetradecylbenzene) are also present. These diverse molecules in the dataset increase the prediction range and the accuracy of the resulting model.

Effect of Alcohol OH Groups
Linear alcohols such as ethanol and butanol have been widely used as fuel additives in transportation fuels. Branched alcohols, however, have a higher sooting propensity compared to n-paraffins of the same carbon number. For example, 2-methyl-2-butanol (YSI, 32.9) and 2-methyl-2-pentanol (YSI, 36.4) have higher YSIs than their n-paraffinic counterparts, namely n-hexane (YSI, 30.4) and n-heptane (YSI, 36), respectively. This can be explained by the reaction of the elimination of water from 2-butanol, in which there is a reversal of the partial oxidation of the carbon atom connected with the OH group, and is followed by the formation of butene [31].

Effect of Alcohol OH Groups
Linear alcohols such as ethanol and butanol have been widely used as fuel additives in transportation fuels. Branched alcohols, however, have a higher sooting propensity compared to n-paraffins of the same carbon number. For example, 2-methyl-2-butanol (YSI, 32.9) and 2-methyl-2-pentanol (YSI, 36.4) have higher YSIs than their n-paraffinic counterparts, namely n-hexane (YSI, 30.4) and n-heptane (YSI, 36), respectively. This can be explained by the reaction of the elimination of water from 2-butanol, in which there is a reversal of the partial oxidation of the carbon atom connected with the OH group, and is followed by the formation of butene [31].
Reduction in soot after blending with branched alcohols has been reported by a number of studies. This phenomenon is primarily due to the fact that that the blended alcohol has a lower carbon number compared to the fuel it has replaced and so produces a net soot reduction. Figure 9 presents the effect of the alcoholic OH group on YSI, and it can be seen that an increase in the OH groups results in a steady reduction in the YSI, as anticipated.

Effect of Alcohol OH Groups
Linear alcohols such as ethanol and butanol have been widely used as fuel additives in transportation fuels. Branched alcohols, however, have a higher sooting propensity compared to n-paraffins of the same carbon number. For example, 2-methyl-2-butanol (YSI, 32.9) and 2-methyl-2-pentanol (YSI, 36.4) have higher YSIs than their n-paraffinic counterparts, namely n-hexane (YSI, 30.4) and n-heptane (YSI, 36), respectively. This can be explained by the reaction of the elimination of water from 2-butanol, in which there is a reversal of the partial oxidation of the carbon atom connected with the OH group, and is followed by the formation of butene [31].
Reduction in soot after blending with branched alcohols has been reported by a number of studies. This phenomenon is primarily due to the fact that that the blended alcohol has a lower carbon number compared to the fuel it has replaced and so produces a net soot reduction. Figure 9 presents the effect of the alcoholic OH group on YSI, and it can be seen that an increase in the OH groups results in a steady reduction in the YSI, as anticipated. Figure 9. Effect of alcoholic OH groups on YSI.

Effect of Ether O Groups
For a given carbon number, ether molecules possess the lowest YSI compared to alcohol or other hydrocarbon compounds in the dataset. As a result, ether compounds, such as dimethyl ether, diethyl ether, and methyl tert-butyl ether (MTBE), have found application as fuel additives that can lower soot emissions and increase combustion efficiency. One of the causes of the low sooting propensity of ether compounds is the ether O atom, which interrupts the carbon chain, thus resulting in smaller reaction products that are

Effect of Ether O Groups
For a given carbon number, ether molecules possess the lowest YSI compared to alcohol or other hydrocarbon compounds in the dataset. As a result, ether compounds, such as dimethyl ether, diethyl ether, and methyl tert-butyl ether (MTBE), have found application as fuel additives that can lower soot emissions and increase combustion efficiency. One of the causes of the low sooting propensity of ether compounds is the ether O atom, which interrupts the carbon chain, thus resulting in smaller reaction products that are unlikely to form soot precursors. Dimethoxymethane, a three-carbon ether compound, has the lowest YSI of an ether compound in the dataset, at 10.9, whereas isoamyl ether, which has ten carbon atoms, possessed the largest YSI value of 63.6. The effect of ether O groups on the YSI is shown in Figure 10, and it can be observed that, as the ether content increases, the YSI of the compounds decreases. unlikely to form soot precursors. Dimethoxymethane, a three-carbon ether compound, has the lowest YSI of an ether compound in the dataset, at 10.9, whereas isoamyl ether, which has ten carbon atoms, possessed the largest YSI value of 63.6. The effect of ether O groups on the YSI is shown in Figure 10, and it can be observed that, as the ether content increases, the YSI of the compounds decreases.

Effect of Molecular Weight
It is well established in the literature that sooting propensity of a compound increases with molecular weight. Experiments performed on a diesel engine using blends of n-decane and n-hexadecane have shown that increasing the molecular weight of the fuel by

Effect of Molecular Weight
It is well established in the literature that sooting propensity of a compound increases with molecular weight. Experiments performed on a diesel engine using blends of ndecane and n-hexadecane have shown that increasing the molecular weight of the fuel by increasing the blend ratio of n-hexadecane led to an increase in soot formation [57]. The molecular weight of a fuel influences its physical properties such as volatility, and Makwana et al. [58] showed that fuel volatility correlates positively with soot volume fractions produced in n-heptane/n-hexadecane mixtures employing both premixed and non-premixed flames. The effect of molecular weight on the YSI of the compounds is shown in Figure 11, and it can be noted that the sooting propensity of compounds of all chemical classes increases with an increase in the molecular size.

Effect of Branching Index
The effect of molecular branching on combustion properties, such as ignition delay, octane number, and cetane number, have been studied in a number of works. Recently, Abdul Jameel et al. [47] defined the term "branching index", which quantifies the degree of branching of paraffins, olefins, naphthenes, and aromatics, by including the effect of

Effect of Branching Index
The effect of molecular branching on combustion properties, such as ignition delay, octane number, and cetane number, have been studied in a number of works. Recently, Abdul Jameel et al. [47] defined the term "branching index", which quantifies the degree of branching of paraffins, olefins, naphthenes, and aromatics, by including the effect of the position of the alkyl branch in the molecule. For example, 2-methylpentane and 3-methylpentane have the same degree of branching and the same functional groups. However, their physical, chemical, and combustion properties are different. 2-methylpentane has a YSI of 36.7, whereas 3-methylpentane has a YSI of 38.2. The difference in the YSI values of these compounds can be explained by their different branching indexes. 2-methylpentane has a branching index of 0.2, whereas 3-methylpentane has a branching index of 0.3 due to the methyl branch in position 3, which is one position away from the outermost position on the chain. The definition of the BI term includes both the size and the position of the alkyl branches, which helps in differentiating these structural isomers. Providing more information on the branching index is outside the scope of the present work and detailed information can be obtained elsewhere [47,59]. Branching tends to increase the sooting propensity of compounds of all chemical classes, as shown by a number of works [30,38,53]. Figure 12 shows the impact of branching index on YSI and, as expected, it can be discerned that the YSI increases with an increase in the branching index.

ANN Model
The ANN model for YSI prediction was developed as per the methodology described in Section 2.2. A number of parameters, such as batch size, epochs, number of hidden layers, and number of neurons in each hidden layer, were optimized to yield the final ANN model. Nearly 260 iterations were carried out and the best performance was observed in the 250th iteration. Figure 13 shows the comparison of the measured and predicted YSI for the entries in the training set, test set, and the entire dataset. It can be seen that the regression coefficient is close to unity for all three cases. It is especially encouraging to see the high level of correlation (R 2 = 0.99) in the test set, which was unseen by the model until it was tested at the end. The developed YSI model has an average error of 3.4%, which is less than the experimental error observed in most cases. This shows the ability of ANNbased models to predict complex chemical phenomena, such as sooting propensity. This also supports the functional group-based approach employed here, which reiterates the ability of the functional groups to predict fuel properties, as previously demonstrated for cetane number [47,48] and octane number [44]. Another advantage of the functional group approach is the ability to predict the YSI of real fuels, which most other models reported in the literature are not designed to do. The functional groups present in hydrocarbon fuels, such as gasoline and diesel, can be identified and quantified using techniques such as nuclear magnetic resonance (NMR) spectroscopy [47,60,61]. The oxygenated functional groups, such as alcohol OH and ether O groups, can be calculated from the blending ratio of oxygenates added to the petroleum fuels.
the position of the alkyl branches, which helps in differentiating these structural isomers. Providing more information on the branching index is outside the scope of the present work and detailed information can be obtained elsewhere [47,59]. Branching tends to increase the sooting propensity of compounds of all chemical classes, as shown by a number of works [30,38,53]. Figure 12 shows the impact of branching index on YSI and, as expected, it can be discerned that the YSI increases with an increase in the branching index.

ANN Model
The ANN model for YSI prediction was developed as per the methodology described in Section 2.2. A number of parameters, such as batch size, epochs, number of hidden layers, and number of neurons in each hidden layer, were optimized to yield the final ANN model. Nearly 260 iterations were carried out and the best performance was ob- ously demonstrated for cetane number [47,48] and octane number [44]. Another ad-vantage of the functional group approach is the ability to predict the YSI of real fuels, which most other models reported in the literature are not designed to do. The functional groups present in hydrocarbon fuels, such as gasoline and diesel, can be identified and quantified using techniques such as nuclear magnetic resonance (NMR) spectroscopy [47,60,61]. The oxygenated functional groups, such as alcohol OH and ether O groups, can be calculated from the blending ratio of oxygenates added to the petroleum fuels.

Conclusions
In the present work, an ANN model was developed to predict the YSI values of fuels encompassing the following six chemical classes: paraffins, olefins, naphthenes, aromatics, alcohols, and ethers. The compounds were segregated into their constituent functional groups or molecular moieties, and these were used as input criteria for training the ANN model, in addition to molecular weight and branching index. The ANN model was developed using MATLAB 2020a by training on 70% of the dataset, which was randomly generated. The ANN model was trained using the Levenberg-Marquardt (ML) algorithm using 15% of the data for validating the generated models, and the mean squared error (MSE) was specified as the loss function. The final model was tested with 15% of the data, which was initially extracted from the dataset and kept aside for model testing. Aromatic groups and branching index had the most significant correlation with YSI, whereas an increase in alcoholic OH and ether O groups led to a steady reduction in the sooting propensity. A regression coefficient (R 2 ) of 0.99 was obtained between the measured and predicted YSI values, which indicates the ability of the ANN model to accurately predict the YSI of random compounds, whose YSI values varied within a large range from 10 to 1300. The average error obtained for the test set was 3.4%, which is lower than the experimental error associated with measurements. The developed model can successfully predict the YSI of pure compounds possessing the discussed functional groups. It can also be applied to YSI prediction of real fuels if the fuel functional groups and branching index can be estimated using advanced spectroscopic techniques. The other input feature, namely molecular weight, can be obtained using chromatographic methods or distillation curves. The present model can also be used for screening purposes when a large volume of fuels needs to be tested. The model can also predict the sooting tendency of novel molecules that are not available in the dataset, based on their functional groups, as demonstrated by the results.