Using the Stochastic Gradient Descent Optimization Algorithm on Estimating of Reactivity Ratios

This paper describes an improved method of calculating reactivity ratios by applying the neuronal networks optimization algorithm, named gradient descent. The presented method is integral and has been compared to the following existing methods: Fineman–Ross, Tidwell–Mortimer, Kelen–Tüdös, extended Kelen–Tüdös and Error in Variable Methods. A comparison of the reactivity ratios that obtained different levels of conversions was made based on the Fisher criterion. The new calculation method for reactivity ratios shows better results than these other methods.


Introduction
Binary copolymerization is one of the essential techniques to obtain macromolecular compounds with controlled characteristics. To predict the copolymer composition and to control the properties, the calculation of reactivity ratios with higher accuracy becomes a critical goal. The most commonly used kinetic model in methods to calculate reactivity ratios is the terminal model, which is described by the following equations [1]: where P n -growing polymer chain, M * 1 , M * 2 -the active center on monomer 1 and on monomer 2, respectively, k 11 , k 12 , k 21 , k 22 -propagation rate constants.
The correlation of these four kinetic equations into an expression which can link the copolymer composition with the kinetic parameters can be carried out assuming that the monomer consumption in the initiation and the termination reaction can be neglected and rate of interchange of radicals is constant (1).
Considering the two lemmas, the mathematical solution proposed by Mayo and Lewis [2] for the instantaneous consumption of monomers is described by Equation (2), and the integral form of this equation is described in Equation (3):  (3) and the integral form of this equation, described by Equation (4), has been proposed by Mayo and Lewis [2]: Determining the reactivity ratios with relation (4) is difficult, but is easier to use if it has been transformed in the following relation: , (8) where: Over time, based on these mathematical models, several methods have been created to calculate reactivity ratios. The proposed methods for calculating reactivity ratios are classified as: a.
In differential methods using Equations (2) and (7) as models, where the conversions are usually <10% and the calculated composition of the copolymer is the instantaneous one; In integral methods using Equations (4), (5) and (8) as models, where the conversions are usually >10% and the calculated composition of the copolymer is the global composition.
Another classification type of the methods for calculating the reactivity ratios can be made according to the mathematical way by which the values of the variables r 1 and r 2 are obtained, as follows: linear methods and nonlinear methods.
One of the commonly used differential methods is the graphical method proposed by Fineman-Ross (FR for Equation (10) and r-FR for Equation (11)) [6] which is linearized in Equation (2) and is described by the following equations: where When using the Fineman-Ross method, the slope of Equation (11) and the intercept of Equation (12) give the value for r 1 and, respectively, the intercept of Equation (11) and the slope of Equation (12) give the value for r 2 . The distribution of the calculated points using Equations (11) and (12) is not uniform along the line. These calculated points are crowded in the area of origin of the coordinate system, and due to experimental errors, can easily lead to errors in estimating the values of the slope and the intercept and obviously the values of the reactivity ratios are affected.
One solution to eliminate this disadvantage of the Fineman-Ross method was proposed by Kelen-Tüdös (KT) [7,8]. This method is also based on linearization of Equation (2) and is described by the following mathematical equations: where: The KT method only gives one solution of reactivity ratios and the calculated points are homogenously distributed along the line.
Calculating the reactivity ratios using the linearization technique of the KT method was extended by the same authors [9,10] for experimental data obtained at high conversions. For this reason, the mathematical Equation (8) is rewritten as follows: where: where the 0 index refers to the initial concentration of monomer i, α has the same mathematical form as Equation (8), Pn weight percent conversion, µ-molecular weight of monomers. Under this mathematical form, the method is known as the extended Kelen-Tüdös method (eKT). For both methods of calculating the reactivity ratios proposed by Kelen and Tudos (KT, e-KT), the value of r 2 is obtained from the value of the intercept multiplied by the correction factor α and the value of r 1 is obtained from the value of the slope which decreases the value of the intercept.
The transformation of the Mayo-Lewis Equation (2) into a weighted (Equations (14) and (18)) or unweighted linear form (Equations (11) and (12)) admits the existence of a dependent variable y, the left hand of equations and an independent variable x, the term that multiplies the reactivity ratio to the right of the equations. The reactivity ratios calculated with linear methods represent the slope and intercept of the line, and to obtain these two parameters, linear regression using ordinary least squares methods (OLS) is carried out. To estimate the best values of reactivity ratios using OLS, the variables x and y must meet the Gauss-Markov assumptions, which are: a.
The dependence between y and x variables must be linear and to have random errors. Non-linearity gives a wrong estimation of reactivity ratios; b.
For independent variable x, the expected error term is zero, otherwise the intercept is biased; c.
The covariance of errors for all independent variables x is constant and represents the measurement of the model uncertainty. If the covariance of errors is not constant, the estimated reactivity ratios are less precise, which increases the likelihood of being further from the correct values; d.
The standard hypothesis in linear regression is that the independent variable x is not dependent on the dependent variable y or, put another way, each independent variable x is uncorrelated with the error terms. In the linear methods of reactivity ratio calculation, the variables x and y are correlated; for this reason, it is possible to obtain biased or inconsistent reactivity ratios; e.
All independent variables x must by collinear, otherwise the calculated reactivity ratios with OLS will have big errors.
Considering those assumptions presented above, the reactivity ratios obtained using linear methods must be viewed with caution in terms of their quality if supplementary information about measurement errors does not exist.
The Tidwell-Mortimer (TM) [9] method is a differential optimization method based on the modified Gauss-Newton nonlinear least-square algorithm. They developed a method of calculating reactivity ratios from Equation (4) and derived the following relationship: where: i is the number of the experimental run, j is number of the estimation set and r 0 1 , r 0 2 are the expectation values of r  Thus, if the difference between the molar fraction measured experimentally and the one calculated using the Wall equation using Equation (6): then estimates,β 1 ,β 2 of the smallest squares of β 1 and β 2 provide the necessary corrections so that the new values of r j 1 and r j 2 given by introduced in Equation (10) should lead to a decrease in value the ∑ (d i ) 2 . This method is known as the Gauss-Newton optimization method. Although integral methods for calculating reactivation ratios have existed for a long time, their use is limited and reduced to a few simpler methods based either on the linearization of integral equations or on the minimization of errors in one variable or both variables.
One of the EVM variant methods is the one proposed by Chee and Ng [20], which uses the integral form proposed by Mayo-Lewis (3) as a mathematical model and is based on the minimization of the residual weighted sum for r 2 , defined by the following relation: where: Cov(x, y) = 0 (34) Cov(y, P n ) = ∂P n ∂y Var(y) Cov(x, P n ) = ∂P n ∂x r e 2 -the value of r 2 estimated with Equation (3), Pn weight percent conversion, σ-standard deviation of M 10 , m 1 and P n , µ-molecular weight of monomers. The improved method proposed below is based on numerical integration step by step of a differential equation and uses the gradient descent optimization algorithm.

Material and Methods
A simplified algorithm used in the processes of optimizing neural networks [21] and machine learning [22] is that of the stochastic gradient descent (GD). This optimization algorithm was adapted for the process of the calculation of the reactivity ratio for a terminal model of copolymerization. In this procedure, it is assumed that the Wall -Skeist Equation (4) is valid and the experimental error is independent and has a common variance. The calculation principle of this algorithm consists of determining the parameters r 1 , r 2 of a function m 2i (Equation (9)) by minimizing the cost of the function.
The calculation procedure of the descending gradient algorithm consists of the following steps: a.
Initialize the value of reactivity ratios, r j 1 , r j 2 with values obtained with the Fineman-Ross method; b.
By introducing r j 1 , r j 2 values into Equation (8), calculate m 2i by numerical integration for each experimental point until the specific conversion has been reached using the numerical integration algorithm proposed by Kazemi [17,18]; c.
The cost of the coefficients r j 1 , r j 2 is evaluated with the following equation, Equation (15): is the experimentally determined copolymer composition, m jcalc 2i is the copolymer composition calculated with Equation (2), and j is the number of the calculation step. a.
The partial derivative of the cost function (δ) according to r 1 and r 2 is calculated to determine the direction of evolution of the parameters r j 1 , r j 2 . Equations (35) and (36): where n is the number of experimental sets. b.
The new values of the reactivity ratios r where α is the search step which has a small positive value α ∈ [0, 1], in this case, α = 0.1. c.
The error of the method is determined using Equation (42): d. If the error is higher than a preset value (err = 1 × 10 −15 in this case), then the calculation is resumed with new coefficients r j+1 1 , r j+1 2 , or else the calculation will stop. The evaluation of the quality of the reactivity ratios obtained is performed using the Fisher criterion (F), whose formula shown below.
where F c is the value of the Fisher criterion, n is the number of monomers used in copolymerization and p is the number of the experimental data set. Thus, m j(e) i is the molar fraction of monomer "i" from copolymer for "j" experimental data set, and m j(c) i is the molar fraction of monomer "i" calculated based on a mathematical model for the experiment "j".
As can be seen from Equation (43), the lower the value of the Fisher criterion, the closer the values of reactivity ratios are to the true value.
The analysis of the quality of the methods for calculating the reactivity ratios considered in this paper was carried out by imposing the conditions presented in Table 1. The choice of reactivity ratios used in the qualitative analysis was made based on the following criteria: (r 1 × r 2 ): For these imposed conditions, the initial compositions of the monomer mixture between 0-1 and conversions were normalized and randomly generated in the imposed intervals given in Table 1. With these data, for the given conversions the composition of the copolymer was calculated using numerical integration of Mayo-Lewis equation until the specific conversion for each point was reached. In Tables 2-4, the obtained data is shown, where LC is low conversion, MC is medium conversion and HC is high conversion and numbers refer to 1 for r 1 = 0.05, r 2 = 0.5., 2 for pairs r 1 = 0.8, r 2 = 1.8 and 3 for r 1 = 0.4, t 2 = 0.8. Further, the analysis of the quality of the reactivity ratios obtained with the described methods was also carried out for the published experimental data with respect to the following conditions: at low conversion (1.0-9.5%) for the copolymerization of n-butyl methacrylate with n-butyl acrylate [23] in bulk at 80 • C initiated by BPO, at medium conversion (14.0-16.5%) for the copolymerization of 2-isopropenyl-2-oxazoline with methyl methacrylate [24] in acetonitrile at 70 • C initiated by AIBN, and at high conversion (10.9-55.0%) for the copolymerization of N-(4-carboxyphenyl) maleimide (NCPM) with N-vinyl-2pyrrolidone [25] (NVP) in dimethylformamide at 90 • C. The GD method was written in the Python 3.7 programming language.

Discussion
However, in Tables 5-13 it can be easily seen that in all cases, the integral method GD has the lowest bias compared to the initial conditions imposed, as well as the lowest values of the Fisher criterion. The e-KT method gives good results, with one exception at high conversion for values of r 1 = 0.05, r 2 = 0.5. The reason why by the e-KT method for the reactivity ratios r 1 = 0.05, r 2 = 0.5 at high conversions gives wrong values is due to the limitations of the method. This method can be used successful up to 40% conversions as well as to the errors of the logarithm function of around of 0 values. More details have been described by Tudos et al. [10]. The EVM method proposed by Chee and Ng [19] is usable, with good results in a few cases. The limitation of the EVM variant proposed by Chee and Ng [19] is due to the mathematical model used, because when one of the reactivity ratios is very close to zero and the conversion is high, the factor of logarithm can take negative values; for this reason, the results are erroneous. In Figures 1-9, the evolution of reactivity ratios values is depicted, with searching steps and error value (Equation (39)) evolution for last 60 searching steps for the GD method for data given in Tables 5-13                        As can be seen from Figures 1-9, the best values of reactivity ratios obtained with the GD method are strongly dependent on the initial value of the reactivity ratios used in the optimization process. Additionally, the values of the reactivity ratios obtained by descent gradient optimization are global minimums. Because the convergence of the err function (Equation (39)) is small, there are situations in which the GD method requires a high number of search steps to reach the proposed threshold value (10 −15 ), which leads to a longer search time. Reducing the threshold value imposed will lead to a decrease in the calculation time but can also reduce the accuracy of the result.   As can be seen from Figures 1-9, the best values of reactivity ratios obtained with the GD method are strongly dependent on the initial value of the reactivity ratios used in the optimization process. Additionally, the values of the reactivity ratios obtained by descent gradient optimization are global minimums. Because the convergence of the err function (Equation (39)) is small, there are situations in which the GD method requires a high number of search steps to reach the proposed threshold value (10 −15 ), which leads to a longer search time. Reducing the threshold value imposed will lead to a decrease in the calculation time but can also reduce the accuracy of the result.   As can be seen from Figures 1-9, the best values of reactivity ratios obtained with the GD method are strongly dependent on the initial value of the reactivity ratios used in the optimization process. Additionally, the values of the reactivity ratios obtained by descent gradient optimization are global minimums. Because the convergence of the err function (Equation (39)) is small, there are situations in which the GD method requires a high number of search steps to reach the proposed threshold value (10 −15 ), which leads to a longer search time. Reducing the threshold value imposed will lead to a decrease in the calculation time but can also reduce the accuracy of the result. As can be seen from Figures 1-9, the best values of reactivity ratios obtained with the GD method are strongly dependent on the initial value of the reactivity ratios used in the optimization process. Additionally, the values of the reactivity ratios obtained by descent gradient optimization are global minimums. Because the convergence of the err function (Equation (39)) is small, there are situations in which the GD method requires a high number of search steps to reach the proposed threshold value (10 −15 ), which leads to a longer search time. Reducing the threshold value imposed will lead to a decrease in the calculation time but can also reduce the accuracy of the result.
It is also well known that from a statistical point of view, all combinations of pairs of reactivity ratios that can be obtained from within the 95% confidence domain satisfy the experimental data for the chosen mathematical model. For this reason, the reliable domains for the calculation methods taken in the analysis for the three initial conditions are drawn in Figures 10-22 It is also well known that from a statistical point of view, all combinations of pairs of reactivity ratios that can be obtained from within the 95% confidence domain satisfy the experimental data for the chosen mathematical model. For this reason, the reliable domains for the calculation methods taken in the analysis for the three initial conditions are drawn in Figures 10-22, and published data are drawn in Figures 23-25.   It is also well known that from a statistical point of view, all combinations of pairs of reactivity ratios that can be obtained from within the 95% confidence domain satisfy the experimental data for the chosen mathematical model. For this reason, the reliable domains for the calculation methods taken in the analysis for the three initial conditions are drawn in Figures 10-22, and published data are drawn in Figures 23-25.          Figure 13.      Figure 14. Details of JCR for GD given in Figure 13.   Figure 15. Figure 16. Details of JCR for GD, EVM and e-KT given in Figure 15.
The JCR domains that do not appear in these figures are so large that they make the other JCRs no longer observable.
Although in all cases regardless of the initial conversions, the bias between the calculated value of the reactivity ratios by the analyzed methods and the target value are small, the JCR domains show their quality difference.
In the case of conversions below 10% and r 1 = 0.05, r 2 = 0.5 the r-FR method, through its reliable JCR, can also admit the possibility of aberrant solutions, i.e., negative reactivity ratios that are not allowed kinetically by their definition. The GD and e-KT methods have the lowest reliable and practical identical JCR in the case of experimental data with conversions of less than 10%.      Figure 17. Figure 18. Details of JCR for GD given in Figure 17.           In the case of the conversion range between 10-25%, a good separation of the quality of the results between the differential and the integral methods can be made. This is observable both by the value of criterion F c and by the size of the JCR domain shown in Figures 13-18. It should be noted that among the integral calculation methods for r 1 = 0.8 and r 2 = 1.8, although they have a close value of criterion F c and the confidence domains are small, GD is the only method that admits the target value within its JCR domain. Additionally, it should be noted that the reactivity ratios obtained by the methods presented are outside the JCR of the GD method. It is obvious that at high conversions over 30%, differential methods give erroneous values and large areas of JCR. In this case, there is a good separation of the quality of the results obtained between the GD and e-KT method. Surprisingly, the EVM method analyzed here, although an integral method, does not give acceptable results. Figure 24. The JCR domain for copolymerization of 2-isopropenyl-2-oxazoline with methyl methacrylate [24]. The JCR domains that do not appear in these figures are so large that they make the other JCRs no longer observable.
Although in all cases regardless of the initial conversions, the bias between the calculated value of the reactivity ratios by the analyzed methods and the target value are small, the JCR domains show their quality difference.
In the case of conversions below 10% and r1 = 0.05, r2 = 0.5 the r-FR method, through its reliable JCR, can also admit the possibility of aberrant solutions, i.e., negative reactivity ratios that are not allowed kinetically by their definition. The GD and e-KT methods have the lowest reliable and practical identical JCR in the case of experimental data with conversions of less than 10%.
In the case of the conversion range between 10-25%, a good separation of the quality of the results between the differential and the integral methods can be made. This is observable both by the value of criterion Fc and by the size of the JCR domain shown in Figure 25. The JCR domain for copolymerization of N-(4-carboxyphenyl) maleimide with N-vinyl-2pyrrolidone [25].
Comparing the values of criterion F c and the size of the 95% confidence regions, it can be said that there is a good correlation between them; therefore, criterion F c can be used as an indicator of the quality of reactivity ratios.

Conclusions
It was observed that there is a good correlation between the value of criterion F c and the size of the JCR domain. For this reason, criterion F c could be used as an indicator of the quality of reactivity ratios. Regardless of the value of the conversions, the integral methods gave better results than the differential methods. From these analyzed integral methods, the GD method presented here can be successfully used to obtain reactivity ratios for experimental data that obtained up to 50-55% conversions.