Determination of Reactivity Ratios from Binary Copolymerization Using the k-Nearest Neighbor Non-Parametric Regression

This paper proposes a new method for calculating the monomer reactivity ratios for binary copolymerization based on the terminal model. The original optimization method involves a numerical integration algorithm and an optimization algorithm based on k-nearest neighbour non-parametric regression. The calculation method has been tested on simulated and experimental data sets, at low (<10%), medium (10–35%) and high conversions (>40%), yielding reactivity ratios in a good agreement with the usual methods such as intersection, Fineman–Ross, reverse Fineman–Ross, Kelen–Tüdös, extended Kelen–Tüdös and the error in variable method. The experimental data sets used in this comparative analysis are copolymerization of 2-(N-phthalimido) ethyl acrylate with 1-vinyl-2-pyrolidone for low conversion, copolymerization of isoprene with glycidyl methacrylate for medium conversion and copolymerization of N-isopropylacrylamide with N,N-dimethylacrylamide for high conversion. Also, the possibility to estimate experimental errors from a single experimental data set formed by n experimental data is shown.


Introduction
Technological development brings with it the need to create new polymers with predefined physico-chemical properties. It is well known that the physico-chemical properties of polymers are given by their microstructure, and the microstructure is determined by the reaction kinetics. By the nature of the monomers used in the copolymerization reaction and by a controlled kinetics, specific microstructures can be obtained such as: polymers with amorphous or crystalline areas, polymers with large molecular masses, branching polymers, crosslinked polymers or more other microstructure types. All these microstructure types have great influence on the mechanical and chemical behavior of the resulting polymers. The possibilities to obtain any kind of mechanical or chemical properties of copolymers are practically unlimited, but there exists only one limitation to our imagination. The mechanism of binary copolymerization in which it is considered that only the last structural unit attached to the polymer chain influences the growth mode of the polymer is described by the following kinetic relations [1]: Polymers 2021, 13 where P n -growing polymer chain, M 1 * , M 2 * -the active center on monomers, k 11 , k 12 , k 21 , k 22 -propagation rate constants.
The transformation of the above kinetic equations into a mathematical model that connects the kinetic evolution, and the microstructure of the formed copolymer is obtained using the mathematical equations of a first order kinetics, described by the following equations: where −dM 1  It is obvious that both the reaction mechanism and the kinetic Equations (1) and (2) are not entirely correct because they do not consider the initiation reaction, the termination reaction, and the transfer reaction of the active center. However, to be able to generate a mathematical model in which parameters that cannot be measured do not appear, it is mandatory to impose the stationary state condition described by relation (3): Considering the above, several authors [2][3][4][5] have proposed various mathematical solutions that describe the connection between the microstructure of the copolymer and the kinetics of the reaction. Thus, Alfrey Jr. and Goldfinger [2] propose the following relation (4) Mayo and Lewis [3] propose relation (5): and Wall [4] and Skeist [5] propose the following form: where, r 1 = k 12 k 11 and r 2 = k 21 k 22 (7) r 1 , r 2 -reactivity ratios of monomers. After all, it is easy to see that the Equations (4)- (6) are nested equations, and the most common form is that described by Equation (5). This mathematical model is a differential one that makes the connection between the reaction kinetics and the instantaneous composition of the copolymer and can be used for experimental data which have conversion below 10%.
For experimental data with conversions greater than 10% it is necessary to use the integral form of the differential equation

X-conversion.
As can be seen, Equations (8) and (9) also are nested equations. Into a r 1 , r 2 coordinate system, the Equations (3) and (9) proposed by Mayo and Lewis [3] describe a line for each experimental point of an experimental data set. Taking account of this Equation (5) can be rewritten as: and Equation (9) has the following form: where By intersecting two lines thus obtained, are obtained the reactivity ratios as a solution that satisfy the parameters of the two experimental points considered. If we have n experimental points, we obtain m solutions of the experimental data set. The number of solutions m of an experimental data set is obtained with the relation: Unfortunately, Mayo and Lewis [3] in their paper do not offer a solution for finding the best solution of reactivity ratios for the situation where n > 2. Since the publication of the intersection method [3] a few authors [7][8][9][10] have proposed various solutions to find the best value of the reactivity ratios from (2, m) matrix of solutions. An interesting solution for finding the best pair of values r 1 , r 2 from the matrix of solutions obtained by the intersection method is proposed by Abdollahi et al. [10] (ANA). These authors consider that the optimal values of reactivity ratios r 1 0 , r 2 0 are that which has the smallest distance from all calculated lines using the Equation (5) for all experimental points. To determine the optimal values r 1 0 , r 2 0 the authors rewrite Equation (5) in the following form: The sum of the squares of the distance from the optimal point r 1 0 , r 2 0 at each line is calculated with the relation: where i-denote the number of the experimental point from experimental data set.
To calculate the minimum distance from the optimal point r 1 0 , r 2 0 to each line, the partial derivatives to r 1 0 and r 2 0 of the function f (r 1 0 , r 2 0 ) respectively, are both of them set to zero. The partial derivatives equations are described by: Solving the Equations (17) and (18) can obtain the optimal values of reactivity ratios r 1 0 , r 2 0 . The algorithm described above is a k nearest neighbour (k-NN) regression algorithm where k = n, where the differential copolymerization equation is used. The algorithm called k nearest neighbour [11] (k-NN) is a non-parametric regression algorithm that is permitted to obtain an optimal point based on calculation of the Euclidian distance between k points located in neighbourhood, where k is an integer chosen value between 2 and total number of points of data set.
An approach in determining the reactivity ratios is either the linearization of the Mayo-Lewis differential Equation (5) or the linearization of the integral Equation (9). The method proposed by Fineman and Ross [12] is chronologically the first method that uses the linearization of the Mayo-Lewis differential Equation (5). The mathematical equations that describe the method proposed by Fineman and Ross are: where, Equation (19) is known as the Fineman-Ross method (FR) and Equation (20) as the reverse Fineman-Ross method (r-FR).
The disadvantage of the uneven distribution of points along the line passing between the calculated points, which is observed in the Fineman-Ross method, was removed by Kelen-Tudos [13,14] (KT) by using a correction factor α which is calculated with the relation: where Considering this aspect presented above, the Mayo-Lewis Equation (5) is rewritten in the form: G where In the coordinate system G/(α + F), F/(α + F) the points calculated by means of the Equation (25) have a uniform and collinear distribution.
The KT linear method has been extended to be used to determine reactivity ratios for experimental data obtained at high conversions [15] (e-KT). In this case Equation (25) is rewritten as follows: z(y − 1) where where the 0 index refer to the initial concentration of monomer i, α has the same mathematical form as presented above, P n weight percent conversion, µ-molecular weight of monomers. For all the linear methods presented above we can write a generalized equation of the following form: where ζ-dependent variable, η-independent variable, a-slope, b-intercept. The line parameters for the methods presented above are centralized in Table 1.
Determination of the slope (a) and the intercept (b) (31) for a line can be obtained using the ordinary least squares methods (OLS) described by following relations: Using OLS to obtain the best slope and intercept values, the parameters ζ and η must respect the Gauss-Markov assumptions, which are: Therefore, obtaining reactivity ratios by linearizing the Mayo-Lewis Equation (5) is limiting because it is difficult to fully respect Gauss-Markov's assumptions.
Considering the above, Tidwell and Mortimer [16] approached the solution of the Mayo-Lewis Equation (5) through a nonlinear view. Tidwell and Mortimer (TM) derived the Mayo-Lewis equation written in the form proposed by Wall [4] and Skeist [5] (6) obtaining the following relation: where: i is the number of the experimental run, j is number of the estimation set and r 0 1 , r 0 2 are the expectation values of r j 1 and r j 2 respectively. By making the difference (d) between the measured value of the composition of the copolymer (m j 2 i ) and the calculated composition of the copolymer (G j ), the following equation is obtained: then estimates,β 1 ,β 2 of the smallest squares of β 1 and β 2 provide the necessary corrections so that the new values of r j 1 and r j 2 given by: The method proposed by Tidwell and Mortimer uses the Gauss-Newton optimization algorithm by minimizing ∑(d i ) 2 for the search for the best pair of reactivity ratios.
It is well known that any experimental measurement contains errors, and for this reason a number of authors [17][18][19][20][21][22][23][24][25][26] have used the principle of minimizing these errors to obtain the true value of composition of the feed and the copolymer, and finally to obtain the best values of reactivity ratios.
This concept, called error in variable method (EVM), was originally developed by German [17] considering the error in only one variable. Later van der Meer et al. [18] extended the concept to analysis the errors in both variables, after which various approaches appeared in the calculation methodology [19][20][21][22][23][24][25][26]. For the comparative analysis of the methods for calculating the reactivity ratios, the EVM variant proposed by Chee and Ng [26] was chosen, because it uses the integral equation proposed by Mayo and Lewis (12) and does not require to know the experimental error.
The variant of EVM proposed by Chee and Ng (EVM-CN) minimizes the objective function given by the relationship: where r e 2 -the value of r 2 estimated with Equation (12), P n -weight percent conversion, σstandard deviation of M 10 , m 1 -molar fraction of monomer 1 in copolymer. Although the methods for calculating reactivity ratios using the EVM technique are integral methods, they do not include conversion measurement errors in their analysis.
The new integral method proposed below is an adaptation of the non-parametric k-NN regression algorithm to the calculation of reactivity ratios from terminal model of binary copolymerization.

Materials and Methods
In the work of Mayo and Lewis [3] the following expression draws attention, "The experimental error, measured by the size of the area bounded by the three lines, is halved by a change of only 0.10% in the carbon analysis (0.5% in the styrene content) of the copolymer". In the coordination system r 1 , r 2 through the intersection of three lines results a triangle whose vertices are described by the coordinates of the points P i (r 1 , r 2 ), P j (r 1 , r 2 ) and P q (r 1 , r 2 ). The determination of the values of the coordinates of the points P i (r 1 , r 2 ), P j (r 1 , r 2 ) and P q (r 1 , r 2 ) is undertaken by solving the following system of equations: where: i, j, q-indices referring to the number of the experimental point from data set. By solving the system of Equation (49) for "n" experimental points a number of "m" of triangles can be generated, according with the relation (51): The calculation of the experimental errors starting from the statement of Mayo and Lewis [3] is undertaken by solving the following system of Equation (52): where S i -the size of area of the triangle, i = 1 . . . m; ε i j , ε i q , ε i s -the errors of the experiments that leads to the formation of the triangle i.
The surface of the formed triangle, where are knowing the values of its peaks P i (r 1 , r 2 ), P j (r 1 , r 2 ) and P q (r 1 , r 2 ) is calculated with the following relation (53): The solutions of the system of Equations (52) are obtained by solving the matrix Equation (54): Based on these observations presented above, it was considered that the determination of reactivity ratios could be achieved by an error regression analysis using the k-NN algorithm where k = 3. The method of calculating the reactivity ratios using the k-NN regression algorithm has the following steps: 1.
Calculate all possible sets of P 3 t (r 1 , r 2 ) points that can be generated from the experimental data set.

2.
For each set of points, P 3 t (r 1 , r 2 ) will calculate the weight center, P cen (r j 1 ,r j 2 ), using the relations:r wherer j 1 ,r j 2 -the coordinates of the weight center of a set of points P 3 t (r 1 , r 2 ), r 1 i,j , r 2 i,j -the coordinates of the vertices of the triangle in the data set P 3 t (r 1 , r 2 ), i-index of the vertices point i = 1,2,3 and j-number of set of points P 3 t (r 1 , where n-number of experimental data sets. 3. For each P cen (r j 1 ,r j 2 ) point, calculate the composition of the substrate using the integration method [24,25] until the experimental conversion of each point from the experimental data set is touched. 4.
Using the experimental data of copolymer composition and calculated copolymer composition withr j 1 ,r j 2 , calculate the value of the objective function, the Fischer criterion (F c ) [38], using the relation (57) where F c cen j is the value of the Fisher criterion for the reactivity ratios from the center of each triangle, n is the number of monomers used in copolymerization and p is the number of the experimental data set. Thus, m i j(e) is the molar fraction of monomer "i" from copolymer for "j" experimental data set, m i j(c) is the molar fraction of monomer "i" calculated based on a mathematical model for the experiment "j". 5.
The P cen (r values. These selected points will generate a new set of points P 3 t (r 1 , r 2 ). This step is intended to eliminate the reactivity ratios which have great errors and to reduce computation time. 6.
The error of the optimization process is evaluated with the following relation: where F c s 1 -the best value of Fischer criterion at step s of the optimization process, F c s−1 1 -the best value of Fischer criterion at step s − 1 of the optimization process. If the error (err) is not less than 1 × 10 −4 , then with the last generated set of points P 3 t (r 1 , r 2 ) return to step 2, else the search process will be stopped.
The reactivity ratios which have the lowest value of the Fischer criterion from the last search step will become the final solution of the optimization process.
In order to verify the quality of the new method compared to the methods presented above, an analysis plan was drawn up on simulated data in which the chosen reactivity ratios must meet the conditions: Table 2 shows the data of a comparison of the quality analysis plan for a new method with the most used methods in reactivity ratios determination, presented above. The reactivity ratios were chosen randomly in such a way as to meet the conditions imposed above. The feed composition and the conversions were obtained by a normalized randomly software. The copolymer composition was obtained by numerical integration until the specific conversion of each point was reached. Moreover, the methods presented above were also verified on real experimental data for copolymerization of: The simulated input data, which were used in the comparative qualitative analysis of the methods for calculating the reactivity ratios presented above are shown in Tables 3-11. The estimated errors shown in the tables below are obtained by solving Equation (39) for given data.
The software used to determine the reactivity ratios with the methods described above was coded in Python 3.

Results
The reactivity ratios obtained in this analysis, as well as the Fisher criterion values, using the input from Tables 3-11, are presented in Tables 12-20. In these tables, the reactivity ratios obtained by the methods used in this analysis are ascending, ordered according to the value of the Fisher criterion (F c ), and the bias represents the value of the difference from the calculated value of the reactivity ratios and the imposed target value. To highlight the way in which the integral method k-NN looks for the best point, Figures 1-9 present the points P cen (r1, r2) obtained for each search step, where the best point represent the final solution of k-NN method.                    For a complete analysis of the quality of the k-NN method and the other methods used in this comparative analysis, the 95% confidence domains (JCR) were plotted for all nine imposed conditions. Relation (59) was used to trace these JCRs: where, Equation (59) was defined by Mathew and Duever as the "exact shape" of JCR [42]. In Figures 10-18, the JCRs that do not appear in the graph are so large that they would make the small ones no longer visible. In the following figures, the target value represent the chosen reactivity ratios for each simulated experiment. For a complete analysis of the quality of the k-NN method and the other methods used in this comparative analysis, the 95% confidence domains (JCR) were plotted for all nine imposed conditions. Relation (59) was used to trace these JCRs: where, Equation (59) was defined by Mathew and Duever as the "exact shape" of JCR [42]. In Figures 10-18, the JCRs that do not appear in the graph are so large that they would make the small ones no longer visible. In the following figures, the target value represent the chosen reactivity ratios for each simulated experiment. where, Equation (59) was defined by Mathew and Duever as the "exact shape" of JCR [42]. In Figures 10-18, the JCRs that do not appear in the graph are so large that they would make the small ones no longer visible. In the following figures, the target value represent the chosen reactivity ratios for each simulated experiment.                     The k-NN method for determining the reactivity ratios proposed in this paper as well as the other methods used in this comparative analysis were also tested on real experimental data. The results obtained are presented in Tables 21-23     The k-NN method for determining the reactivity ratios proposed in this paper as well as the other methods used in this comparative analysis were also tested on real experimental data. The results obtained are presented in Tables 21-23    The k-NN method for determining the reactivity ratios proposed in this paper as well as the other methods used in this comparative analysis were also tested on real experimental data. The results obtained are presented in Tables 21-23 and Figures 19-24.   Figure 18. The JCR of analyzed methods for reactivity ratios calculation for HC3 imposed condition.
The k-NN method for determining the reactivity ratios proposed in this paper as well as the other methods used in this comparative analysis were also tested on real experimental data. The results obtained are presented in Tables 21-23 and Figures 19-24.

Discussion
The visualization of the search steps of the k-NN method shows us that the elimination of the pairs of irrelevant reactivity ratios using as criterion of elimination the value of F c not only increases the calculation speed but also improves the quality of the result. The improvement in the quality of the result is given by the fact that the numerator of the function F c is in fact a residual variation due to errors (61).
The results from Tables 13-21 show that the integral k-NN method is in good agree-

Discussion
The visualization of the search steps of the k-NN method shows us that the elimination of the pairs of irrelevant reactivity ratios using as criterion of elimination the value of F c not only increases the calculation speed but also improves the quality of the result. The improvement in the quality of the result is given by the fact that the numerator of the function F c is in fact a residual variation due to errors (61).
The results from Tables 13-21 show that the integral k-NN method is in good agreement with the other integral methods for determining reactivity ratios and obviously better than the differential methods used in this comparative analysis.
On the other hand, if we corroborate the data from Tables 13-21 with the JCRs presented in Figures 10-18, it is observed that: - The e-KT method has the lowest values of F c in the case of the conditions imposed by LC1, LC2, LC3, MC1 and HC2 but at the same time the target value imposed for LC1, MC1 and MC3 is outside the JCR determined for this method. Taking into account that JCR represents the set of reactivity ratios that are solutions of the method with 95% confidence and the target value is not part of these solutions, the e-KT method cannot be considered the best method in the situations presented above. - The EVM-CN method is the best method for the MC3 and HC3 conditions. - The reactivity ratios obtained by the EVM-CN method for the imposed conditions LC2, and LC3 are outside the JCR of the best method for these cases. -Under the conditions imposed by HC1, the e-KT and EVM-CN methods did not give good results because the calculation method uses logarithms whose argument takes negative values for large conversions and appropriate reactivity ratios of 0.
The true value of the k-NN method is demonstrated by the results obtained on real experimental data which proves that it is a solid method and can be used successfully at any conversion of less than 55%.

Conclusions
The integral method for determining the reactivity ratios based on the k-NN regression algorithm proposed in this paper is a simple method based on the intersection method. The k-NN method provides results comparable to any other integral method. The k-NN method is stable for any combination of reactivity ratios and can be used successfully up to 55% conversions. The notable disadvantage of this method is that it requires a minimum of six experimental points to be effective. Also, in the search process, a way to estimate the experimental errors using a single data set was determined. We believe that future works could establish models with the three conversion parts.