Modeling the Optical Properties of a Polyvinyl Alcohol-Based Composite Using a Particle Swarm Optimized Support Vector Regression Algorithm

We developed particle swarm optimization-based support vector regression (PSVR) and ordinary linear regression (OLR) models for estimating the refractive index (n) and energy gap (E) of a polyvinyl alcohol composite. The n-PSVR model, which can estimate the refractive index of a polyvinyl alcohol composite using the energy gap as a descriptor, performed better than the n-OLR model in terms of root mean square error (RMSE) and mean absolute error (MAE) metrics. The E-PSVR model, which can predict the energy gap of a polyvinyl alcohol composite using its refractive index descriptor, outperformed the E-OLR model, which uses similar descriptor based on several performance measuring metrics. The n-PSVR and E-PSVR models were used to investigate the influences of sodium-based dysprosium oxide and benzoxazinone derivatives on the energy gaps of a polyvinyl alcohol polymer composite. The results agreed well with the measured values. The models had low mean absolute percentage errors after validation with external data. The precision demonstrated by these predictive models will enhance the tailoring of the optical properties of polyvinyl alcohol composites for the desired applications. Costs and experimental difficulties will be reduced.


Introduction
Polyvinyl alcohol is an atactic, semi-crystalline polymeric material that possesses excellent biodegradability, biocompatibility, useful mechanical properties, excellent optical properties, and non-toxicity, hence its wide range of applications [1][2][3]. Other excellent properties of polyvinyl alcohol include thermal stability, water solubility, excellent optical transmission, and non-corrosiveness [4]. These features, especially its optical properties such as the refractive index and energy gap, promote its industrial and technological uses as an optoelectronic material, a coating material, a solar cell component, a super capacitor component, and a component of several kinds of sensors [5,6]. The hydrogen bonding between polyvinyl alcohol and other materials is facilitated by the presence of hydroxyl groups on the carbon backbone of polyvinyl alcohol, and these bonds help with composite formation [6,7]. Polyvinyl alcohol is of significant interest because it is abundantly accessible, relatively cheap, contains many volatile functional groups, and has hydrophilic features. It has excellent charge storing capacity, has great dielectric strength, and gives uniform high-optical-quality films for nonlinear optical instruments and optical sensors. Temperature dependency and inter or intramolecular connectivity enhance polyvinyl alcohol chains' flexibility [8]. These properties strengthen the polyvinyl alcohol matrix, making it a viable composite that can be used for electronic devices, bioengineering, and the acquisition of the implemented dataset. Section four explains the results of the models and presents a comparison with the outcomes of the ordinary linear regression models. Section five concludes the manuscript.

Mathematical Descriptions of the Algorithms
The formulation of the support vector regression algorithm is mathematically described in this section. The evolutionary principle governing particle swarm optimization is also presented.

Support Vector Regression
Support vector regression can connect the energy gap of doped polyvinyl alcohol with the corresponding refractive indices through data transformation from a two-dimensional structure to a higher-level structure of n dimensions [25,26]. Consider a dataset of M samples of polyvinyl alcohol composite consisting of input energy gap E k ∈ X = R m and measured refractive indices n k ∈ Y = R, such that k = 1, 2, . . . , M. The algorithm addresses the problem through a regression function presented in Equation (1) [27,28].
where γ and b are vector weights and bias, respectively, where γ, b ∈ R. The dot product between the input E and weight vector γ is represented by γ•E . Restricting the precision of the model to a threshold value defined by epsilon ε requires that the Euclidean norm shown in Equation (2) is minimized and subjected to the constraints and conditions of Equation (3) [29,30].
where measured and estimated refractive indices are denoted by n meas (E) and n(E), respectively. Positive variables (χ and χ * ) known as slack variables penalize the prediction function in situations where the precision threshold defined by ε becomes difficult to actualize. With these inclusions, the optimization problem is transformed to Equation (4), and the new constraints contained in Equation (5) hold. Min where C is the penalty coefficient that influences the precision and accuracy of the model. It penalizes samples outside the channel through determination of the tradeoff between model complexity and the training error. A small value of C indicates a regression function permitting a lower-cost deviation of the predicted refractive index from the measured values. Thus, the epsilon loss function is defined in Equation (6).
It should be noted that the refractive indices of the trained sample of polyvinyl alcohol composites, which fall within negative and positive ε zone, do not fall within the loss. Lagrange multipliers are adequate for solving the convex optimization problem contained in Equation (4). Lagrange multipliers (λ, λ * , δ, δ * ) are introduced as presented in Equation (7).
The final regression function after Lagrange multipliers and subsequent transformation to original dual space is presented in Equation (8) [31].
The support vectors acquired during the training phase of model development with the training samples correspond to λ k − λ * k = 0. These support vectors represent the data points which are closer to the hyperplane and can influence the orientation and the position of the hyperplane. Inclusion of kernel function η(E k , E) into Equation (7) allows nonlinear mapping, and the new regression function is presented in Equation (9) [32] The Gaussian kernel function presented in Equation (10) performs better than the other functions. This kernel is a robust radial basis kernel which has excellent anti-interference defense against data noise.
where ω is the kernel parameter.

Particle Swarm Optimization (PSO)
Particle swarm optimization is a metaheuristics-based method of optimization that was inspired by fish training and bird swarming. The algorithm addresses optimization problems by considering a flock of birds with social interactions among themselves in a search for sources of food [33,34]. Each bird searching for food sources is considered a particle; the swarm refers to the flock of birds. Velocity and position are two characteristic features that direct the swarm towards the food sources, and these features are determined randomly at the initial stage of the search. When a bird attains an ideal position, the position is referred to as its individual best, since the position factors in the peculiarities of the bird itself. However, the global best position comes into play when a bird attains the best position with respect to the swarm [31,35]. With the individual experience of each bird and the experiences perceived by other birds in the swarm, the position and velocity (individual best and global best positions and the velocities) are updated and refreshed accordingly. The position and velocity of each of the particles (bird) are mathematically modeled and simulated as shown in Equations (11) and (12), respectively. where R j = jth particle position (N-dimensional), V j = jth particle velocity (N-dimensional), ψ = weight (inertial), a c = first acceleration constant, τ = random number in a range of 0 to 1, P j = individual best position, aˆc = second acceleration constant, τˆ= another random number from 0 to 1 (it may be different or the same as τ), and Pˆj = global best position. The inertial weight controls the stopping conditions of the algorithm and decreases as the number of iterations increases [36]. The relation with which the inertial weight controls the convergence of the algorithm is presented in Equation (13).
where i max and i, respectively, represent the maximum number of iterations defined by the user at the commencement of the algorithm, two hundred, and the number of iterations at a particular time.

Hybrid Particle Swarm-Based Support Vector Regression Model Development
The computational part of this work is presented in this section. Data acquisition and a description of the dataset are also presented.

A Description of the Dataset and Its Acquisition
The refractive indices and energy gaps of polyvinyl alcohol composites used for developing PSVR and OLR models were extracted from the literature [6,7,12,[37][38][39][40][41][42][43][44][45][46]. The energy gaps and refractive indices dataset was extracted from sixty-three composite samples of polyvinyl alcohol. Increments in the concentrations of fillers within the polymer matrices influence the refractive indices of the polymer composites due to crosslink formation in the respective matrices. The refractive index of a polymer changes with the density of the crosslinking because of the tightness and closeness between the chains [6]. Similarly, impurities (fillers) incorporated into the polymer matrix lead to trap-level formation within the band gap, which consequently affects the energy gap of the composite [12]. The correlation cross-plot between the refractive indices and energy gaps of polyvinyl alcohol composites is presented in Figure 1. It can be inferred from the figure that there exists no linear relationship between refractive index and energy gap for the investigated polymeric composites.

Computational Methodology of the Particle Swarm Optimized Support Vector
Development of PSVR and OLR-based models was conducted within computing environment. The hyperparameters influencing the precision, ro accuracy of support vector regression were optimized using a particle sw

Computational Methodology of the Particle Swarm Optimized Support Vector Regression
Development of PSVR and OLR-based models was conducted within the MATLAB computing environment. The hyperparameters influencing the precision, robustness, and accuracy of support vector regression were optimized using a particle swarm optimization algorithm, in which each bird (particle) in a flock (swarm) was assumed to contain the information about the hyperparameters in a specified order. The dataset employed for the simulation was randomized before proceeding to the data partitioning phase to ensure uniform distribution of the data points. The randomized set of data was further separated into training and testing at 8:2. The training set was employed for support vector acquisition. The effectiveness and efficacy of each model were assessed using the testing set. The step-by-step procedures of the algorithm hybridization are summarized as follows: Step 1: Particle swarm parameter and search space initialization: PSO parameters such as the population size (N P ), maximum number of iteration (i max ), inertial weight (ψ), and acceleration constants (aˆc and a c ) were initiated and specified. The search spaces for each of the hyperparameters were also defined as [1000 1; 0.9 0.1; 0.9 0.01] for the E-PSVR model, corresponding to [penalty factor, epsilon, kernel option]. The search space for the n-PSVR model was defined as [1000 1; 0.9 0.1; 0.9 0.1], corresponding to [penalty factor, epsilon, kernel option]. It should be noted that the limits of the search spaces were selected after performing a random check of the most probable locations of the possible solutions.
Step 2: Random generation and initialization of particle position and velocity: The position and velocity of each of the particles constituting a swarm were generated randomly within the search space. The generated position and velocity were potential values of the hyperparameters.
Step 3: Fitness function evaluation: Evaluation of the fitness of each of the particles involves the development of an SVR-based model by implementing the following major steps. (i) Selection of a function (such as sigmoid, polynomial, or Gaussian) which serves as the kernel function. (ii) The selected function, a particle from a swarm, and the training data are incorporated into the SVR algorithm to train a model. (iii) The trained model is evaluated using root mean square error (TR-RMSE). (iv) The testing dataset is fed into the support vectors acquired during the training for model validation. (v) The tested model is also evaluated using root mean square error (TS-RMSE). Therefore, each particle within the swarm has the corresponding value of TS-TMSE (that is, individual best P j ) which serves as the fitness value. The lower the value of TS-RMSE, the fitter the particle. When the lowest value (corresponding to the most fit particle) of TS-RMSE in a swarm is compared with the lowest values of TS-RMSE of the other swarms, the lowest TS-RMSE from all the swarms is referred to as the global best (Pˆj).
Step 4: Updating the individual best positions: If the value of the particle current position (P current ) is greater than P j , update the position as P current = P j . Otherwise, proceed to the next step.
Step 5: Global position update: If P current > Pˆj, update as P current = Pˆj. Otherwise, proceed to the next step.
Step 6: Iteration continuation: If a particle's index is greater than the initially defined number of particles, proceed to the next step. Otherwise go back to Step 3.
Step 7: Fitness evaluation using global best position: Using the global best position, evaluate the fitness function of the particles.
Step 8: Velocity and position update: Update the velocity and position of the particle using Equations (11) and (12), respectively.
Step 9: Stopping conditions: The algorithm stops the repeating circle if the maximum number of iterations has been attained. Otherwise, go to Step 2.
The computational flow description of the developed PSVR-based models is presented in Figure 2. The complete code is available at the Supplementary Materials.
Step 8: Velocity and position update: Update the velocity and position of the particle using Equations (11) and (12), respectively.
Step 9: Stopping conditions: The algorithm stops the repeating circle if the maximum number of iterations has been attained. Otherwise, go to Step 2.
The computational flow description of the developed PSVR-based models is presented in Figure 2. The complete code is available at the Supplementary Materials.

Results and Discussion
The outcomes of the developed n-PSVR, n-OLR, E-PSVR, and E-OLR models are presented in this section. The dependencies of the developed models on the number of particles in the swarm are presented. Results of the investigation of the influences of fillers on the optical properties of polyvinyl alcohol composite are also presented.

Convergence and Sensitivity of the Developed PSVR-Based Models
The influences of the number of swarm particles on the exploration and exploitation capacities of the developed n-PSVR and E-PSVR models are presented in Figure 3. The figure also includes the sensitivity of each of the developed models to the hyperparameters, given various numbers of swarm particles. A balance should be maintained between the exploration and exploitation capacities of the PSO algorithm. When a small number of particles explores a search space, the exploration ability of the algorithm might be hindered. To enhance this exploration capacity by populating the search space with many particles, the exploitation strength of the algorithm might be affected. Figure 3a presents the convergence of the developed n-PSVR model as the number of iterations varies. Premature convergence was observed when the number of particles was set to ten. The figure shows fifty particles in a swarm led to global convergence. The algorithm was trapped in local solutions as the number of particles in the swarm increased from fifty to one hundred. This can be attributed to the deterioration of the exploitation capacity of the algorithm, as the search space was well explored with fifty particles. Figure 3b shows the variation of the penalty factor with the number of particles in the swarm. Less deviation of the estimated refractive index was observed when fifty particles explored the search space. The sensitivity of the developed n-PSVR model to error threshold epsilon is presented in Figure 3c. Although the convergence began at different points when different numbers of swam particles explored the search space, the algorithm converged to the optimum error threshold with fifty particles. This signifies the robustness and precision of the developed model. Error convergence of the developed E-PSVR model is presented in Figure 3d. Irrespective of the number of particles exploiting the search space, the algorithm converged to the same global solution. This shows the robustness of the model we made to have enhanced exploitation and exploration capacities. Figure 3e presents the sensitivity of the E-PSVR model to the penalty factor with different numbers of swarm particles in the search space. The algorithm showed similar global convergence after sixty iterations. Figure 3f shows the sensitivity of the E-PSVR model to the value of error threshold epsilon. The model showed good convergence irrespective of the number of swarm particles. The details of the swarm particles that demonstrated optimum performance, as measured through lowest root mean square error (RMSE), are presented in Table 1. It should be noted that several kernel functions were investigated. The reported Gaussian kernel function showed superior performance over polynomial and sigmoid functions.

Performance Evaluations of the Developed Models
The performance of each of the four developed models was evaluated using error metrics and correlation coefficients. The empirical linear equations for the n-OLR and E-OLR models are presented in Equations (14) and (15), respectively.

Performance Evaluations of the Developed Models
The performance of each of the four developed models was evaluated using error metrics and correlation coefficients. The empirical linear equations for the n-OLR and E-OLR models are presented in Equations (14) and (15), respectively.
The empirical equations were generated using a set of training data and later validated with test data. Evaluations of the performances of n-PSVR and n-OLR models are presented in Figure 4.
The n-PSVR model had superior performance to the n-OLR model in the training and testing stages of model development according to root mean square error (RMSE), mean absolute error (MAE), and correlation coefficient (CC). The n-PSVR model performed better than the n-OLR model during the training phase, as presented in Figure 4a. The performance improvement was 70.83% in terms of CC. The testing phase of model development showed performance improvements of 83.90%, 9.39%, and 7.12% with CC, RMSE, and MAE metrics, respectively, as shown in Figure 4b-d. Performance during the training phase was only compared using CC, since future performance of a model can be effectively judged using the testing performance.
The empirical equations were generated using a set of training data and later validated with test data. Evaluations of the performances of n-PSVR and n-OLR models are presented in Figure 4. The n-PSVR model had superior performance to the n-OLR model in the training and testing stages of model development according to root mean square error (RMSE), mean absolute error (MAE), and correlation coefficient (CC). The n-PSVR model performed better than the n-OLR model during the training phase, as presented in Figure 4a. The performance improvement was 70.83% in terms of CC. The testing phase of model development showed performance improvements of 83.90%, 9.39%, and 7.12% with CC, RMSE, and MAE metrics, respectively, as shown in Figure 4b-d. Performance during the training phase was only compared using CC, since future performance of a model can be effectively judged using the testing performance.
The E-PSVR, which can estimate the energy gaps of polyvinyl alcohol composites, performed better than the E-OLR model. The performance enhancement was 80.34% in  The E-PSVR, which can estimate the energy gaps of polyvinyl alcohol composites, performed better than the E-OLR model. The performance enhancement was 80.34% in terms of CC, on training data, as depicted by Figure 5a. Similar performance improvements of 108.46%, 37.28%, and 32.77% in CC, RMSE, and MAE, respectively, were obtained at the testing stage of model development, as presented in Figure 5b-d. The results with all error metrics of the performance evaluation at each stage of model development are presented in Table 2.

The Doping Effect of Sodium-Based Dysprosium Oxide on the Energy Gap of Polyvinyl Alcohol Using E-PSVR
The effect of incorporating sodium-based dysprosium oxide on the energy gap of polyvinyl alcohol using is presented in Figure 6, which was calculated using E-PSVR. The results of the developed E-PSVR model match the measured values well [44].
The gap disjoining the conduction band from the valence band was reduced by the incorporation of the filler (sodium-based dysprosium oxide). This observation can be attributed to the induction of a localized electronic state which facilitated lower-energy electronic transitions [44]. The disorderliness in the doped samples increased as the filler concentration increased, due to structural change in the polymer consequent upon incorporation of the dopant. This experimentally observed energy gap reduction was well captured by the E-PSVR model, except for the sample with a 2% concentration of sodium-based dysprosium oxide, which showed a maximum deviation of 1.5% for the measured and estimated energy gaps of 3.62 and 3.6744 ev, respectively. The gap disjoining the conduction band from the valence band was reduced by the incorporation of the filler (sodium-based dysprosium oxide). This observation can be attributed to the induction of a localized electronic state which facilitated lower-energy electronic transitions [44]. The disorderliness in the doped samples increased as the filler concentration increased, due to structural change in the polymer consequent upon incorporation of the dopant. This experimentally observed energy gap reduction was well captured by the E-PSVR model, except for the sample with a 2% concentration of sodiumbased dysprosium oxide, which showed a maximum deviation of 1.5% for the measured and estimated energy gaps of 3.62 and 3.6744 ev, respectively.

The Importance of Benzoxazinone for the Energy Gap of Polyvinyl Alcohol Using the E-PSVR Model
The energy gap lowering effect of incorporating benzoxazinone on polyvinyl alcohol, as obtained by the E-PSVR model, is presented in Figure 7. The figure also presents a comparison between the obtained outcomes of the model and the measured values [6].

The Importance of Benzoxazinone for the Energy Gap of Polyvinyl Alcohol Using the E-PSVR Model
The energy gap lowering effect of incorporating benzoxazinone on polyvinyl alcohol, as obtained by the E-PSVR model, is presented in Figure 7. The figure also presents a comparison between the obtained outcomes of the model and the measured values [6]. The gap disjoining the conduction band from the valence band was reduced by the incorporation of the filler (sodium-based dysprosium oxide). This observation can be attributed to the induction of a localized electronic state which facilitated lower-energy electronic transitions [44]. The disorderliness in the doped samples increased as the filler concentration increased, due to structural change in the polymer consequent upon incorporation of the dopant. This experimentally observed energy gap reduction was well captured by the E-PSVR model, except for the sample with a 2% concentration of sodiumbased dysprosium oxide, which showed a maximum deviation of 1.5% for the measured and estimated energy gaps of 3.62 and 3.6744 ev, respectively.

The Importance of Benzoxazinone for the Energy Gap of Polyvinyl Alcohol Using the E-PSVR Model
The energy gap lowering effect of incorporating benzoxazinone on polyvinyl alcohol, as obtained by the E-PSVR model, is presented in Figure 7. The figure also presents a comparison between the obtained outcomes of the model and the measured values [6].  The observed energy gap reduction can be attributed to the formation of chemical and structural bonds. The molecules of benzoxazinone each form a bond with polyvinyl alcohol, which enhances the formation of trap levels existing between the lowest unoccupied molecular orbit (LUMO) and highest occupied molecular orbit (HOMO). Therefore, lower energy transitioning becomes feasible, leading to optical energy gap reduction [6].

Further Validation of the E-PSVR and n-PSVR Models Using External Data
To assess the performances of the hybrid E-PSVR and n-PSVR models, external validation was conducted with them. In the validation process, the developed models were only supplied with the model inputs. The models employed the acquired support vectors during the training phase of model development for performing external validation. It should be noted that the external data utilized for the validation process were not included in the training and testing sets of data used for model development. Validation of the n-PSVR model employed thirty-five polyvinyl alcohol composite polymers extracted from different sources. Twenty-eight polyvinyl alcohol composite polymers were used for validating the developed E-PSVR model. Table 3 presents the outcomes of the external validation with the inclusion of percentage error for each of the polyvinyl alcohol composite polymers. The mean absolute percentage errors (MAPE) for the developed n-PSVR and E-PSVR models were 7.92 and 7.57, respectively, for the employed validation data. The standard deviation (SD) of the mean error and the standard error of the mean (SEM) are also presented in the table.

Conclusions
The optical properties of polyvinyl alcohol composites were modeled in this work using hybrid support vector regression and particle swarm optimization. The results of the hybrid PSVR-based model were compared with the estimates of ordinary linear regression (OLR) models using error metrics such as RMSE, CC, and MAE. The E-PSVR model performed better than the E-OLR model with a performance enhancement of 80.34% in CC on the training data. Similar performance improvements of 108.46%, 37.28%, and 32.77% in CC, RMSE, and MAE, respectively, were obtained at the testing stage of model development.
The n-PSVR model also outperformed the n-OLR model using three error metrics. The E-PSVR model was used to investigate the significance of sodium-based dysprosium oxide and benzoxazinone on the energy gap of a polyvinyl alcohol composite. The results agree well with the measured values. The E-PSVR and n-PSVR models were externally validated using thirty-six and twenty-eight polyvinyl alcohol composites, respectively, and the obtained optical properties agree well with the measured values. The outstanding performance demonstrated by these models should strengthen and aid the design of polyvinyl alcohol-based composites for specific industrial and technological applications.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/polym13162697/s1, The developed source code that implements the proposed hybrid PSVR models is included as supplementary material. The employment data (training, testing and external validation) are also included to ease the reproducibility of the developed models.

Data Availability Statement:
The data supporting the results can be found in literature as stated in Section 3.1 and the additional dataset is also available in Section 4.5 Table 3 of the manuscript.