PreOBP_ML: Machine Learning Algorithms for Prediction of Optical Biosensor Parameters

To develop standard optical biosensors, the simulation procedure takes a lot of time. For reducing that enormous amount of time and effort, machine learning might be a better solution. Effective indices, core power, total power, and effective area are the most crucial parameters for evaluating optical sensors. In this study, several machine learning (ML) approaches have been applied to predict those parameters while considering the core radius, cladding radius, pitch, analyte, and wavelength as the input vectors. We have utilized least squares (LS), LASSO, Elastic-Net (ENet), and Bayesian ridge regression (BRR) to make a comparative discussion using a balanced dataset obtained with the COMSOL Multiphysics simulation tool. Furthermore, a more extensive analysis of sensitivity, power fraction, and confinement loss is also demonstrated using the predicted and simulated data. The suggested models were also examined in terms of R2-score, mean average error (MAE), and mean squared error (MSE), with all of the models having an R2-score of more than 0.99, and it was also shown that optical biosensors had a design error rate of less than 3%. This research might pave the way for machine learning-based optimization approaches to be used to improve optical biosensors.


Introduction
Knight [1] first proposed the notion of photonic crystal fiber (PCF), which consists of a solid or hollow core with a microstructure arrangement extending throughout the fiber's length. PCFs are becoming more and more widespread because of their design freedom, low cost, resilience, rapid detection, compact size, high sensitivity, and versatility. As a result of these characteristics, it is utilized in optical sensors [2][3][4], optical lasers [5,6], spectroscopy [7,8], Raman scattering [9], and a variety of other applications. The finite element method (FEM) [10,11], block-iterative frequency domain methods [12,13], and plane wave expansion methods [14,15], are all numerical approaches for improving the properties of PCFs. Such procedures, however, take a long time to complete. Furthermore, determining a PCF's properties typically needs a considerable quantity of data, and all other parameters must be recalculated when one parameter is altered. As a result, parameter updates as well as data collection take a while to execute. Machine learning (ML) is now the most effective solution for these challenges, and it has begun to address them.
Machine learning is a technique for developing a standard function for a hypothesis or for learning specific indicators to distinguish different inputs as separate samples. A well-balanced dataset may be used to effectively train machine learning algorithms and help them in making highly accurate predictions. Over the last several years, machine learning techniques have been employed in a range of sectors, including medical research, transportation, agricultural science, plasmonics, biosensing, traffic categorization, network security, and many more. In the realm of PCF, ML has been utilized to optimize and 1.
To investigate the attributes of PCF sensors using various ML algorithms.

2.
To make some changes to improve the model's accuracy as well as to minimize the error rate. 3.
To estimate the output faster than direct numerical simulation strategies.

Machine Learning and Optimization
To make predictions or judgments without needing to be explicitly programmed to do so, ML algorithms build a model based on training data [23]. In many fields, such as health, computer vision, speech recognition, and email filtering, where constructing conventional algorithms to perform the necessary tasks is challenging or even impractical, ML methods are used [24,25]. The minimization of a loss function on a training set of instances is a common formulation for learning problems, which connects machine learning to optimization. Loss functions describe the discrepancy between the predictions of the model and the actual occurrences of the problem. These models are trained to properly predict the labels that have been assigned to a group of samples [26]. The four approaches used are briefly described below.

Least Squares Regression (LSR) Method
The LSR approach has traditionally been used to minimize the sum of squares of error terms with a homogeneous variance and normal distribution and, therefore, to improve the model [27]. In terms of optimization, it solves problems of the form [28]: where X stands for the independent input variable, Y stands for the model output, and the symbol β stands for the model parameter. The LSR approach relies on a number of assumptions in order to produce trustworthy results. The regression model should be represented in a linear form, according to the starting assumption. This implies that the average error predicted by the regression model should be zero. The residual variance must be constant, and the residuals must not be correlated with one another (no autocorrelation) [29]. Certain assumptions must be made while using the LSR technique, such as the absence of a linear relationship between independent variables. The model reliability is contingent on the LSR assumptions being met. When all of these conditions are met, the LSR approach can produce the best results. Multicollinearity is defined as a linear relationship between the independent variables. While multicollinearity improves the covariance and variance of regression coefficients, none or only a few of the explanatory variables will be statistically significant if the model's R 2 value is changed. As a result, model misunderstanding is caused by multicollinearity [30].
The LSR method has a significant benefit over all other estimating methods, such as traverse adjustments, in which it is theoretically and statistically defensible and hence a fully rigorous procedure. By minimizing the sum of the squares of the mistakes to a minimal value, the LSR principle states that it is possible to determine the most likely values of a system of unidentified variables on which measurements have been made.

LASSO Method
For linear modeling, LASSO stands for Least Absolute Shrinkage and Selection Operator. In linear models, the LASSO regression method is a regularization technique for estimating unknown parameters [31]. It modifies model parameters based on a loss function while taking model complexity into account. When there are significant correlations between predictors, the LASSO will randomly select one and ignore the others, and when all predictor variables are comparable, it will collapse [32]. The LASSO penalty forces the majority of coefficients to be zero, with only a small percentage likely to be nonzero. It usually consists of a linear model with an additional regularization parameter. The following is the objective function [33] that must be minimized where λ ≥ 0 signifies the regularization constant and ||β|| 1 denotes the coefficient vector's 1 -norm penalty. For correctly selected λ, the 1 penalty allows the LASSO to regularize the least squares fit while simultaneously decreasing certain components of β to zero. The cyclical coordinate descent [32] method is faster than the LRS method [28] at evaluating all LASSO solution routes for λ. The LASSO is a common feature selection approach due to these factors. Nonetheless, the LASSO has three major flaws: it lacks the oracle feature, is incompatible with high-dimensional data, and cannot select more features than the sample size without saturating [34]. On the other hand, because the coefficients have been decreased and deleted, it can deliver exact predictions without a large increase in biases.

Elastic-Net (ENet) Method
The ENet approach is a LASSO extension that can handle features with strong correlations [32]. The ENet was suggested to analyze high-dimensional data in order to avoid the instability of LASSO when predictors were significantly linked [34]. The ENet ap-proach combines 1 (Lasso) and 2 (Ridge Regression) penalties, and it can be represented mathematically as [28]: The regularization constant ρ is 1 ratio, and the 1 and 2 norm penalties of regularization coefficients are ||β|| 1 and ||β|| 2 2 . When numerous properties are linked together, the ENet is effective. The ENet's 1 section automatically chooses variables, whereas the 2 segment allows for grouped selection and regulates solution paths to improve prediction. The ENet may discover and choose groups of correlated variables even when the groupings are unknown by using a grouping effect throughout the variable selection, such that a group of very correlated features appears to have coefficients of the same scale. When p n, where n is the sample size and p is the predictor numbers, the ENet chooses more than n variables, unlike the LASSO. The ENet, on the other hand, lacks the oracle property.

Bayesian Ridge Regression (BRR) Method
BRR is a subset of Bayesian regression that belongs to the ridge regression category, and it possesses all of the properties of both Bayesian and ridge regression [35]. The Gaussian distribution is used in Bayesian regression, which is a probabilistic method [36]. In order to optimize posterior prediction, the 1 regularization must be used. The weighted coefficient W is determined from a Gaussian distribution, which is the only difference between Bayesian and BRR. The Gaussian prior considerably reduces the magnitude of all effects in BRR. A spherical Gaussian provides the prior W for the coefficient: The priors over α and λ are gamma distributions, which are the conjugate prior for Gaussian precision. By default, the value of λ 1 , λ 2 , α 1 , and α 2 is 10 −6 . BRR is the name given to the resulting model, which is comparable to the classic Ridge. I P represents the identity matrix. Although BRR takes a long time to run, it excels at adjusting to small data parameters and is simple to use in regularization problems and change hyper-parameters. Furthermore, the BRR focuses on feature selection in order to limit the number of inputs and then rank them according to their predictive value to the estimation method. In reality, BRR produces similar results to the maximum a posterior (MAP) technique.

Optical Sensor
Due to their great precision and sensitivity, optical sensors are becoming increasingly popular. According to their design and measurement procedures, the most common optical sensors can be grouped into three categories: (1) spectroscopy-based sensors, (2) SPRbased sensors, and (3) negative curvature-based sensors. Fiber optical sensors have been developed for various applications such as environmental monitoring, salinity monitoring, temperature monitoring, magnetic fluid monitoring, agricultural monitoring, industrial monitoring, food quality monitoring, health monitoring, and military operations over the last two decades in medicine, air force, and industry. Optical biosensors, among many uses of optical sensors, usher in a new age in the field of biological monitoring and detection [3,37].
Optical biosensors are analytical smart devices that are used to monitor a variety of biological components in conjunction with the appearance of chemicals or analytes. Nowadays, PCF-based biosensors are utilized to detect physiological components such as RBCs, hemoglobin, plasma cells, WBCs, cancer cells, tuberculosis, pregnancy, urine, medications, virus, SARS-CoV-2, and so on. The large variety of real-time applications of optical biosensors, as well as design freedom, fast detection time, compact size, simplicity, resilience, and low cost, have piqued researchers' interest. As a result, in recent years, a number of researchers have looked at using plasmonic materials rather than gold and silver to improve the quality of optical biosensors. ML techniques are also used to speed up the detection process and improve accuracy. This quick biochemical diagnosis will help to reduce the danger of contamination and exposure [38][39][40][41]. Figure 1 shows the cross-sectional illustration of an optical biosensor with specifications. Core radius, cladding radius, pitch, analyte, and wavelength are five optical design factors for the proposed biosensor. The radius of the core and cladding in this study ranged from 2.8 µm to 3.00 µm, with a pitch of between 7.00 µm and 8.00 µm. Furthermore, during the operating range of wavelength 1.4 µm to 2.0 µm, the examined analyte range was 1.33 to 1.35. Some of the most important optical guiding properties are described as follows.

Symbols Specification with Values
Core Radius

Effective Refractive Index (ERI)
ERI is a number that compares the phase delay in a waveguide per unit length to the phase delay in a vacuum. ERI is symbolized as n eff , with RIU as the unit. ERI can be expressed mathematically as follows: where A stands for the real value of ERI and B stands for an imaginary value of ERI.

Optical Power Profiles (OPP)
Optical sensors are used to monitor buildings that originate, supply, distribute, and transform electrical power in the energy field. Core power, cladding power, and total power are the three categories of optical sensor energies. An optical sensor's performance is ensured by its high core power.

Optical Power Fraction (OPF)
The OPF is a unitless quantity that specifies how much light enters the optical sensor. Core and cladding optical power fractions are two different forms of optical power fractions. Core power fractions (CPF) with high values suggest the highest sensitivity and lowest cladding power fraction (CLPF). Mathematically, the total, core, and cladding powers are computed as follows [3,[37][38][39]: where the transverse electric and magnetic fields are denoted by the symbols E x , E y , H x , and H y , respectively. Mathematically, the core and cladding power fractions are computed as follows [3,[37][38][39]: CLPF (%) = Cladding Power Total Power × 100 (10)

Optical Effective Area (OEA)
The area that a waveguide or fiber mode successfully covers in the transverse dimensions is measured in this mode of operation. Fiber or other waveguide modes have smooth transverse profiles, making defining a mode area difficult, especially for sophisticated mode shapes where a 1 e 2 intensity requirement, such as for Gaussian beams, is not appropriate. OEA is symbolized as A eff , with m 2 as the unit. The OEA can be defined as follows [3,[37][38][39]: where E is the amplitude of the electric field.

Optical Loss Profiles (OLP)
To fulfill the boundary condition, a circular perfectly matched layer (PML) is utilized, which prevents possible reflection at the border. The imaginary component of the ERI can be used to calculate confinement or leakage loss. OLP is symbolized as L c , with dB/cm as the unit. The confinement or leakage loss is calculated as follows [3,[37][38][39]: In this case, the wave number in free space is 2π λ . L c is the result of scattering light escaping from the core to the outer materials, which is calculated using the imaginary portion of n eff at the specific wavelength λ.

Optical Sensing Profile (OSP)
The relative sensitivity coefficient measures the interaction between light and the analyte, which can be calculated as follows [3,[37][38][39].
where n r denotes the refractive index of the detected liquid within the core and n eff denotes the model effective index.

Limitations and Proposed Solutions
Due to restrictions, any changes in optical sensor design parameters will result in changes in sensor performance, which is time-consuming to optimize. Different ML parameter estimate models that are faster and more accurate can be used to solve the optical sensor design optimization challenge.

Methodology
For the proposed research, quantitative data is necessary. The data was gathered through simulation. After some preparation, such as handling noisy data and measuring the correlation between variables using Pearson's correlation testing method, a valid dataset was formed. In order to determine the best one, regression techniques are applied. Figure 2 illustrates the full procedure of the research.

Machine Learning algorithms
Least

Design and Dataset Collection
Data collection is a methodical process for gathering and analyzing particular information in order to respond to pertinent questions and evaluate results. It focuses on discovering all there is to know about a certain subject. Data is obtained in an effort to test hypotheses in an effort to comprehend a phenomenon. Here was created a rudimentary PCF model, which is shown in Figure 1.

Dataset Distribution
A technique or list that shows all potential values (or intervals) is a distribution of the data. Additionally, it displays the frequency of each value, which is important. Figures make it simple to evaluate both the numbers and the frequency in which they exist. A project's data is typically organized from the smallest to the largest. The dataset was split into training and testing portions by 70% and 30%, respectively, in this research.

Training, Testing and Evaluation
The dataset was fed into the ML algorithms for training, and the model was evaluated using a new tuple. In this research, 690 data points from 1402 were utilized to train the models, and 13 data points were used to visualize the model performance.

Result Analysis and Discussion
The trained models have been validated in this section. Unknown input parameters have been used to test the model. With wavelength changes ranging from 1.4 µm to 2.0 µm, the suggested model has been trained to predict effective indices, core power, and total power. After that, sensitivity, confinement loss, core power percentage, and effective mode area are calculated. This section includes various graphical visualizations and comparison tables in terms of the R 2 -score, MSE, and MAE.

ERI for X-Axis and Y-Axis
The predicted and simulated outcome values are represented by a straight line and different symbols in Figure 3a,c for real values of ERI in the X-axis and Y-axis, respectively. Methods are represented by different color dots, and the solid line represents the simulated values. The ERI values drop as the wavelength rises. The predicted values in Figure 3a,c are substantially equal to the simulated values. The greatest value of R 2 for both the X-axis and the Y-axis is 0.9994 using the LS and BRR methods. However, when compared to other applied techniques, the LS method has the lowest mean square error (MSE) of 3.9729 × 10 −8 and 3.9921 × 10 −8 for both the X-axis and Y-axis, respectively. Furthermore, all applied approaches have an MAE of 0.0001 in common. Table 1 shows the similarities in greater detail. All of the algorithms predict results that are nearly indistinguishable in the imaginary section of the ERI. At first, the predicted values are lower than the simulated values, but after 1.5 µm, the predicted values are higher than the simulated values in Figure 4a,c. The LS method has the highest R 2 of 0.9436 and 0.9435, the lowest MSE of 5.7626 × 10 −9 and 5.7656 × 10 −9 , and the lowest MAE of 6.0969 × 10 −5 and 6.1013 × 10 −5 for the X-axis and Y-axis, respectively, when compared to all other strategies. Table 2 shows the results of the detailed comparisons and also exhibits that the ENet method has the weakest performance when compared to the other strategies. However, the ENet approach had an absolute error of 0.00015 at 1.85 µm wavelength, which is fantastic news. As a result, we may state that practically all used approaches have a 1-5 percent error rate. Furthermore, the results of the proposed sensors in both axes indicate similar results, which is why the fundamental mode is chosen for further investigation.      Figure 5 depicts the EMA as a function of wavelength variations over the working range of 1.4 to 2.0 µm, which clearly shows that there is no noticeable difference between the estimated model results and the simulated outcomes. In comparison to the four applied models, it is also obvious that LS approaches perform the best and ENet performs the poorest. The absolute inaccuracy for the ENet approach is 3.1 × 10 −13 at 1.8 µm wavelength, as seen in Figure 5. It demonstrates that the differences between predicted and simulated results are nearly identical. Furthermore, the MSE of least square, LASSO methods is of the order of 10 −25 , whereas ENet, BRR methods are of the order of 10 −24 . Furthermore, the MAE of all applicable methods is of the order of 10 −13 .

Total Power and Core Power
Core power and total power fluctuate due to light intensity in the core and structural changes in the fiber. The total power values of the PCF model grow as the wavelength increases in Figure 6a. As the wavelength gets longer, the total power values for the least squares and maximum posterior approaches get closer to the simulated one. In contrast, the LASSO and ENet approaches represent the reversed scenario. For lower wavelength values, the expected values are virtually comparable. Figure 6b shows that the ENet and LASSO methods produce lower values than the simulated ones, but the LS and BRR methods produce higher values for wavelengths between 1.4 µm and 1.8 µm, and then the values approach the simulated values.

Core Power Fraction (CPF)
A power fraction is needed to calculate sensitivity. It refers to the ratio of total power to core power. The suggested PCF model has a lower power percentage for longer wave-lengths. Figure 7a shows that all of the approaches had lower values than the simulated ones. Furthermore, the LS, LASSO, and BRR techniques predict closer outcomes at lower wavelengths and show absolute errors on the order of 10 −5 at the lower operating wavelength (1.45 µm), respectively, while ENet displays 10 −2 . At the upper working point of wavelength (1.90 µm), the LS, LASSO, and BRR approaches show an absolute error of the order of 10 −3 , while the ENet method displays 10 −2 .

Optical Sensitivity Profile (OSP)
The power fraction, analyte, and ERI all influence PCF sensitivity. The predicted values of various ML models, as well as simulated values, were used. The suggested PCF model has a higher sensitivity for shorter wavelengths. As demonstrated in Figure 9a, all of the predicted values are identical to the simulated values. For assessing the sensitivity profile of the proposed optical sensor, almost all approaches function well, with absolute errors in the order of 10 −8 . Table 3 demonstrates that the LS method outperforms other strategies, with an R 2 of 0.9994, an MSE of 3.9 × 10 −8 , and an MAE of 1.5 × 10 −4 . The ENet approach, on the other hand, has the lowest performance, with an R 2 of 0.9990, an MSE of 6.2 × 10 −8 , and an MAE of 1.9 × 10 −4 .

OSP Evaluation for Different Volume of Datasets
The performance of different methods for the proposed sensor and various numbers of input datasets is shown in Figure 10. It is obvious from the illustration that the performance of applied algorithms improves as the amount of trained data grows. In comparison to dataset-1, datasets 2 and 3 include 1.5 and 2 times more data. It is now clearly demonstrated that when the volume of the dataset grows, the performance of the applied methods for the presented sensors improves as well. The changes in the approaches used are, however, relatively minor. The various performance indicator values of the employed methods are shown in Table 4. Finally, we can state that the applied ML algorithms exhibit remarkably consistent performance across a wide range of trained datasets.

OSP Evaluation for Different Volume of Outliers
The sensitivity performance of several approaches for various volumes of outliers is depicted in Figure 11. It is obvious from the preceding study that the applied approaches produce stable results for the input data when outliers are absent. To test the performance of the applied methods, 5% and 10% outliers are introduced to the input data, and the methods are trained repeatedly. In comparison to Dataset-3, Datasets 4 and 5 include 5% and 10% outliers, respectively. The provided approaches then demonstrated stable performance for the proposed sensor with outlier data as well. Table 5 depicts the changes in technique performance by displaying several performance indicator values. The performance of different applied procedures, however, varies slightly. Even so, the effectiveness of the LS method is consistent across diverse groups of outliers.

Overall Performance Evaluation
As a consequence, the LS method clearly outperforms the other four techniques when it comes to predicting outcomes. Because the trained data was properly distributed with no outliers, the LS offers the best results. So, we also put the performance of the ML techniques to the test by merging outliers ranging from 5% to 10% of the total data. We noticed that LS R 2 -score performance was decreasing, from 99.94 percent to 96.52 percent, while other techniques did not degrade as significantly. After training the model, fast and precise prediction of PCF parameters is possible in a few milliseconds, which is approximately 99 percent faster than standard numerical simulation approaches. Finally, for optimizing an optical biosensor design, the proposed LS technique is the most effective when high-quality training data is available.

Conclusions
We proposed four alternative machine learning algorithms in this study to predict optical sensor design-dependent characteristics as well as optical sensor attributes. By adjusting the core and cladding radius (2.8 µm to 3 µm), pitches (7 µm and 8 µm), analytes (1.33 to 1.35) and wavelengths (1.4 µm to 2.00 µm), the mode effective indices, effective area, core, and total power confinement of suggested optical biosensors are obtained. A total of 690 data points are utilized for training the proposed ML model, with another 13 data points used to visualize the trained model performance. Almost all machine learning algorithms effectively anticipate optical design-dependent parameters and optical sensor qualities (about 99.94%), which are close to simulated values. The best LS technique achieves the highest R 2 -score of 0.9994, the lowest MAE of 3.9 × 10 −8 , and the MSE of 1.5 × 10 −4 for sensitivity profile prediction. On the other hand, the worst Elastic-Net approach has an R 2 -score of 0.9990, an MAE of 6.2 × 10 −8 , and an MSE of 1.9 × 10 −4 . As a result, it is evident that the least squares method outperforms the other three techniques in terms of prediction outcomes. We also test the performance of the ML approaches by combining outliers ranging from 5% to 10% of the total data. We found that the R 2 -score performance of LS fell from 99.94 percent to 96.52 percent. Finally, the proposed LS approach is efficient for optimizing an optical biosensor design, assuming that the model parameters are consistent and that training data conform to the true distribution without outliers.