1. Introduction
There are different data analysis techniques to predict output values based on a set of different features, including machine learning and deep learning techniques. The application of these techniques does not have an exclusive disciplinary area, and a wide number of models can be found in almost all areas of knowledge. Regarding optical sensors, they have been used to improve several detection capabilities [
1,
2,
3,
4]. For example, Maryamsadat et al. reported on non-invasive glucose monitoring [
1], in which five different prediction models were applied, which were based on classification and regression methods, such as decision trees and artificial neural networks. In that work, the features used in the mathematical models were the transmission intensity of four wavelengths, and the estimated variable was the glucose concentration. In addition, Karapanagotis et al. proposed applying linear regression to estimate humidity and temperature from the output data of an optical sensor. That algorithm was trained by using Brillouin frequency shifts and the line widths of the fiber’s multipeak Brillouin spectrum as features and allowed the authors to minimize cross-sensitivity effects.
Specifically, machine learning and deep learning techniques have been used to simultaneously estimate two variables with high precision within a wide measurement range by analyzing interferometric optical sensor signals [
5,
6,
7], which is difficult to achieve by applying conventional methods, such as a sensitivity matrix [
8,
9]. With this last method, the measurement ranges of the output variables are limited due to the cross-sensitivity between the independent variables (measurands). Furthermore, some machine learning methods have also been used to enlarge the measurement range of one output variable [
6,
10,
11,
12], which, typically, is limited due to the 2π ambiguity presented by interferometric optical sensors, which, usually, is related to the free spectral range of the interferometer (
FSR). In another interesting example, the multiple regression model was implemented for the simultaneous measurement of the refractive index and temperature and to widen the measurement range by breaking the free spectral range limit [
7]. In that work, different link functions were tested, and the considered features were obtained from the spectral patterns of the interferometric arrangement. Another example is the work by Zizheng Yue [
10], in which a standard long short-term memory network was used to establish the relationship between the spectral intensity distribution information, sampled by an array waveguide output grating power data and the target measurand (displacement). According to the authors, the comparison between the real data and the estimated data reached a coefficient of determination of 0.99 in a wide measurement range.
In this work, it is shown that by applying Kernel Ridge Regression (KRR), it is possible to improve the measurement range of a multilayer interferometric sensing system. This method is based on a kernel function for which the inputs are two feature vectors that are extracted from the reflective spectrum of the interferometric system. Moreover, four kernel functions—Gaussian, exponential, Bessel, and inverse multi-quadratic functions—were used to estimate the values of the response variable (temperature) over a large measurement range. Here, it is shown that from a reduced experimental dataset, a larger synthetic dataset could be built to train and validate the model. Moreover, the synthetical dataset was divided into the training and evaluation datasets and the experimental dataset containing the original measured information. Furthermore, the efficiency of the model was evaluated with the root-mean-square error (RMSE) obtained for the three datasets. Here, the optimal parameters of the models were determined by considering these three RMSE values. Finally, it is presented that by implementing the algorithm with a Gaussian kernel, the temperature could be estimated with an RMSE of 0.094 °C in the experimental dataset, for a measurement range that covered eight FSR periods. This is quite important because, with traditional methods, the measurement range is usually limited to one FSR period.
2. Experimental Setup and Interferometric System Model
The physical model of the interferometric system used to study the viability of KRR to estimate the output variable is shown in
Figure 1a. This system was based on an arrangement of three stacked layers (L1, L2, and L3) at the tip of a single-mode fiber (SMF) and an external expander (L4). Here, it is important to mention that the light was not in contact with layer L4. The details of the fabrication of the interferometric system and its mathematical model have previously been explained in detail in [
13]. In addition, with the setup presented in
Figure 1b, a set of experimental reflected spectra was obtained at different temperatures. Here, the light from the broadband source was transmitted to the fiber-coupled interferometric system through an optical circulator model 6015-3 (Thorlabs Inc., Newton, NJ, USA). The output spectrum of the interferometric system was monitored by an optical spectrum analyzer (OSA) (Yokogawa Test & Measurement Corporation, Musashino, Japan). Finally, a thermal electrical cooler (TEC) model HLD001 (Thorlabs Inc., Newton, NJ, USA) was used to control the temperature.
The relative reflected intensity of spectra can be modeled by using a mathematical model that considers the main reflected rays between layers [
13]. The spectrum generated by one layer is a pattern of periodic fringes with an
that is inversely proportional to the thickness (
) and refractive index (
) of the layer. For a multilayer filter, the resulting fringe pattern is formed by the superposition of the patterns generated by each one of the layers. In this sense, if these patterns have different
s, the overall spectrum will be a pattern of fringes with modulated amplitudes. For our filter, the values of the thicknesses were
= 321.9 nm,
= 31.499
m,
= 495.38
m, and
= 4000
m, and the values of the refractive indexes were
= 1.44,
= 1.2,
= 1.45, and
, where
A =
− 0.028 [
14], and
is the wavelength. Here,
,
, and
were considered constant functions within the wavelength range of 1500
1650 nm. A couple of experimentally recorded spectra of the filter are shown in
Figure 1c. The narrowest separation between the fringes with an
corresponds to layer L3, and the separation between the peaks of the envelope with an
corresponds to layer L2 [
6].
As the materials of layers L2, L3, and L4 had thermal properties, the interference spectrum was shifted when the temperature was varied. The changes in the spectra were mainly governed by the values of the thermo-expansion (
) and thermo-optic (
) coefficients and the thicknesses of each layer. In this sense, the thickness and the refractive index as a function of temperature (
) of each layer were
and
where
and
are the thickness and the refractive index at the reference temperature (
). An example of these changes is the behavior of the maximum amplitude (MA) and the wavelength positions (WP) of the peaks of the interference spectrum as a function of temperature shown in
Figure 1c. In this figure, it is observed that the fringe amplitudes of the blue spectrum with an
were modulated by the fringes of the spectrum with an
. When the temperature of the measurement system was changed, the spectrum was shifted, causing a shift in the wavelength and a change in the maximum amplitude of the fringe peaks of the spectrum with an
. The behavior of the spectral response of this interferometer has been explained in detail in a previous work [
13]. Here, it is important to mention that the slope associated with
was different for temperatures greater than 30 °C. Therefore, the change in the thickness
was not constant as the temperature increased [
15].
An example of the experimental behavior of the MA and the WP (red circles) of some peaks of the interference spectrum as a function of the temperature of the sensing interferometric system is shown in
Figure 2. For this system, it can be seen that a quasi-linear relationship could be established between temperature and the MA of one fringe, but it was limited for a measurement range of shorter than one
, which was
6 °C. For example, for P4, a linear relationship between the MA and the temperature could be defined for a measurement range from 9.3 to 12.9 °C (
Figure 2d). Now, with respect to the linear relationship between temperature and the WP, the measurement range was also limited to less than one
. Here, it is shown that the KRR machine learning method could be able to estimate the response variable by considering a set of nonlinear explanatory variables. Here, firstly, an experimental dataset was formed with features extracted from all recorded reflection spectra. These features were the changes in the wavelength and maximum amplitude of some interference fringe peaks. Secondly, all features of the experimental dataset were interpolated to generate a synthetic dataset. By visual observation, it was expected that the synthetic dataset was reliable, since it fit very well all the experimental data points, as can be seen in
Figure 2, where the blue lines represent the interpolated data.
3. Mathematical Model of KRR
In KRR, a set of features,
, and the outcome values associated with these features
) are used to estimate the value of a response variable
with the following expression:
where
is a real regularization parameter,
is an
identity matrix,
is a column vector that contains the features of the
n-th case,
is the number of features,
is an
matrix, where
is the total number of cases, and
is a column vector. The matrix kernel,
, is an
matrix, and it is expressed as follows:
Here,
is a column vector, which is described by the following:
The four kernel functions that were applied for the data analysis of the sensing interferometric system are listed in
Table 1. Moreover, for the evaluation of the goodness of estimation, the real (experimental) and estimated output values were compared with
Implemented Algorithm
The KRR model is based on a parameter
and a kernel function that has a parameter
. Now, to estimate the values of the response variable with high accuracy, an algorithm was implemented to find the optimal values of these parameters. In this algorithm, the heuristic method was used to explore all combinations formed with the proposed value sets (
Table 1) to obtain the optimal
and
parameters. The steps of this algorithm are as follows:
The data are divided into three sets: the training data, evaluation data, and experimental data.
The values of the features for all the cases of the training data, the evaluation data, and the experimental data are put as the inputs of the matrixes , , and , respectively. Their corresponding associated output values are the inputs of the vectors , , and , respectively.
A set of values is proposed.
A value of is chosen, and the matrix kernel is evaluated by using (Equation (2)).
A set of values is defined.
For one of the cases of the matrix , its vector with features, , and the value is used to obtain the vector (Equation (3)). With the selected value and the vector , the value of the response variable is estimated (Equation (1)). This step is repeated for all the cases.
The is calculated (Equation (4)) with the values of and , where contains as inputs the values of . This step is repeated for all the values of .
The optimal value of is considered as the one for which the minimum is obtained.
For all the cases of the matrix , the response variable is estimated by using its features vector (, the and the values, the matrix , the vector and the vector .
The is calculated with the values of and .
For all the cases of the matrix the response variable ( is estimated by using its features vector (, the and values, the matrix , the vector and the vector .
The is calculated with the values of and .
Steps 4–12 are repeated for the entire set of values of proposed in step 3.
The values of and of the model are the values for which the values of , , and present small values within the smaller range between these values. In this sense, these error values are labeled as , , and .
From the values obtained in step 14, the value is considered as the RMSE value reached with the proposed algorithm, denoted as the .
4. Results
The experimental output data at different temperatures of the interferometric system were obtained by means of the implemented setup (
Figure 1b). Here, the temperature was varied in the range from
5 to 50 °C in 97 steps. Moreover, for each temperature step, four spectra were recorded, and the time elapsed between the first and the fourth measured spectra was
8 min. In this way, there were 388 experimentally recorded spectra, and from these, the MA and the WP of the first 12 peaks (
) of fringes occurring above 1540 nm were extracted, and these were taken as features (
Figure 2). Later, these feature values were interpolated to strengthen the database. In this sense, a synthetic dataset of size 24
1000 was obtained. Afterward, the synthetic dataset was divided into the training dataset (TD) and the evaluation data (ED), for which 80% and 20% of the registers were randomly selected, respectively. Hence, the features of the TD were allocated in the matrix
of size 24
800, while the features of the ED were in the matrix
of size 24
200. Moreover, the synthetic values of the output variable for the TD and ED were allocated in the
and
vectors of size 800
1 and 200
1, respectively. Furthermore, the features of the experimental dataset (XD) were contained in the matrix
of size 24
97, while the corresponding experimental outputs were saved in the
vector of size 97
1.
Later, the mathematical model for estimating the output of our interferometric system was implemented by considering different
and
values and three kernel functions.
Table 1 lists these functions and the
values used. Additionally, the set of
values was defined as
for
and
. The results obtained with the Gaussian (GK), exponential (EK), Bessel (BK), and inverse multi-quadratic (MK) kernels are shown in
Figure 3,
Figure 4,
Figure 5 and
Figure 6, respectively. The
obtained for some of the used
values as a function of
are shown in
Figure 3a,
Figure 4a,
Figure 5a and
Figure 6a for the GK, EK, BK, and MK, respectively. Here, it can be observed that for each
value, a curve was obtained. From each of these, the minimum
needed to be localized. For instance, in
Figure 3a, the minimum
values for three different cases are marked with asterisks. In this way, for each
, the optimal
was the one for which the smallest
was obtained, and we labeled it as
. The obtained
values as a function of
are shown in
Figure 3b,
Figure 4b,
Figure 5b and
Figure 6b for each kernel.
Furthermore, the model was evaluated again but now considering the optimal
and different
values, and the resulting RMSEs are shown in
Figure 3c,
Figure 4c,
Figure 5c and
Figure 6c for each kernel. In these figures,
,
, and
correspond to the TD, ED, and XD datasets. Here, the
values were determined by considering the criteria mentioned in step 14 of algorithm 1. In this case, the smallest ranges and means between the
,
,
were 0.040–0.080 °C, 1.727–3.662 °C, 0.242–0.395 °C, and 0.173–0.225 °C for the GK, EK, BK, and MK, respectively. Now, with the selected
and
values, the outputs of the synthetical TD (blue circles) and ED (cyan points) datasets were estimated (
) and are shown as a function of the original synthetic (
) values in
Figure 3d,
Figure 4d,
Figure 5d and
Figure 6d for each kernel. In these figures, it can be observed that the RMSE values for the training dataset were fitted very well with all kernels; however, the GK provided the best fit for the evaluation (ED) dataset, while the poorest fit was obtained with the EK. Finally, by running the model considering the selected
and
values, the estimated temperatures for the EX dataset were obtained, and these are plotted as a function of the experimental values in
Figure 7. Here, it can be clearly observed that the best linear relationship between the estimated and experimental values was obtained when the GK was used.
The
,
,
,
, and
values obtained by using the four kernel functions for the sensing interferometric systems are listed in
Table 2. It can be seen that the best results were obtained with the Gaussian kernel for the measurement range used. Here, it should be pointed out that the KRR method allowed us to be able to estimate the temperature over a wide measurement range, which cannot be achieved by tracking only one variable, such as the maximum amplitude or the wavelength position of one fringe, as in the conventional method, due to the periodical behavior of the features (
Figure 2). In addition, it should be noted that the implemented KRR model was trained with spectral features, which have physical restrictions governed by the behavior of the interferometer spectra. In our case, the physical constraints of each feature are listed in
Table 3. Finally, it is important to mention that, in the way the model was trained, it was just validated for predicting the output variable (temperature) within the range from 4.5 to 50 °C, covering the experimental range for which the spectra were recorded. In future work, the capability of extending the predicting range outside the range for which the model was trained will be studied.