1. Introduction
In recent years, fiber optic sensors (FOS) have gained significant attention due to their exceptional sensitivity, reliability, and immunity to electromagnetic interference. These sensors have been extensively employed in various fields, such as telecommunications, aerospace, and structural health monitoring, where accurate measurement of physical parameters is critical [
1,
2,
3,
4]. Among different types of FOS, the fiber Bragg grating (FBG) sensor has emerged as a promising technology for precise and distributed sensing applications.
Tilted FBG (TFBG) sensors have been used to measure various physical or environmental parameters, such as temperature, strain, pressure, and refractive index (RI) [
5,
6,
7,
8]. The tilted grating structure in a TFBG sensor induces coupling between the fundamental core mode and cladding modes of the fiber, leading to higher sensitivity than conventional FBG sensors. By measuring the shift in Bragg wavelength caused by changes in surrounding physical/environmental parameters, a TFBG sensor can accurately measure the desired parameter [
9].
One key parameter that plays a crucial role in FBG-based sensing is the spectral width of FBG spectrum, which determines the resolution and precision of the sensor system [
10]. However, the precise estimation of the spectral width presents a significant challenge due to several factors, including sensor property, environmental noise, and limitations of conventional data processing techniques.
Recent advancements in artificial intelligence (AI) have significantly bolstered the accuracy and robustness of TFBG sensor measurements, with notable progress in machine learning (ML) and deep learning (DL) algorithms. ML employs algorithms and statistical models for task execution based on patterns and inference. DL, which is a subset of ML, employs multi-layered neural networks to discern intricate patterns in the given data. Notably, ML has been pivotal in advancing photonic and optical fiber-based devices, surmounting challenges like processing speed constraints, noise reduction over long fiber lengths, data management, nanophotonic meta-surface generation, and addressing cross-sensitivity issues [
11,
12,
13].
In a recent study, unsupervised ML techniques, specifically k-means clustering and principal component analysis (PCA), were applied for signal analysis in plasmonic-based sensors coated with Au-Pd polymers [
14]. Another study proposed the use of PCA as a data analysis technique for monitoring multiple parameters, identifying correlations, and reducing dimensionality in aquaculture data analysis [
15]. Furthermore, a study introduced a FBG temperature sensor array and employed various ML algorithms for fluid level detection, achieving a root mean square error (RMSE) of 3.56 cm [
16]. Avellar et al. presented a transmission–reflection analysis system that utilized dielectric nanoparticles (NPs)-doped fibers and AI to achieve high spatial resolution in distributed sensing [
17]. ML techniques enable the analysis and optimization of micro-structured film designs, including v-cut, lenticular shapes, and patterned holes, to transform light distribution, enhance efficiency, and maintain a low unified glare rating (UGR) in LEDs [
18]. ML algorithms can also facilitate accurate data analysis for high-performance fuel gauging sensors [
19]. In the realm of fiber-optic biosensors, ML techniques enhance accuracy and reliability by extracting information from spectra, enabling sensitive detection of cortisol levels [
20]. Several studies have demonstrated the potential of generative ML algorithms in optimizing and developing novel grating structures [
21,
22].
For optimizing and enhancing the performance of FBG sensors, ML algorithms have been explored in several studies. These advancements aim to improve the estimation of measurands based on the optical characteristics of the signal. Various regression algorithms, such as decision tree, random forest, K-nearest neighbour (KNN), and Gaussian process regression (GPR) models, have been investigated to optimize the estimation of measurands [
23]. Additionally, an optimization method based on the nondominated sorting genetic algorithm II (NSGA-II) has been proposed to determine the optimum grating parameters for FBG sensors based on application requirements [
24]. A demodulation system for FBG sensors based on a long-period fiber grating (LPG) driven by AI techniques was developed, reporting high-precision wavelength interrogation [
25].
Focusing on the FBG sensor, this research paper leverages the capability of DL models to enable precise spectral width (full width at half maximum, i.e., FWHM) measurement in FBG sensor spectra through advanced experimental data processing. Specifically, we explore the application of DL algorithms to improve the estimation of FWHM and investigate their performance compared to conventional techniques.
The main contribution of this research are as follows:
Development of a novel framework for estimating FWHM of TFBG-based glucose sensor data: We developed a novel DL-based framework for FWHM estimation in TFBG-based glucose sensor data.
Comprehensive evaluation of the proposed approach: Extensive experiments were conducted to rigorously evaluate the performance and applicability of the proposed DL model on real-world TFBG glucose sensor data.
Proposing a more robust data analysis method: The sensor data are reformulated using a comprehensive DL assisted framework. This approach addresses the high variability in experimental data, leading to more accurate and reliable results.
Overall, this research presents a significant advancement in FOS by proposing a refined methodology for analyzing sensor data. By integrating DL techniques with innovative attention mechanisms and evaluation metrics, it establishes a new standard for enhancing the performance and data quality of the FOS.
This paper is organized as follows:
Section 2.1. provides an overview of the collected TFBG sensor data,
Section 2.2. introduces the proposed ML framework for precise FWHM estimation, and
Section 2.3. provides details of the experimental setup and training.
Section 3 then presents the results and discusses the performance of the DL-enabled approach. Finally,
Section 4 concludes the work and outlines future research directions. This research combines the strengths of DL algorithms with the inherent advantages of FBG sensors and aims to improve experimental data handling. This paves the way for next-generation systems that demonstrate improved accuracy, reliability, and adaptability across diverse application scenarios.
2. Proposed Methodology
2.1. Data Exploration
The TFBG data collected are related to an experimental study on the Au-TFBG sensor, where the T
n values are measured at varying glucose concentrations (C
g) values, as presented in
Figure 1a [
26]. The experiments were conducted over three days, with C
g ranging from 0% to 50% (0%, 1%, 5%, 10%, 20%, 30%, 40%, and 50%), to capture the variability occurring in the transmittance, mostly due to the coupling variations between optical fiber and light source. A representative subset of the dataset is shown in
Table 1.
For each C
g value, at least three measurements were taken per day. The dataset contains wavelength (λ) values ranging from 1500 nm to 1620 nm, with an actual laser resolution of 0.049 nm and an input resolution of 0.06 nm. As a result, the array has the dimensions of (3 × 2001 × 8 × 2), where the first axis represents the day dimension, the second axis represents the λ dimension, and the third axis represents the C
g dimension. The last dimension represents the two values, (i.e., λ and T
n). In the upcoming sections, this dataset is called the FWHM-data. A detailed methodology (SM-2) is provided in
Supplementary Materials file.
2.2. Proposed Models
It is noteworthy that the variability in experimental data can arise due to factors like coupling fluctuations, environmental changes, and human errors, which may lead to hindrances in the analysis and measurement of the FWHM. To address this issue, we propose a two-step approach to negate the variability effects and improve the FWHM estimations.
Firstly, we utilize a statistical model called seasonal decomposition using moving averages (SDMA) to extract the trend and discard redundant T
n values during FWHM estimation. SDMA deconstructs time series data into their underlying components (trend, seasonal, and residual) by taking a moving average of a fixed window size over the time series. The trend component is derived using an additive model (original series (Y) = trend (T) + seasonality (S) + residuals (R)) and is further processed for extrapolation modeling [
27,
28], as represented in
Figure 1b.
In the second stage, we utilized the extracted data from SDMA, with proposed sequential modeling approach to extrapolate the T
n spectra for improved FWHM. This approach utilizes a recurrent neural network (RNN) framework, specifically the long short-term memory (LSTM) framework, which learns complex hidden patterns by treating the data sequentially [
29,
30,
31,
32]. The LSTM models the regression function in a supervised fashion, using sequence of λ with day and C
g values to predict a sequence of T
n values (extracted from SDMA).
The proposed models for accomplishing the aforementioned task are as follows.
2.2.1. Spectral Extrapolator LSTM (SpecExLSTM)
The
SpecExLSTM is a type of sequential deep belief network (DBN). This model is designed to work in a sequential setting [
33,
34]. The primary goal of Seq-to-Seq translation involves treating an input sequence of λ conditioned with the C
g value into the T
n sequence. To accomplish this, LSTM serves as an efficient tool for capturing both long-term and short-term dependencies within the data in a sequential manner. LSTM excels in handling sequential information, where λ is treated sequentially, by retaining and processing information over varying spectral dimension, making it apt for sequence extrapolation.
SpecExLSTM is equipped with two LSTM layers that can perform bidirectional computations (Bi-LSTM) and a single unit for transformation. The Bi-LSTM layers contain 64 and 32 units, respectively. The Bi-LSTM output function settings are modified with the linear function in the proposed model to provide sparsity to the learnable-weight values. In addition, the single unit for transformation contains a sigmoid function as an activation. However, multiple activation functions have been experimented with, and it has been discussed in the following sections. The time complexity of the following model is shown to be
O(
ln2), where
l is the sequence length and
n is the number of units.
2.2.2. Spectral Extrapolation with Attention (AttentiveSpecExLSTM)
The
AttentiveSpecExLSTM incorporates an additional attention layer component. The proposed model employs the composition of two different types of attention mechanisms: dot attention (Luong attention) and add attention (Bahdanau attention) [
31,
35]. The introduction of the attention architecture brings several enhancements compared to the
SpecExLSTM, including improved context awareness, translation quality between input and output sequences, and interpretability. Additionally, in photonic applications, alternative attention variants with LSTM have demonstrated notably enhanced performance relative to their counterparts without attention [
36].
The base architecture of the model is the same as
SpecExLSTM, with the additional complexity of attention layers. The dot attention layer is applied after the first Bi-LSTM layer, which computes the attention by taking the Bi-LSTM layer’s outputs. Moreover, the additive attention layer is applied after the second Bi-LSTM layer, which also computes from the LSTM layer outputs and then feeds them into a single unit perceptron layer, as represented in
Figure 2. This approach implies that the attention mechanism is attending to the LSTM output sequence solely based on its features, without considering any external contextual information. The following model shows the time complexity of
O(2
l2n2), where
l is the sequence length and
n denotes the number of units.
2.3. Data Preparation
To ensure data standardization prior to modeling, min–max scaling was applied to the T
n values of each day, resulting in data sampled between the bounds of 0 and 1. Additionally, the spectral width can be estimated from the trend series of the FWHM data. Hence, we performed trend extraction on the FWHM data using the SDMA model. This model employs a rolling average technique to decompose the input series into its constituents: trend, seasonality, and residuals. The trend component was extracted using a window size of 50, and the resulting trend plots are displayed in
Figure 1b.
In this sequence,
Figure 3, presents the comparison between TFBG data collected on the same day for a sub-set of C
g values after the application of SDMA. It is evident that while the T
n magnitudes exhibit shifts for each day, the corresponding λ values for T
n minima remain consistent for the given glucose C
g values. This indicates the repeatability and consistency of the sensor data.
However, calculating FWHM becomes complex due to variations in the damping of the curve and the absence of the curve intersection. Nonetheless, we have addressed these issues using the application of the ML/DL model, discussed further below.
Before applying ML/DL models, the data were partitioned based on Cg values. The Cg values for each day were divided into training, validation, and test sets to ensure unbiased splits across the days. The partition ratio for each day’s Cg values in the train–test-validation sets was 7.0:1.5:1.5, resulting in 15, 6, and 3 Cg values, respectively.
Furthermore, λ and Cg values were standardized, and the days were encoded into a one-hot vector representation. The data were then augmented by breaking them into smaller overlapping sequences using a window size of l = 10 (sequence length) and a stride of 5, following an up-sampling of the training split. Consequently, the inputs were transformed into an array of dimensions (Batch size × l × 5), where the last axis contained values for λ, Cg, and the one-hot vector representation of the days (with the vector length being the total number of days).
Similarly, the outputs, namely, the Tn values, were transformed into an array of dimensions (Batch size × l × 1), with the batch size set to 512 to align with the model requirements.
2.4. Model Training
The
SpecExLSTM and
AttentiveSpecExLSTM models were optimized with the mean-squared error (MSE) as a loss function and trained exclusively using the ADAM optimizer, which employed a cyclic learning rate policy. The evaluation of these models utilized six commonly employed metrics: mean absolute error (MAE); root mean square error (RMSE); symmetric mean absolute percentage error (SMAPE); non-linear regression multiple correlation coefficient (R
2); percentage of correction direction (PCD); and dynamic time warping similarity (DTW-sim.) [
37]. Additionally, we introduced a novel metric, namely, minima difference (Min-dif.), to assess the accuracy of the models in tracking the minima and measuring the corresponding λ values for T
n minima. The formulations of these metrics (SM-1) are provided in
Supplementary Materials file.
The Min-dif. metric is pivotal for evaluating the effectiveness of the proposed models in their implicit tracking of spectral width. In essence, proposed models are acquiring a shared representation of Tn sequences vis-à-vis λ sequences, Cg, and day labels. Their collective goal during the inference is to predict Tn sequences, accomplished through interpolating missing Cg and extrapolating the Tn sequence for absent λ sequences for calculating FWHM of the Tn curve.
For precise FWHM calculation, it is imperative to measure the λ value at the maxima of T
n curve. Existing evaluation metrics focus on average error, forecast accuracy, goodness of fit, or temporal deviation, but they may not align with the specific requirements for the FWHM calculation. The introduced Min-dif. metric addresses this gap by crucially measuring λ at T
n maxima in both predicted and true sequences. This adds a vital dimension to model evaluation, quantifying its efficacy in utilizing shared representations for FWHM calculation based on λ, C
g values, and day labels. The metric formulation is straightforward: we begin by identifying the index of the minima T
n value in both the unprocessed predicted and true sequences (the minima is computed for the unprocessed T
n sequences; the FWHM measurements are performed on the processed sequences as described in
Section 3.2; computing minima for the unprocessed T
n sequences yields the same results as computing maxima for the processed T
n sequences). From this index, we extract the corresponding λ values and calculate the absolute difference between predicted and true λ values. The mathematical representation is as follows:
Let
and
be the true and predicted T
n sequence, then
In Equation (1), the “argmin” function returns the indices of the minimum value for the given sequence.
3. Results and Discussion
In this section, we present and analyze the results of the tests conducted on the models discussed in the previous sections. The models were evaluated using various test metrics, as discussed in
Section 2.3, with the objective of comparing and selecting the model showing the best possible performance. Our aim is to choose a model that consistently performs well across all metrics, displaying low RMSE, MAE, SMAPE, and Min-dif. values, as well as high R
2, PCD, and DTW-sim. values.
Table 2 presents the average evaluation metric scores for the top-three performing
SpecExLSTM and
AttentiveSpecExLSTM models. Based on these metrics, we observe that the
AttentiveSpecExLSTM model generally outperformed the
SpecExLSTM model. The
AttentiveSpecExLSTM model exhibited slightly lower RMSE, MAE, and SMAPE values, indicating greater accuracy in predicting the target values. However, the Min-dif. for the
AttentiveSpecExLSTM model was higher than that of the
SpecExLSTM model, indicating a larger difference between predicted and actual λ values corresponding to T
n minima. On the other hand, the R
2 value for the
AttentiveSpecExLSTM model was slightly higher, indicating a better fit to the data. Additionally, the
AttentiveSpecExLSTM model had a higher PCD value, suggesting a stronger degree of correlation. Both models had identical DTW-sim. values.
The AttentiveSpecExLSTM model demonstrates superior performance over SpecExLSTM across various evaluation metrics, showcasing enhanced predictive capabilities. However, specific challenges arise, notably in the Min-dif. metric. SpecExLSTM achieves an average score of 0.543 with a deviation of 0.12, whereas AttentiveSpecExLSTM yields an average score of 1.086 with a deviation of 0.469. While the attention mechanism in AttentiveSpecExLSTM contributes to lower RMSE and MAE, it also influences the Tn curve by shifting minima, reflected in lower Min-dif. scores. This shift is pivotal, directly impacting FWHM calculation. To improve the performance, incorporating Min-dif. directly into the loss function or enhancing the network’s awareness of underlying physics could be explored.
Additionally, upon analyzing the predicted T
n curve of both models, particularly in
Figure 4a,c,e compared to
Figure 4g,i,k, within the highlighted region marked by a red circle,
AttentiveSpecExLSTM exhibits significant improvement by reducing variability. However, accuracy in prediction remains a challenge, evident from the high deviation in R
2 and DTW-sim. metrics scores. To further enhance the proposed methodology’s performance, scaling the experimental setup by collecting more data and inducing generalizability could be explored. This approach would expose the model to various T
n curve variations, contributing to improved accuracy.
Overall, the AttentiveSpecExLSTM model exhibited reasonably superior performance across most evaluation metrics, demonstrating its enhanced predictive capabilities compared to the SpecExLSTM model. This improved performance can be attributed to the inclusion of a composite attention mechanism in the AttentiveSpecExLSTM model. This mechanism enhanced the model’s ability to capture both short-range and long-range dependencies within the sequence.
In the next section, we conduct a comprehensive empirical analysis to explore the effects of the attention layer, specifically additive attention and dot product attention.
3.1. Empirical Analysis of Composite Attention Mechanism
This section provides an evaluation and analysis of the proposed composite attention mechanism, AttentiveSpecExLSTM. Thia model employs a hierarchical attention approach, combining dot product attention in the initial layer and additive attention in the deep layer.
To assess the contribution of each attention mechanism in the composite setup, ablation experiments were conducted with different attention layer configurations. The attention layer was removed from the
AttentiveSpecExLSTM model, resulting in two ablated models:
L1-Dot Attn. only and
L1-Add Attn. only. The former applies dot attention after the first Bi-LSTM layer, while the latter applies additive attention. Evaluation metrics were recorded for these models, and
Table 3 presents the average results of the top three best-performing models.
From
Table 3, it is evident that the
L1-Dot Attn. only model consistently outperformed the other models in terms of evaluation metrics. This model achieved the minimum MAE (1.28 ± 0.01 × 10
−2), SMAPE (2.36 ± 0.01 × 10
−2), Min-dif. (0.60 ± 0.16), R
2 (98.87 ± 0.13 × 10
−2), PCD (58.86 ± 0.4 × 10
−2), and DTW-sim. (0.35 ± 0.02). Additionally, the
L1-Dot Attn. only model achieved the minimum RMSE (1.87 ± 0.11 × 10
−2).
Figure 5a,b depict the activation map of the attention layer for the
L1-Dot Attn. only and
L1-Add Attn. only models, respectively. The activation map of the
L1-Dot Attn. only model indicates a focus on short-range dependencies, with attention peaks concentrated around specific sequence positions. Conversely, the
L1-Add Attn. only model exhibits a broader distribution of attention weights across the sequence, capturing long-range dependencies.
Similar patterns were observed in the second layer ablated models (
L2-Dot Attn. only and
L2-Add Attn. only), as shown in
Figure 5c,d. Notably, the
L2-Add Attn. only ablated model outperformed the
L2-Dot Attn. only ablated model in terms of RMSE, MAE, SMAPE, Min-dif., R
2, and PCD, as noted in
Table 3.
When both attention layers were employed with the same attention type (dot attention or additive attention) in the initial and deep layers, additive attention consistently yielded better results in the evaluation metrics compared to dot attention.
Table 3 illustrates that both
Add-Attn. models achieved the minimum scores for evaluation metrics, including RMSE (1.93 ± 0.11 × 10
−2), SMAPE (2.49 ± 0.18 × 10
−2), R
2 value (98.98 ± 0.128 × 10
−2), and PCD (60.72 ± 0.68 × 10
−2). The activation pattern of the
Both Add-Attn. model (refer to
Figure 5f) reveals a sparse distribution in the attention map, indicating its ability to model non-linear interactions and capture long-range dependencies.
To capture both short-range and long-range dependencies effectively, a hierarchical composite attention mechanism was experimented with. This mechanism aimed to learn the low-level and global context of the sequence at different stages. Two hierarchical composite attention models were evaluated, and the
L1-Dot Attn./L2-Add Attn. or
AttentiveSpecExLSTM model exhibited superior performance compared to the
L1-Add Attn./L2-Dot Attn. model. The
L1-Dot Attn./L2-Add Attn. model achieved an RMSE of 1.73 ± 0.05 × 10
−2, an MAE of 1.20 ± 0.04 × 10
−2, an SMAPE of 2.22 ± 0.05× 10
−2, a Min-dif. of 1.086 ± 0.469, an R
2 of 99.08 ± 0.12× 10
−2, a PCD of 60.41 ± 2.68× 10
−2, and a DTW-sim. of 0.35 ± 0.053.
Figure 5h. visually represents this model, illustrating how attention is applied in both the initial and deep layers in a complementary fashion. This behavior is also observed in
Both Dot-Attn.,
Both Add-Attn., and
L1-Add Attn./L2-Dot Attn. models.
Furthermore, attention weights are broadly distributed in the damped region of the sequence, while they concentrate towards a narrow position in the sequence towards the extremes. This consistent behavior was observed across all experiments.
In summary, the AttentiveSpecExLSTM model with the proposed hierarchical composite attention demonstrates its capability to capture fine-grained details across the entire sequence through initial dot product attention. The subsequent additive attention in the deep layer focuses on global dependencies and relevant context, utilizing the information extracted by the previous layers. This hierarchical attention contributes to the model’s superior performance by progressively refining its understanding and making more informed decisions.
The next section applies the AttentiveSpecExLSTM model with the proposed hierarchical composite attention for curve enhancement, aiming to improve the FWHM estimation.
3.2. FWHM Estimation
As discussed in the previous sections, the FWHM-data contains C
g values for three days across a range of λ from 1500 nm to 1620 nm, with a spectral resolution of 0.06 nm. The calculation of the spectral width is challenging due to the high variability in the T
n values for the same C
g and λ value of different days. This is because the T
n curve is not intersected with the half-maxima value in the current λ range for days 2 and 3, whereas the T
n curve is only intersected with the half-maxima value in the current λ range for day 1, as shown in
Figure 6.
To address this issue, we have employed a two-fold approach, where primarily the high variability in the experimental data were removed with the SDMA model and then the sequential ML/DL model is used for extrapolation. The proposed AttentiveSpecExLSTM model was selected to extrapolate the λ range beyond the bound of the training dataset until the Tn curve intersects with the half-maxima value in order to calculate the FWHM of the curve for quantifying the performance of the Au-TFBG sensor using this approach.
It was found that only the left bound of λ in the dataset needed to be extended for extrapolation since the Tn curve did not intersect with the half-maxima in this region. The Tn values of λ, ranging from 1450 nm to 1620 nm, were predicted while keeping the resolution the same as the training resolution of 0.06 nm. The predicted Tn values were concatenated with the true Tn values of the un-extrapolated data to calculate the FWHM or spectral width.
To calculate the FWHM value, the predicted scaled T
n values were subtracted from 1 to obtain an inverse plot of the T
n. The spectral width calculation was then performed, which yielded FWHM values as shown in
Table 4.
Figure 7 shows the spectral width calculation of the experimental Au-TFBG glucose sensor data. The FWHM calculation for day 1 did not require extrapolation, whereas for days 2 and 3, FWHM was calculated after extrapolating the λ left-bound range.
To address this issue, the selected AttentiveSpecExLSTM model was utilized to extrapolate the λ range beyond the bound of the training dataset until the Tn curve intersected with the half-maxima value. This was undertaken in order to allow for the FWHM of the curve to be calculated for quantifying the performance of the Au-TFBG sensor.
3.3. Quantifying Sensor Performance with Figure of Merit (FOM) Estimation
In this section, we delve deeper into the computation of the FWHM values using the AttentiveSpecExLSTM model. The introduction of the figure of merit (FOM) offers a more nuanced and comprehensive metric for the assessment of the sensor’s performance. This metric serves as a guiding factor in the pursuit of optimizing sensor functionality for a diverse range of enhanced sensing applications. The FOM calculation involves these steps:
The refractive index (
) of the solution is influenced by glucose concentration (
), which can be expressed as
where
is the refractive index of the pure solvent and k is a proportionality constant. As glucose concentration increases, the refractive index also increases, causing a corresponding shift in the resonance wavelength of the TFBG sensor. Thus, the sensor’s sensitivity to glucose concentration is intrinsically linked to its response to changes in the refractive index, and it can be expressed as
- 2.
Estimation of FWHM, discussed in
Section 3.2, vital for FOM calculation.
- 3.
FOM computation by measuring the ratio of sensitivity as reciprocal to FWHM, with the following mathematical representation:
As a point of reference, we performed FOM computations on the initial FWHM-dataset, consisting of high-variability in the datapoints. This involved empirical FWHM measurements, followed by sensitivity and FOM calculations. The calculated FOM values provide insights into the sensor’s performance across different days. On day 1, the FOM average was 0.012, with a standard deviation of 0.0038; day 2 showed a slightly higher average of 0.0131, with a standard deviation of 0.0035; while day 3 had an average FOM of 0.0106 with a narrower standard deviation of 0.001. These variations in FOM values across days could be attributed to various factors; notably, the empirical measurement of the FWHM from the high-variability data also leads to inaccurate calculations of the FOM, as shown in
Figure 1a.
However, a significant shift in FOM values was observed when incorporating the AttentiveSpecExLSTM model to reduce the variability of the data and extrapolate for the spectral width measurements. The model helped to mitigate the impact of measurement variability by extracting meaningful patterns from the data. The FOM values on day 1 exhibit an average of 0.0006, with a standard deviation of 4.9 × 10−5; whereas day 2 and day 3 exhibit FOM values with averages of 0.0065 and 0.00652, respectively, and relatively low standard deviations.
This marked progress in FOM measurement subsequent to the model’s implementation underscores the AttentiveSpecExLSTM model’s proficiency in elevating data quality and reducing measurement uncertainties. It accentuates the model’s ability to focus on the relevant spectral features and discard the variability, contributing to more accurate FOM calculations. The decreasing standard deviations further signify an elevation in data consistency and repeatability, consequently bolstering the dependability of the sensor’s performance evaluation. Ultimately, this advancement in FOM assessment substantiates the sensor’s optimization and attests to the efficacy of the TFBG sensor, guided by the refined insights provided by the AttentiveSpecExLSTM model.
3.4. Comparison of the Proposed Scheme with the Existing Schemes
As mentioned earlier, we have treated the transmittance spectrum as a time-series in the wavelength dimension, framing the problem as time-series forecasting (extrapolation). Several reports in the literature operate on similar data types, either treating the data as dependent (sequential) or independent. Dwivedi et al. treated optical sensor data via GPR to model the FOM of surface plasmon resonance (SPR) sensors [
37]. This study considered the sensor’s data as independent, achieving an RMSE of 185.52, an MAE of 78.32, and an R
2 of 0.927. Further, in one of our previous studies, we approached the problem as a sequential one, employing RNN-based models to forecast a series of FOM values from the wavelength and the corresponding metal layer thickness values [
38]. The above study achieved superior results, with an RMSE of 2.21, an MAE of 0.54, and an R
2 of 0.99 on the test dataset. Salmela et al. utilized an RNN-based model, particularly LSTM, to predict the temporal and spatial evolution of light waves from the initial conditions of light pulses using simulation data [
39]. Despite achieving an RMSE value of 0.161, their model struggled with extreme values in the wavelength spectrum and was limited to learning only from the provided simulation data. Liu et al. proposed optimization methods, including cuckoo search and orthogonal least squares, to optimize the architecture of an RNN model for a microwave heating system. Despite achieving an RMSE of 0.67 and an MAE of 0.536, their methodology was computationally expensive due to its complexity [
40].
In contrast, our proposed methodology preserves the true nature of TFBG’s experimental data while effectively capturing trends and patterns within the transmittance spectrum. This enables us to extrapolate the transmittance spectrum for the enhanced estimation of FWHM.
Table 5 presents the comparative scenario as per the above discussion.
4. Conclusions and Future Work
This research effectively employs DL techniques to improve data analysis for quantifying the performance of Au-TFBG-based glucose sensors. We introduced a novel metric, the minima difference (Min-dif.), to evaluate model accuracy in tracking the minima and corresponding λ values for Tn minima. Additionally, we proposed two new sequential architectures and a hierarchical composite attention mechanism.
The baseline SpecExLSTM model demonstrated promising performance, achieving an RMSE of 1.75 ± 0.03 × 10−2 and a PCD of 59.12 ± 1.11 × 10−2. The integration of the hierarchical composite attention mechanism in the AttentiveSpecExLSTM model further enhanced prediction accuracy, resulting in an RMSE of 1.73 ± 0.05 × 10−2 and a PCD of 60.41 ± 2.68 × 10−2. This attention mechanism plays a crucial role in capturing both high-level and low-level dependencies, allowing the model to refine its understanding and improve accuracy.
Our methodology includes a two-step approach to addressing variability in experimental data: first, by applying SDMA to filter out noise; and second, by using DL/ML models to extrapolate transmittance values with respect to λ. This approach enabled accurate measurement of the spectral width of the Tn curve, which was previously unachievable with raw data. A notable improvement was achieved, resulting in FOM values of 0.0006 ± 0.00049 for day 1 and 0.0065 and 0.00652 for day 2 and day 3, respectively. This underscores the impact of the methodology in mitigating variability and rectifying inaccuracies while quantifying the performance of the Au-TFBG glucose sensor.
Our method is designed for high accuracy across various solutions, not just glucose. While specifically tailored to TFBG sensor data, its adaptability extends to datasets with similar attributes. Future work will focus on refining the proposed models by incorporating the physical information of the system into the neural network, along with introducing bias, to fine-tune the trade-off between generalization and the performance of the learning algorithm, as well as overall system performance.
Additionally, we aim to scale experimental data to create a more generalized dataset, further boosting system performance. However, this scaling poses challenges, including increased computational time and cross-sensitivity (e.g., the DL model inadvertently learning sensor-specific behavior from the training data over time with respect to the Tn from the glucose concentration). When dealing with high-resolution concentration values, minor differences in concentrations may lead to overlapping latent representations in the model. This could be addressed by exploring physics-informed neural networks.
For computational efficiency, sequential learning models, such as transformers, offer promise due to their ability to parallelize processes. In regression tasks, transformers can treat decimal points as categorical values, which are then concatenated to form floating-point predictions. This approach could significantly reduce computational overhead and improve prediction precision.
In summary, this research significantly advances data quality and performance in fiber optic sensors through innovative DL and ML techniques, providing valuable insights for future photonic sensor applications.