Temperature Stability Investigations of Neural Network Models for Graphene-Based Gas Sensor Devices †

: Chemiresistive gas sensors are a crucial tool for monitoring gases on a large scale. For the estimation of gas concentrations based on the signals provided by such sensors, pattern recognition tools, such as neural networks, are widely used after training them on data measured by sample sensors and reference devices. However, in the production process of low-cost sensor technologies, small variations in their physical properties can occur, which can alter the measuring conditions of the devices and make them less comparable to the sample sensors, leading to less adapted algorithms. In this work, we study the inﬂuence of such variations with a focus on changes in the operating and heating temperature of graphene-based gas sensors in particular. To this end, we trained machine learning models on synthetic data provided by a sensor simulation model. By varying the operation temperatures between − 15% and +15% from the original values, we could observe a steady decline in algorithm performance, if the temperature deviation exceeds 10%. Furthermore, we were able to substantiate the effectiveness of training the neural networks with several temperature parameters by conducting a second, comparative experiment. A well-balanced training set has shown to improve the prediction accuracy metrics signiﬁcantly in the scope of our measurement setup. Overall, our results provide insights into the inﬂuence of different operating temperatures on the algorithm performance and how the choice of training data can increase the robustness of the prediction algorithms.


Introduction
Chemiresistive gas sensors are widely used for the task of tracking different gases of interest in various areas since they are low-cost and easily to deploy in sensor networks.The working principle behind this technology is based on the adsorption of the gases on the conducting sensor surface and the measurement of the resistivity or conductivity of the material being influenced by the adsorbed molecules [1].
Often arranged in sensor arrays, these measurements are simultaneously performed with slightly different materials in order to create specific fingerprints for different gas types and to use pattern recognition algorithms for gas detection and concentration estimation [2].The pattern recognition techniques, such as neural networks, are therefore trained based on sample devices and then distributed on the other devices.For these algorithms, a good prediction accuracy is highly demanded in order to establish a precise assessment of the air quality at the locations where the sensing devices are placed.
However, when producing such sensors, small variations in their signal response are common between different sensor devices [3,4] (called sensor-to-sensor variations) which Eng. Proc.2021, 10, 19 2 of 7 are caused by slight differences of the overall physical properties of the sensors themselves or during the measurement process and, hence, change the input for the pattern recognition algorithms.This means that these variations can have an impact on the overall sensor performance and, hence, on their effectiveness in high-quality environmental monitoring.
Previous research aimed to address potential solutions for such sensor-to-sensor variations by applying a calibration transfer to the devices inheriting the prediction algorithms from another device.The majority of these approaches used algorithms for standardization in order to map the measurements from a new device to the signal space of a master device by using a limited number of transfer samples for the calibration process.Different calibration transfer techniques, for instance direct standardization [5], robust regression [6], global affine transformation [7], windowed piecewise direct standardization [8], transfer sample-based coupled task learning [9], or cross-domain adaptation extreme learning machines [10], have been used.Additionally, global calibration models are also used in recent investigations in order to reduce calibration costs, as evaluated by Miquel-Ibarz et al. [11], for instance.
Temperature-related shifts were studied, in particular, by Bruins et al. [12].Specifically, they investigated the influence of the operating temperature on the reproducibility of their sensor measurements and found that the heterogeneity of the measured data strongly increases when the temperature is shifted, especially in comparison to inter-sensor responses.This topic was further addressed by Fernandez et al. [13] in an experimental setup involving sensor measurement samples by the same device type with different operating temperatures for different gases.They compared several calibration transfer techniques to account for these MOX-related temperature shifts, concluding that piece-wise direct standardization leads to the best prediction results.
In our work, we want to study the robustness of a state-of-the art gas regression algorithm based on a recurrent neural network in the presence of variances in temperature heating for a graphene-based gas sensor device.Compared to previous work, we want to more thoroughly investigate the limits and margins in temperature variability that new pattern recognition algorithms can compensate for without calibration and test, to which degree these effects can be addressed intrinsically in the algorithm design and without the need to take additional transfer calibration samples or extra calibration layers for each new sensor.In order to thoroughly investigate these objectives, we use a system-level model of a graphene-based gas sensor, which permits to exactly define the temperature parameters in the simulation and, hence, ensure adequate reproducibility compared to experimental measurements achieved with different sensor devices.

Methodology
In order to properly describe the methodology of our work, a short introduction of the stochastic sensor model, that was used to create synthetic data for the study, is given first.After that, the experimental setup of the study and the different simulation cases are shown.

Sensor Modeling
A system-level gas sensor model by Schober et al. [14,15] was used in order to generate signals with different variations in their temperature profile.Here, the stochastic simulation of the adsorption and desorption of the gases of interest, for instance Ozone, is modeled by discrete Markov processes on a sample grid representing the sensor surface.The adsorption probability p a and the desorption probability p d can be expressed by the equations p a [gas] = k a • c[gas] and with k a and k d describing the interaction rates for the processes, E denoting the adsorption energy, and T denoting the temperature of the sensor surface.Note that the surface temperature has a direct influence on the desorption process on the sensor and, therefore, plays a crucial role in the generation of the sensor signal and its dynamics.
The sensor output signal is subsequently determined by calculating the adsorption fraction on the sample grid, i.e., the ratio of adsorbed sensor sites and the total number of sensor sites.The results are then mapped to a relative resistance in a separate part of the model.By using different parameters for the simulation procedure, different functionalizations of the sensing material can be modeled, leading to three different output signals reacting slightly differently to certain gases.
The inputs for the sensor simulation are the concentration profile, which contains the time evolution of the Ozone concentration, and the temperatures on the sensor surface with respect to time.The model parameters were chosen to fit measurements of a graphenebased gas sensor used in gas exposure measurements with different sensing materials.

Experimental Setup
The data that were used in our studies are based on three different Ozone concentration profiles with different characteristics which are shown in the first row of Figure 1a-c.All profiles contain concentration values between 0 and 100 ppb of O 3 and show different peak-shaped concentration blocks followed by phases of different length without O 3 .The simulations of the sensor response, which are shown below the profiles, were performed with different heating parameters between −15% and +15% deviation from the standard temperature settings.The sensor response is defined as the relative resistance of the sensor in percent.The surface temperature was modulated between the lower sensing temperature and the higher heating temperature.
Based on the results of the simulation, it can be concluded that the different heating temperatures have a visible influence on the sensor response.This can be explained by the desorption probability, which is highly dependent on the operating temperature.In general, higher temperatures lead to a stronger desorption effect, which then leads to a slower downwards drift of the sensor signal.Vice versa, a lower temperature profile leads to a stronger drift in the signal.Additionally, the sensitivity to small concentration changes in the profile can also be influenced by the operating temperature.
Eng. Proc.2021, 10, 19 3 of 7 gas =  ⋅  gas and  =  ⋅  ⋅ , with ka and kd describing the interaction rates for the processes, E denoting the adsorption energy, and T denoting the temperature of the sensor surface.Note that the surface temperature has a direct influence on the desorption process on the sensor and, therefore, plays a crucial role in the generation of the sensor signal and its dynamics.
The sensor output signal is subsequently determined by calculating the adsorption fraction on the sample grid, i.e., the ratio of adsorbed sensor sites and the total number of sensor sites.The results are then mapped to a relative resistance in a separate part of the model.By using different parameters for the simulation procedure, different functionalizations of the sensing material can be modeled, leading to three different output signals reacting slightly differently to certain gases.
The inputs for the sensor simulation are the concentration profile, which contains the time evolution of the Ozone concentration, and the temperatures on the sensor surface with respect to time.The model parameters were chosen to fit measurements of a graphene-based gas sensor used in gas exposure measurements with different sensing materials.

Experimental Setup
The data that were used in our studies are based on three different Ozone concentration profiles with different characteristics which are shown in the first row of Figure 1a-c.All profiles contain concentration values between 0 and 100 ppb of O3 and show different peak-shaped concentration blocks followed by phases of different length without O3.The simulations of the sensor response, which are shown below the profiles, were performed with different heating parameters between −15% and +15% deviation from the standard temperature settings.The sensor response is defined as the relative resistance of the sensor in percent.The surface temperature was modulated between the lower sensing temperature and the higher heating temperature.Based on the results of the simulation, it can be concluded that the different heating temperatures have a visible influence on the sensor response.This can be explained by the desorption probability, which is highly dependent on the operating temperature.In general, higher temperatures lead to a stronger desorption effect, which then leads to a slower downwards drift of the sensor signal.Vice versa, a lower temperature profile leads to a stronger drift in the signal.Additionally, the sensitivity to small concentration changes in the profile can also be influenced by the operating temperature.
The machine learning model, which was used in order to train the pattern recognition models, was a neural network with one hidden RNN layer containing 50 GRU cells and a dense output layer for the gas concentration estimation.The models were trained by using early stopping and by reducing the learning rate over time.The input features for the model were the relative resistance of each of the different three sensor materials, their derivatives, as well as an additional feature called the energy vectors evaluating the mutual energy between any combination of two relative resistance responses.

Results and Discussion
In our work, we want to answer two questions.First of all, to which degree and at which deviance from the standard parameters is a deterioration in the sensor algorithm performance due to temperature-related sensor-to-sensor variations observed?Secondly, can this deterioration effect be minimized by using additional temperature parametrizations in the training set in order to increase the stability of the neural network?In the following two sections, we want to summarize some results of our investigations on these topics.

Influence of Temperature Parameters on Algorithm Performance
The first part of the study addresses the questions of how much the algorithm performance in predicting the O3 concentration will be decreased by increasing temperature variances.Therefore, the model which was trained with data from the standard temperature configuration (Δ = 0%) was tested on datasets with different The machine learning model, which was used in order to train the pattern recognition models, was a neural network with one hidden RNN layer containing 50 GRU cells and a dense output layer for the gas concentration estimation.The models were trained by using early stopping and by reducing the learning rate over time.The input features for the model were the relative resistance of each of the different three sensor materials, their derivatives, as well as an additional feature called the energy vectors evaluating the mutual energy between any combination of two relative resistance responses.

Results and Discussion
In our work, we want to answer two questions.First of all, to which degree and at which deviance from the standard parameters is a deterioration in the sensor algorithm performance due to temperature-related sensor-to-sensor variations observed?Secondly, can this deterioration effect be minimized by using additional temperature parametrizations in the training set in order to increase the stability of the neural network?In the following two sections, we want to summarize some results of our investigations on these topics.

Influence of Temperature Parameters on Algorithm Performance
The first part of the study addresses the questions of how much the algorithm performance in predicting the O 3 concentration will be decreased by increasing temperature variances.Therefore, the model which was trained with data from the standard temperature configuration (∆T = 0%) was tested on datasets with different temperature variations between −15% and +15%.Profile A was the training and Profile B the testing dataset.The results are shown in Figure 2a for negative variations and Figure 2b for positive ones.
By analyzing these results, several effects can be seen.First of all, a steady decline in the model performance occurs for increasing deviation from the standard temperature settings.This effect starts to have a measurable impact at a deviation of ∆T = 10%, whereas the performance shift at lower deviations appears to be rather small.This observation suggests that the model might tolerate a temperature margin of 5% without losing performance.By analyzing these results, several effects can be seen.First of all, a steady decline in the model performance occurs for increasing deviation from the standard temperature settings.This effect starts to have a measurable impact at a deviation of Δ = 10%, whereas the performance shift at lower deviations appears to be rather small.This observation suggests that the model might tolerate a temperature margin of 5% without losing performance.
Moreover, it is also noticeable that the performance loss seems to be higher, if the absolute temperature settings are higher than the standard parameters (Δ > 0) than lower.This might be explained by the change in sensitivity that arises from stronger heating which might be captured less well by the machine learning model.It has to be mentioned, however, that the choice of the concentration profile can also influence the magnitude of the observed effects.
Overall, the claim that a variation in heating temperature can lead to serious performance loss of the Ozone concentration prediction capability of the machine learning model, once a certain variation threshold is exceeded, is substantiated by the data.Therefore, strategies to compensate for such effects appear to be necessary if the margin of temperature-induced sensor-to-sensor variations is in the scope of such values.

Algorithm Robustness Analysis with Mixed Training Settings
In a second study, the difference in training the neural network with one temperature setting (representing the use of one sample device), in comparison to training it with a diverse dataset comprised of several temperature settings (Δ ∈ {−10%; 0%; +10%}), was studied.Therefore, the training set (Profile B) and the test set (Profile C) were simulated under these different temperature conditions to train two models and to test them on the differently configured test sets.The results are shown in Figure 3.Moreover, it is also noticeable that the performance loss seems to be higher, if the absolute temperature settings are higher than the standard parameters (∆T > 0) than lower.This might be explained by the change in sensitivity that arises from stronger heating which might be captured less well by the machine learning model.It has to be mentioned, however, that the choice of the concentration profile can also influence the magnitude of the observed effects.
Overall, the claim that a variation in heating temperature can lead to serious performance loss of the Ozone concentration prediction capability of the machine learning model, once a certain variation threshold is exceeded, is substantiated by the data.Therefore, strategies to compensate for such effects appear to be necessary if the margin of temperature-induced sensor-to-sensor variations is in the scope of such values.

Algorithm Robustness Analysis with Mixed Training Settings
In a second study, the difference in training the neural network with one temperature setting (representing the use of one sample device), in comparison to training it with a diverse dataset comprised of several temperature settings (∆T ∈ {−10%; 0%; +10%}), was studied.Therefore, the training set (Profile B) and the test set (Profile C) were simulated under these different temperature conditions to train two models and to test them on the differently configured test sets.The results are shown in Figure 3.
The experiments show that the model trained with different temperature settings outperforms the single-temperature model in each test set configuration, both measured in RMSE and MAE.This substantiates the hypothesis that the stability of the algorithm can highly benefit from a well-balanced dataset comprising different temperature configurations in order to avoid overfitting to the standard temperature settings.
Furthermore, it is noticeable that, even for the test set with the standard temperature settings, the prediction performance metrics seem to improve.This indicates that, even for these settings, the algorithm can benefit from the variations in the training set, leading to an overall more robust outcome and better generalization properties to other concentration profiles.
In addition, it has to be pointed out that the choice of the concentration profile can also have an impact on how visible these differences are.Therefore, a different train/test configuration was chosen in this investigation to show the general scope of the stability enhancement.The experiments show that the model trained with different temperature settings outperforms the single-temperature model in each test set configuration, both measured in RMSE and MAE.This substantiates the hypothesis that the stability of the algorithm can highly benefit from a well-balanced dataset comprising different temperature configurations in order to avoid overfitting to the standard temperature settings.
Furthermore, it is noticeable that, even for the test set with the standard temperature settings, the prediction performance metrics seem to improve.This indicates that, even for these settings, the algorithm can benefit from the variations in the training set, leading to an overall more robust outcome and better generalization properties to other concentration profiles.
In addition, it has to be pointed out that the choice of the concentration profile can also have an impact on how visible these differences are.Therefore, a different train/test configuration was chosen in this investigation to show the general scope of the stability enhancement.

Conclusions
In this work, we studied the influence of different temperature settings on the prediction performance of a neural network-driven graphene-based gas sensor system by using synthetic data from a stochastic sensor model.In our first experiment, we found that there is a threshold of around 10% from the standard temperature parameters that can lead to substantial performance loss.Additionally, it was found that the performance metrics decreased more for positive temperature variations than for negative ones.A subsequent experiment suggests that the enrichment of the training set with data from different temperature settings around the found thresholds (−10% and 10%) substantially improved the prediction outcome for the given data configuration.Such an effect was even seen for the standard temperature case.It should be noted that the train and test set configuration also has an effect on how well defined the improvement is.Overall, our study substantiates the need for enriching the dataset in the training process of such sensors if such sensor-to-sensor variations are likely to occur.However, additional

Conclusions
In this work, we studied the influence of different temperature settings on the prediction performance of a neural network-driven graphene-based gas sensor system by using synthetic data from a stochastic sensor model.In our first experiment, we found that there is a threshold of around 10% from the standard temperature parameters that can lead to substantial performance loss.Additionally, it was found that the performance metrics decreased more for positive temperature variations than for negative ones.A subsequent experiment suggests that the enrichment of the training set with data from different temperature settings around the found thresholds (−10% and 10%) substantially improved the prediction outcome for the given data configuration.Such an effect was even seen for the standard temperature case.It should be noted that the train and test set configuration also has an effect on how well defined the improvement is.Overall, our study substantiates the need for enriching the dataset in the training process of such sensors if such sensor-to-sensor variations are likely to occur.However, additional simulation studies and also experimental investigations might be needed to further consolidate these findings for more complex settings and profiles.
In future research, it would be important to also investigate other sensor-to-sensor variations that can occur, such as differences in the sensitivity response and also different drift levels, in order to see if such variations have similar effects on the model performance and if they could also profit from a more diverse training set.

Figure 1 .
Figure 1.Shows different concentration profiles as well as their simulation outputs for different temperature settings.Here, they are denoted as Profile A, B and C and shown in (a), (b) and (c), respectively.

Figure 1 .
Figure 1.Shows different concentration profiles as well as their simulation outputs for different temperature settings.Here, they are denoted as Profile A, B and C and shown in (a), (b) and (c), respectively.

Figure 2 .
Figure 2. Algorithm performance evaluated in four different metrics (RMSE, MAE, standard deviation, and R2-score) for O3 measurements.(a) shows the negative temperature variations whereas (b) shows the positive temperature variations.

Figure 2 .
Figure 2. Algorithm performance evaluated in four different metrics (RMSE, MAE, standard deviation, and R2-score) for O 3 measurements.(a) shows the negative temperature variations whereas (b) shows the positive temperature variations.

Figure 3 .
Figure 3.Comparison of the performance between a model trained on a single temperature setting (purple, Δ = 0) and on three different temperature settings (pink, Δ ∈ {−10%, 0%, 10%}) in terms of two different error metrics (RMSE) and (MAE) in ppb.The models were evaluated on the same testing set with respect to their simulation using a different temperature setting shown on the x-axis.

Figure 3 .
Figure 3.Comparison of the performance between a model trained on a single temperature setting (purple, ∆T = 0) and on three different temperature settings (pink, ∆T ∈ {−10%, 0%, 10%}) in terms of two different error metrics (RMSE) and (MAE) in ppb.The models were evaluated on the same testing set with respect to their simulation using a different temperature setting shown on the x-axis.