Dual Oxygen and Temperature Luminescence Learning Sensor with Parallel Inference

A well-known approach to the optical measure of oxygen is based on the quenching of luminescence by molecular oxygen. The main challenge for this measuring method is the determination of an accurate mathematical model for the sensor response. The reason is the dependence of the sensor signal from multiple parameters (like oxygen concentration and temperature), which are cross interfering in a sensor-specific way. The common solution is to measure the different parameters separately, for example, with different sensors. Then, an approximate model is developed where these effects are parametrized ad hoc. In this work, we describe a new approach for the development of a learning sensor with parallel inference that overcomes all these difficulties. With this approach we show how to generate automatically and autonomously a very large dataset of measurements and how to use it for the training of the proposed neural-network-based signal processing. Furthermore, we demonstrate how the sensor exploits the cross-sensitivity of multiple parameters to extract them from a single set of optical measurements without any a priori mathematical model with unprecedented accuracy. Finally, we propose a completely new metric to characterize the performance of neural-network-based sensors, the Error Limited Accuracy. In general, the methods described here are not limited to oxygen and temperature sensing. They can be similarly applied for the sensing with multiple luminophores, whenever the underlying mathematical model is not known or too complex.


Introduction
The simultaneous determination of multiple physical quantities can be very advantageous in many sensor applications, for example, when an in-situ or a remote acquisition is required. If the physical effect on which the measurement method is based presents cross-sensitivity between more than one quantity, their simultaneous determination becomes a necessity. Optical luminescence sensing is particularly attractive for multiple sensing. Since several parameters can be measured using the same principle, namely luminescence, it is possible to use the same illumination or detection channels, thus allowing a compact and simple sensor design.
The typical approaches to multiple sensing are based on either the use of a single luminescence indicator (luminophore), in which the luminescence is sensitive to more than one physical quantity, or the use of several luminophores, one for each quantity, embedded in a substrate and placed in close physical proximity [1][2][3][4][5][6][7][8][9]. To be able to determine each quantity separately, it may be necessary to determine more than one optical property (e.g., absorption spectrum, emission spectrum, luminescence intensity, decay time). Another possibility is to measure one single optical property using special detection schemes that take advantage of the emission properties of the used luminophores [4,6,[10][11][12][13].
The problem of dual sensing is particularly relevant in applications that involve oxygen sensing. Since oxygen plays a major role for living organisms, the measurement of oxygen partial pressure is of great relevance in fields which range from medicine and biotechnology, to environmental monitoring [4,14]. One of the most used optical measuring approaches is based on dynamical luminescence quenching. When colliding with molecular oxygen, the energy of the excited luminophore is reduced due to radiationless deactivation. As a result, both the intensity and decay time of the luminescence are reduced (quenched) [15]. The dependence of the measured sensing quantity (e.g., luminescence intensity or decay time) on the relevant influencing factors needs to be described through mathematical models with a sufficiently complex parametrization. Among the cross-interfering quantities, temperature is the most relevant since both the luminescence and its quenching are strongly temperature-dependent phenomena. Therefore, in any optical oxygen sensor, the temperature must be continuously monitored, most frequently with a separate sensor, and used to correct the calculated oxygen concentration [16]. This task can be difficult in practical implementation and may become a significant source of error. Another disadvantage of this approach is that the parametrization of the sensor response is system-specific since it depends on how the sensing element was fabricated and on the sensor itself [17][18][19][20][21][22].
In this work, these difficulties are overcome through a new approach for sensor development based on neural networks for parallel inference. This enables accurate dual-sensing, using one single luminophore and by measuring a single quantity. Instead of describing the response of the sensor as a function of the relevant parameters through an analytical model, a neural network was designed and trained to predict both oxygen concentration and temperature simultaneously. Multi-task learning (MTL) architectures were chosen for this new approach because they can learn correlated tasks [23][24][25][26][27][28]. In a previous purely theoretical study that used only synthetic data, the authors showed that MTL architectures can be flexible enough to address multi-dimensional regressions problems [29] as required by this new type of sensor. This work demonstrates for the first time that this is indeed true by building, training and characterizing a real physical optical sensor based on this principle. To train the MTL neural network and to test the performance of the sensor on unseen data, a very large amount of data is needed. Since the collection cannot be performed by hand, a fully automated data collection setup was developed and used to both vary the sensor environment conditions (gas concentration and temperature) and to collect the sensor response.
This work proposes a paradigm shift from the classical description of the response of a sensor through an approximate model, to the use of MTL based sensor learning thanks to neural networks. These will learn the complex inter-parameter dependencies and sensor-specific response characteristics from a large amount of data automatically collected. This new method will enable to build sensors even if the response of the system to the physical quantities is too complex to be comfortably described by a mathematical model.

Measurement Principle
Luminescence-based oxygen sensors usually are based on a luminophore in which luminescence intensity and decay time decrease for increasing O 2 concentrations. This reduction is due to collisions of the excited luminophore with molecular oxygen, which thus provides a radiationless deactivation process. The dependence of the luminescence intensity and decay time of the luminophores used for oxygen sensing is best described by the Stern-Volmer (SV) equation [15]. Using a frequency-domain approach, the phase shift difference between the excitation and the emitted luminescence can be approximated using a two-site model [30,31] and written as [32] tan θ 0 (ω, T) where θ 0 and θ, are the phase shifts without and with oxygen, f and 1 − f are the fractions of the total emission for the two components, K SV1 and K SV2 are the corresponding Stern-Volmer constants, and ω is the angular modulation frequency. It is to be noted that the quantities θ 0 , f , K SV1 , and K SV2 are all temperature dependent [33][34][35]. Additionally, they also depend on the modulation frequency, which in the case of f , K SV1 , and K SV2 is an artifact due to the approximate nature of the model. Finally, Equation (1) needs to be inverted to determine [O 2 ] from the measured quantity θ.
From Equation (1) it is evident that the phase shift cannot be easily used to determine the oxygen concentration unless ω and T, the parameters f , K SV1 and K SV2 (including their dependencies from ω and T) are known. The proposed sensor not only overcomes the above-mentioned difficulties in finding an approximate mathematical model, but also allows the determination of multiple quantities simultaneously.

Experimental Setup and Dataset
The luminophore used for oxygen detection is Pt-TFPP, commercially available as Oxygen Sensor Spot (PSt3, PreSens GmbH, Regensburg, Germany). The optical setup for the luminescence measurements is described in [36]. The large amount of data needed for the training and the test of the neural network was acquired using an automated acquisition program written using the software LabVIEW by National Instruments. The flow chart of the automated data acquisition program is shown in Figure 1.
First, the program fixed the temperature and concentration of the gas in contact with the sensor. Then, the phase shift was measured for 50 modulation frequencies between 200 Hz and 15 kHz. This measurement was repeated 20 times. Next, keeping the temperature fixed, the program changed the oxygen concentration, and the entire frequency-loop was repeated. The oxygen concentration was varied between 0% air and 100% air in 5% air steps. Finally, the temperature was changed, and then the oxygen and frequency loops where repeated. The temperature was varied between 5 • C and 45 • C in 5 • C steps. The total number of measurements was thus 50 (frequencies) times 20 (loops) times 21 (oxygen concentrations) times 9 (temperatures) for a total of 189'000, which required a total acquisition time of approximately 65 h. This number of measurements was chosen as a compromise between maximizing the number of data and avoiding photodegradation, which naturally occurs when the sample is subjected to illumination.

Signal Processing Algorithm
The software component of this new sensor type is based on a neural network model (NNM). An NNM is made of three components [37]: a neural network architecture (that includes how neurons are connected, the activation functions and all the hyperparameters), a loss function (here indicated with L) and an optimizer algorithm. In this particular work we use what is called a Multi Task Learning (MTL) network architecture [25]. This architecture has different branches, each able to learn to predict a separate quantity (in our case one T and [O 2 ]). The details and parameters of the neural network architecture, of the loss function and of the optimizer used in this work are studied and described in detail in [29] and will not be described again here. The network was trained with two types of input to test its effectiveness. In the first case, each observation consists of a vector of 50 values defined as where w i are the 50 values of the angular modulation frequency of the excitation light (see Section 2.2). The measured phase shift was divided by 90 to normalize the inputs between 0 and 1. In the second case, each observation is where θ 0 (w i ) is the value of the measured phase shift without oxygen quenching at the angular modulation frequency w i . The loss function was minimized using the optimizer Adaptive Moment Estimation (Adam) [37,38]. The implementation was performed using the TensorFlow TM library. The training was performed with a starting learning rate of 10 −3 . Two types of training were investigated to compare the training efficiency and performance of the network. No-batch training: with this method all the training data are used to perform an update of the weights and to evaluate the loss function.
Mini-batch training: with this method the weights update is performed after the network has seen 32 observations (this number is called mini-batch size [37]). For each update of the weights, 32 random observations are chosen from the training dataset without repetitions until all the training data are fed to the network. The size of the mini-batch was chosen as a compromise between a good performance (measured through the value of the loss function) and the duration of training.
No-batch training has the advantage of stability and requires less time for each epoch since it performs one update of the weights using the entire training dataset. Mini-batch training is normally more effective in reaching small values of the loss function in fewer epochs, but it requires more time for each epoch [37]. In our experiments the training for 20'000 epochs took roughly five minutes for no-batch training, and approximately 1 h with mini-batch training with mini-batch size of 32, thus resulting ca. 12 times slower. The training has been performed on a 2.2 GHz 6-Core Intel Core i7, with 32 GB of RAM. No GPU acceleration was used.

Sensor Performance Evaluation
To evaluate the performance of the sensor, the dataset S of measured data was divided into two parts: one containing 80% of randomly chosen observations (indicated with S train ), and one containing the remaining 20% of the data (indicated with S test ). All the results presented here were obtained by measuring the different metrics on the S test dataset.
The metric used to compare predictions from expected values is the absolute error (AE) defined as the absolute value of the difference between the predicted and the expected value for a given observation. The mean of the AE overall observations of a given dataset is the mean absolute error MAE and is a further metric used to characterize the performance. In Section 3, the prediction distribution of the AEs for both the oxygen and temperature predictions is discussed in detail. To better illustrate this distribution, the kernel density estimate (KDE) of the AEs was also evaluated. Details on the calculation of the AE, MAE and KDE can be found in [29].

Error Limited Accuracy
Generally, in a commercial sensor, the accuracy quantifies the performance of the sensor and helps to decide if the chosen device is appropriate for the application of interest. The above-defined metrics (AE, MAE and KDE) are useful to compare the performance of different NNMs but do not help quantify which error the sensor reading will ultimately have in practice. For this reason, in this work we introduce a new metric, called Error Limited Accuracy (ELA) and indicated with η. Definition 1. In a regression problem, given the metric AE, and a chosen value of itÂE, the ELA η limited by the errorÂE is defined as the number of predictionsŷ of the NNM that lie in the range |ŷ − y| ≤ÂE, with y the expected value, divided by the total number of observations. It will be indicated with η(ÂE). Given the set where |E(ÂE)| is the cardinality of the set E(ÂE) or, in other words, the number of its elements. y [i] andŷ [i] are respectively the expected and predicted value of the target variable for observation i.
This metric allows interpreting the regression problem as a classification one. η(ÂE) simply describes how many observations are predicted by the NNM within a given value of the absolute error. In other words, it represents the percentage of predictions that are within a certain errorÂE. Therefore, if we makeÂE big enough, all the predictions will be classified perfectly, so η(ÂE) is expected to approach 1 for increasingÂE. On the other hand, the smallerÂE is, the lower will be the number of predictions correctly classified. We finally define AE as the minimum value ofÂE for which η(ÂE) = 1, so the minimum value of the absolute error for which the network predicts all the observations correctly. This value (AE) can be interpreted as the biggest error in the sensor predictions.

Pt-TFPP Luminescence
As described in Section 2.1, the phase shift depends non-linearly on the oxygen concentration according to the Stern-Volmer equation. It also depends on the temperature, which influences the luminescence and the collision mechanisms, and on the modulation frequency of the excitation light. The experimental observations for the phase shift for variations of these three quantities are shown in Figures 2-4. Figure 2 shows the measured phase shifts as a function of the oxygen concentration at a constant modulation frequency of 6 kHz and for increasing temperatures. For clarity, the results at selected temperatures are shown. The decrease of the phase shift due to the collisional quenching is clearly visible in all curves. The phase shift is, as expected, also strongly temperature-dependent. For [O 2 ] = 0, in the absence of oxygen, the reduction of the phase shift with increasing T is due to temperature quenching; the influence of temperature becomes stronger at higher oxygen concentration, as a result of the increase of the diffusion rates of oxygen through the sample.   For a given oxygen concentration, the phase shift is strongly dependent on the modulation frequency, as it can be seen in Figure 3, where the shape of the frequency response is determined by the distribution of decay times of the sample. From the figure it is visible that the reduction of the phase shift with increasing temperatures is not constant but depends on the modulation frequency.
For completeness, the effect of the oxygen concentration on the frequency response at a fixed temperature is shown in Figure 4. Compared to Figure 3, the frequency response of the sample is affected more strongly by the oxygen concentration than by temperature. In other words, the sample has a higher sensitivity to oxygen than to temperature.
The measurements of Figures 2-4 show how similar the curves of the phase shift are for different values of oxygen, temperature and modulation frequency. This helps to understand why it is not possible from the measurement of the phase shift, or even of the phase shift for varying modulation frequencies, to simultaneously determine both the oxygen concentration and the temperature using Equation (1). The temperature must be known in advance and used to compute the oxygen concentration. This is no longer the case for the proposed sensor, as it will be shown in the next section.

Sensor Performance
First, the effect of the training on the sensor performance was investigated. As described previously, the neural network was trained with no-batches and with mini-batches. For this comparison the network was trained for 20'000 epochs using the input observations θ θ θ s as defined in Equation (2). The results for AE [O 2 ] and AE T are shown in Figure 5A,B, respectively. The blue histogram shows the AE distribution when using no-batch, the gray when using mini-batches of size 32. The KDE profiles help to illustrate the features of the histogram. The effect of introducing mini-batches on the performance is significant. The predictions distributions get much narrower, the mean average errors decrease from MAE [O 2 ] = 2.4% air and MAE T = 3.6 • C to MAE [O 2 ] = 1.4% air and MAE T = 1.6 • C. Although the performance is significantly improved, from Figure 5A,B it can also be clearly seen that errors as high as approximately 5% air for [O 2 ] or 12 • C for T are still possible. Figure 5C,D shows the effect of the training length. Here the comparison is between prediction distributions with 20'000 and 100'000 epochs (always using a mini-batch of size 32), using the input observations θ θ θ s as defined in Equation (2). The effect of longer training is a dramatic improvement in the performance. When the network was trained for 100'000 epochs the mean average errors were reduced to only MAE [O 2 ] = 0.22% air and MAE T = 0.27 • C. Additionally, all the predictions for [O 2 ] lie below 0.94% air, and for T lie below 2.1 • C. The results of Figure 5C,D demonstrate two new findings: (1) with the proposed approach, it is possible to predict both [O 2 ] and T at the same time from the phase shift using a single luminophore and a set of measurements; (2) the prediction has an expected error that is comparable or below the typical accuracy of commercial sensors. The possibility of dual sensing paves the road to the development of a completely new generation of sensors. The price to pay is that the training of a network for 100'000 epochs requires approximately 5 h on the hardware described earlier.
To investigate if the training can be performed more efficiently, the normalized phase shift θ θ θ n defined in Equation (3) was used as input to the network. The performance of the network in this case, with a mini-batch size of 32 and a training of 20'000 epochs is shown in Figure 5E,F. With this input the performance is further improved: even if the number of epochs is only 20'000 the mean average errors are better than what was obtained with θ θ θ s and a training of 100'000 epochs, achieving MAE [O 2 ] = 0.13% air and MAE T = 0.24 • C. The distributions are also narrower, particularly for the temperature. Additionally, all the AE [O 2 ] lie below 0.87% air, and AE T below 1.7 • C. This type of training is clearly more efficient. The reason may lie in the additional information which is fed to the network when using the input θ θ θ n and in the simplified functional behavior of θ θ θ n compared to θ θ θ s (see Equation (1)).
The performance of the different neural networks is summarized in Table 1. The response time of the sensor is due to the sum of two contributions: the actual measurement time of the phase shift and the time needed by the algorithm to calculate the oxygen concentration and temperature. The measurement time for 50 frequencies with our setup was below one minute but could be easily improved by reducing the time delays in the communication between the various instruments.

Error Limited Accuracy
The metrics discussed in the previous sections are useful to compare the network performance and to measure how good the predictions are. However, they do not offer an understanding on what a sensor built with such a model could achieve. For practical applications, the relevant question is rather what is the maximum error the sensor will have in predicting the oxygen concentration and temperature. To answer this question, the ELA (η) defined in Section 2.4 can be used. Figure 6 displays the ELA η( AE) for oxygen concentration (A) and for the temperature (B). In each panel, the results obtained with the bests models described before are shown: the ELAs using the input θ θ θ n and a training for 20'000 epochs are shown in black, and the ELAs obtained using the input θ θ θ s and a training for 100'000 epochs in red. In both cases, the training was performed with mini-batches of size 32. The dashed lines indicate the values of the AE [O 2 ] and AE T for which the error limited accuracy η equals 1. In other words, all the predictions will have an error equal or smaller than AE.
From Figure 6A can be seen that, for the network trained with θ θ θ s as input, the model would predict perfectly all the oxygen concentrations within 0.95% air error. For the network trained with θ θ θ n this value is further reduced to 0.87% air. AE [O 2 ] can be interpreted as the accuracy a sensor based on this NNM would have. Figure 6B shows the results of the same analysis for the temperature measurement. The interpretation is similar to the one given above for the oxygen concentration. For the network trained with θ θ θ s as input, the model would predict perfectly all the temperature values within AE T = 2.1 • C error. For the network trained with θ θ θ n this value would be AE T = 1.7 • C. The values of AE [O 2 ] and AE T are summarized in Table 2. The black lines are the results obtained with a network that was trained with θ θ θ n as input for 20'000 epochs with mini-batchs of size 32, while the red ones with θ θ θ s as input for 100'000 epochs with mini-batchs of size 32. The dashed lines indicates the values of the AE for which the predictions would give η = 1. Table 2. Summary of the values of AE for the cases shown in Figure 6A,B.

Conclusions
In this work, the realization of a new type of sensor based on luminescence sensing is presented. The proposed sensor allows parallel inference, or the extraction of multiple physical quantities simultaneously, from a single set of measurements without any a priori mathematical model, even in the presence of cross interferences. Classical approaches to this type of problem in physics can be challenging or impossible to solve if the mathematical models describing the functional dependencies are too complex or even unknown. This sensor, which uses a single luminophore and a single measuring channel can measure simultaneously both the oxygen concentration and the temperature of a medium. This is achieved using a multi-task learning neural network model, which was trained on a very large dataset. The results in the prediction of the oxygen concentration and temperature show unprecedented accuracy for both parameters, demonstrating that this approach could open up the possibility of a new generation of dual-or even multiple-parameter sensors. Estimating the accuracy of a sensor based on a given NNM approach is intrinsically difficult. For this reason, the new metric Error Limited Accuracy ELA is proposed. The ELA enables to estimate how many predicted values lie within a certain absolute error from the expected measurement. This new metric allows therefore the estimation of the maximum measurement error of any NNM-based sensor.
The ability to predict both [O 2 ] and T at the same time, from a single set of data obtained with a single indicator, has profound implications for the development of luminescence sensors. Sensors will become easier and cheaper to build since no separate temperature measurements are necessary anymore. Generally, this work shows that the effect of interferences can be learned by the neural network and do not need to be corrected for in the data processing.
This work opens the road to complete new optical sensing approaches for future generations of sensors. Those sensors will be able to extract multiple physical quantities from a common set of data at the same time to achieve consistent results that are both accurate and stable. The described approach is relevant for many practical applications in sensor science and demonstrates that this model-free approach has the potential of revolutionizing optical sensing.