Multivariate Analysis of Photoacoustic Spectra for the Detection of Short-Chained Hydrocarbon Isotopologues

We report, to our knowledge, the first optical detection scheme for short-chained hydrocarbon isotopologues. The sensor system is based on photoacoustic spectroscopy (PAS). Two continuous wave, thermoelectrically cooled, distributed feedback interband cascade lasers (DFB-ICLs) with emission wavelengths around 3.33 and 3.38 μm, respectively, served as light sources. The investigations comprised the main stable carbon isotopologues of methane (12CH4, 13CH4), ethane (12CH3-12CH3, 13CH3-12CH3, 13CH3-13CH3), and propane (12CH3-12CH2-12CH3, 13CH3-12CH2-12CH3). They were selected because of their importance for numerous applications from climate and planetary research to natural gas exploration. Multiple measurements of single components in nitrogen and synthetic mixtures were conducted at room temperature and atmospheric pressure. Depending on the investigated hydrocarbon isotopologue, detection limits ranging from 0.043 ppmv to 3.4 ppmv were achieved. For a selective concentration determination, multivariate analysis (MVA) was applied. Partial least-squares regression (PLSR) was used to calculate concentrations from the PA spectra. The implementation of MVA has shown that the PA setup in principle works reliably and that the selective concentration determination of short-chained hydrocarbon isotopologues is possible.


Introduction
In environmental policy discussions, people often only talk about the carbon dioxide (CO 2 ) emissions that need to be reduced and controlled in order to stop global warming. Due to simplified news coverage, many people are not aware that atmospheric methane (CH 4 ) plays a major role in the greenhouse effect as it has a considerably higher global warming potential compared to CO 2 , i.e., CH 4 emission will have 28 times the impact on temperature of a CO 2 emission of the same mass over the following 100 years [1]. In addition to its own climate impact, the presence of CH 4 in the atmosphere can also affect the concentration of other greenhouse gases, such as tropospheric ozone (O 3 ), water vapor, and CO 2 [2]. While CH 4 does not cause direct harm to human health or crop production, its role as precursor gas contributes significantly to the negative health and agricultural impacts of O 3 [3,4]. In contrast to CO 2 , the climate system reacts rapidly to changes in CH 4 emissions and reducing CH 4 emissions could provide the opportunity to immediately achieve major benefits for global and regional climate as well as for human health and agriculture [5]. A better understanding of the global methane budget is crucial to take meaningful measures. This can only be achieved by the biological and agricultural applications, industrial/manufacturing processes, and medical diagnostic (e.g., non-invasive breath analysis) [18][19][20].
In this work, we report the first PAS sensor for the detection of methane (CH 4 ), ethane (C 2 H 6 ), and propane (C 3 H 8 ) including their main stable 13 C-isotopologues ( 13 CH 4 , 13 CH 3 -12 CH 3 , 13 CH 3 -13 CH 3 and 13 CH 3 -12 CH 2 -12 CH 3 ) by using two DFB-ICLs operating in the wavelength range from 3327.7 nm to 3334.7 nm (ICL 1541, methane and ethane detection) and from 3373.3 nm to 3396.7 nm (ICL 1638, propane detection), respectively. In order to obtain the concentrations of individual substances of a measured spectrum containing multiple substances, multivariate analysis (MVA) [21,22] can be applied. This was also demonstrated in a recent work, where PA measurements of volatile organic compounds from strongly overlapping spectra in the mid IR region were evaluated [23]. Well-known multivariate regression models are the multiple linear regression, the principal component regression, and the partial least-square regression (PLSR), whereby the latter has become the standard method for spectroscopy [21,24]. If the data are noisy and strongly correlated, PLSR is considered to be superior to the other techniques [23,24]. Here, a PLSR model was trained and verified with experimental data of single and multiple components in synthetic gas mixtures. In addition, the sensor's limit of detection (LOD) was determined.

Photoacoustic Spectroscopy
The PAS technique uses the fact that absorbed electromagnetic radiation is partially converted into kinetic energy by inelastic molecular collisions. This corresponds to an increase in the temperature of the irradiated volume. If the radiation is modulated, this will lead to modulated heating and, thus, generate a pressure wave with an amplitude proportional to the concentration of the absorbing molecules, which can be recorded with a microphone. For a single absorbing compound and under the condition that the absorption is not saturated, the wavelength-dependent PA signal S (in V) as function of the wavelength λ, is given as [15,25] S(λ) = CP(λ)N tot cσ(λ) (1) where C (in V/(cm −1 W)) is known as the cell constant depending mainly on the cell geometry but also on the cell wall surface, beam profile, the modulation frequency, and the microphone sensitivity. P(λ) (in W) is the laser power, N tot is the total number density of molecules (in molecule/cm 3 ), and the coefficients c and σ(λ) are the concentration and absorption cross section (in cm 2 molecule −1 ) of the substance of interest, respectively. The PA signal is directly proportional to the laser power, concentration of the absorbing molecules, and the cell constant. If the laser modulation frequency corresponds to an acoustical resonance frequency of the cell, a significant signal amplification can be obtained.

Experimental Setup
A schematic diagram of the experimental setup is illustrated in Figure 1. Two cw DFB-ICLs operating at wavelengths around 3.33 µm (ICL 1541, Nanoplus, Gerbrunn, Germany) and 3.38 µm (ICL 1638, Nanoplus), respectively, were selected as radiation sources for the isotope-selective detection of methane/ethane and propane. The laser sources (ICLs) were employed one after another. Each of them was operated by the same temperature controller (TTC001, Thorlabs, Newton, NJ, United States) and current controller (TLD001, Thorlabs). The laser beam modulated by an optical chopper (model 300CD, Scitec Instruments, Trowbridge, United Kingdom) and passes the H-shaped PA cell [26] before hitting a power meter (thermal head 3A-FS-SH, Ophir Optronics, Jerusalem, Israel). The power meter records the transmission spectrum. The linear response of the PA signal as function of laser power according to Equation (1) is in the strict sense only valid for weak absorptions. Stronger ones weaken the PA spectrum's amplitudes and require a further correction.
The spectral regions that required correction were determined by examining the transmission spectra. A mirror and two irises are required for the laser alignment. The chopper frequency at 2750 Hz is set by a dual phase digital signal processing lock-in amplifier (LIA: model 7265, Signal Recovery, Oak Ridge, TN, United States). The LIA with a time constant set to 1 s is also responsible for the signal processing of the preamplified (PAS PMV 201, PAS-Tech, Hamburg, Germany) PA signal recorded by the analogue microphone (EM-158N, Primo, Tokyo, Japan). The system control and the data acquisition are performed with a computer. Further details of the ICLs can be found elsewhere [15].
An overview of the PAS gas flow system is illustrated in Figure 2.
Molecules 2020, 25, x FOR PEER REVIEW 4 of 18 Hz is set by a dual phase digital signal processing lock-in amplifier (LIA: model 7265, Signal Recovery, Oak Ridge, TN, United States). The LIA with a time constant set to 1 s is also responsible for the signal processing of the preamplified (PAS PMV 201, PAS-Tech, Hamburg, Germany) PA signal recorded by the analogue microphone (EM-158N, Primo, Tokyo, Japan). The system control and the data acquisition are performed with a computer. Further details of the ICLs can be found elsewhere [15]. An overview of the PAS gas flow system is illustrated in Figure 2.  Four gas lines, one for nitrogen (N2) and three for sample gases, lead to a connecting tube, which is attached to a pressure meter, the inlet of the mixing cell (for preprocessing) and to the PA cell. The  Hz is set by a dual phase digital signal processing lock-in amplifier (LIA: model 7265, Signal Recovery, Oak Ridge, TN, United States). The LIA with a time constant set to 1 s is also responsible for the signal processing of the preamplified (PAS PMV 201, PAS-Tech, Hamburg, Germany) PA signal recorded by the analogue microphone (EM-158N, Primo, Tokyo, Japan). The system control and the data acquisition are performed with a computer. Further details of the ICLs can be found elsewhere [15]. An overview of the PAS gas flow system is illustrated in Figure 2.  Four gas lines, one for nitrogen (N2) and three for sample gases, lead to a connecting tube, which is attached to a pressure meter, the inlet of the mixing cell (for preprocessing) and to the PA cell. The Four gas lines, one for nitrogen (N 2 ) and three for sample gases, lead to a connecting tube, which is attached to a pressure meter, the inlet of the mixing cell (for preprocessing) and to the PA cell. The gas lines consist partly of polytetrafluoroethylene and partly of stainless-steel tubes. The cell pressure was measured using a ceramic pressure transducer (PM 3) connected to the vacuum gauge (DVR 5, Vacuubrand, Wertheim, Germany). The transducer exhibits a resolution of 0.1 hPa in the range between 0.1 hPa and 10 hPa and a resolution of 1 hPa for pressures higher than 10 hPa. The outlets of the cells were connected to the vacuum pump, a chemistry-hybrid pump (RC 6, Vacuubrand), which enabled the evacuation of the gas lines including the cells down to 2.0 hPa. The sample mixtures were produced injecting a defined amount of one or more sample gases into the evacuated mixing cell and then adding N 2 up to atmospheric pressure. Thereafter, the evacuated PA cell was filled with a small amount of the gas mixture from the mixing cell and, if required, additional amounts of different sample gases. Finally, the PA cell was filled with N 2 up to atmospheric pressure to obtain the desired gas mixture. For some target concentrations, the usage of the mixing cell is not necessary. The concentrations, purities, and suppliers of the sample gases are listed in Table 1. The gases CH 4 , C 2 H 6 , and C 3 H 8 were each present at natural abundance (NA). Assuming that isotopes are randomly distributed throughout all isotopologues, the NA of the main methane isotopologues 12 CH 4 and 13 CH 4 are 98.8% and 1.11%, respectively [27]. The NA of the main ethane and propane isotopologues was estimated (based only on the 13 C/ 12 C ratio) to be 12 CH 3 -12 CH 3 : 97.8% 13 CH 3 -12 CH 3 : 2.2%, 13 CH 3 -13 CH 3 : 0.013%, 12 CH 3 -12 CH 2 -12 CH 3 : 96.7%, and 13 CH 3 -12 CH 2 -12 CH 3 : 2.2%, respectively.

Measurement and Data Processing
Measurements were performed on single and multiple component mixtures of sample gas with N 2 under ambient conditions. In total, 62 (43 with ICL 1541 and 19 with ICL 1638) different SC hydrocarbon measurements/spectra were recorded. It should be noted, that a considerably larger dataset is usually required to set up a reliable prediction model. However, the primary goal of this research project was to show the feasibility. The ICL 1541 was used for all methane and ethane measurements, and the ICL 1638 for all propane measurements. The concentrations of the hydrocarbon used vary between 50 ppmv and 1 vol%. The individual concentrations for each measurement are listed in Table A1 of the Appendix A. The measured spectra are displayed in Figures A1 and A2 of the Appendix A to give an impression of the difference.
Further details of the recording of the PA spectra with the ICLs and the processing of the spectra can be found elsewhere [15]. The processing involved averaging, power normalization, and wavelength correction.

Detection Limits
The signal-to-noise ratio (SNR) is commonly defined as the ratio of the signal power to the background noise power and is often expressed in dB. In this work, the SNR is calculated as the highest possible PA signal S max of a specific substance divided by the standard deviation of the PA signal during a nitrogen measurement σ N 2 : where SNR is of dimension 1 and σ N 2 can be considered the noise level. PAS is a background-free technique, and, in principle, no PA signal is generated in absence of an absorber. Nitrogen does not show any absorption in the mid IR and therefore an N 2 spectrum is perfectly suited to measure the background noise. The mean value of the PA signal during a N 2 measurement represents an offset, e.g., due to window absorptions, and does not influence the SNR. The LOD is specified as the lowest quantity/concentration of a substance that can be distinguished from the absence of that substance.
Here, the SNR shall be used for the calculation of the LOD of a substance: where c is the concentration of the substance during the SNR measurement. The obtained LOD is idealized, because it would be hard to distinguish a meaningful signal from the noise level, if both are of the same magnitude. Therefore, the value represents only a rough estimate of the experimentally achievable LOD. In order to measure the noise level of the setup, the sample cell was flushed with pure N 2 and measurements were conducted under ambient conditions with both ICLs. The standard deviations of the N 2 spectrum are 1.1 × 10 −5 V for the ICL 1541 and 7.9 × 10 −6 V for the ICL 1638, respectively.

Partial Least Squares Regression
Multicomponent measurements were processed with a PLSR model in the numerical software Matlab to evaluate the performance of the PA setup in detecting SC hydrocarbon isotopologue concentrations. The implementation of the PLSR method in Matlab is based on the SIMPLS algorithm developed by De Jong in 1993 [28]. The Matlab PLSR calculation function requires the gas spectra, the corresponding gas concentrations, and the number of selected latent variables (LVs) as input data.
As output values, loading and score matrix of X and Y, explained variance (in %) by the model by each LV, and the PLSR coefficients are computed. The PLSR coefficients are required to calculate target values (concentrations) out of initial data (spectra): where B PLSR is the PLSR coefficient matrix, X pre is the predictor matrix (spectra), Y res the response matrix (concentrations), and G the error matrix [23,24]. The predictive accuracy, i.e., the standard/mean error of a multivariate calibration model, is expressed as the root-mean-square error (RMSE). For that, the predicted residual error sum of squares (PRESS) is calculated following the equation [21]: where y i denotes the initial data (true concentrations),ŷ i the predicted values (calculated concentrations) by the model and n the number of probes. The RMSE is then determined as

Results
The linear behavior of the sensor was verified by evaluating the single-component gas measurements of the analytes. Representatively, the linearity of the single-component measurements of C 2 H 6 and C 3 H 8 are shown in Figure 3. The data demonstrate a good linearity with derived coefficients of determination of R 2 = 0.991 (C 2 H 6 ) and R 2 = 0.997 (C 3 H 8 ) and a small y-axis intercept of −0.162 V and −0.145 V, respectively.  The SNR and the LOD of the PAS sensor are calculated for each measured SC hydrocarbon isotopologue and are summarized in Table 2. In Figure 4, the respective PA spectra of the methane and the ethane isotopologues are shown. Although all these gases absorb across the entire wavelength range of the laser, each gas shows distinct absorption peaks, which makes them distinguishable from the others. In Figure 5, the PA spectra of the propane isotopologues are displayed. In contrast to the isotopologue spectra of methane and ethane, the propane isotopologue spectra show a broadband absorption structure with less sharp absorption peaks. In order to calculate the SNR, the maximum PA signal of each sample spectrum was determined from single component measurements with at low concentrations in N2. The SNR and LOD of the PAS sensor are different for each individual species. The PA signal depends equally on the absorption of the sample and on the laser power. This explains the noticeably higher LOD of CH4, because CH4 has a lower absorption strength (approximately one magnitude) in the spectral region of the laser source compared to the other compounds. It should be noted that due to the varying power performance of the ICLs at different wavelengths, the highest absorption peaks do not necessarily result in the maximum PA signal. The SNR and the LOD of the PAS sensor are calculated for each measured SC hydrocarbon isotopologue and are summarized in Table 2. In Figure 4, the respective PA spectra of the methane and the ethane isotopologues are shown. Although all these gases absorb across the entire wavelength range of the laser, each gas shows distinct absorption peaks, which makes them distinguishable from the others. In Figure 5, the PA spectra of the propane isotopologues are displayed. In contrast to the isotopologue spectra of methane and ethane, the propane isotopologue spectra show a broadband absorption structure with less sharp absorption peaks. In order to calculate the SNR, the maximum PA signal of each sample spectrum was determined from single component measurements with at low concentrations in N 2 . The SNR and LOD of the PAS sensor are different for each individual species. The PA signal depends equally on the absorption of the sample and on the laser power. This explains the noticeably higher LOD of CH 4 , because CH 4 has a lower absorption strength (approximately one magnitude) in the spectral region of the laser source compared to the other compounds. It should be noted that due to the varying power performance of the ICLs at different wavelengths, the highest absorption peaks do not necessarily result in the maximum PA signal.    Table 2.
The method of choice for the validation of the PLSR model was a full cross validation (CV). Usually, all measurements have to be divided into calibration measurements to train the PLSR model (as a reference) and test measurements to verify the validity of the model. The CV is one of the most frequently used and efficient validation methods because it uses all recorded data for calibration as well as for validation [21]. In the full CV, calibration models are created, where n stands for the number of probes (spectra). Each model uses − 1 probes for calibration and the left-out probe for validation. This is performed times so that each probe serves once as validation. As part of the calibration, the PLSR model calculated five and two LVs for the measurements with the ICL 1541 and the ICL 1638, respectively, each LV representing the contribution of one hydrocarbon isotopologue. The 43/19 models were able to explain on average (96.7 ± 0.4)% (ICL 1541) and (99.1 ± 0.1)% (ICL 1638) of the total variance in the dataset regarding the target variable (Y). Figure 6 shows two exemplary PA measurements of multi-component mixtures. Table 3 lists the five concentrations determined by the pressure readings and the PLRS models. The method of choice for the validation of the PLSR model was a full cross validation (CV). Usually, all measurements have to be divided into calibration measurements to train the PLSR model (as a reference) and test measurements to verify the validity of the model. The CV is one of the most frequently used and efficient validation methods because it uses all recorded data for calibration as well as for validation [21]. In the full CV, n calibration models are created, where n stands for the number of probes (spectra). Each model uses n − 1 probes for calibration and the left-out probe for validation. This is performed n times so that each probe serves once as validation. As part of the calibration, the PLSR model calculated five and two LVs for the measurements with the ICL 1541 and the ICL 1638, respectively, each LV representing the contribution of one hydrocarbon isotopologue. The 43/19 models were able to explain on average (96.7 ± 0.4)% (ICL 1541) and (99.1 ± 0.1)% (ICL 1638) of the total variance in the dataset regarding the target variable (Y). Figure 6 shows two exemplary PA measurements of multi-component mixtures. Table 3 lists the five concentrations determined by the pressure readings and the PLRS models.   Table 3. The standard approach to evaluate the prediction/quality of the model would be to specify the error (RMSE) according to Equations (5) and (6) of each individual calibration model and average it over all RMSE in the course of the CV (RMSECV). The RMSEs of Mixtures 29 and 40 equal 123 ppmv and 562 ppmv, respectively. If we leave CH4 due to its comparably high concentration out, we get an   Table 3. The standard approach to evaluate the prediction/quality of the model would be to specify the error (RMSE) according to Equations (5) and (6) of each individual calibration model and average it over all RMSE in the course of the CV (RMSECV). The RMSEs of Mixtures 29 and 40 equal 123 ppmv and 562 ppmv, respectively. If we leave CH4 due to its comparably high concentration out, we get an  Table 3. The standard approach to evaluate the prediction/quality of the model would be to specify the error (RMSE) according to Equations (5) and (6) of each individual calibration model and average it over all RMSE in the course of the CV (RMSECV). The RMSEs of Mixtures 29 and 40 equal 123 ppmv and 562 ppmv, respectively. If we leave CH 4 due to its comparably high concentration out, we get an RMSE of Mixture 29 of 13 ppmv and an RMSE of Mixture 40 of 98 ppmv. Hence, we consider a separation of the error into the respective components to be more informative, since, for example, larger concentrations were selected for CH 4 (with possible larger errors) in order to obtain PA signals of the same magnitude as the other components. However, it is also possible for the PLSR to predict negative concentrations. Real concentrations are logically greater or equal to zero. Here, the negative predicted concentrations were set to zero. That way, the error (RMSECV 0 ) could be slightly improved. In addition, in order to obtain a better understanding of the error in relation to the respective concentration, the CV mean relative residual (MRR) of the measurements of the individual components was determined. All the CV results are summarized in Table 4.
The concentrations retrieved by the PLRS models can be obtained from Table A2 of the Appendix A. The residuals and the relative residuals of all predicted concentrations in the respective calibration model can be seen in Figures A3-A6 of the Appendix A, respectively.

Discussion
In general, the quality of the MVA strongly depends on the applied calibration method as well as on the used spectra. The PLSR is applicable as long as the individual components contributing to the overall spectrum are linearly independent of each other and possess characteristic features with signals larger than the noise level. The recorded spectra depend on the experimental setup, the absorption strength of the investigated compounds, as well as their concentration. The smallest RMSECV (28 ppmv) and MRR (7.3%) were found for 13 C 12 C 2 H 8 measured with the ICL 1638. For the ICL 1541, the largest RMSECV (604 ppmv) and at the same time smallest MRR (9.7%) can be assigned to CH 4 . This proves that the RMSE is highly dependent on the concentration, which in the case of CH 4 is, on average, one magnitude larger than the other hydrocarbons. The largest MMRs (15.6-17.6%) were found for the ethane isotopologues measured with the ICL 1541. These discrepancies between the expected and the regressed concentrations are too high to be attributed only to the regression algorithm. Moreover, the estimated errors for some concentration values such as the CH 4 in Mixture 29 or the C 2 H 6 in Mixture 40 appear to be too small to justify the difference with the retrieved concentration values. Additionally, the variation for retrieved concentrations where a gas component was absent is relatively large. For CH 4 , it varies between −756.2 ppmv in Mixture 4 and +1241 ppmv in Mixture 40. All these results point out an underlying problem with the gas line management and/or the recorded mixed sample spectra. Especially with a limited dataset, each single measurement has a significant influence on the model. The uncertainty of the results originates (in the sense of error propagation) from the uncertainty in the sample preparation as well as from all other influences of the experimental setup. The sample preparation is associated with a certain degree of inaccuracy. The method of preparing the gas mixtures results in an uncertainty of the partial pressures and, therefore, the concentrations. After inserting sample gas into the cell and closing the valves, the gas line between the cross-adaptor and the cell was still filled with this component. When the cell was subsequently filled up to ambient pressure with N 2 , this remainder was pushed into the cell, thus falsifying the sample gas concentration. In addition, the uncertainty in the sample preparation of multiple components is considerably higher because the determination is solely calculated out of the pressure readings of the gas system. Furthermore, it can be assumed that the different components are not particularly well mixed in the gas system and mixing cell and that unwanted remaining substances in the system cannot be completely excluded. In fact, the different components are linearly dependent of each other because all gases at NA include small proportions of their 13 C isotopologues, which also increases the error. Considering the fact that in this work only a simple PA setup was realized without substantial efforts to maximize the SNR or LOD, these results are very promising, especially since many improvements are still possible, e.g., cell optimization, signal enhancement with optimized amplifier and LIA parameters, and the application of direct modulation of the ICL instead of using the mechanical chopper.

Conclusions
Overall, the PA setup has demonstrated to work dependable in distinguishing SC hydrocarbon isotopologues and, in combination with the evaluation algorithm, enables an isotope-selective concentration determination of SC hydrocarbons. The current way of sample preparation offers room for improvement to minimize the concentration uncertainty. This can be achieved, for instance, with a suitable gas mixer, a system based on the combination of pressure controller and flow controller, and/or with the acquisition of hydrocarbon gas samples with accurate premixed concentrations. Afterwards, a proper design of experiment should be used for hydrocarbon concentration mixture combinations. This will help to plan necessary future measurements with synthetic multicomponent gas mixtures for the calibration model (required for the sensor's concentration determination). In the future, the research will focus on several tasks concerning the optimization of the PA setup. The achievable LOD can be further enhanced by numerous experimental improvements. The used PA cell in this work has a relatively low Q-factor. An exchange of the current cell with an optimized PA cell with a higher Q-factor directly improves the PA signal and lowers the LOD. The feasibility of using a quartz tuning fork (QTF) as acoustic transducer will be further investigated, since QTFs can have Q values > 10,000 [29], which is unreachable with traditional PA cells. It should be noted that a reorientation to the quartz enhanced photoacoustic spectroscopy (QEPAS) technique (utilizing QTFs) is associated with several substantial changes of the PA setup. Commercially available QTFs work with a resonant frequency of ca. 32.8 kHz; the use of an optical chopper is therefore no longer possible. Therefore, the direct modulation of the ICL will be studied more intensively together with certain modulation methods like the wavelength modulation spectroscopy (WMS). Both changes offer the distinct advantage of efficient noise suppression [30,31]. In addition, the enhanced frequency stability of an applied direct modulation compared to an optical chopper together with the elimination of mechanical noise can result in better sensor sensitivity. The WMS approach was shown in a recent study using an ICL for the simultaneous detection of CH 4 , C 2 H 6 , and C 3 H 8 (each at NA) using QEPAS [32]. In addition, the development of QEPAS is also still ongoing; therefore, further improvements will be achieved by implementing new generations of QTFs, overtone modes operation, and new micro-resonator systems [32]. Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.