Quantitative Analysis of Gas Phase IR Spectra Based on Extreme Learning Machine Regression Model

Advanced chemometric analysis is required for rapid and reliable determination of physical and/or chemical components in complex gas mixtures. Based on infrared (IR) spectroscopic/sensing techniques, we propose an advanced regression model based on the extreme learning machine (ELM) algorithm for quantitative chemometric analysis. The proposed model makes two contributions to the field of advanced chemometrics. First, an ELM-based autoencoder (AE) was developed for reducing the dimensionality of spectral signals and learning important features for regression. Second, the fast regression ability of ELM architecture was directly used for constructing the regression model. In this contribution, nitrogen oxide mixtures (i.e., N2O/NO2/NO) found in vehicle exhaust were selected as a relevant example of a real-world gas mixture. Both simulated data and experimental data acquired using Fourier transform infrared spectroscopy (FTIR) were analyzed by the proposed chemometrics model. By comparing the numerical results with those obtained using conventional principle components regression (PCR) and partial least square regression (PLSR) models, the proposed model was verified to offer superior robustness and performance in quantitative IR spectral analysis.


Introduction
With the development of advanced technologies in medical, industrial, and environmental applications, gas sensing has been applied to play an essential role in many areas [1,2]. Currently, researches on gas sensing can mainly be divided as two parts: qualitative analysis and quantitative analysis [3]. Compared to aiming at only recognizing the components of a gas mixture, the latter obtains the concentration of gas components, which is relevant for industrial measurements, e.g., in the manufacturing industry, as well as transportation, environmental, and food security.
Quantitative gas analysis could benefit from a variety of technologies [4,5], among which gas chromatography (GC) and spectroscopic sensing are two frequently applied methods [6]. Gas chromatography is time consuming and operates discontinuously, whereas spectroscopic methods stand out due to their rapid response, compactness, and accuracy [7]. Moreover, spectroscopic methods can identify gases according to their more or less pronounced spectral signatures across the entire electromagnetic spectrum, and especially in near-infrared (NIR), mid-infrared (MIR), and Raman spectroscopy, which are commonly used for real time and in-field gas sensing applications [8][9][10][11].

Background Knowledge
According to the above description, the work determining the concentration of individual gas components in mixtures mainly involves with quantitative analysis in IR gas spectra. The methodology used herein relies on three main parts of background knowledge: (i) data pre-processing, (ii) predictive regression modelling, and (iii) performance evaluation. Some general work on these parts are presented as below.

Data Pre-Processing
Data pre-processing of complex IR spectra not only aims at extracting useful information discriminatory from interferences, e.g., data de-noising, normalization, and feature selection, but also specific processes such as baseline correction, optimizing the input spectral range, etc. [27]. In regression analysis, data pre-processing is expected to generate a reliable database for constructing a precise and robust relationship between input and output. Therefore, feature selection or variable selections is highly relevant. It is known that generic IR gas spectra may be composed of thousands of emission/absorption lines (i.e., variables), especially if high-resolution data is recorded [28]. However, it is detrimental to use all available variables for modelling, as a large number of variables not only increases the complexity and computation time, but also considers noise. Therefore, suitable variable selection is relevant to filter out noise, and build models using only variables carrying essential analytical information. A commonly-applied method is based on the selection of the suitable wavelength regimes, while avoiding spectral segments that do not provide molecularly relevant signatures, thereby also reducing computational expense. In addition, dimension reduction algorithms are usually applied, such as PCA which could be realized by Karhunen-Loeve transform (KLT) [29,30] generating orthogonal and independent feature vectors of the original data. By taking eigenvectors with the former largest eigenvalues to construct a transform matrix, the important features (then called 'principal components') [19] are selected with dimensionality reduction and having great explanation ability to the original data matrix. Thus, PCA is widely used for dimensional reduction and for de-noising.

Regression Analysis
Based on the pre-processed data, one may then focus on establishing predictive models for qualitative analysis (i.e., classification) and/or quantitative analysis (i.e., regression) of unknown samples. Since the prediction of the component concentration (i.e., quantitative analysis) is the focus of the present study, only regression modelling is considered in the following. Given the wide variety of applicable regression models including SLR, MLR, PCR, PLSR, SVR, NN, etc. [18], PCR and PLSR are considered among the most useful ones for the analysis of IR gas phase spectra.

PCR
PCR is a linear regression model based on PCA [31]. Compared with SLR/MLR using the most important variables in the original feature matrix directly, PCR regresses the target based on the principal components of the feature matrix, namely, transformed features, which may then be used to reproduce the original data. The expression of a PCR model is as follows: whereby Equation (1) is the expression of PCA decomposition; X is the original spectral matrix; and F X and S X represent the loadings and score matrix of X, respectively. Equation (2) is the final regression model, where Y represents the concentration matrix in gas mixtures; C is the regression coefficient matrix; and E X and E represent residual errors in the two equations.

PLSR
PLSR model is another useful model for the quantitative analysis of complex spectra. Different from PCR, PLSR extracts latent variables of both the original spectrum X and the target Y, and then constructs the regression model between latent variables. The PLS models are implemented as follows: The latent variables of X are still extracted via Equation (1), and those of Y are extracted via Equation (3) whereby F Y and S Y represent the loadings and score matrix respectively, and E Y is the residual error matrix.
(2) Modelling the regression relationship Assuming that these two latent variable matrices (SX and SY) are correlated to each other, then one may construct a regression model describing this relationship as: whereby C is the matrix reflecting the regression coefficients between S X and S Y , and E is the residual error. From the description above, it is evident that the aim of PLS modelling is to decompose both X and Y into two loadings and scores matrices, and build a regression model between the score matrices of X and Y with maximum covariance. Based on the PCR and PLSR described in Equations (2)-(4), it is evident that these two models are linear in describing the relationship between concentration and spectral signals, and that by using the obtained regression coefficients one may predict the concentration of a gas component of interest within an unknown sample.

Evaluation Metrics
Next to establishing a useful model, it is essential evaluating the performance of regression models via appropriate evaluation metrics. A wide variety of metrics are defined for regression and prediction analysis [27], e.g., regression error metrics like root mean square error (RMSE), mean absolute percentage error (MAPE), mean absolute error (MAE), and other correlation metrics like the coefficient of determination values (R 2 ). Considering that conventional error metrics are generally correlated, they are all expected to be close to 0 for well-performing regression models. R 2 describes how the variance of the dependent variable is influenced by the independent variable(s). Hence, the independent variables are regarded as significantly important when the value of R 2 is close to 1. Therefore, in this study two typical metrics were selected, namely RMSE and R 2 for evaluating performance of quantitative analysis.
whereby y i andŷ i represent the ith measured and predicted concentration of the given gas component, and n is the number of gas samples.

ELM-AE-Based Regression Model (ELM-AE-R)
In this study, we propose an advanced regression model based on ELM and ELM-AE, as described in detail below.

ELM Architecture
ELM was developed by Huang et al. [32] based on the architecture of single hidden layer feed forward networks (SLFNs). This novel machine learning algorithm has been successfully employed in a wide variety of fields, e.g., feature learning, dimension reduction, classification, and regression. Compared with conventional neural networks, its success mainly results from the following three aspects.
(1) With randomly generated weights in the input layer, ELM shows excellent generalization performance, and lends itself to real-world application scenarios. (2) Compared with conventional neural networks whose parameters, e.g., learning rate, learning epochs, and local minima are tuned iteratively, ELM fixes the input weights to obtain extremely fast learning speed. (3) ELM can be easily implemented to achieve both the smallest training error and the smallest norm of weights.
According to the topological structure of SLFNs, a generic ELM network can be constructed.
∈ R c is the input vector and t i is the target, then the ELM network with L hidden nodes can be modelled as follows: where, W = [w 1 , w 2 , . . . , w L ] is the weight matrix between input layer and hidden layer; b = [b 1 , b 2 , . . . , b L ] is the bias vector; g(*) is the active function which could be linear or nonlinear; β = [β 1 , β 2 , . . . , β L ] T is the output weight matrix; o i is the ith ELM output. By transforming the above formula into matrix form, Equation (7) is rewritten as below.
O=Hβ (8) where T is the final output matrix; H is the hidden layer output matrix, expressed as To train the optimal ELM network, we assume the objective is to minimize the error between model outputs and targets, expressed as where, T = [t 1 , t 2 , . . . , t N ] T is the target matrix. By plugging Equation (8) into the objective function Equation (10) and adopting the least square method for solution, the output weights β can be calculated as the follows: min where H † is the Moore-Penrose generalized inverse of the hidden layer output H, that can be calculated as

ELM-AE
Based on the description of modelling ELM networks, if we set the target T = X, then ELM becomes a self-learning network as auto-encoder (AE) [33], which is called ELM-based auto-encoder (ELM-AE). Conventional AEs are formed by a pair of encoder and decoder: the encoder for new features learning, and the decoder for feature reconstruction, such that the new ELM-AE can be constructed as shown in Figure 1. It is evident from Figure 1 that there is a single hidden layer in ELM-AE, which has randomly generated weights and biases for encoding. Therefore, the hidden outputs (encoder outputs) of a given data x can be expressed as To improve the generalization performance of ELM-AE, these randomly generated parameters A and b are usually chosen to be orthogonal, Via these orthogonal random parameters, the Euclidean information of input data is retained by ELM-AE, as described in Johnson-Lindenstrauss Lemma [34].
Then, as the description of AE, one can re-represent the original feature space through the decoders of ELM-AE. As ELM is a universal approximator, the output layer (decoder) of ELM-AE can be utilized to approximate any given function. According to the description above, the objective of ELM-AE decoder is to retain the information of input features as more as possible, i.e., approximating the original input, namely T = X. Therefore, the objective function in (11) can be expressed as: where, βAE is the output weights; H is the hidden layer matrix consisting of h(x) in ELM-AE. According to the assumption of zero bias (bi = 0), the output weights βAE could be simply calculated through (11). Then, the new architecture of ELM-AE is constructed based on the randomly generated parameters (A, b) and the optimal output parameters βAE. While, considering that the purpose of AE is to learn features as described above, one may utilize the optimal output weights βAE to construct a new network for feature representation, as shown in Figure 1. The final representation of the original data is then expressed as Xnew = Xβ T AE (15) where, Xnew represents the newly learned features which can replace the original data for future analysis.
On the other hand, by setting different values of L, we can see from (15) that ELM-AE can project the input data into a higher (L > m), equal (L = m) or lower (L < m) dimension space of Xnew. Especially, if L < m, ELM-AE also can be utilized for dimension reduction analysis such as PCA. It is evident from Figure 1 that there is a single hidden layer in ELM-AE, which has randomly generated weights and biases for encoding. Therefore, the hidden outputs (encoder outputs) of a given data x can be expressed as To improve the generalization performance of ELM-AE, these randomly generated parameters A and b are usually chosen to be orthogonal, Via these orthogonal random parameters, the Euclidean information of input data is retained by ELM-AE, as described in Johnson-Lindenstrauss Lemma [34].
Then, as the description of AE, one can re-represent the original feature space through the decoders of ELM-AE. As ELM is a universal approximator, the output layer (decoder) of ELM-AE can be utilized to approximate any given function. According to the description above, the objective of ELM-AE decoder is to retain the information of input features as more as possible, i.e., approximating the original input, namely T = X. Therefore, the objective function in (11) can be expressed as: (14) where, β AE is the output weights; H is the hidden layer matrix consisting of h(x) in ELM-AE. According to the assumption of zero bias (b i = 0), the output weights β AE could be simply calculated through (11). Then, the new architecture of ELM-AE is constructed based on the randomly generated parameters (A, b) and the optimal output parameters β AE .
While, considering that the purpose of AE is to learn features as described above, one may utilize the optimal output weights β AE to construct a new network for feature representation, as shown in Figure 1. The final representation of the original data is then expressed as (15) where, X new represents the newly learned features which can replace the original data for future analysis.
On the other hand, by setting different values of L, we can see from (15) that ELM-AE can project the input data into a higher (L > m), equal (L = m) or lower (L < m) dimension space of X new . Especially, if L < m, ELM-AE also can be utilized for dimension reduction analysis such as PCA.

ELM-AE-R for Quantitative Analysis of IR Spectra
According to the description above, by capitalizing on the advantages of ELM-AE and its pronounced feature learning and dimension reduction ability, one can also resemble the utility of PCA in spectroscopic gas sensing. As the ELM architecture offers fast computation speeds for large data sets, as well as linear and nonlinear learning abilities, we propose herein a new ELM-based model for quantitative IR spectra analysis, as shown in Figure 2.

ELM-AE-R for Quantitative Analysis of IR Spectra
According to the description above, by capitalizing on the advantages of ELM-AE and its pronounced feature learning and dimension reduction ability, one can also resemble the utility of PCA in spectroscopic gas sensing. As the ELM architecture offers fast computation speeds for large data sets, as well as linear and nonlinear learning abilities, we propose herein a new ELM-based model for quantitative IR spectra analysis, as shown in Figure 2. In Figure 2 it is illustrated that the framework of the proposed quantitative analysis contains two parts: feature selection and regression. In the first part, we propose to utilize ELM-AE for feature selection given the complexity of IR spectra across a broad wavelength regime, especially in highresolution laser spectroscopies. This situation requires dimensionality reduction for input data facilitated by a feature selection process. Compared with conventional spectra analysis using PCA for dimension reduction, the proposed ELM-AE can not only realize dimension reduction while satisfying L < m, but simultaneously learns features within the original data matrix. Furthermore, for achieving high performance at calculating the concentration of gas components, some modifications of the generic ELM-AE are considered in this study. One modification targets the selection of the active function in hidden layers of ELM-AE. PCR and PLSR both perform well in a linear data space. Therefore, it is worthwhile also in ELM-AE using linear functions for the active function g(*). The other modification concerns the input parameters A. Different from generic ELM-AE using randomly generated parameters, during the present study the parameter matrix A was generated in a supervised way following: † ( ); where X and Y are the original spectral data and concentration matrix, † X Y reflects the correlation between input and output, and Rm is randomly generated matrix. By initializing the parameter matrix A via (16), the finally learned features in ELM-AE offer substantial self-learning abilities in PCR, and target-learning abilities in PLSR. Finally, as described in Figure 2 the second part was to realize the regression analysis. Considering that ELM may equally well perform regressions, the ELM architecture was directly adapted also to regression analysis.

Generation of Simulated Data
To study the performance of the proposed approach in calculating gas concentrations, in a first step simulated spectral datasets were used. In this study, three gas components-N2O, NO2 and NO-were targeted for quantitative analysis. To obtain simulated datasets, pure gas spectra were In Figure 2 it is illustrated that the framework of the proposed quantitative analysis contains two parts: feature selection and regression. In the first part, we propose to utilize ELM-AE for feature selection given the complexity of IR spectra across a broad wavelength regime, especially in high-resolution laser spectroscopies. This situation requires dimensionality reduction for input data facilitated by a feature selection process. Compared with conventional spectra analysis using PCA for dimension reduction, the proposed ELM-AE can not only realize dimension reduction while satisfying L < m, but simultaneously learns features within the original data matrix. Furthermore, for achieving high performance at calculating the concentration of gas components, some modifications of the generic ELM-AE are considered in this study. One modification targets the selection of the active function in hidden layers of ELM-AE. PCR and PLSR both perform well in a linear data space. Therefore, it is worthwhile also in ELM-AE using linear functions for the active function g(*). The other modification concerns the input parameters A. Different from generic ELM-AE using randomly generated parameters, during the present study the parameter matrix A was generated in a supervised way following: where X and Y are the original spectral data and concentration matrix, X † Y reflects the correlation between input and output, and R m is randomly generated matrix. By initializing the parameter matrix A via (16), the finally learned features in ELM-AE offer substantial self-learning abilities in PCR, and target-learning abilities in PLSR. Finally, as described in Figure 2 the second part was to realize the regression analysis. Considering that ELM may equally well perform regressions, the ELM architecture was directly adapted also to regression analysis.

Generation of Simulated Data
To study the performance of the proposed approach in calculating gas concentrations, in a first step simulated spectral datasets were used. In this study, three gas components-N 2 O, NO 2 and NO-were targeted for quantitative analysis. To obtain simulated datasets, pure gas spectra were calculated based on the HITRAN Database [35]. Then, a simulated spectrum of a mixture of gases was generated by adding pure spectra of the constituents with different multiplication factors.
Considering the standard spectra in HITRAN are calculated per mol, the concentration of gas components in the simulated datasets are also expressed by the number of molecules. By assuming the wavelength range 0-4000 cm −1 as the spectral range of interest at a spectral resolution of 1 cm −1 , 60 simulated mixture sample spectra of N 2 O/NO 2 /NO were generated serving as the training dataset. In order to make these training samples discriminative, the concentration of three components were set in the range from 10 mol to 90 mol in increments of 20 mol; all three components had therefore different concentrations in any given mixture sample (Table 1 and Figure 3). calculated based on the HITRAN Database [35]. Then, a simulated spectrum of a mixture of gases was generated by adding pure spectra of the constituents with different multiplication factors.
Considering the standard spectra in HITRAN are calculated per mol, the concentration of gas components in the simulated datasets are also expressed by the number of molecules. By assuming the wavelength range 0-4000 cm −1 as the spectral range of interest at a spectral resolution of 1 cm −1 , 60 simulated mixture sample spectra of N2O/NO2/NO were generated serving as the training dataset. In order to make these training samples discriminative, the concentration of three components were set in the range from 10 mol to 90 mol in increments of 20 mol; all three components had therefore different concentrations in any given mixture sample (Table 1 and Figure 3).   Table 1 summarizes the concentration of gas components N2O/NO2/NO in the training mixture samples, while Figure 3 shows selected simulated spectra (i.e., six selected examples) from the training dataset.

Analysis on Simulated Data
To calculate the concentration of the gas components, one needs to build regression models. Here, three models (PCR, PLSR, and the proposed ELM-AE-R) were compared. First, the simulated datasets were considered for evaluating the feature selection process along with dimensionality reduction prior to the regression analysis. Figure 4 shows the feature loadings of the three investigated regression models. For PCR and PLSR, the loadings were the principle components. For ELM-AE-R, the loadings were the learned  Table 1 summarizes the concentration of gas components N 2 O/NO 2 /NO in the training mixture samples, while Figure 3 shows selected simulated spectra (i.e., six selected examples) from the training dataset.

Analysis on Simulated Data
To calculate the concentration of the gas components, one needs to build regression models. Here, three models (PCR, PLSR, and the proposed ELM-AE-R) were compared. First, the simulated datasets were considered for evaluating the feature selection process along with dimensionality reduction prior to the regression analysis.  Figure 4 shows the feature loadings of the three investigated regression models. For PCR and PLSR, the loadings were the principle components. For ELM-AE-R, the loadings were the learned feature vectors. Based on these feature loadings, latent variables of spectral signals could be calculated. Then, three regression models were constructed according to the description in Section 2.3.
Sensors 2020, 20, x FOR PEER REVIEW 9 of 20 feature vectors. Based on these feature loadings, latent variables of spectral signals could be calculated. Then, three regression models were constructed according to the description in Section 2.3. To discuss the performance of the constructed models, 40 samples of NO/NO2/N2O mixtures with random concentrations were separately generated based on standard spectra in HITRAN. The results of the regression analysis on predicting the concentrations of the gas components in these quasi unknown samples are shown in Figure 5. It is immediately evident that the data points fall on the red line indicating ideal prediction. These ideal results are expected, as simulated data are free from noise or interferences. Consequently, the performance of the three models was also identical. To discuss the performance of the constructed models, 40 samples of NO/NO 2 /N 2 O mixtures with random concentrations were separately generated based on standard spectra in HITRAN. The results of the regression analysis on predicting the concentrations of the gas components in these quasi unknown samples are shown in Figure 5. It is immediately evident that the data points fall on the red line indicating ideal prediction. These ideal results are expected, as simulated data are free from noise or interferences. Consequently, the performance of the three models was also identical. Sensors 2020, 20, x FOR PEER REVIEW 10 of 20 Figure 5. Results of the regression analysis for simulated quasi unknown spectra.

Actual Spectra Collection and Processing
To collect real spectra, Fourier transform infrared (FTIR) spectroscopy was used in combination with substrate-integrated hollow waveguide (iHWG) technology simultaneously serving as highly efficient gas cell [36][37][38]. Compared with other sensor technologies such as electrochemical and semiconductor-based devices, IR spectroscopy/sensing enables monitoring multiple gas components even in complex mixtures. In essence, IR techniques operating in the 3-15 µm (i.e., mid-infrared) wavelength band are capable of distinguishing polyatomic and hetero-nuclear diatomic molecules providing a unique "fingerprint" for each component within mixture IR spectra [39], as shown herein for the absorption spectra of mixtures of N2O/NO2/NO. Using the IR sensing configuration shown in Figure 6, spectral data of 356 N2O/NO2/NO mixtures were collected across a wide variety of concentrations. Figure 7 shows selected exemplary spectra.

Actual Spectra Collection and Processing
To collect real spectra, Fourier transform infrared (FTIR) spectroscopy was used in combination with substrate-integrated hollow waveguide (iHWG) technology simultaneously serving as highly efficient gas cell [36][37][38]. Compared with other sensor technologies such as electrochemical and semi-conductor-based devices, IR spectroscopy/sensing enables monitoring multiple gas components even in complex mixtures. In essence, IR techniques operating in the 3-15 µm (i.e., mid-infrared) wavelength band are capable of distinguishing polyatomic and hetero-nuclear diatomic molecules providing a unique "fingerprint" for each component within mixture IR spectra [39], as shown herein for the absorption spectra of mixtures of N 2 O/NO 2 /NO. Using the IR sensing configuration shown in Figure 6, spectral data of 356 N 2 O/NO2/NO mixtures were collected across a wide variety of concentrations. Figure 7 shows selected exemplary spectra.
wavelength band are capable of distinguishing polyatomic and hetero-nuclear diatomic molecules providing a unique "fingerprint" for each component within mixture IR spectra [39], as shown herein for the absorption spectra of mixtures of N2O/NO2/NO. Using the IR sensing configuration shown in Figure 6, spectral data of 356 N2O/NO2/NO mixtures were collected across a wide variety of concentrations. Figure 7 shows selected exemplary spectra.  The collected wavelength range was 1000-4000 cm −1 . It is evident from Figure 7 that the raw IR spectra are affected by several parameters including, e.g., baseline drifts, background signals, noise, molecular interferants such as CO2, etc. Therefore, data pre-processing is required to obtain useful input data for the regression analysis including baseline correction. In this study, asymmetric least squares smoothing (ALS) was applied [40]. ALS aims at obtaining a smooth baseline, which follows the main baseline trend of the original spectrum. The objective function of ALS is defined as: where y is the original spectra signal; yb is the calculated baseline; n is the number of spectral elements; αi is the weight for the ith point in spectrum; and λ is a balance factor, whose values generally are set as 10 2 < λ < 10 9 . By minimizing the objective function in Equation (17), one may extract a useful baseline for correction, as shown in Figure 8. The collected wavelength range was 1000-4000 cm −1 . It is evident from Figure 7 that the raw IR spectra are affected by several parameters including, e.g., baseline drifts, background signals, noise, molecular interferants such as CO 2 , etc. Therefore, data pre-processing is required to obtain useful input data for the regression analysis including baseline correction. In this study, asymmetric least squares smoothing (ALS) was applied [40]. ALS aims at obtaining a smooth baseline, which follows the main baseline trend of the original spectrum. The objective function of ALS is defined as: (17) where y is the original spectra signal; y b is the calculated baseline; n is the number of spectral elements; α i is the weight for the ith point in spectrum; and λ is a balance factor, whose values generally are set as 10 2 < λ < 10 9 . By minimizing the objective function in Equation (17), one may extract a useful baseline for correction, as shown in Figure 8.  Figure 8 shows IR spectra of an exemplary dataset for a mixture of N2O (30 ppm), NO2 (100 ppm), and NO (600 ppm) before and after baseline correction. All collected spectra were then processed by ALS prior to the regression analysis.

Regression Analysis and Concentration Prediction of Measurd Spectra
To construct a regression model analysing the concentration of gas components, the dataset was divided into a training dataset for modelling, and a test dataset for evaluation, i.e., 189 and 167 samples, respectively. Feature selection and dimension reduction were implemented before modelling to reduce computational cost. Again, the performance of PCR, PLSR, and the proposed ELM-AE-R were compared. Figure 9 depicts the feature loadings of the three regression models. Three principle components were selected in PCR and PLSR, while the number of nodes in the hidden layer of ELM-AE was also set as three. Based on the feature loadings evident in Figure 9, latent variables for training data were calculated, and three regression models were established. The performance of predicting the concentration of the three gas components in the test data set is shown in Figures 10 and 11.   Figure 8 shows IR spectra of an exemplary dataset for a mixture of N 2 O (30 ppm), NO 2 (100 ppm), and NO (600 ppm) before and after baseline correction. All collected spectra were then processed by ALS prior to the regression analysis.

Regression Analysis and Concentration Prediction of Measurd Spectra
To construct a regression model analysing the concentration of gas components, the dataset was divided into a training dataset for modelling, and a test dataset for evaluation, i.e., 189 and 167 samples, respectively. Feature selection and dimension reduction were implemented before modelling to reduce computational cost. Again, the performance of PCR, PLSR, and the proposed ELM-AE-R were compared. Figure 9 depicts the feature loadings of the three regression models. Three principle components were selected in PCR and PLSR, while the number of nodes in the hidden layer of ELM-AE was also set as three. Based on the feature loadings evident in Figure 9, latent variables for training data were calculated, and three regression models were established. The performance of predicting the concentration of the three gas components in the test data set is shown in Figures 10 and 11. modelling to reduce computational cost. Again, the performance of PCR, PLSR, and the proposed ELM-AE-R were compared. Figure 9 depicts the feature loadings of the three regression models. Three principle components were selected in PCR and PLSR, while the number of nodes in the hidden layer of ELM-AE was also set as three. Based on the feature loadings evident in Figure 9, latent variables for training data were calculated, and three regression models were established. The performance of predicting the concentration of the three gas components in the test data set is shown in Figures 10 and 11.      Figure 11. Performance of the regression analysis on the testing data set. Figures 10 and 11 illustrate the performance of predicting concentrations for the training and the test dataset, respectively. The diagonal red line represents an ideal prediction; conversely, points located close to the diagonal line indicate better performance. It is evident that all models perform better on predicting the concentration of N2O and NO2 vs. NO. In order to quantitatively discuss the performance of these models, RMSE and R 2 were calculated and summarized in Table 2.  Table 2, determining the best performing model is not immediately evident. To analyse the relative performance of the proposed ELM-AE-R, the improvement coefficient [41] was calculated as a percentage vs. PCR and PLSR serving as references, respectively. The improvement coefficient of RMSE is defined as: where I represents the improvement coefficient. For I > 0, the ELM-AE-R outperforms the reference model; if I < 0, the ELM-AE-R is worse than PCR or PLSR. For R 2 , the improvement coefficient could Figure 11. Performance of the regression analysis on the testing data set. Figures 10 and 11 illustrate the performance of predicting concentrations for the training and the test dataset, respectively. The diagonal red line represents an ideal prediction; conversely, points located close to the diagonal line indicate better performance. It is evident that all models perform better on predicting the concentration of N 2 O and NO 2 vs. NO. In order to quantitatively discuss the performance of these models, RMSE and R 2 were calculated and summarized in Table 2. From the results in Table 2, determining the best performing model is not immediately evident. To analyse the relative performance of the proposed ELM-AE-R, the improvement coefficient [41] was calculated as a percentage vs. PCR and PLSR serving as references, respectively. The improvement coefficient of RMSE is defined as: where I represents the improvement coefficient. For I > 0, the ELM-AE-R outperforms the reference model; if I < 0, the ELM-AE-R is worse than PCR or PLSR. For R 2 , the improvement coefficient could be determined by the difference vs. the reference model, which results in the degree of improvement of ELM-AE-R vs. PCR and PLSR as summarized in Table 3. The results in Table 3 show that PLSR performs best on the training data, however, the proposed ELM-AE-R outperformed both PCR and PLSR on the test dataset, which corresponds to the real-world scenario of an unknown sample containing the three components. Moreover, the improvement coefficients of ELM-AE-R vs. PCR were larger than of ELM-AE-R vs. PLSR implying that ELM-AE-R performed best, while PLSR performed still better than PCR.

Improvement Analysis
The regression analysis discussed in Section 4.4 is not perfect, as using only three principal components (PCs) may lead to a loss in information. In order to improve the performance, in a next step more principle components were extracted and the number of PCs used for modelling was optimized.
In Figure 12a, the contribution of an increasing number of PCs in PCA is shown. In Figure 12b, the average regression error of the three models with increasing number of PCs is shown. While RMSEs of predicting different gas components have different magnitudes, the average of these RMSEs directly will hide the influence of good prediction models, e.g., that of N 2 O herein. Therefore, we propose to use MAPE to calculate the average regression error, which keeps the same variance trend as RMSE. According to results in Figure 12b, one may derive that predictive errors in PCR decreased with the number of PCs, yet remained constant beyond eight PCs. PLSR showed the best nominal performance if 17 PCs were selected. The proposed ELM-AE-R achieved the smallest regression error using around 11 PCs. It is again obvious that ELM-AE-R outperformed PCR and PLSR in most cases, and that PLSR outperformed PCR. Considering less PCs (not enough features) and more PCs (may introduce noise) were not suitable in modelling; using 11 PCs for modelling the target analyzes appeared most suitable. The corresponding results of the regression analysis are shown in Figures 13 and 14.
Sensors 2020, 20, x FOR PEER REVIEW 15 of 20 be determined by the difference vs. the reference model, which results in the degree of improvement of ELM-AE-R vs. PCR and PLSR as summarized in Table 3. The results in Table 3 show that PLSR performs best on the training data, however, the proposed ELM-AE-R outperformed both PCR and PLSR on the test dataset, which corresponds to the realworld scenario of an unknown sample containing the three components. Moreover, the improvement coefficients of ELM-AE-R vs. PCR were larger than of ELM-AE-R vs. PLSR implying that ELM-AE-R performed best, while PLSR performed still better than PCR.

Improvement Analysis
The regression analysis discussed in Section 4.4 is not perfect, as using only three principal components (PCs) may lead to a loss in information. In order to improve the performance, in a next step more principle components were extracted and the number of PCs used for modelling was optimized.
In Figure 12a, the contribution of an increasing number of PCs in PCA is shown. In Figure 12b, the average regression error of the three models with increasing number of PCs is shown. While RMSEs of predicting different gas components have different magnitudes, the average of these RMSEs directly will hide the influence of good prediction models, e.g., that of N2O herein. Therefore, we propose to use MAPE to calculate the average regression error, which keeps the same variance trend as RMSE. According to results in Figure 12b, one may derive that predictive errors in PCR decreased with the number of PCs, yet remained constant beyond eight PCs. PLSR showed the best nominal performance if 17 PCs were selected. The proposed ELM-AE-R achieved the smallest regression error using around 11 PCs. It is again obvious that ELM-AE-R outperformed PCR and PLSR in most cases, and that PLSR outperformed PCR. Considering less PCs (not enough features) and more PCs (may introduce noise) were not suitable in modelling; using 11 PCs for modelling the target analyzes appeared most suitable. The corresponding results of the regression analysis are shown in Figures 13 and 14.    Figures 13 and 14 show the performance of the three models for predicting the concentration of the three gas components. Evidently, all models perform better on training data and test data vs. using only three PCs (cf. Figures 10 and 11). The performance values are summarized in Table 4.   Figures 13 and 14 show the performance of the three models for predicting the concentration of the three gas components. Evidently, all models perform better on training data and test data vs. using only three PCs (cf. Figures 10 and 11). The performance values are summarized in Table 4.  Figures 13 and 14 show the performance of the three models for predicting the concentration of the three gas components. Evidently, all models perform better on training data and test data vs. using only three PCs (cf. Figures 10 and 11). The performance values are summarized in Table 4. When using 11 PCs, PCR performed well when predicting the concentration of N 2 O. ELM-AE-R achieved excellent performance on the test dataset, while PLSR had advantages on the training process, yet, remained less robust vs. ELM-AE-R when evaluating test data. To comprehensively analyse the performance of the proposed ELM-AE-R model, averages of the relative improvement coefficients were again calculated by taking PCR and PLSR as reference respectively. On the training dataset, the average improvement coefficients of ELM-AE-R compared to PCR and PLSR on RMSE were 15.10% and −10.88%, and on R 2 0.39% and −0.02%. On the test dataset, ELM-AE-R outperformed PCR and PLSR at 21.16% and 17.45% on RMSE, respectively, and at 2.87% and 2.04% on R 2 . These results illustrated that the proposed ELM-AE-R indeed achieves a better overall performance vs. PCR and PLSR for quantitative IR spectral data analysis, and represents an excellent alternative vs. conventional multivariate data evaluation techniques in complex gas sensing scenarios.

Conclusions
In this study, an innovative ELM-based regression model is proposed for the quantitative analysis of infrared spectra obtained via sensing gas mixtures. An ELM-based autoencoder has been applied for feature selection. Compared with conventional feature selection methods based on PCA, ELM-AE achieves both dimension reduction and simultaneous feature learning abilities. Then, by using the reduced features from ELM-AE, an ELM-based regression model was established and tested using simulated IR spectra as well as experimentally obtained data for a mixture of three gases -N 2 O, NO 2 , and NO, respectively. The proposed ELM-AE-R has demonstrated good comprehensive performance with particular benefit of improved robustness when predicting concentrations of the three target gas components. When compared with PCR using PCA for dimension reduction, both PLSR and the proposed ELM-AE-R learned dimensionality-reduced features via supervised learning toward to the target, so they can achieve better performance than PCR. On the other hand, ELM-AE-R have the learning ability of generating quantities of potential features, but PLSR cannot, therefore the proposed model could be robust to reach the best regression accuracy in all models.
However, besides the above contributions achieved in this paper, some potential issues are worth for studying. For example, how and where to apply the proposed model. Making use of gas sensing technologies could benefit a lot to our industries and society, e.g., applying the research in this paper for measurement of vehicle exhaust. Moreover, from the perspective of algorithms, how to improve model's stability is also important, since random projection would be hidden bugs weakening the prediction performance. Therefore, more work could be executed in our following study.