Pm2.5 Time Series Imputation with Deep Learning and Interpolation

: Commonly, regression for time series imputation has been implemented directly through regression models, statistical, machine learning, and deep learning techniques. In this work, a novel approach is proposed based on a classiﬁcation model that determines the NA value class, and from this, two types of interpolations are implemented: polynomial or ﬂipped polynomial. An hourly pm2.5 time series from Ilo City in southern Peru was chosen as a study case. The results obtained show that for gaps of one NA value, the proposal in most cases presents superior results to techniques such as ARIMA, LSTM, BiLSTM, GRU, and BiGRU; thus, on average, in terms of R 2 , the proposal exceeds implemented benchmark models by between 2.4341% and 19.96%. Finally, supported by the results, it can be stated that the proposal constitutes a good alternative for short-gaps imputation in pm2.5 time series.


Introduction
Time series forecasting is one of the most active research topics [1] in machine learning and deep learning; it is used in different domains such as biology, finance, meteorology, and social sciences [2], among others.
Missing values in time series is a common problem [2], and many machine and deep learning techniques do not work with missing data [3], so time series imputation becomes a very important task.It is also very important to mention that the quality of data for time series forecasting is very important [4], hence the importance of a good time series imputation process.
In this work, a novel approach for time series imputation is proposed, which combines a math technique with a deep learning one.The imputation problem is approached as a classification problem, and the proposal uses a classification model to determine the class of NA value, and from the class, an interpolation technique is implemented.The main motivation stems from the analysis of different short gaps of NA values in time series, where it was possible to identify two classes.The first, according to Figure 1a, comprises the NA values whose real values fit better to a polynomial interpolation.The second, according to Figure 1b, comprises the NA values whose real values fit better to a flipped polynomial interpolation.As can be seen in Figure 1, the NA value can be located above or below the The study case selected to validate the novel approach corresponds to hourly pm2.5 time series obtained from an environmental monitoring station in Ilo City in southern Peru.The study case was chosen due to the importance of the analysis of the pm2.5 time series since, as is known, long-term exposure to pm2.5 can cause diverse health issues, including heart disease [5], lung cancer [6], chronic obstructive pulmonary disease [7], lower respiratory infections (LRIs) [8], ischemic stroke [9], diabetes mellitus [10], and others.However, the pm2.5 time series presents NA values due to multiple factors such as equipment failure [11], power outages, etc., as occurs with the data from the monitoring station chosen.Therefore, the importance of implementing imputation techniques in order to recover lost data and made it available for the implementation of forecasting models.
The main contributions of this work are cited next: -A novel approach for pm2.5 time series imputation based on deep learning classification; -An ensemble deep learning model for NA values classification; -A comparative analysis between the proposal approach versus benchmark models.
The main limitations of this work are as follows: It is only focused on the analysis of short gaps, gives NA sequences of a single value in pm2.5 time series, and the interpolations are performed considering only four values: p1, p0, n0, and n1.The study case selected to validate the novel approach corresponds to hourly pm2.5 time series obtained from an environmental monitoring station in Ilo City in southern Peru.The study case was chosen due to the importance of the analysis of the pm2.5 time series since, as is known, long-term exposure to pm2.5 can cause diverse health issues, including heart disease [5], lung cancer [6], chronic obstructive pulmonary disease [7], lower respiratory infections (LRIs) [8], ischemic stroke [9], diabetes mellitus [10], and others.However, the pm2.5 time series presents NA values due to multiple factors such as equipment failure [11], power outages, etc., as occurs with the data from the monitoring station chosen.Therefore, the importance of implementing imputation techniques in order to recover lost data and made it available for the implementation of forecasting models.
The main contributions of this work are cited next: -A novel approach for pm2.5 time series imputation based on deep learning classification; -An ensemble deep learning model for NA values classification; -A comparative analysis between the proposal approach versus benchmark models.
The main limitations of this work are as follows: It is only focused on the analysis of short gaps, gives NA sequences of a single value in pm2.5 time series, and the interpolations are performed considering only four values: p1, p0, n0, and n1.
The rest of the paper is organized in Section 2 that briefly describes the work performed for pm2.5 time series imputation; Section 3, where the complete process for implementation of the proposal is described; and Section 4, where the achieved results are described and discussed according to the pros and cons of proposal results.The paper ends with Section 5.

Related Work
Some works for pm2.5 time series imputation where the use of machine learning and deep learning techniques were proposed are listed below in chronological order.Xiao et al., 2018 [16] proposed an ensemble model for pm2.5 time series imputation.The proposal presents superior results to models such as random forest (RF), extreme gradient boosting (XGBoosting), and the generalized additive model (GAM).Yuan et al., 2018 [17] proposed the RNN long short-term memory (LSTM) for the imputation process and compared this technique with two classic techniques: moving average and mean imputation.The results show the enormous superiority of LSTM over the other two techniques.
Belachsen et al., 2022 [18] proposed a multivariate KNN technique for half-hourly pm2.5 time series reaching an NMAE between 0.21 and 0.26.Saif-ul-Allah et al., 2022 [19] proposed the recurrent neural network known as GRU, reaching an RMSE of 10.60 ug/m 3 and surpassing other models such as SVM, LSTM, and BiLSTM.Alkabbani et al., 2022 [20] proposed a multivariate random forest model for pm2.5 time series imputation, and the results were compared only with linear interpolation, showing that random forest achieves the best RMSE (3.756 ug/m 3 ).Yldz et al., 2022 [21] proposed a transformer model for multivariate time series imputation; they experimented with air quality (pm2.5) and healthcare time series.For pm2.5, they worked with hourly data of 12 months, obtaining a MAE = 8.31 ug/m 3 that exceeded those obtained by benchmark models such as BRITS, RDIS, V-RIN, etc. Lee et al., 2023 [22] proposed four machine learning models, namely GAIN, FCDNN, random forest, and KNN, and the best results were obtained with the GAIN model with R 2 = 0.895.
Reviewing other related works for time series imputation, in general, it can be noted that these address techniques ranging from moving averages: simple moving average (SMA), linear weighted moving average (LWMA), exponential weighted moving average (EWMA), and interpolation techniques (linear, spline, and Stinneman), all generally implemented in the imputeTS library [2,18] of R. On the other hand, there were also deep learning-based ones, including LSTM [23,24], GRU [25], and GAN [26] and some variants such as BRITS [27], GRU-D [28], and B-CIPIT [29].
The main difference between the related works and the proposal is that these works to estimate NA values generally apply a regression model in some cases associated with a time series transformation.In our model, at the beginning, NA features were extracted and labeled; then, features were normalized; next, a time series classification task with deep learning models was performed, followed by the implementation of an ensemble model; finally the class of an NA value was estimated, and from this, the proper polynomial interpolation was performed to obtain the estimated NA value.Figure 2

Materials and Methods
The methodology used for the implementation of proposal approach is summarized in Figure 3.

Materials and Methods
The methodology used for the implementation of proposal approach is summarized in Figure 3.

Materials and Methods
The methodology used for the implementation of proposal approach is summarized in Figure 3.

Data Preparation
The hourly data used for this work were downloaded from OEFA's server located at http://fiscamb.oefa.gob.pe/vig_amb/accessed on 2 May 2023, and they oscillate between 1 August 2020 and 30 April 2023, corresponding to the Pacocha environmental monitoring station located at Ilo Province and Pacocha District in southern Peru.The downloaded data present several missing values, so the days that presented some missing values were discarded.After this process, the total available records numbered 56,424.As it is a regression problem, the dataset was just split into two sets (training and test) [30].For training, 48,792 records were used (80%), and 7632 records were used for testing (20%).Figure 4 shows a sample of 1000 h of the selected time series.Data normalization was performed for extracted features during the implementation of classification models; it is described in Section 3.2.3.

Data Preparation
The hourly data used for this work were downloaded from OEFA's server located at http://fiscamb.oefa.gob.pe/vig_amb/accessed on 2 May 2023, and they oscillate between 1 August 2020 and 30 April 2023, corresponding to the Pacocha environmental monitoring station located at Ilo Province and Pacocha District in southern Peru.The downloaded data present several missing values, so the days that presented some missing values were discarded.After this process, the total available records numbered 56,424.As it is a regression problem, the dataset was just split into two sets (training and test) [30].For training, 48,792 records were used (80%), and 7632 records were used for testing (20%).Figure 4 shows a sample of 1000 h of the selected time series.Data normalization was performed for extracted features during the implementation of classification models; it is described in Section 3.2.3.

Implementation of Classification Models
For classification models, the dataset was split into three sets: training, validation, and test.For the proposal, from 48,792 records corresponding to the training data, 70% was used for training, 10% for validation, and the remaining 20% was used for testing the classification models.
Therefore, at this stage, the first task we performed corresponds to feature extraction and labeling of NA values, which is described below.

Feature Extraction and Labelling
The features that were extracted from the time series are described in Table 1.The motivation to use the described features for NA values classification was the method by which moving averages techniques work: they use a parameter k that corresponds to the window size.The window size tells the technique how many values before and after the NA value should be used for the corresponding estimation; thus, p2, p1, p0, n0, n1, and n2 were obtained considering a window size k = 3.For future work, higher values could be used.The rest of the features were derived or synthetic; that is, they were generated to provide more information for model training.

Implementation of Classification Models
For classification models, the dataset was split into three sets: training, validation, and test.For the proposal, from 48,792 records corresponding to the training data, 70% was used for training, 10% for validation, and the remaining 20% was used for testing the classification models.
Therefore, at this stage, the first task we performed corresponds to feature extraction and labeling of NA values, which is described below.

Feature Extraction and Labelling
The features that were extracted from the time series are described in Table 1.Given a t time series, the features shown in Table 1 were extracted using the algorithm shown in Figure 5.
The algorithm shown in Figure 5 received as an argument a time series t, which was traversed from beginning to end.As can be seen, the feature extraction was quite simple: p2, p1, p0, n0, n2, and n2 were estimated from the position of the NA value; mid, mid1, and mid2 are averages of p1, p0, n0, and n1; diff, slope1, and slope2 are the differences between p1, p0, n0, and n1.However, the estimation of the class or label is somewhat complex, and it requires that certain conditions be met; these conditions are detailed in Table 2.

Feature Selection
For this task, a correlation matrix was implemented, which can be seen in Figure 6.
and after the NA value should be used for the corresponding estimation; thus, p2, p1, n0, n1, and n2 were obtained considering a window size k = 3.For future work, hig values could be used.The rest of the features were derived or synthetic; that is, they w generated to provide more information for model training.
Given a t time series, the features shown in Table 1 were extracted using the algorith shown in Figure 5.The algorithm shown in Figure 5 received as an argument a time series t, which w traversed from beginning to end.As can be seen, the feature extraction was quite simp p2, p1, p0, n0, n2, and n2 were estimated from the position of the NA value; mid, mi According to the last row or last column of the correlation matrix, the relationship between the input features and the target feature was not strong; thus, it would not be easy to obtain good results in terms of accuracy, precision, recall, and f1-score.In this work, it was decided to use all the input features for classification models.

(polynomial interpolation)
According to Figure 1a, the conditions to apply polynomial interpolation are two cases: Case 1: When na is below the line (p0 to n0), it can be stated through the next conditions: na ≤ mid p1 ≥ p0 Case 2: When na is above the line (p0 to n0).The two conditions to be met are: na > mid p0 > p1 1 (flipped polynomial interpolation) According to Figure 1b, the conditions to apply flipped polynomial interpolation are two: Case 1: When na is below the line (p0 to n0), it can be stated through the next conditions: na < mid p1 < p0 Case 2: When na is above the line (p0 to n0).The two conditions to be met are: na > mid p0 ≤ p1 According to the last row or last column of the correlation matrix, the relationship between the input features and the target feature was not strong; thus, it would not be easy to obtain good results in terms of accuracy, precision, recall, and f1-score.In this work, it was decided to use all the input features for classification models.

Normalization
Before the implementation of deep learning models, normalization is very important in order to ensure their faster convergence.In this study, z-score normalization was used; it can be implemented through Equation (1).
where x : normalized vector; x: original feature vector; − x: mean of the feature vector; σ: standard deviation of the feature vector.

Deep Learning Classification Models
In this stage, four deep learning models were implemented, including deep neural networks (DNN), convolutional neural networks (DNN), long short-term memory (LSTM), and gated recurrent unit (GRU), whose architectures are described in Table 3.According to Table 3, all classification models use a learning rate of 0.0001 and dropout rates of 0.1 after all layers except the output layer.The DNN presents an architecture with a greater number of layers than the other models, all with relu as the activation function with 10 neurons in the input layer; 20, 40, and 10 in the hidden layers; and 1 neuron in the output layer with sigmoid the as activation function.Regarding the CNN model, it presents its first two Conv1-type layers with relu as activation function, followed by a MaxPooling1D layer with pool_size = 2, then a dense layer of 10 neurons, and, finally, a dense layer with 1 neuron.The LSTM and GRU RNNs present identical architectures: one input layer with 30 neurons, two hidden layers with 30 neurons each, and an output layer of 1 neuron.

Evaluation
The results of the classification models based on deep learning are shown in Table 4.These are described in terms of accuracy (2), precision (3), recall (4), and f1-score (5).Once the classification models were compiled, they were evaluated in test data.The respective confusion matrices are shown in Figure 7.

Evaluation
The results of the classification models based on deep learning are shown in Table 4.These are described in terms of accuracy (2), precision (3), recall (4), and f1-score (5).
Once the classification models were compiled, they were evaluated in test data.The respective confusion matrices are shown in Figure 7. From the confusion matrices and Equations ( 2) and ( 5)-( 7), the metrics shown in Table 4 were estimated.According to Table 4 and Figure 7, it can be seen that the results of the classification models are not good, and it can be seen that the main difficulty presented by the imple- From the confusion matrices and Equations ( 2) and ( 5)-( 7), the metrics shown in Table 4 were estimated.
According to Table 4 and Figure 7, it can be seen that the results of the classification models are not good, and it can be seen that the main difficulty presented by the implemented models is the low capacity to detect true positives (NAs of class 1); this is reflected in the recall below 43% and f1-score below 48%.
In order to improve the results, the strategy of implementing an ensemble model based on average was used.For this, it was experimented by assembling all the models and combinations of three models.Figure 8 shows the respective ROC curves, and as it can be seen, the ensemble model based on DNN, CNN, and LSTM presented the best area under the curve (AUC of 0.611), which is why this ensemble model was chosen for the imputation process.In terms of accuracy, recall, precision, and f1-score, the ensemble model reached 0.5859, 0.4045, 0.5457, and 0.4647, respectively.

Imputation of NA Values
Once the classification model for NA class estimation was obtained, the pm2.5 time series imputation process using the proposal was as described below.The NA values insertion mechanism was quite simple: to achieve 19.98% of NA values, one NA value was inserted into the test data every four items.To achieve 25.00%, NA values were inserted every three items, and to achieve 33.32%, NA values were inserted every two items.In this way, for each set, it was possible to obtain different amounts and configurations for the NA values.The partial result of this process can be visualized in Figure 9.

Class estimation for NA values.
To estimate the class of each NA value in every NA set, a Python function was created.
The .h5 files correspond to DNN, CNN, and LSTM models that are part of the selected ensemble model to estimate the classes of NA values.The Python function getClass receives as the input parameter the characteristics of all NA values to be estimated, and then, it returns the classes to which they correspond.Figure 10 shows such a function.

Imputation of NA Values
Once the classification model for NA class estimation was obtained, the pm2.5 time series imputation process using the proposal was as described below.The NA values insertion mechanism was quite simple: to achieve 19.98% of NA values, one NA value was inserted into the test data every four items.To achieve 25.00%, NA values were inserted every three items, and to achieve 33.32%, NA values were inserted every two items.In this way, for each set, it was possible to obtain different amounts and configurations for the NA values.The partial result of this process can be visualized in Figure 9.

Imputation of NA Values
Once the classification model for NA class estimation was obtained, the pm2.5 time series imputation process using the proposal was as described below.The NA values insertion mechanism was quite simple: to achieve 19.98% of NA values, one NA value was inserted into the test data every four items.To achieve 25.00%, NA values were inserted every three items, and to achieve 33.32%, NA values were inserted every two items.In this way, for each set, it was possible to obtain different amounts and configurations for the NA values.The partial result of this process can be visualized in Figure 9.

Class estimation for NA values.
To estimate the class of each NA value in every NA set, a Python function was created.
The .h5 files correspond to DNN, CNN, and LSTM models that are part of the selected ensemble model to estimate the classes of NA values.The Python function getClass receives as the input parameter the characteristics of all NA values to be estimated, and then, it returns the classes to which they correspond.Figure 10 shows such a function.

Class Estimation for NA Values
To estimate the class of each NA value in every NA set, a Python function was created.The .h5 files correspond to DNN, CNN, and LSTM models that are part of the selected ensemble model to estimate the classes of NA values.The Python function getClass receives as the input parameter the characteristics of all NA values to be estimated, and then, it returns the classes to which they correspond.Figure 10 shows such a function.

Interpolation according to class estimation
For polynomial interpolation, the first step is to determine the coefficients of the polynomial function; for this, there are various techniques, including the Lagrange method, which is described below.
From ( 6), the coefficients are obtained, and the polynomial function can be implemented, and from it, any point can be estimated, in this case, the point corresponding to the NA value.The polynomial function is similar to what is shown in Equation (7).
The algorithm that implements the interpolations according to the estimated class is shown in Figure 11.

Interpolation according to Class Estimation
For polynomial interpolation, the first step is to determine the coefficients of the polynomial function; for this, there are various techniques, including the Lagrange method, which is described below.
From ( 6), the coefficients are obtained, and the polynomial function can be implemented, and from it, any point can be estimated, in this case, the point corresponding to the NA value.The polynomial function is similar to what is shown in Equation (7).
The algorithm that implements the interpolations according to the estimated class is shown in Figure 11.
According to Figure 10, the getNA procedure for NA estimation receives as parameters the NA class and the vector y that contains the four values to be used by polynomial interpolation: p1, p0, n0, and n1.The p0 and n0 are interpolated, and mid is obtained; then, the coefficients of the polynomial function are obtained for the four values in y, and for this, a polynomial procedure is used (See Figure 12).With the polynomial obtained, the value in position 1.5 is estimated, which corresponds to the NA value according to Figure 1; for this, the procedure interpolate is used (see Figure 13), and for na, this is the final result for NA values of class 0. For the case of NA values of class 1, the na value is flipped; for this, the absolute distance d between the na value and mid is determined, and depending on the location of the na value according to mid, d is subtracted or added.According to Figure 10, the getNA procedure for NA estimation receives as parame ters the NA class and the vector y that contains the four values to be used by polynomia interpolation: p1, p0, n0, and n1.The p0 and n0 are interpolated, and mid is obtained then, the coefficients of the polynomial function are obtained for the four values in y, an for this, a polynomial procedure is used (See Figure 12).With the polynomial obtained the value in position 1.5 is estimated, which corresponds to the NA value according t Figure 1; for this, the procedure interpolate is used (see Figure 13), and for na, this is th final result for NA values of class 0. For the case of NA values of class 1, the na value i flipped; for this, the absolute distance d between the na value and mid is determined, an depending on the location of the na value according to mid, d is subtracted or added.The polynomial procedure receives as parameters the number of points (points) an the vectors of values in x (Xs) and y (Ys).With these data, according to Equation ( 6), fo each point, Equation ( 8) is estimated, and from this, Equation ( 9) must be estimated.
Between lines 9 and 19 of the algorithm, Equation ( 8) is estimated, and Equation ( 9) is estimated between lines 20 and 29.The last part of the algorithm generates the coefficients of the polynomial function.The interpolate procedure algorithm receives as parameters the coefficients of the polynomial (poly) and the position of the value to be estimated (v = x = 1.5), which in this The polynomial procedure receives as parameters the number of points (points) and the vectors of values in x (Xs) and y (Ys).With these data, according to Equation ( 6), for each point, Equation ( 8) is estimated, and from this, Equation ( 9) must be estimated.
Between lines 9 and 19 of the algorithm, Equation ( 8) is estimated, and Equation ( 9) is estimated between lines 20 and 29.The last part of the algorithm generates the coefficients of the polynomial function.
The interpolate procedure algorithm receives as parameters the coefficients of the polynomial (poly) and the position of the value to be estimated (v = x = 1.5), which in this case corresponds to the position of the NA value, and from this, Equation ( 7) is implemented.

Implementation of Benchmark Models
Seven benchmark models were implemented in order to compare the proposal performance; these models include polynomial interpolation, LANN, ARIMA, long short-term memory (LSTM), bidirectional LSTM, gated recurrent unit (GRU), and bidirectional GRU.
Polynomial interpolation and ARIMA were implemented in R language using the imputeTS library.For ARIMA, the auto.arimamodel was used.
Deep regression models were implemented in Python using tensorflow 2.9.0, and the hyperparameters can be seen in Table 5.All deep regression models use Adam as the optimizer, mean standard error (mse) as the loss function, and 0.001 as learning rate.Also, the number of epochs used was 100, and the batch_size was 100, too.Finally, all layers used relu as the activation function except the output one-neuron layer, which used sigmoid as the activation function.

Evaluation
The proposal is evaluated in terms of root mean squared error (RMSE), mean absolute percentage error (MAPE), and R 2 , which are implemented through Equations ( 10)-( 12)

Results and Discussion
This section describes the results achieved by the proposal, comparing them with other models in the literature.

Results
The results achieved by the proposal are described below; likewise, these are compared with other techniques and models of the literature and the state of the art.
According to Table 6 and Figure 14a, it can be seen that, in terms of RMSE, the lowest error was reached by the proposal in one of the three NA sets; it was the best in the third NA set (33.32%) with 6.8134 ug/m 3 .For the first NA set (19.98%), the best technique was ARIMA with RMSE = 6.7654 ug/m 3 , followed by LANN with RMSE = 6.8123 ug/m 3 , and with the proposal with RMSE = 6.9148 ug/m 3 in third place.For the second NA set (25.00%), the best technique was LANN with RMSE = 6.7088 ug/m 3 , followed by the proposal with RMSE = 6.7137 ug/m 3 , and with polynomial interpolation in third place with RMSE = 6.7912 ug/m 3 .In terms of MAPE, according to Table 7 and Figure 14b, in all sets of NAs, the polynomial interpolation model presented the best results, surpassing all implemented models, including the proposal.For the first NA set (19.98%), LANN was in second place with MAPE = 21.9548, and the proposal was in third place with MAPE = 22.0901.For the second NA set, (25.00%), the proposal was in second place with MAPE = 21.0242%, and in third place was LANN with MAPE = 21.0718.For the third NA set (33.32%), LANN was in second place with MAPE = 21.1710,followed by the proposal with MAPE = 21.1758.In terms of MAPE, according to Table 7 and Figure 14b, in all sets of NAs, the polynomial interpolation model presented the best results, surpassing all implemented models, including the proposal.For the first NA set (19.98%), LANN was in second place with MAPE = 21.9548, and the proposal was in third place with MAPE = 22.0901.For the second NA set, (25.00%), the proposal was in second place with MAPE = 21.0242%, and in third place was LANN with MAPE = 21.0718.For the third NA set (33.32%), LANN was in second place with MAPE = 21.1710,followed by the proposal with MAPE = 21.1758.In terms of R 2 , according to Table 8 and Figure 14c, the proposal model outperformed all benchmark models in two of three NA sets.In the first NA set (19.98%), LANN was the best with R 2 = 0.6911, followed by the proposal and polynomial interpolation with 0.6879 and 0.6861, respectively.In the second NA set (25.00%), the proposal was the best with R 2 = 0.6941, followed by LANN and polynomial interpolation with 0.6916 and 0.6859, respectively.Finally, in the third NA set, the proposal was also the best with R 2 = 0.7072, followed by LANN and polynomial interpolation with 0.7062 and 0.7018, respectively.On average, according to the above, there is a notable difference between the proposal and the benchmarks models based on deep learning, such as LSTM, BiLSTM, GRU, and BiGRU; in terms of RMSE, the difference is between 1.6298 ug/m 3 and 2.4773 ug/m 3 ; in terms of MAPE, it is between 4.1459% and 5.8669%; and in terms of R 2 , it is between 16.3818% and 19.9618%.
However, comparing the proposal with benchmark models such as LANN, polynomial interpolation, and ARIMA, the difference is smaller.On average, in terms of RMSE, LANN alone is better than the proposal by a small 0.03527 ug/m 3 .In terms of MAPE, only polynomial interpolation and LANN are better than the proposal by 0.1239% and 0.0309%, respectively.In terms of R 2 , the proposal is better than polynomial interpolation, LANN, and ARIMA by 0.3898%, 0.0134%, and 2.4341%, respectively.
Graphically, Figure 15 shows the results of the best benchmark models and the proposal for the first 100 items of ground truth for the different sets of NAs.

Discussions
Through analysis of the results, it can be seen that the closer models to the proposal model are LANN and polynomial interpolation.Polynomial interpolation served as the basis for the proposal, as it was used to estimate the NA values of class 0. In the proposal, thanks to the implementation of the flipped polynomial interpolation for NA class 1, the proposal outperformed the polynomial interpolation benchmark on average by 0.0712 ug/m 3 and 0.3898% in terms of RMSE and R 2 , respectively.
The good results obtained in the estimation of NA values even though the classification models did not achieve good performances, according to Figure 1, prove that both

Discussions
Through analysis of the results, it can be seen that the closer models to the proposal model are LANN and polynomial interpolation.Polynomial interpolation served as the basis for the proposal, as it was used to estimate the NA values of class 0. In the proposal, thanks to the implementation of the flipped polynomial interpolation for NA class 1, the proposal outperformed the polynomial interpolation benchmark on average by 0.0712 ug/m 3 and 0.3898% in terms of RMSE and R 2 , respectively.
The good results obtained in the estimation of NA values even though the classification models did not achieve good performances, according to Figure 1, prove that both interpolations have similar estimates, so the class estimation errors do not affect them greatly.
Despite what is mentioned in the preceding paragraph, the main weakness of the proposal is the complexity of the classification task to estimate the NA class since it was not possible to identify good input features that have a higher correlation with the target feature.The classification models in most of the evaluation metrics present values below 60%; this can be improved by using a larger amount of data for the training phase and generating or obtaining better input features.A better classification rate would make it possible to identify the NA classes with greater accuracy and, from this, obtain better NA value estimations, reducing the RMSE and MAPE and increasing the R 2 coefficient.
Another aspect that should be analyzed for future work lies in the distance between the real NA value and the value that can be estimated by polynomial interpolation.If the real NA value is further from the polynomial curve, the estimation error will be larger.This constitutes a limitation of this type of technique to estimate more complex values.This type of technique is good for NA values that are close to the line between p0 and n0.For this reason, LANN produces good results, too.
Also, even though in most cases the proposal presented higher R 2 s than the benchmark ones, the R 2 coefficient has to be improved, as in two of three NA sets, the proposal presented a value below 0.7, and just in one set, it presented an R 2 above 0.7.As was mentioned in the previous case, improving the classification rate will improve the R 2 of the proposal, too.
On the other hand, considering that related works used other datasets, and each dataset presents different characteristics, the comparison is carried out only for reference.According to Table 9, it can be seen that in terms of RMSE, the proposal is only below the work [20], which obtained an RMSE of 3.756 ug/m 3 ; in terms of R 2 , the proposal with R 2 of 0.6946 is below the work [22], which reported an R 2 of 0.895; and in terms of MAE, the proposal with MAE = 3.4944 ug/m 3 exceeded the work [21] with MAE = 8.31 ug/m 3 .

Conclusions
Although the classification rate of the types or classes of NA values did not exceed 60% in most of the analyzed metrics, the estimation results of NA values obtained in terms of RMSE, MAPE, and R 2 are very promising because when compared with the results of benchmark models, the proposal managed to widely outperform the state-of-the-art models, such as LSTM, BiLSTM, GRU, BiGRU and ARIMA, demonstrating that for short gaps, the proposal is a good alternative.

Future Work
As previously mentioned, this work experimented only with short gaps-gaps of one NA value.For future work, it would be important to adapt the proposal for gaps of more than one value.Likewise, as mentioned above, the classification models only reached accuracies, recalls, precisions, and f1-scores below 60%, which indicates that there is still a wide margin for improvement; in this sense, other architectures of deep learning could be implemented, offering a greater number of layers and different configurations of neurons, among others.Also, the dataset could be enriched through data augmentation techniques for time series classification in order to generate a greater diversity of rows that could help to improve the performance of the models.Also, LANN could be exploited through the creation of a new class (2), which could contain the NA values that can be estimated with greater accuracy with linear interpolation instead of polynomial interpolation.
On the other hand, fractal theory could help to find other time series features that could improve the time series classification process.Likewise, the proposal approach of this work can also be adapted for time series of similar contexts, such as pm10, SO 2 , etc., and other different ones like meteorology, biology, finance, etc.

Figure 2 .
Figure 2. Comparison between literature approaches and proposal approach.

Figure 2 .
Figure 2. Comparison between literature approaches and proposal approach.

Figure 2 .
Figure 2. Comparison between literature approaches and proposal approach.

Figure 3 .
Figure 3. Methodology for the implementation of proposal approach.

Figure 3 .
Figure 3. Methodology for the implementation of proposal approach.

Figure 5 .
Figure 5. Feature extraction algorithm of NA values.

Figure 5 .
Figure 5. Feature extraction algorithm of NA values.

3. 3 . 1 .
Generation of NA Values in Test Data NA values were generated considering gaps of a single NA value.Three different sets of NA values were implemented: 19.98% (1525 items), 25.00% (1907 items), and 33.32% (2543 items).

Figure 9 .
Figure 9. Sets of NA values in test data.

Figure 8 .
Figure 8. ROC curves of implemented and ensemble models.

3. 3 . 1 .
Generation of NA Values in Test Data NA values were generated considering gaps of a single NA value.Three different sets of NA values were implemented: 19.98% (1525 items), 25.00% (1907 items), and 33.32% (2543 items).

3. 3 . 1 .
Generation of NA Values in Test Data NA values were generated considering gaps of a single NA value.Three different sets of NA values were implemented: 19.98% (1525 items), 25.00% (1907 items), and 33.32% (2543 items).

Figure 9 .
Figure 9. Sets of NA values in test data.

Figure 9 .
Figure 9. Sets of NA values in test data.

Figure 10 .
Figure 10.Procedure for Classes Estimation of NA values.

Figure 10 .
Figure 10.Procedure for Classes Estimation of NA values.

wheren:
Number of observed/predicted values; P i : Vector of predicted values; O i : Vector of observed values; − O: Mean of observed values.
summarizes this process.

Table 1 .
Features for NA values.
label Class of NA value.0, corresponding to polynomial interpolation 1, corresponding to flipped polynomial interpolation.

Table 1 .
Features for NA values.

Table 2 .
Conditions for NA classes.

Table 3 .
Architectures of deep learning classification models.

Table 4 .
Results of deep learning classification models.

Table 4 .
Results of deep learning classification models.

Table 5 .
Hyperparameters for deep regression models.

Table 7 .
MAPEs of implemented models.

Table 7 .
MAPEs of implemented models.

Table 8 .
R 2 s of implemented models.

Table 9 .
Results of related work models.