Next Article in Journal
Vygotsky’s Systemic Perspectives on Managing the Risk of Student Failure in Technology-Enhanced Learning Design
Previous Article in Journal
Dotsformer: Capturing Chain-Loop Structures for Transformer in Dots-and-Boxes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Reproducible QA/QC, Imputation and Robust-Series Workflow for Air-Quality Monitoring Time Series

by
Nuria Fernández Palomares
1,
Laura Álvarez de Prado
1,*,
Luis Alfonso Menéndez García
2,
David Fernández López
3,
Sandra Buján
1 and
Antonio Bernardo Sánchez
1,*
1
Department of Mining Technology, Topography and Structures, University of Leon, 24071 Leon, Spain
2
Department of Mathematics, University of Oviedo, 33007 Oviedo, Spain
3
INREMIN S.L., 24008 Leon, Spain
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2026, 16(7), 3396; https://doi.org/10.3390/app16073396
Submission received: 3 March 2026 / Revised: 26 March 2026 / Accepted: 28 March 2026 / Published: 31 March 2026
(This article belongs to the Section Environmental Sciences)

Abstract

This study develops a reproducible and auditable workflow to prepare regulatory air-quality monitoring time series for subsequent temporal analysis, including observational PRE/POST applications around coal-fired power plant closures in northwestern Spain. The dataset comprises daily concentrations from 28 monitoring stations (2006–2023) for PM10, PM2.5, NO, NO2, NOx, O3, SO2, and CO, affected by missingness, structural inconsistencies, and extreme values. The contribution of this study lies in integrating standardized data ingestion and QA/QC chained-equation imputation with Bayesian Ridge regression, hold-out validation, physicochemical consistency checks, and robust extreme-value handling within a traceable processing workflow. Missing values are reconstructed per pollutant using plant-level multi-station pooling to improve stability. Performance is evaluated using a 5% masked hold-out and summarized with MAE, RMSE, R2, and bias, complemented by an operational fit-quality label. Post-imputation controls enforce NO–NO2–NOx consistency and the physical constraint PM2.5 ≤ PM10, while extreme values are screened through a hierarchical robustness framework combining a Hampel filter, winsorization, and a Tukey IQR criterion. The workflow outputs documented diagnostics and robust daily series while preserving the traceability of observed values, flags, edits, and final decisions.

1. Introduction

Air-quality monitoring networks provide long-term time series that enable the assessment of environmental changes and support observational analyses linked to energy transitions and decarbonization processes under Spain’s National Energy and Climate Plan (PNIEC) [1,2]. In practice, however, regulatory series frequently contain missing values, structural inconsistencies (e.g., incomplete months or irregular calendars), and extreme observations associated with instrument downtime, maintenance, operational incidents, transmission failures, or genuine pollution episodes [3,4,5,6]. If these issues are not handled through transparent and consistent preprocessing, they can reduce temporal comparability, introduce bias, increase uncertainty, and weaken the robustness of downstream temporal analyses, including PRE/POST designs [7].
A broad body of the literature addresses missing-data imputation and outlier detection in environmental and air-quality time series, ranging from classical statistical approaches to more recent time-series and machine-learning methods [7,8,9,10]. However, published studies often address only part of the preprocessing problem—for example, focusing exclusively on imputation or exclusively on extreme values—or rely on procedures that are difficult to reproduce, audit, and maintain when scaled to heterogeneous regulatory networks [6,7,11,12]; recent work has also emphasized the importance of traceable and reproducible data streams in environmental science [13]. In observational settings, this makes three requirements particularly relevant: first, ensuring traceability and data governance through explicit QA/QC procedures and documented validation within the European monitoring framework [14,15,16]; second, quantitatively assessing imputation performance using masking schemes and widely used error/agreement metrics to avoid overinterpretation of fit [17,18,19]; and, third, preserving physicochemical and operational consistency among related variables, including internal consistency within the NO–NO2–NOx system and physical constraints such as PM2.5 ≤ PM10, to avoid implausible values after imputation or extreme-value treatment [20,21].
This article presents and validates a reproducible procedure to transform daily series from regulatory networks into datasets ready for temporal analysis. The contribution of the study lies not in proposing a new imputation algorithm or a new outlier detector, but in integrating the required preprocessing stages within a coherent, reproducible, and auditable workflow. The workflow integrates (i) quality assurance and quality control (QA/QC) checks and structural harmonization through systematic cleaning and verification processes, in line with European frameworks for the validation, comparability, and reporting of air quality data [15,16], complemented by applied evidence on incident detection and anomalous behavior in monitoring networks [6]; (ii) multivariate missing-data reconstruction using an iterative chained-equation imputation scheme with Bayesian Ridge regression, implemented within a MICE-based framework [22,23,24,25,26]; (iii) hold-out validation using MAE, RMSE, R2, and bias to quantify reconstruction performance and reduce biased interpretations of fit [7,19]; (iv) post-imputation controls based on physicochemical consistency among related variables, including the NO–NO2–NOx system and the physical constraint PM2.5 ≤ PM10 [20,21]; and (v) robust handling of extreme values and generation of a final daily series with traceable decisions for retention, adjustment, or removal, supported by criteria commonly used in environmental data processing [12,27,28].
The procedure is applied to a case study focused on the areas of influence for coal-fired power plants in northwest Spain, within a context of energy transition and the progressive cessation of coal-based electricity generation [1,2]. This study adopts a PRE/POST observational design, supported by the ability of long-term monitoring networks to detect environmental change provided that the underlying series are made comparable through reproducible QA/QC, imputation and extreme-value treatment [29,30]. The analysis considers daily concentrations of the main-criteria pollutants of regulatory interest—PM10, PM2.5, NO, NO2, NOx, O3, SO2, and CO—whose monitoring is prioritized in European air-quality assessment frameworks because of their environmental and regulatory relevance [31]. In Spain, these pollutants are covered by national air-quality legislation and by the official instruments used for atmospheric pollution assessment and control [32,33]. This article is organized as follows: Section 2 describes the data source and the proposed methodological workflow (spatial selection, QA/QC, hold-out validation, imputation, and robust cleaning); Section 3 presents validation results and derived products; Section 4 discusses methodological implications, limitations, and transferability; and Section 5 summarizes the main conclusions.

2. Materials and Methods

2.1. Station Selection and Spatial Assignment to Power Plants

Stations were selected using a reproducible spatial criterion based on proximity to each coal-fired thermal power plant. The selection was implemented in a GIS environment by creating 10 km buffers around each plant and performing a spatial intersection with the station layer using QGIS Desktop 3.40.12, thereby identifying monitoring stations located within that radius and generating an initial inventory of candidate stations per plant (MITECO Pollutant Release and Transfer Register, PRTR-Spain) [34]. A 10 km radius was adopted as an operational threshold intended to represent the local (near-field) environment of each point source while maintaining a minimum level of station support for the observational design. This distance was retained as a study-specific compromise between three practical requirements: proximity to the emitting facility, sufficient station availability, and control of excessive spatial overlap between neighboring plant buffers. Shorter distances led to very sparse station coverage, whereas larger distances increased overlap and reduced the spatial specificity of plant-level assignment, particularly in the Asturian sector. This operational choice is broadly compatible with near-source framing used in atmospheric modeling guidance and applied technical documents discussing local-scale source influence [35,36]. The resulting approach operationalizes the PRE/POST observational design with an explicit, reproducible, and verifiable spatial rule, while preserving the traceability of station inclusion and exclusion.
Starting from an initial universe of 12 coal-fired power plants within the study area, only those with at least one station within the 10 km buffer were included in the analysis; plants with no stations within the established radius (N stations ≤ 10 km = 0) were excluded (Table 1). As a result, the analytical set comprised ten plants, while two plants (Narcea and Anllares) were excluded from the analysis because they did not have stations within the established radius.
After applying the spatial criterion, the final set comprised 28 regulatory stations with daily time series over 2006–2023, whose characterization and spatial assignment are documented in Table 2 (ID_MAPA, station code, typology, distance to the assigned plant, and number of plants within the radius). Due to overlapping areas of influence, two stations were identified as being simultaneously included in more than one 10 km buffer (N_CT_10 km = 2): ID_MAPA 22 (COD_LOCAL 33031029) and ID_MAPA 24 (COD_LOCAL 33037012) (Table 2). In these cases, multiple membership was retained to preserve assignment traceability and to enable both plant-specific and aggregated area-level analyses. The number of supporting stations also varied across plants, including cases with only one station within the selected radius. Figure 1 shows the locations of the plants, the 10 km buffers, and the selected stations, allowing the verification of the spatial assignment used in the study.
All official monitoring stations located within the 10 km buffer were subsequently carried forward into the cleaning, analytical validation, imputation, and robustification workflow developed in this study. Throughout this process, all series were treated uniformly, without applying specific weighting according to station type, area setting, or distance to the thermal power plant. To preserve traceability and support later context-aware interpretation, the final processed dataset retained descriptors such as station type, area setting, and distance to the thermal power plant (Table 2), so that future air-quality analyses may distinguish, where relevant, among industrial, traffic, and background contexts.

2.2. Data Source and Database Structure

This study uses time series from official air-quality monitoring networks, collected under European quality assurance and quality control (QA/QC) frameworks and comparability and reporting requirements [15,16]. The data were obtained upon request from the Environmental Information Office of the Ministry for the Ecological Transition and the Demographic Challenge (MITECO) as official series aggregated at daily resolution (hourly → daily) for the 2006–2023 period [37].
The analytical dataset includes daily concentrations of PM10, PM2.5, NO, NO2, NOx, O3, SO2, and CO. These official series derive from the regulatory monitoring framework, in which gaseous and particulate pollutants are measured using reference methods or methods demonstrated as equivalent under the applicable European and Spanish legislation [15,16]. For the pollutants considered here, the reference measurement principles correspond to ultraviolet fluorescence for SO2, chemiluminescence for NO/NO2/NOx, ultraviolet photometry for O3, non-dispersive infrared spectroscopy for CO, and gravimetric determination for particulate matter. In operational network practice, official PM series may also derive from equivalent automated methods; in this study, technique codes are used explicitly only when assessing PM2.5–PM10 comparability for coherence checks (Section 2.7.4).
The processing unit is defined as the pair (station, pollutant), preserving the original reporting scale and its numerical precision.

2.3. Input Structure and Temporal Harmonization

Records were received in a wide tabular format (year/month with daily columns D01–D31) and were harmonized to ensure consistency with the real calendar and to avoid structural inconsistencies that could propagate to subsequent stages. In particular, we verified (i) the validity of days per month, (ii) the uniqueness of dates, and (iii) the absence of duplicates after transformation into continuous series.
Next, the data were transformed into a long daily format (one row per date), preserving the observed value and the PUNTO_MUESTREO identifier to maintain the traceability of the historical record [10]. Discontinuities associated with instrument failures, maintenance downtime, or transmission problems were encoded as missing values and recorded using control indicators (flags) without overwriting the original data.

2.4. Pre-Imputation QA/QC

The QA/QC approach implemented in this study should be distinguished from the primary QA/QC procedures of the official monitoring networks. Starting from already-compiled official daily series, our pre-imputation QA/QC was designed as a reproducible analytical preprocessing step to reduce the propagation of inconsistencies into subsequent reconstruction and robustification stages. Accordingly, it was applied after the official network validation process established within the regulatory air-quality monitoring framework [38]. Station control logs were therefore not incorporated as an additional analytical input in the present workflow.
Applied uniformly across all station–pollutant series, this phase included (i) integrity and consistency checks of metadata and operational identifiers (including PUNTO_MUESTREO) and (ii) basic numerical plausibility controls, with the flagging of negative or otherwise implausible values prior to imputation [6,15]. As a result, each daily observation was associated with a status flag that distinguishes, at minimum, between (i) valid observed value, (ii) missing value, (iii) value excluded due to structural inconsistency, and (iv) value flagged for plausibility. This coding constitutes the traceability backbone of the full workflow, allowing the reconstruction of the original dataset and auditing of any subsequent change.

2.5. Imputation Validation Design (Hold-Out)

To evaluate imputation performance while preventing information leakage, a validation scheme was implemented by masking (hold-out) a fixed fraction of originally valid observations. For each (plant, pollutant) block, 5% of originally observed entries were randomly selected, replaced with missing values, and reserved as the validation set. Imputation was then run on the masked series, and the imputed values at the reserved dates were compared with the original observations to estimate reconstruction error. This design enables performance quantification under controlled conditions that are comparable across series. Because hold-out dates are selected at random, the evaluation approximates a Missing Completely At Random (MCAR)-like setting and therefore mainly reflects reconstruction performance under relatively simple missing-data conditions [4]. In addition, months with severe missingness (≥50%) were excluded before imputation because, at that level of absence, the monthly segment no longer provides sufficient observational support for robust multivariate reconstruction and post-imputation coherence checks. Retaining such periods would have moved the reconstruction closer to weakly supported extrapolation and reduced the homogeneity of internal validation across analytical blocks. Accordingly, the adopted validation design should be interpreted as an internal and operational benchmark rather than as a complete representation of all missing-data scenarios encountered in practice.
Let x t be the original observed daily value on day t . Let m t be a binary masking indicator, with m t = 1 if day t belongs to the hold-out subset and m t = 0 otherwise. The input series for imputation is defined as shown in Equation (1):
x t m a s k = x t m t = 0 NA m t = 1
After imputation, let x ^ t be the imputed value. Quantitative evaluation was summarized using standard metrics computed exclusively on the masked subset t : m t = 1 , with n reserved observations. Specifically, MAE and RMSE are defined in Equations (2) and (3):
M A E = 1 n t : m t = 1 x t x t ^
R M S E = 1 n t : m t = 1 x t x t ^ 2
Mean bias (Bias) was computed as the mean signed error, defined in Equation (4):
B i a s = 1 n t : m t = 1 x t ^ x t
The coefficient of determination was also computed on the masked subset, according to Equation (5):
R 2 = 1 t : m t = 1 x t x t ^ 2 t : m t = 1 x t x ¯ 2
where
x ¯ = 1 n t : m t = 1 x t
For operational interpretation and homogeneous comparison across pollutants, stations, and plants, these continuous metrics were further summarized into an ordinal fit-quality classification (1–4) based primarily on the R 2 criterion (Table 3). MAE, RMSE, R 2 , and Bias are also reported as continuous results in Section 3.
In addition to the primary R 2 -based class, complementary diagnostics were inspected to contextualize the results and identify potential issues that may warrant sensitivity analyses: absolute bias B i a s , which reflects systematic offset (Equation (6)), and the heavy-tail indicator H = R M S E / M A E , which helps to identify the presence of occasional large reconstruction errors (Equation (7)). These diagnostics are reported alongside the continuous metrics but were not used to assign the ordinal class.
B i a s = a b s B i a s
H = R M S E M A E
This ordinal classification is subsequently used in the plant–pollutant comparison (), while MAE, RMSE, R 2 , and Bias values are also reported as continuous results in Section 3.
Table 3. Criteria used to classify imputation performance (“fit quality”) based on the 5% hold-out validation scheme. The ordinal fit-quality score (1–4) is assigned using the primary R 2 criterion.
Table 3. Criteria used to classify imputation performance (“fit quality”) based on the 5% hold-out validation scheme. The ordinal fit-quality score (1–4) is assigned using the primary R 2 criterion.
Fit-Quality CategoryR2 Criterion (Primary)
Excellent (4)R2 ≥ 0.65
Good (3)0.55 ≤ R2 < 0.65
Acceptable (2)0.45 ≤ R2 < 0.55
Low (1)R2 < 0.45
R 2   is computed on the masked observations (5% hold-out subset; see Figure 2). If R 2 is negative or undefined (near-zero variance), the fit is classified as Low (1). Fit-quality classes are intended as operational indicators for network management and are not equivalent to precision metrics in predictive modeling. R 2 0.65 is classified as Excellent (4) for regulatory data completeness, not model prediction accuracy.
Figure 2. Phase A of the data-processing workflow: pre-imputation QA/QC, 5% hold-out validation, and chained-equation imputation. The pipeline produces traceable daily time series after hourly-to-daily aggregation, applying pre-imputation checks, masking-based validation, and iterative chained-equation gap filling, followed by basic plausibility constraints prior to Phase B (outlier processing). The main decision paths associated with exclusions, flagging, validation masking, and post-imputation traceability are shown explicitly.
Figure 2. Phase A of the data-processing workflow: pre-imputation QA/QC, 5% hold-out validation, and chained-equation imputation. The pipeline produces traceable daily time series after hourly-to-daily aggregation, applying pre-imputation checks, masking-based validation, and iterative chained-equation gap filling, followed by basic plausibility constraints prior to Phase B (outlier processing). The main decision paths associated with exclusions, flagging, validation masking, and post-imputation traceability are shown explicitly.
Applsci 16 03396 g002

2.6. Imputation of Missing Values Using a Chained-Equation Iterative Scheme (Bayesian Ridge)

Missing values were reconstructed using an iterative chained-equation imputation scheme with Bayesian Ridge regression, implemented in a MICE-based framework [10,23,24,25,26,39]. In the present study, the procedure was used to generate a single completed dataset for each analytical block, rather than a formal multiple-imputation design for uncertainty propagation across several completed datasets.
In a preliminary stage, extending a strictly within-station approach to the full dataset showed limited stability across a part of the network. To improve robustness and reduce between-station variance, we adopted a plant-level multi-station pooling strategy by aggregating observations from all stations located within the 10 km buffer of each power plant. This design was intended to capture the shared temporal signal within each plant-specific spatial context and to stabilize reconstruction in sparse series by borrowing strength from nearby stations. Pooling was implemented within each pollutant, without mixing chemical species, so that each station was treated as a variable (column) in a daily plant-level matrix for the corresponding pollutant. This configuration allows the imputation model to exploit shared temporal covariation among nearby stations while preserving pollutant-specific structure [3,9,40].
No meteorological covariates were included as predictors; reconstruction therefore relies on shared temporal covariation among nearby stations within each plant-specific block. The complete workflow for preprocessing, hold-out validation, and imputation is summarized in Figure 2.

2.6.1. Scope and Imputation Unit: (Plant, Pollutant)

The operational imputation unit was defined as the pair (plant, pollutant). For a plant c and a pollutant p , a daily matrix was constructed in which each associated station acts as a variable (column). Denoting by x t , s c p , the daily concentration on day t at station s , the imputation block is defined as in Equation (8):
X c p = x t , s c p t = 1 T ; s = 1 S c
In MICE, each column with missingness is modeled conditionally on the others. For each station s , the conditional model is expressed as in Equation (9), using the remaining stations as predictors:
x t , s c , p = f s X t , s c , p + ε t , s
In this study, imputation was applied by pollutant and no other pollutants were used as predictors [10,39]. The data were kept within the temporal coverage reported by MITECO for each station, with no extrapolation beyond official coverage. Non-existent calendar days were excluded by construction, and dates without measurements remained coded as missing values.

2.6.2. Preprocessing and Workflow Constraints

Before imputation, months with severe missingness (≥50%) were excluded as an operational constraint to avoid reconstructing extended missing blocks with very limited observational support, a situation that can increase uncertainty and degrade the interpretability of temporal aggregations [7]. This rule was adopted as a conservative preprocessing decision within the workflow. In addition, no auxiliary variables were incorporated as predictors, maintaining a conservative configuration focused on the shared signal across stations.

2.6.3. Imputer Specification (Bayesian Ridge Regression)

Bayesian Ridge regression was used as the estimator within the chained-equation iterative imputation scheme due to its stability in the presence of collinearity and its robust behavior when the available information is partial [23,25]. For a day t   and a target station s , the linear model is expressed as in Equation (10):
x t , s = β 0 + β T X t , s + ε t , w h e r e   ε t ~ N 0 , σ 2
The chained-equation procedure alternates between fitting and imputing each variable with missingness over several iterations. The iterative update of the chained-equations scheme is represented generically as in Equation (11):
x t , s k g s X t , s k , for   s = 1 , , S c   and   k = 1 , , K
Here, k   indicates the iteration within the procedure. In this study, k = 10   iterations per block and a fixed global seed (random_state) were set for reproducibility. Imputed values were stored in a derived field valor_mice, without overwriting the original observed value.

2.6.4. Scale Transformations (Sensitivity Test)

Positive skewness and heteroscedasticity are frequently observed in environmental series, such that a logarithmic transformation can facilitate imputation by approximating multiplicative relationships as additive and stabilizing variability. This behavior has been documented for pollutant distributions and in applications that explicitly use log transformations to correct bias and improve the statistical behavior of variables [41]. In general terms, linear models commonly apply variable transformations to better approximate the assumptions of homoscedasticity and residual normality [42] and, in the context of multiple imputation, it has been noted that treating highly skewed variables as normally distributed can introduce bias and yield implausible values (e.g., negative values). A prior transformation followed by back-transformation has been recommended [43].
Therefore, within the proposed reproducible QA/QC and imputation workflow, an optional sensitivity step based on a logarithmic transformation prior to imputation is included. Here, log denotes the natural logarithm (ln). The transformation and its back-transformation are defined in Equations (12) and (13):
z = l o g 1 + x
x = e x p z 1
In this study, this transformation was not applied in the final configuration nor in the reported results, as imputation on the original scale provided satisfactory performance. In addition, back-transformation may introduce bias on the original scale and smooth extremes; therefore, its use is reserved for cases where it yields a clear improvement in fit (R2) without increasing bias (Bias) [42,43,44].

2.6.5. Post-Imputation Controls and Traceability

After imputation (and, when applicable, after back-transformation), fixed postprocessing constraints were applied to prevent physically implausible results. In particular, negative values were discarded according to the rule given in Equation (14):
x ^ t < 0 D R O P _ N A N
Traceability was preserved by keeping the original observed value immutable and explicitly recording the imputed data in dedicated fields, together with indicators (flags) that allow observed, missing, and reconstructed values to be distinguished. This scheme enables the reconstruction of the original dataset and auditing of any modification throughout the processing workflow.

2.7. Detection and Conservative Treatment of Extreme Values (Outliers) and Construction of Robust Series

After completing the imputation (Section 2.6), a specific stage was applied to identify and treat extreme values in order to reduce the influence of spurious records (e.g., instrumental errors, transmission failures, or validation incidents) without removing real high-concentration episodes [45]. The detection of extreme values is particularly relevant because even a small proportion of atypical observations can distort estimates and affect statistical inference; therefore, their identification and documentation should be part of the preliminary screening of the data [46].
This phase was designed with a conservative and fully traceable approach: the original observed value is not overwritten, extreme-value candidates are flagged through control indicators, and derived series intended for robust analyses are generated [47]. This approach is consistent with common QA/QC practices in regulatory networks and with methodological proposals for quality control and anomaly detection in environmental time series [6,28,48].

2.7.1. Unit of Application and Operational Principle

Phase B was applied to daily series by a station–sampling point–pollutant combination. For each day t with observation x t (field VALOR), the procedure generates outlier and coherence diagnostics and assigns an audited decision (decision_final, razon_final) that leads to the analytical product VALOR_robusto, while always keeping the original value immutable.

2.7.2. Robust Global Detection of Candidates ()

Primary detection was based on the interquartile range (IQR) method, using a conservative operational threshold with k = 3 to flag severe extremes. As a first screening layer, a robust criterion based on the interquartile range (IQR), originally proposed by Tukey [49], was applied, allowing outlier detection without assuming normality [50]. For each daily series (station–sampling point–pollutant unit), the first and third quartiles, Q 1 (25th percentile) and Q 3 (75th percentile), were computed, and the interquartile range was defined as shown in Equation (15):
I Q R = Q 3 Q 1
Based on the IQR, global robust bounds were established as shown in Equations (16) and (17):
L = Q 1 k I Q R
U =   Q 3 + k I Q R
where k   controls the severity of the criterion. In this work, k = 3   was set, and any day with observation x ( t )   that satisfied Equation (18) was labeled as an IQR outlier candidate when
x t < L x t > U
The choice of k = 3 in Equations (16) and (17) aims for a conservative approach that prioritizes the detection of clearly extreme deviations, compatible with both real pollution episodes and isolated measurement artefacts, while minimizing false positives in series with high intrinsic variability. This type of robust threshold has been used in quality control of comparable environmental series and in monitoring-network studies (e.g., for ozone), where the distribution may be skewed and heavy-tailed [30,51].
Records flagged by IQR were retained as contextual flags (is_outlier_IQR) for subsequent integration into the final decision logic (Figure 3), without modifying the original value at this stage.

2.7.3. Local Diagnosis and Conservative Treatment (Hampel + Winsorization)

To complement the global IQR screening (Section 2.7.2), a local and robust diagnosis based on the Hampel filter was applied [52]. This approach evaluates each daily observation against its nearby temporal context and is suitable for series with skewness, seasonality, and heavy tails, which are common in atmospheric pollutants. The diagnosis is constructed from a moving temporal reference and a robust scale.
Let x t   be the daily concentration on day t . Let x ~ t be the centered rolling median (31-day window; center = True), and let M A D t be the median absolute deviation from x ~ t , defined in Equation (19):
M A D t = m e d i a n x t + i x ~ t   for   i   in   the   local   window
The local robust scale was defined as in Equation (20):
s t = 1.4826 M A D t
The factor 1.4826 is the usual consistency coefficient under normality and expresses s t   on a scale comparable to a robust standard deviation. The Hampel statistic (local robust z-score) was defined as in Equation (21):
z t = x t x ~ t s t
A value was considered a local extreme candidate when it met the threshold shown in Equation (22):
z t > λ
In this work, λ = 6   was used to identify clearly extreme deviations under a conservative approach [52,53].
The rolling median x ~ t   and M A D t were computed requiring a minimum of 15 valid observations within the window (min_periods = 15). If this minimum is not reached or if s t = 0 , the diagnosis is considered non-informative for that day and winsorization is not applied.
Instead of automatically removing Hampel-detected extremes, a conservative winsorization step was applied [47]. This step limits the influence of isolated peaks without altering the temporal structure of the series.
For each day t , a symmetric local acceptance interval around x ~ t   was defined, as indicated in Equations (23) and (24):
L H t = x ~ t 6 s t
U H t = x ~ t + 6 s t
When the day meets Equation (22) (with λ = 6 ) and exceeds the applicable coherence barriers (Section 2.7.4), the winsorized value is computed by clipping according to Equation (25):
x * t = c l i p x t , L H t , U H t ,   with   c l i p a , L , U = m i n m a x a , L , U
This clipping is symmetric and avoids directional bias, while limiting only the most extreme departures from the local reference [46,54].
The winsorized value is stored in the auxiliary field valor_winsor and did not overwrite VALOR. The construction of the final analytical product was defined as in Equation (26):
V A L O R r o b u s t o t = x t   if   d e c i s i o n _ f i n a l = K E E P
V A L O R r o b u s t o t = x * t   if   d e c i s i o n _ f i n a l = K E E P E X T R E M O
V A L O R r o b u s t o t = N A   if   d e c i s i o n _ f i n a l = D R O P _ N A N
This design maintains full traceability between the observed data, the diagnostic flags, and the final decision.

2.7.4. Physicochemical and Hierarchical Consistency Checks (NO–NO2–NOx/PM)

After flagging outlier candidates using robust criteria (IQR/Hampel) and, where applicable, conservative accommodation through winsorization, internal consistency checks between related species were applied to detect physicochemical and hierarchical inconsistencies. These checks were implemented as barriers within the final decision logic of the analytical product VALOR_robusto, using binary indicators (flags) and auditable reasons, without overwriting the original observed value.
1.
Non-negativity (physical–metrological control)
Ambient concentrations cannot be negative; however, negative values may appear in instrumental networks due to noise, drift, or baseline corrections. Therefore, the condition in Equation (27) was verified:
x t 0
In the analyzed dataset, no negative values were detected; therefore, this control required no intervention (no truncation, recoding, or removal). It is retained as a standard QA/QC check.
2.
Algebraic–operational consistency of the NO–NO2–NOx system
Operationally, NOx represents the set of nitrogen oxides associated with NO and NO2 (with possible traces arising from the measurement technique). At daily aggregation, NOx should be, at minimum, consistent with the NO and NO2 values. The inequality in Equation (28) was evaluated:
N O x t max N O t , N O 2 , t ε
where ε is an instrumental tolerance that absorbs measurement uncertainty, temporal-averaging mismatches, and small rounding errors. In this study, ε = 5 μg m−3 was adopted as a conservative threshold. Days that violate Equation (28) are flagged as incoherencia_NOx = True.
3.
Hierarchical consistency of particulate-matter fractions PM2.5–PM10
To reduce false positives, a tolerant rule defined in Equation (29) was applied:
PM2.5,t ≤ PM10,t + max (5, 0.10 · PM10,t)
This tolerance combines an absolute margin (5 µg·m−3), which is relevant at low concentrations, and a relative margin (10% of PM10), which is more appropriate at mid–high levels. Days that violate Equation (29) are flagged as incoherencia_PM = True.
4.
Applicability of PM consistency and exceptions due to instrumental non-comparability
PM2.5–PM10 consistency is only interpretable when both magnitudes are instrumentally comparable. Therefore, the PM barrier was applied only when (a) PM2.5 and PM10 came from the same PUNTO_MUESTREO, or (b) both measurements belonged to the same instrumental subfamily PM_MASS (TEOM/BAM/GRAV; Table 4). In non-comparable cases, the data were preserved and the exception was recorded as flag_incoherencia_PM_excepcion = True, maintaining traceability and avoiding biases due to instrumental changes. This approach is consistent with conservative QA/QC in real networks, where isolated discrepancies may reflect instrumental changes, different techniques, or measurement uncertainty, and therefore it is preferable to label rather than impose automatic corrections [55,56,57,58,59].

2.7.5. External Plausibility and Regulatory Tagging (Contextual Flags)

In addition to the robust detectors (IQR/Hampel) and internal consistency checks, contextual flags were incorporated to (1) place high values within ranges observed in European monitoring networks and (2) label potential exceedances with respect to regulatory thresholds. These indicators do not modify the daily value and do not act as automatic exclusion rules; their purpose is to support interpretation and traceability.
1.
External plausibility (EU reports/annual thresholds)
Annual “plausible high” thresholds were defined based on European air-quality reports Table 5 [60,61]. Evaluation was performed by station–pollutant–year, provided that annual coverage was sufficient.
In particular, note the following:
  • Minimum coverage: ≥ 75% of valid days in the year. If not reached, the metric is coded as NA and the information on (N_valid_days and coverage) is retained.
  • PM10: the annual high regime was computed as p90.4 of daily means, implemented conservatively as the 36th-highest value (no interpolation). FLAG_PLAUS_PM10_P90_4_GT_75 = True is triggered if p90.4 > 75 µg·m−3.
  • NO2: the annual mean of daily values was computed. FLAG_PLAUS_NO2_MEAN_GT_100 = True is triggered if the annual mean > 100 µg·m−3.
  • PM2.5: the annual mean of daily values was computed. FLAG_PLAUS_PM25_MEAN_GT_30 = True is triggered if the annual mean > 30 µg·m−3.
  • O3: the usual reference threshold is defined on the daily maximum of 8 h running means; when only daily means are available, this control is considered non-operational and is documented as NA.
External plausibility results were stored as contextual flags (FLAG_PLAUS_*) together with supporting fields (N_DIAS_VALIDOS, COBERTURA_ANUAL, and annual metrics). In the analyzed dataset, the annual plausible-high tags were not triggered; therefore, they were retained only as contextual information and did not alter the outlier-handling logic.
2.
Daily regulatory tagging (context): traceability and output products
Regulatory tagging fields were incorporated to place elevated daily values against legislated thresholds (Directive 2008/50/EC and its national transposition, Royal Decree 102/2011 and subsequent amendments) [31,33,38]. The reference values used and their interpretability with 24 h daily means are summarized in Table 6. Specifically, note the following:
  • exceso_normativo_diario: indicates that the daily value exceeds a reference threshold applied at daily scale (when interpretable).
  • normativa_no_evaluable_diario: indicates that evaluation is not applicable at daily scale (e.g., criteria defined on percentiles or annual averages).
These flags are stored together with decision_final and razon_final as contextual and traceability information, but they do not act as automatic rules in the construction of VALOR_robusto.

2.8. Coherence Checks and Final Decision Logic to Construct VALOR_robusto

After imputation (Section 2.6) and the conservative diagnosis/treatment of extremes (Section 2.7), a final coherence-and-decision stage was applied to construct the daily analytical series VALOR_robusto. This stage was designed to meet two objectives: physicochemical plausibility and full traceability. In no case is the original record overwritten: VALOR is kept immutable, and any diagnosis or modification is recorded in separate fields.
To ensure auditability and reproducibility, each daily record incorporates four information blocks: (i) outlier diagnosis, via indicators derived from robust methods (e.g., flag_IQR, z_Hampel, is_extremo_Hampel); (ii) internal coherences, to identify physicochemical or hierarchical incompatibilities (incoherencia_NOx, incoherencia_PM) and, when instrumental comparability is not guaranteed, the corresponding non-applicability tag (flag_incoherencia_PM_excepcion); (iii) contextual tagging, which labels conditions relevant to interpreting high values without acting as an automatic exclusion rule (exceso_normativo_diario, normativa_no_evaluable_diario, and FLAG_PLAUS_* when sufficient annual coverage exists); (iv) final decision and justification, explicitly recorded via decision_final and razon_final, together with the resulting value stored in VALOR_robusto. In all cases, the original value VALOR remains immutable and the analytical product is obtained exclusively through derived fields. Figure 3 summarizes the logic of this phase and its relationship to the output fields used in subsequent analyses.

2.8.1. Final Decision Rule and Operational Definition of VALOR_robusto

This section defines the final daily rule that transforms the observed value VALOR = x t   into the analytical value VALOR_robusto by combining (i) robust outlier diagnostics (global IQR and local Hampel diagnosis), (ii) physicochemical coherence checks (NO–NO2–NOx and PM2.5–PM10), and (iii) an audited final decision recorded in the output fields. The semantics, minimum activation conditions, and the effect on VALOR, valor_winsor, and VALOR_robusto are summarized in Table 7, which serves as the single decision dictionary (KEEP, KEEP_EXTREMO, DROP_NAN).
The daily rule is applied following the priority order in Table 7. Preliminary checks are applied: if the day is structurally non-evaluable (calendar/coverage constraints) or if VALOR is missing (NaN), the record is preserved as missing (KEEP as “missing”) and VALOR_robusto is set to NaN.
Next, physical barriers and internal coherences are applied. DROP_NAN is assigned when any of the following conditions are detected (and the corresponding rule is applicable): (a) violated non-negativity (Equation (27)); (b) inconsistency in the NO–NO2–NOx system, evaluated with tolerance ε (Equation (28)); and (c) hierarchical PM2.5–PM10 inconsistency (only if applicable), evaluated using the tolerant rule (Equation (29)). To avoid false positives, the PM2.5–PM10 coherence check is applied only when instrumental comparability is adequate, given method-dependent uncertainty and potential systematic offsets (e.g., TEOM/BAM/gravimetry).
The local Hampel diagnosis is then evaluated and, when applicable, conservative accommodation via winsorization. The local statistic is defined as in Equation (21), with robust scale s t   defined in Equation (20) and the local-extreme threshold in Equation (22) with λ = 6 . When a local extreme is triggered and no incoherences are present, the winsorized value is computed according to Equation (25) (equivalent to clipping between Equations (23) and (24)) and decision_final = KEEP_EXTREMO is assigned. In this case, VALOR_robusto takes the winsorized value according to Equation (26).
Finally, if the record is IQR-only (outside the global bounds, Equation (18), but with no local extreme signal z t 6 and no incoherences), the observed value is retained (KEEP) and VALOR_robusto = VALOR (Equation (26)). In the remaining non-extreme cases, the record is kept (KEEP) and VALOR_robusto = VALOR.
Contextual flags (annual external plausibility and daily regulatory tagging) are used only as interpretative and traceability context: they do not trigger automatic exclusions nor modify daily thresholds. The overall process logic is summarized in Figure 3 and allows a reproducible reconstruction of which observations were kept, which were winsorized, and which were discarded, without overwriting the original data [28,47,51].

2.8.2. Output Products and QA/QC Control Plots

After applying the daily decision rule and constructing VALOR_robusto (Section 2.8.1), traceable output products are generated to support (i) subsequent statistical analyses, (ii) auditing of postprocessing, and (iii) visual change control (QA/QC). In all cases, the original value VALOR is kept immutable and results are stored in derived fields and/or output files. Operationally, the information is structured into four auditable blocks (Table 8): outlier diagnostics, internal coherences, contextual tagging, and final decision, which enables reconstruction of the reasoning applied to each observation without overwriting the original data.
In addition, as a visual quality-control step, VALOR vs. VALOR_robusto comparison plots are generated to verify that changes are concentrated in isolated episodes, that winsorization does not introduce artefacts, and that discards respond to physical incoherences. This approach is consistent with conservative QA/QC in real monitoring networks [55,56,57,58,59].

3. Results

The results presented below assess the proposed workflow as an integrated preprocessing framework, focusing on effective coverage after QA/QC, reconstruction performance under hold-out validation, and the extent to which the resulting daily series remain sufficiently traceable, coherent, and robust for subsequent long-term temporal analyses in regulatory air-quality contexts.

3.1. Effective Coverage and Preprocessing Outcome (QA/QC) Prior to Imputation

After converting hourly observations to daily resolution (2006–2023) and applying the QA/QC workflow prior to imputation (Phase A; Section 2.6), the final dataset retained high overall completeness. Months with severe missingness were excluded (threshold: ≥50% missing data; Section 2.6.2), so that subsequent imputation operates mainly as gap-filling rather than extensive signal reconstruction.
Pre-imputation completeness is summarized using the weighted percentage of missingness, defined in Equation (30).
% M i s s i n g w = 100 M i s s i n g d a y s T o t a l d a y s
where the sums were computed from monthly missingness summaries at daily resolution. To avoid double counting, overall completeness by pollutant (Table 9) was calculated on unique station–measurement-configuration series (station × measurement configuration/parameterization). In addition, completeness by coal-fired power plant (Figure 4) was estimated through a controlled spatial expansion, in which stations located ≤10 km from more than one plant contribute to each relevant plant, consistent with the study’s spatial design.
At the plant level, the weighted percentage of missingness was moderate and heterogeneous (Figure 4), with values typically in the single-digit range (approximately ~3–9%). This pattern indicates that the final dataset is mostly observed and that imputation acts on localized discontinuities.
At the pollutant level, incompleteness was generally moderate for gases and PM10, whereas PM2.5 concentrated the largest relative missingness (Table 9). Specifically, PM2.5 showed the highest weighted percentage (18.43%) and the largest between-series variability (P75 = 32.24%, maximum = 70.55%), consistent with more limited historical availability and/or the later implementation of PM2.5 measurements across part of the regulatory network. PM10 showed an intermediate level (weighted percentage 6.35%, maximum 53.77%). In contrast, gaseous pollutants remained in a low and relatively narrow range (weighted percentage: CO 5.70%, O3 5.13%, NO2 4.81%, SO2 4.71%, NO 4.56%, NOx 4.21%), reinforcing that the dataset entering the imputation stage is predominantly complete.

3.2. Hold-Out Validation: Imputation Performance (Phase A)

At the pollutant level (pooled across all power plants), hold-out validation indicates overall stable imputation performance (Table 10). Using the primary R2-based criterion in Table 3, CO, O3, NO2 and NOx achieve an Excellent (4) rating, while PM10, PM2.5 and NO fall into Good (3). SO2 exhibits comparatively lower performance (Acceptable (2)), consistent with its intermittency and episodic behavior and the typically weaker inter-station coherence of locally driven primary pollutants. Across most pollutants, Bias remains small in absolute terms, indicating that the reconstruction step does not introduce strong systematic offsets into the completed series. This is relevant for downstream long-term temporal analyses, including observational PRE/POST comparisons, where minimizing artificial level shifts is as important as achieving acceptable pointwise fit.
This pollutant-level summary is spatially disaggregated in the plant–pollutant comparison (Figure 5), where score 3 predominates and quality degradations are bounded and readily localizable to specific plant–pollutant combinations. These localized drops provide a practical basis to identify candidate series for sensitivity analyses and to contextualize subsequent robustification steps.
The R2 pattern by pollutant in Table 10 is consistent with differences in the spatio-temporal structure and the primary/secondary character of each species. Ozone (O3) exhibits a regional component and relatively smooth, predictable variability, modulated by meteorology and seasonality [62]. An O3 lifetime in the free troposphere on the order of weeks has been described, enabling hemispheric-scale transport and favoring a spatially correlated signal [62]; in graph-based approaches, this spatio-temporal correlation is explicitly assumed to propagate information across neighboring nodes [18]. By contrast, short-lived primary pollutants dominated by local sources—particularly NO and SO2—tend to exhibit steep near-source gradients and brief peak episodes, reducing inter-station coherence and penalizing reconstruction when emission peaks are randomly masked. In the case of NO2, despite its generally local character, the pooled performance remains high, which is consistent with the fact that a substantial fraction of the daily variability is still structured (e.g., by weekday–weekend patterns and seasonality), while the most localized extremes remain harder to recover. Nevertheless, NO2 has been reported to vary over very short distances governed by traffic density, which accentuates spatial heterogeneity and can complicate the reconstruction of sharp episodes at individual stations [28]. Carbon monoxide (CO), with a comparatively long atmospheric lifetime (weeks to months), reflects more persistent, shared signals associated with transport processes, which favors higher explained variance at the daily scale [63]. Overall, these results align with the imputation literature: fit improves under stronger autocorrelation and inter-station covariation, and deteriorates in the presence of spikes, heavy tails, or nonlinear processes, where linear models tend to underestimate extreme values [10,18].
As a qualitative check complementing the aggregated summary (Table 10; Figure 5), a station-level example is included (Figure 6) to verify that imputed values remain within plausible ranges and preserve the dominant temporal structure. For CT_VELILLA, given that it is a single-station plant, the example is directly representative of the aggregated performance: despite the low rating for NO2 and PM10 (score = 1) (Figure 5), imputations remain within reasonable ranges and follow the general dynamics, suggesting that the degraded fit is driven mainly by isolated discrepancies (e.g., episodes) rather than by out-of-domain imputations. In this sense, the example illustrates that even under degraded fit, the workflow preserves the overall representativeness of the daily series and reduces the risk that isolated discrepancies disproportionately affect subsequent long-term temporal analyses.

3.3. Outlier Screening and Construction of the Robust Series (Phase B)

After imputation and its validation (Phase A; Table 10; Figure 5), Phase B was applied to (i) identify IQR candidates, (ii) characterize Hampel extremes, and (iii) construct the daily series VALOR_robusto with per-observation traceability (Figure 3).
In this first layer, based on a global robust threshold (IQR), 11,051 observations were flagged out of 748,367 observed values (1.48%). This low proportion indicates a targeted intervention concentrated in extreme episodes. The distribution of IQR candidates by plant and pollutant is summarized in Table 11. The CT_VELILLA example (Figure 7) illustrates contrasts across pollutants under the same global criterion, with residual detection in O3 and higher incidence in species with heavier tails.
The incidence of IQR candidates was clearly pollutant-dependent: it concentrated in NO (3.46%), SO2 (2.32%), and CO (2.57%), whereas PM10 (0.70%), PM2.5 (0.54%), and NO2 (0.37%) showed lower proportions, and O3 was negligible across species (Table 11).
As a second layer, the local Hampel diagnosis identified Hampel extremes based on the temporal context. Results by plant and pollutant are reported in Table 12, with an example for CT_VELILLA in Figure 8.
Because IQR (global) and Hampel (local) rely on different criteria, they are not necessarily nested and may yield pollutant-specific discrepancies even for the same number of observations. This pattern is evident for CT_VELILLA, where Hampel (Table 12) detects relatively more episodes in PM10 (sensitivity to isolated deviations from the local level), whereas IQR (Table 11) identifies more candidates in NO (more pronounced global tails); detection for O3 remains residual.
Next, we quantify how these candidates translated into final decisions (KEEP/KEEP_EXTREMO/DROP_NAN) and, in particular, how many observations were accommodated through winsorization versus discarded, following the priority decision rule in Table 7 and as summarized by pollutant in Table 13.

3.4. Final Decision and Generation of VALOR_robusto (Phase B)

After identifying extreme candidates using the global IQR criterion (Table 11; Figure 7) and characterizing local anomalies with the Hampel diagnosis (Table 12; Figure 8), a deterministic and traceable decision rule was applied to construct the final analytical series VALOR_robusto (Figure 3). This stage integrates the statistical evidence of extremes, conservative accommodation when applicable (winsorization), and physicochemical coherence checks across species, so that each observation is associated with auditable “decision_final/razon_final” fields, without overwriting the original record.
In aggregate terms, Phase B affected only a small fraction of the observation set: KEEP = 98.83%, KEEP_EXTREMO = 0.84% (winsorized values), and DROP_NAN = 0.32% (discarded records) of the total daily records analyzed. Percentages are computed over the full set of analyzed daily records across pollutants and may not sum to 100% due to rounding. This limited intervention rate indicates that Phase B acts as a targeted robustness layer rather than as a broad transformation of the reconstructed series. In practical terms, this helps to reduce the influence of high-leverage anomalies on downstream temporal summaries while preserving the overall structure of the daily series.
At the pollutant level, final outcomes are summarized in Table 13 (KEEP (%) can be obtained by complement). Three patterns are useful to interpret the effect of Phase B:
  • Winsorization dominates in species with a higher extreme burden: NO (KEEP_EXTREMO = 2.0386%) and SO2 (2.022%), with DROP_NAN ~0.33% in both cases (Table 13).
  • A closer balance between winsorization and discarding is observed for PM: PM10 (0.4112% vs. 0.3096%) and PM2.5 (0.4128% vs. 0.2838%) (Table 13).
  • Discarding exceeds winsorization in species with residual winsorization: O3 (0.0263% vs. 0.4051%) and, to a lesser extent, NO2 (0.1943% vs. 0.3566%) (Table 13).
Internal coherence barriers were integrated as constraints in the final decision to avoid statistically admissible but physically inconsistent adjustments (Table 8; Section 2.8). This includes coherence within the NO/NO2/NOx family and PM2.5–PM10 coherence. For PM, exception handling is explicitly incorporated when instrumental comparability is not guaranteed, preventing spurious flags driven by method-related offsets rather than true physical inconsistency. Contextual flags (annual plausibility and daily regulatory tagging) are retained as interpretive layers and do not act as automatic exclusion rules (Section 2.8).

3.5. Station-Level Example: Traceability and Contextual Plausibility (34080004_CT_VELILLA)

All Phase B products and outputs were generated systematically for the 28 study stations; however, to maintain coherence and comparability with the rest of the article, only the illustrative example of 34080004_CT_VELILLA is presented here, following the same approach applied in the previous sections.
To assess the local effect of Phase B, station 34080004_CT_VELILLA is examined. The comparison between VALOR and VALOR_robusto (showing only days with changes) indicates that the intervention was sporadic and concentrated in isolated episodes, without altering the overall temporal structure (seasonality and background level) (Figure 9).
For the 2015–2023 period, 19,360 daily records (six pollutants) were evaluated. At this station, Phase B produced no discards (DROP_NAN = 0). All modifications were resolved exclusively through winsorization (KEEP_EXTREMO = 57; 0.29%), while the remaining records were kept as KEEP (19,303; 99.71%). By pollutant, changes were concentrated in PM10 (42/3318; 1.27%), followed by SO2 (6/3259; 0.187%) and NO (5/3287; 0.15%). For NOx, changes were residual (3/2922; 0.1027%). For NO2, a single adjustment was observed (1/3287; 0.03%). For O3, no changes were recorded (0/3287; 0.00%). This summary is reported in Table 14.
In PM10, the changes correspond to high-magnitude peaks. The original daily maximum reached 404 µg·m−3, whereas the robust value was capped at 74.71 µg·m−3 in the most extreme episode. These adjustments are consistent with a conservative intervention that limits the influence of isolated peaks without modifying the background level (Figure 9).
As a contextual layer, flag_PM10_gt_75 was triggered on five days during the study period. In all cases, Phase B applied KEEP_EXTREMO. The corresponding dates and (VALOR, VALOR_robusto) pairs are reported in Table 15.
Regarding internal coherence, no violations of the NOx barrier were detected on days with simultaneous measurements (2922 days with NOx, NO, and NO2 available). PM2.5–PM10 coherence was not evaluable at this station due to the absence of PM2.5 during the analyzed period.
Overall, 34080004_CT_VELILLA confirms the operational goal of Phase B: to avoid data loss, concentrate intervention on a small number of days, and reduce the influence of extreme episodes on subsequent metrics, while maintaining traceability and a strict separation between VALOR and VALOR_robusto. In this sense, the CT_VELILLA example illustrates that the robustification stage is intended not to suppress genuine temporal variability, but to preserve series representativeness by reducing the disproportionate influence of isolated anomalies on subsequent long-term temporal interpretation.
While the CT_VELILLA example provides station-level qualitative evidence of traceability and conservative intervention, broader applicability is constrained by data resolution and modeling choices; these aspects are discussed in Section Limitations and Transferability.

4. Discussion

This study proposes a conservative, audit-oriented preprocessing layer for regulatory daily air-quality time series, designed to support PRE/POST observational analyses around coal power plant closures. Rather than optimizing for marginal gains in single-metric predictive accuracy, the workflow prioritizes data governance: preservation of the original record (VALOR) as immutable, explicit and reproducible rules for each intervention, and the storage of flags and justifications that allow every modification to be traced and audited. This design choice is particularly relevant for long-term environmental monitoring networks, where comparability across stations, periods, and pollutants is as critical as pointwise reconstruction accuracy.
At the pollutant level (Table 10), the 5% hold-out validation shows generally stable performance under daily aggregation, with differences across species consistent with their spatio-temporal structure and primary/secondary character. Figure 5 further disaggregates this summary by plant–pollutant combinations, where degradations remain bounded and readily localizable, providing a practical basis to identify candidate series for targeted sensitivity analyses.
Across several pollutants, MICE shows a modest negative Bias (Table 10 and Table S1), consistent with a conservative smoothing of episodic peaks under pooled reconstruction; this should be considered when interpreting peak-driven PRE/POST indicators.
Method selection should therefore be interpreted in relation to the overall workflow architecture, and not only to marginal differences in single-metric performance under the adopted hold-out benchmark. As an additional sanity check, Supplementary Table S1 and Figure S1 benchmark the proposed imputation step against a univariate linear-interpolation baseline on the same 5% random hold-out mask. As expected under MCAR-like short-gap masking, linear interpolation can match or outperform multivariate models for several pollutants, particularly under short and relatively simple missing-data conditions, as also reported in previous comparative studies [10]. In this work, MICE is retained primarily for its role within the broader audit-ready workflow (QA/QC + physicochemical consistency checks + traceable robustification) and because the benchmark adopted here does not fully represent the longer and more complex missingness patterns that may occur in regulatory monitoring networks. In this sense, method selection must also be interpreted in light of traceability and reproducibility requirements within environmental data workflows [13].

Limitations and Transferability

Several limitations should be noted. First, the analysis is conducted at daily resolution, which is appropriate for long-term PRE/POST designs but does not allow a direct evaluation of regulatory indicators that depend on hourly data or moving averages (e.g., 8 h metrics). However, daily aggregation provides a temporally homogeneous basis for long-term comparisons, which is consistent with the intended downstream use of the processed series in extended observational analyses. Second, the imputation models do not include meteorological covariates; while this choice improves portability and reduces external data dependencies, it may limit reconstruction during meteorology-driven episodes (e.g., secondary formation or stagnation events). Although meteorological covariates were not included, the plant-level pooling strategy is intended to partially mitigate this limitation by capturing shared regional covariation across stations within each buffer. Third, the 5% random hold-out validation approximates a Missing Completely At Random (MCAR) masking mechanism, which does not explicitly model the Missing Not At Random (MNAR) missingness that may occur in real networks (e.g., during extreme events or instrument downtime). Consequently, the benchmark should be interpreted primarily as a controlled internal validation for short random gaps, rather than as a full characterization of performance under all missingness mechanisms.
Despite these limitations, the workflow is highly transferable because interventions are encoded through explicit rules, flags, and stored justifications, enabling reproducible application to other monitoring networks. Key parameters (e.g., Hampel threshold λ, IQR factor k, plausibility/coherence rules, and decision logic) can be tuned to local network characteristics without changing the overall structure of the pipeline. This supports method reuse across regions while preserving auditability and comparability.

5. Conclusions

We present a reproducible, audit-ready workflow to preprocess regulatory daily air-quality time series for subsequent long-term temporal analyses, including observational PRE/POST applications around coal power plant closures. The core contribution lies not in the individual preprocessing algorithms, but in a traceability-first processing architecture that preserves the original record as immutable (VALOR) while generating fully documented derived products—including a robust daily series (VALOR_robusto)—so that each edit is associated with an explicit decision label and stored justification.
Methodologically, the framework integrates multivariate imputation (MICE with Bayesian Ridge, implemented per pollutant with plant-level multi-station pooling) with physicochemical consistency checks (e.g., NO–NO2–NOx and PM2.5–PM10, when applicable) and a traceable robustification stage (Phase B). Empirically, robustification operates as a targeted intervention rather than a systematic transformation of the signal: across the full analyzed daily dataset (all plants and pollutants), 98.83% of daily records remain unchanged, 0.84% are conservatively accommodated via winsorization (KEEP_EXTREMO), and 0.32% are discarded (DROP_NAN) due to physical implausibility and/or coherence violations. A station-level illustration (CT_VELILLA) further supports this behavior, showing that modifications concentrate on isolated episodes while preserving background levels and seasonal structure. Overall, the proposed approach provides a transferable preprocessing layer for regulatory networks affected by missingness, structural inconsistencies, and extreme values, reducing the risk that data gaps, inconsistencies, or high-leverage anomalies condition the interpretation of the processed series in later temporal applications.
Finally, this audit-ready preprocessing layer provides the traceability, coherence, and robustness required for subsequent long-term air-quality analyses in the context of energy transition and decarbonization. Future work will extend the framework toward (i) higher temporal resolution (hourly and 8 h metrics), (ii) the inclusion of meteorological covariates and alternative missingness scenarios beyond MCAR masking (e.g., prolonged outages), and (iii) sensitivity analyses of key thresholds and baseline comparisons to further quantify robustness across diverse regulatory networks.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app16073396/s1, Table S1: Minimal benchmark of the imputation step (Phase A) on the 5% hold-out subset (pooled across all power plants): MICE (Bayesian Ridge) versus a univariate linear-interpolation baseline. Figure S1: RMSE and R 2 by pollutant for the same benchmark.

Author Contributions

Conceptualization, N.F.P., L.Á.d.P. and A.B.S.; methodology, N.F.P., L.Á.d.P., L.A.M.G. and A.B.S.; software, N.F.P., L.Á.d.P., S.B. and A.B.S.; validation, N.F.P., L.Á.d.P. and A.B.S.; formal analysis, L.A.M.G., D.F.L. and S.B.; investigation, N.F.P., L.Á.d.P., S.B. and A.B.S.; resources, N.F.P., L.Á.d.P. and A.B.S.; data curation, N.F.P., L.Á.d.P., D.F.L. and A.B.S.; writing—original draft preparation, N.F.P.; writing—review and editing, L.Á.d.P. and A.B.S.; visualization, L.A.M.G., D.F.L. and S.B.; supervision, L.Á.d.P., L.A.M.G. and A.B.S.; project administration, L.Á.d.P., D.F.L. and A.B.S.; funding acquisition, A.B.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study were obtained from the Spanish Ministry for the Ecological Transition and the Demographic Challenge (MITECO) upon request. Derived data supporting the findings of this study are available from the corresponding authors upon reasonable request.

Conflicts of Interest

Author David Fernández López is employed by the company INREMIN S.L. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Gobierno de España Plan Nacional Integrado de Energía y Clima (PNIEC) 2021–2030. Available online: https://www.miteco.gob.es/content/dam/miteco/images/es/pnieccompleto_tcm30-508410.pdf (accessed on 27 January 2026).
  2. Gobierno de España Plan Nacional Integrado de Energía y Clima (PNIEC): Actualización 2023–2030. Available online: https://www.miteco.gob.es/content/dam/miteco/es/energia/files-1/pniec-2023-2030/PNIEC_2024_240924.pdf (accessed on 27 January 2026).
  3. Gómez-Carracedo, M.P.; Andrade, J.M.; López-Mahía, P.; Muniategui, S.; Prada, D. A Practical Comparison of Single and Multiple Imputation Methods to Handle Complex Missing Data in Air Quality Datasets. Chemom. Intell. Lab. Syst. 2014, 134, 23–33. [Google Scholar] [CrossRef]
  4. Junger, W.L.; Ponce de Leon, A. Imputation of Missing Data in Time Series for Air Pollutants. Atmos. Environ. 2015, 102, 96–104. [Google Scholar] [CrossRef]
  5. Rodríguez, S.; López-Darias, J. Extreme Saharan Dust Events Expand Northward over the Atlantic and Europe, Prompting Record-Breaking PM10 and PM2.5 Episodes. Atmos. Chem. Phys. 2024, 24, 12031–12053. [Google Scholar] [CrossRef]
  6. Wu, H.; Tang, X.; Wang, Z.; Wu, L.; Lu, M.; Wei, L.; Zhu, J. Probabilistic Automatic Outlier Detection for Surface Air Quality Measurements from the China National Environmental Monitoring Network. Adv. Atmos. Sci. 2018, 35, 1522–1532. [Google Scholar] [CrossRef]
  7. Hadeed, S.J.; O’Rourke, M.K.; Burgess, J.L.; Harris, R.B.; Canales, R.A. Imputation Methods for Addressing Missing Data in Short-Term Monitoring of Air Pollutants. Sci. Total Environ. 2020, 730, 139140. [Google Scholar] [CrossRef] [PubMed]
  8. Chen, M.; Zhu, H.; Chen, Y.; Wang, Y. A Novel Missing Data Imputation Approach for Time Series Air Quality Data Based on Logistic Regression. Atmosphere 2022, 13, 1044. [Google Scholar] [CrossRef]
  9. Hua, V.; Nguyen, T.; Dao, M.-S.; Nguyen, H.D.; Nguyen, B.T. The Impact of Data Imputation on Air Quality Prediction Problem. PLoS ONE 2024, 19, e0306303. [Google Scholar] [CrossRef]
  10. Junninen, H.; Niska, H.; Tuppurainen, K.; Ruuskanen, J.; Kolehmainen, M. Methods for Imputation of Missing Values in Air Quality Data Sets. Atmos. Environ. 2004, 38, 2895–2907. [Google Scholar] [CrossRef]
  11. Menéndez García, L.A.; Menéndez Fernández, M.; Sokoła-Szewioła, V.; Álvarez de Prado, L.; Ortiz Marqués, A.; Fernández López, D.; Bernardo Sánchez, A. A Method of Pruning and Random Replacing of Known Values for Comparing Missing Data Imputation Models for Incomplete Air Quality Time Series. Appl. Sci. 2022, 12, 6465. [Google Scholar] [CrossRef]
  12. Zimek, A.; Filzmoser, P. There and Back Again: Outlier Detection between Statistical Reasoning and Data Mining Algorithms. WIREs Data Min. Knowl. Discov. 2018, 8, e1280. [Google Scholar] [CrossRef]
  13. Schmidt, L.; Schäfer, D.; Geller, J.; Lünenschloss, P.; Palm, B.; Rinke, K.; Rebmann, C.; Rode, M.; Bumberger, J. System for Automated Quality Control (SaQC) to Enable Traceable and Reproducible Data Streams in Environmental Science. Environ. Model. Softw. 2023, 169, 105809. [Google Scholar] [CrossRef]
  14. European Environment Agency. Air Quality E-Reporting Submission Procedures for Reporting to Eionet CDR. Available online: https://www.eionet.europa.eu/aqportal/doc/AQ_IPR_submission_procedure_2018.pdf (accessed on 27 January 2026).
  15. European Commission. 2011/850/EU: Commission Implementing Decision of 12 December 2011 laying down rules for Directives 2004/107/EC and 2008/50/EC of the European Parliament and of the Council as regards the reciprocal exchange of information and reporting on ambient air quality (notified under document C(2011) 9068). Off. J. Eur. Union 2011, L 335, 86–106. [Google Scholar]
  16. European Commission. Directive (EU) 2015/1480 of 28 August 2015 amending several annexes to Directives 2004/107/EC and 2008/50/EC of the European Parliament and of the Council laying down the rules concerning reference methods, data validation and location of sampling points for the assessment of ambient air quality. Off. J. Eur. Union 2015, L 226, 4–11. [Google Scholar]
  17. Liu, X.; Wang, X.; Zou, L.; Xia, J.; Pang, W. Spatial Imputation for Air Pollutants Data Sets via Low Rank Matrix Completion Algorithm. Environ. Int. 2020, 139, 105713. [Google Scholar] [CrossRef]
  18. Betancourt, C.; Li, C.W.Y.; Kleinert, F.; Schultz, M.G. Graph Machine Learning for Improved Imputation of Missing Tropospheric Ozone Data. Environ. Sci. Technol. 2023, 57, 18246–18258. [Google Scholar] [CrossRef]
  19. Willmott, C.J.; Matsuura, K. Advantages of the Mean Absolute Error (MAE) over the Root Mean Square Error (RMSE) in Assessing Average Model Performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
  20. Reche, C.; Querol, X.; Alastuey, A.; Viana, M.; Pey, J.; Moreno, T.; Rodríguez, S.; González, Y.; Fernández-Camacho, R.; de la Rosa, J.; et al. New Considerations for PM, Black Carbon and Particle Number Concentration for Air Quality Monitoring across Different European Cities. Atmos. Chem. Phys. 2011, 11, 6207–6227. [Google Scholar] [CrossRef]
  21. World Health Organization. WHO Global Air Quality Guidelines: Particulate Matter (PM2.5 and PM10), Ozone, Nitrogen Dioxide, Sulfur Dioxide and Carbon Monoxide: Executive Summary; World Health Organization: Geneva, Switzerland, 2021; ISBN 978-92-4-003443-3. [Google Scholar]
  22. Little, R.J.A.; Rubin, D.B. Statistical Analysis with Missing Data, 3rd ed.; Wiley: Hoboken, NJ, USA, 2019; ISBN 978-1-118-59569-5. [Google Scholar] [CrossRef]
  23. van Buuren, S. Flexible Imputation of Missing Data, 2nd ed.; Chapman and Hall/CRC: New York, NY, USA, 2018. [Google Scholar] [CrossRef]
  24. Azur, M.J.; Stuart, E.A.; Frangakis, C.; Leaf, P.J. Multiple Imputation by Chained Equations: What Is It and How Does It Work? Int. J. Methods Psychiatr. Res. 2011, 20, 40–49. [Google Scholar] [CrossRef]
  25. Raghunathan, T.E.; Lepkowski, J.M.; van Hoewyk, J.; Solenberger, P. A Multivariate Technique for Multiply Imputing Missing Values Using a Sequence of Regression Models. Surv. Methodol. 2001, 27, 85–95. [Google Scholar]
  26. White, I.R.; Royston, P.; Wood, A.M. Multiple Imputation Using Chained Equations: Issues and Guidance for Practice. Stat. Med. 2011, 30, 377–399. [Google Scholar] [CrossRef]
  27. Dai, X.; Jin, L.; Shi, A.; Shi, L. Outlier Detection and Accommodation in General Spatial Models. Stat. Methods Appl. 2016, 25, 453–475. [Google Scholar] [CrossRef]
  28. van Zoest, V.M.; Stein, A.; Hoek, G. Outlier Detection in Urban Air Quality Sensor Networks. Water Air Soil Pollut. 2018, 229, 111. [Google Scholar] [CrossRef]
  29. European Environment Agency. Air Quality Data Validation: Guidance for Monitoring Networks; EEA: Copenhagen, Denmark, 2020. [Google Scholar]
  30. O’Leary, B.; Reiners, J.J.; Xu, X.; Lemke, L.D. Identification and Influence of Spatio-Temporal Outliers in Urban Air Quality Measurements. Sci. Total Environ. 2016, 573, 55–65. [Google Scholar] [CrossRef] [PubMed]
  31. The European Parliament and the Council Parliament of the European Union. Directive 2008/50/EC of the European Parliament and of the Council of 21 May 2008 on ambient air quality and cleaner air for Europe. Off. J. Eur. Union 2008, L 152, 1–44. [Google Scholar]
  32. Gobierno de España. Plan Nacional de Calidad del Aire 2017–2019 (Plan Aire II); Ministerio de Agricultura, Pesca, Alimentación y Medioambiente: Madrid, España, 2017. [Google Scholar]
  33. Gobierno de España. Real Decreto 102/2011, de 28 de enero, relativo a la mejora de la calidad del aire. Boletín Of. Estado 2011, 25, 9574–9626. [Google Scholar]
  34. Ministerio para la Transición Ecológica y el Reto Demográfico Inventario de Instalaciones—Inventario Completo|PRTR España. Available online: https://prtr-es.miteco.gob.es/Informes/InventarioInstalacionesIPPC.aspx (accessed on 26 January 2026).
  35. Golder Associates Dispersion Modelling Guidance: Determining the Need for Industrial PM10 Offsets Under the National Environmental Standards for Air Quality. Available online: https://www.envirolink.govt.nz/assets/Envirolink/1285-HBRC184-Practical-Guidance-on-Dispersion-Modelling-Determining-the-need-for-PM10-offsets-under-the-NES.pdf (accessed on 1 February 2026).
  36. Environmental Protection Agency Air Dispersion Modelling from Industrial Installations Guidance Note (AG4). Available online: https://www.epa.ie/publications/compliance--enforcement/air/air-guidance-notes/EPA-Air-Dispersion-Modelling-Guidance-Note-(AG4)-2020.pdf (accessed on 27 January 2026).
  37. Datos Horarios de Calidad del Aire—Datos Abiertos MITECO. Available online: https://catalogo.datosabiertos.miteco.gob.es/catalogo/dataset/19458583-9953-4fe7-a494-e2cc26e89e58 (accessed on 26 January 2026).
  38. Gobierno de España. Real Decreto 34/2023, de 24 de Enero, por el que se modifican el Real Decreto 102/2011, de 28 de Enero, relativo a la mejora de la calidad del aire; el Reglamento de emisiones industriales y de desarrollo de la Ley 16/2002, de 1 de Julio, de prevención y control integrados de la contaminación, aprobado mediante el Real Decreto 815/2013, de 18 de Octubre; y el Real Decreto 208/2022, de 22 de Marzo, sobre las garantías financieras en materia de residuos. Boletín Of. Estado 2023, 21, 10326–10348. [Google Scholar]
  39. Quinteros, M.E.; Lu, S.; Blazquez, C.; Cárdenas-R, J.P.; Ossa, X.; Delgado-Saborit, J.-M.; Harrison, R.M.; Ruiz-Rudolph, P. Use of Data Imputation Tools to Reconstruct Incomplete Air Quality Datasets: A Case-Study in Temuco, Chile. Atmos. Environ. 2019, 200, 40–49. [Google Scholar] [CrossRef]
  40. Lu, P.; Deng, S.; Li, G.; Tuheti, A.; Liu, J. Regional Transport of PM2.5 from Coal-Fired Power Plants in the Fenwei Plain, China. Int. J. Environ. Res. Public Health 2023, 20, 2170. [Google Scholar] [CrossRef]
  41. Alsaber, A.R.; Pan, J.; Al-Hurban, A. Handling Complex Missing Data Using Random Forest Approach for an Air Quality Monitoring Dataset: A Case Study of Kuwait Environmental Data (2012 to 2018). Int. J. Environ. Res. Public Health 2021, 18, 1333. [Google Scholar] [CrossRef]
  42. Rodríguez-Barranco, M.; Tobías, A.; Redondo, D.; Molina-Portillo, E.; Sánchez, M.J. Standardizing Effect Size from Linear Regression Models with Log-Transformed Variables for Meta-Analysis. BMC Med. Res. Methodol. 2017, 17, 44. [Google Scholar] [CrossRef]
  43. Sterne, J.A.C.; White, I.R.; Carlin, J.B.; Spratt, M.; Royston, P.; Kenward, M.G.; Wood, A.M.; Carpenter, J.R. Multiple Imputation for Missing Data in Epidemiological and Clinical Research: Potential and Pitfalls. BMJ 2009, 338, b2393. [Google Scholar] [CrossRef] [PubMed]
  44. Duan, N. Smearing Estimate: A Nonparametric Retransformation Method. J. Am. Stat. Assoc. 1983, 78, 605–610. [Google Scholar] [CrossRef]
  45. Pearson, R.K. Outliers in Process Modeling and Identification. IEEE Trans. Control Syst. Technol. 2002, 10, 55–63. [Google Scholar] [CrossRef]
  46. Osborne, J.W.; Overbay, A. The power of outliers (and why researchers should ALWAYS check for them). Pract. Assess. Res. Eval. 2004, 9, 6. [Google Scholar] [CrossRef]
  47. Agathokleous, E.; Xu, T.; Yu, L. Outlier Management in Data Analysis: A Checklist for Authors and Reviewers. J. For. Res. 2025, 37, 28. [Google Scholar] [CrossRef]
  48. Čampulová, M.; Čampula, R.; Holešovský, J. An R Package for Identification of Outliers in Environmental Time Series Data. Environ. Model. Softw. 2022, 155, 105435. [Google Scholar] [CrossRef]
  49. Tukey, J.W. Exploratory Data Analysis; Addison-Wesley: Reading, MA, USA, 1977. [Google Scholar]
  50. Sancho Val, J.; Hernando, C.C.; de Baños, L.M. Functional Data Analysis of Air Quality Time Series in Madrid Using FPCA and Splines. Atmos. Environ. 2026, 367, 121741. [Google Scholar] [CrossRef]
  51. Zuur, A.F.; Ieno, E.N.; Elphick, C.S. A Protocol for Data Exploration to Avoid Common Statistical Problems. Methods Ecol. Evol. 2010, 1, 3–14. [Google Scholar] [CrossRef]
  52. Hampel, F.R.; Ronchetti, E.M.; Rousseeuw, P.J.; Stahel, W.A. Robust Statistics: The Approach Based on Influence Functions; Wiley: Hoboken, NJ, USA, 1986. [Google Scholar] [CrossRef]
  53. Roos-Hoefgeest Toribio, M.; Garnung Menéndez, A.; Roos-Hoefgeest Toribio, S.; Álvarez García, I. A Novel Approach to Speed Up Hampel Filter for Outlier Detection. Sensors 2025, 25, 3319. [Google Scholar] [CrossRef]
  54. Arat, M.M. Detection of Anomalous Nitrogen Dioxide (NO2) Concentration of A District in Ankara: A Reconstruction-Based Approach. J. Polytech. 2025, 28, 101. [Google Scholar] [CrossRef]
  55. Lagler, F.; Belis, C.; Borowiak, A. A Quality Assurance and Control Program for PM2.5 and PM10 Measurements in European Air Quality Monitoring Networks; Publications Office of the European Union: Luxembourg, 2011; JRC65176, EUR 24851 EN. [Google Scholar] [CrossRef]
  56. Alastuey, A.; Minguillón, M.C.; Pérez, N.; Querol, X.; Viana, M.; Leeuw, F. PM10 Measurement Methods and Correction Factors: 2009 Status Report; European Topic Centre on Air Pollution and Climate Change Mitigation (ETC/ACM): Bilthoven, The Netherlands, 2011. [Google Scholar]
  57. Aggarwal, S.G.; Kumar, S.; Mandal, P.; Sarangi, B.; Singh, K.; Pokhariyal, J.; Mishra, S.K.; Agarwal, S.; Sinha, D.; Singh, S.; et al. Traceability Issue in PM2.5 and PM10 Measurements. MAPAN 2013, 28, 153–166. [Google Scholar] [CrossRef]
  58. Kuhlbusch, T.A.J.; Quincey, P.; Fuller, G.W.; Kelly, F.; Mudway, I.; Viana, M.; Querol, X.; Alastuey, A.; Katsouyanni, K.; Weijers, E.; et al. New Directions: The Future of European Urban Air Quality Monitoring. Atmos. Environ. 2014, 87, 258–260. [Google Scholar] [CrossRef]
  59. Benschop, N.D.; Zewotir, T.; Naidoo, R.N.; North, D. A New Data-Standardization Procedure for Comprehensive Outlier Detection in Correlated Meteorological Sensor Data. Adv. Stat. Climatol. Meteorol. Oceanogr. 2025, 11, 133–158. [Google Scholar] [CrossRef]
  60. European Environment Agency. Air Quality in Europe: 2020 Report. Available online: https://data.europa.eu/doi/10.2800/786656 (accessed on 6 January 2026).
  61. European Environment Agency. Air Quality in Europe—2019 Report; EEA Report No. 10/2019; Publications Office of the European Union: Luxembourg, 2019; ISBN 978-92-9480-088-6. [Google Scholar] [CrossRef]
  62. Monks, P.S.; Archibald, A.T.; Colette, A.; Cooper, O.; Coyle, M.; Derwent, R.; Fowler, D.; Granier, C.; Law, K.S.; Mills, G.E.; et al. Tropospheric Ozone and Its Precursors from the Urban to the Global Scale from Air Quality to Short-Lived Climate Forcer. Atmos. Chem. Phys. 2015, 15, 8889–8973. [Google Scholar] [CrossRef]
  63. Chen, Y.; Ma, Q.; Lin, W.; Xu, X.; Yao, J.; Gao, W. Measurement Report: Long-Term Variations in Carbon Monoxide at a Background Station in China’s Yangtze River Delta Region. Atmos. Chem. Phys. 2020, 20, 15969–15982. [Google Scholar] [CrossRef]
Figure 1. Spatial selection of the study network in NW Spain: 10 coal-fired power plants (CT) with 10 km buffers and 28 regulatory monitoring stations (daily data, 2006–2023). Stations are labeled by ID_MAPA and cross-referenced in Table 2; overlapping buffers imply N_CT_10 km > 1 for some stations. Map produced in QGIS Desktop 3.40.12. Basemap: Bing Maps.
Figure 1. Spatial selection of the study network in NW Spain: 10 coal-fired power plants (CT) with 10 km buffers and 28 regulatory monitoring stations (daily data, 2006–2023). Stations are labeled by ID_MAPA and cross-referenced in Table 2; overlapping buffers imply N_CT_10 km > 1 for some stations. Map produced in QGIS Desktop 3.40.12. Basemap: Bing Maps.
Applsci 16 03396 g001
Figure 3. Phase B of the data-processing workflow: robust outlier screening, consistency checks, contextual regulatory tagging, and final decision logic for the construction of VALOR_robusto. The workflow integrates global IQR-based screening, local Hampel diagnosis, conservative winsorization, NO–NO2–NOx and PM2.5–PM10 consistency checks, external plausibility indicators, and contextual regulatory flags, while preserving full traceability of observed values, flags, exceptions, and final decisions.
Figure 3. Phase B of the data-processing workflow: robust outlier screening, consistency checks, contextual regulatory tagging, and final decision logic for the construction of VALOR_robusto. The workflow integrates global IQR-based screening, local Hampel diagnosis, conservative winsorization, NO–NO2–NOx and PM2.5–PM10 consistency checks, external plausibility indicators, and contextual regulatory flags, while preserving full traceability of observed values, flags, exceptions, and final decisions.
Applsci 16 03396 g003
Figure 4. Pre-imputation data completeness by power plant. Bars show weighted missingness (%Missing_w; Equation (30)).
Figure 4. Pre-imputation data completeness by power plant. Bars show weighted missingness (%Missing_w; Equation (30)).
Applsci 16 03396 g004
Figure 5. Overall imputation quality score by power plant and pollutant (5% hold-out validation). Cells show the four-level score (1 = Low, 4 = Excellent) assigned using the primary R2 criterion in Table 3.
Figure 5. Overall imputation quality score by power plant and pollutant (5% hold-out validation). Cells show the four-level score (1 = Low, 4 = Excellent) assigned using the primary R2 criterion in Table 3.
Applsci 16 03396 g005
Figure 6. Station-level visual plausibility checks of MICE imputations (example). Colored lines represent the observed daily values for the pollutant shown in each panel, whereas red crosses denote the corresponding MICE-imputed values. The figure shows the CT_VELILLA single-station example, including low-scoring pollutants (NO2 and PM10), over the available station record (2015–2023). Plant-level validation is reported in Table 10 and summarized in Figure 5.
Figure 6. Station-level visual plausibility checks of MICE imputations (example). Colored lines represent the observed daily values for the pollutant shown in each panel, whereas red crosses denote the corresponding MICE-imputed values. The figure shows the CT_VELILLA single-station example, including low-scoring pollutants (NO2 and PM10), over the available station record (2015–2023). Plant-level validation is reported in Table 10 and summarized in Figure 5.
Applsci 16 03396 g006
Figure 7. IQR screening example for CT_VELILLA shown over the available station record (daily, 2015–2023). Panels show O3, PM10 and NO; red crosses indicate IQR outliers, and horizontal lines show Q1/Q3 and the IQR bounds.
Figure 7. IQR screening example for CT_VELILLA shown over the available station record (daily, 2015–2023). Panels show O3, PM10 and NO; red crosses indicate IQR outliers, and horizontal lines show Q1/Q3 and the IQR bounds.
Applsci 16 03396 g007
Figure 8. Hampel screening example for CT_VELILLA (daily, 2015–2023). Red crosses indicate Hampel extremes (|z| > 6); the black line is the 31-day rolling median and dashed lines show ±6·1.4826·MAD thresholds.
Figure 8. Hampel screening example for CT_VELILLA (daily, 2015–2023). Red crosses indicate Hampel extremes (|z| > 6); the black line is the 31-day rolling median and dashed lines show ±6·1.4826·MAD thresholds.
Applsci 16 03396 g008
Figure 9. CT_VELILLA (34080004)—Original vs. robust daily series (changes only) for PM10, NO, and O3 (2015–2023). Colored lines show the original daily series; black dots mark original values on days modified in Phase B, and red crosses show the corresponding robust values after winsorization.
Figure 9. CT_VELILLA (34080004)—Original vs. robust daily series (changes only) for PM10, NO, and O3 (2015–2023). Colored lines show the original daily series; black dots mark original values on days modified in Phase B, and red crosses show the corresponding robust values after winsorization.
Applsci 16 03396 g009
Table 1. Coal-related facilities considered in the study and number of monitoring stations within 10 km.
Table 1. Coal-related facilities considered in the study and number of monitoring stations within 10 km.
CT_IDRegion (CCAA)N Station ≤ 10 kmIncluded
CT_AS_PONTESGalicia2Yes
CT_SABONGalicia7Yes
CT_MEIRAMAGalicia2Yes
CT_COMPOSTILLACastilla y León1Yes
CT_LA_ROBLACastilla y León2Yes
CT_VELILLACastilla y León1Yes
CT_SOTO_RIBERAAsturias4Yes
CT_LA_PEREDAAsturias1Yes
CT_LADAAsturias4Yes
CT_ABONOAsturias4Yes
CT_ANLLARESCastilla y León0 *No
CT_NARCEAAsturias0 *No
* Facilities with N stations ≤ 10 km = 0 were excluded from the analytical dataset.
Table 2. Regulatory monitoring stations selected within 10 km of the included coal-related facilities (daily data, 2006–2023).
Table 2. Regulatory monitoring stations selected within 10 km of the included coal-related facilities (daily data, 2006–2023).
ID_MAPACOD_LOCALStation TypeArea TypeCT_IDDIST_CT_kmN_CT_10 km *
115005011IndustrialRuralCT_SABON2.311
215005012IndustrialSuburbanCT_SABON3.341
315041001IndustrialRuralCT_SABON9.201
415030021IndustrialUrbanCT_SABON6.591
515030027BackgroundSuburbanCT_SABON9.301
615030028IndustrialSuburbanCT_SABON7.201
715030001TrafficUrbanCT_SABON7.571
815059004IndustrialRuralCT_MEIRAMA7.571
915024001IndustrialSuburbanCT_MEIRAMA5.341
1015070010IndustrialRuralCT_AS_PONTES4.541
1115070002IndustrialSuburbanCT_AS_PONTES1.901
1224115015IndustrialSuburbanCT_COMPOSTILLA7.751
1324134007IndustrialRuralCT_LA_ROBLA1.581
1424134006IndustrialSuburbanCT_LA_ROBLA1.471
1534080004IndustrialUrbanCT_VELILLA2.761
1633044033IndustrialSuburbanCT_SOTO_RIBERA8.561
1733044029TrafficUrbanCT_SOTO_RIBERA5.131
1833044030TrafficUrbanCT_SOTO_RIBERA6.881
1933044032BackgroundUrbanCT_SOTO_RIBERA6.691
2033031032BackgroundUrbanCT_LADA2.201
2133031030IndustrialUrbanCT_LADA0.711
2233031029IndustrialSuburbanCT_LADA0.652 *
2333060003BackgroundSuburbanCT_LADA9.051
2433037012TrafficUrbanCT_LA_PEREDA3.292 *
2533024032BackgroundSuburbanCT_ABONO4.311
2633024031BackgroundUrbanCT_ABONO5.861
2733024027TrafficUrbanCT_ABONO6.451
2833024025TrafficUrbanCT_ABONO4.761
* N_CT_10 km indicates the number of facilities whose 10 km buffer includes the station; N_CT_10 km = 2 reflects buffer overlap.
Table 4. Measurement technique codes used to assess PM2.5–PM10 comparability, grouped into PM_MASS and SCATTERING families for coherence checks.
Table 4. Measurement technique codes used to assess PM2.5–PM10 comparability, grouped into PM_MASS and SCATTERING families for coherence checks.
IDTechniqueProposed FamilyUse
46Differential Optical/Optical ScatteringSCATTERINGPM (surrogate, not mass)
47Oscillating Microbalance (TEOM)TEOM → PM_MASSPM mass
49Beta Attenuation Monitor (BAM)BAM → PM_MASSPM mass
50Gravimetry (filter)GRAV → PM_MASSPM mass
54NephelometrySCATTERINGPM (surrogate, not mass)
MManual (gravim.)GRAV → PM_MASSPM mass
Table 5. External plausibility reference thresholds (EU reports) used as contextual flags (not as exclusion criteria).
Table 5. External plausibility reference thresholds (EU reports) used as contextual flags (not as exclusion criteria).
PollutantAveraging Period/StatisticPlausible High ReferenceUnit
PM10Annual p90.4 of daily mean (36th highest)>75µg·m−3
NO2Annual mean>100µg·m−3
PM2.5Annual mean>30µg·m−3
O3p93.2 of daily maximum 8 h mean>160µg·m−3
SO2Alert threshold (3 consecutive hours)500µg·m−3
CODaily maximum 8 h running mean>15mg·m−3
Values from EEA Air Quality in Europe reports [60,61].
Table 6. Regulatory reference values used for daily contextual flags (not formal compliance).
Table 6. Regulatory reference values used for daily contextual flags (not formal compliance).
PollutantRegulatory Reference (Statistic)ThresholdUnitEvaluable Contextual Flag
PM10Daily limit value (24 h mean)50µg·m−3Yesexceso_normativo_diario
SO2Daily limit value (24 h mean)125µg·m−3Yesexceso_normativo_diario
NO2Annual limit value (annual mean)40µg·m−3Nonormativa_no_evaluable_diario
NO2Hourly limit value (1 h)200µg·m−3Nonormativa_no_evaluable_diario
O3Target value (daily maximum of 8 h running mean)120µg·m−3No *normativa_no_evaluable_diario
O3Information threshold (1 h)180µg·m−3Nonormativa_no_evaluable_diario
O3Alert threshold (1 h)240µg·m−3Nonormativa_no_evaluable_diario
COLimit value (daily maximum of 8 h running mean)10mg·m−3Nonormativa_no_evaluable_diario
SO2Alert threshold (3 h)500µg·m−3No *normativa_no_evaluable_diario
NO2Alert threshold (3 h)400µg·m−3Nonormativa_no_evaluable_diario
* Not evaluable when only 24 h daily means are available (the legal criterion requires hourly data and/or 8 h running means). These fields are used as contextual daily tags and do not constitute a formal compliance assessment.
Table 7. Priority decision rules to derive VALOR_robusto from daily records (KEEP/KEEP_EXTREMO/DROP_NAN).
Table 7. Priority decision rules to derive VALOR_robusto from daily records (KEEP/KEEP_EXTREMO/DROP_NAN).
PriorityMinimum Trigger (Condition)DECISIONFinal
Value
Robust
Value
1Negative or physically impossible value (xt < 0)DROP_NANNaNNaN
2NOx inconsistency
(Equation (28))
DROP_NANNaNNaN
3PM inconsistency (if applicable *): (Equation (29))DROP_NANNaNNaN
4Hampel extreme with ∣zt∣ > 6 and any inconsistency
(NOx or PM)
DROP_NANNaNNaN
5Hampel with (|zt| > 6) and no inconsistencies (NOx or PM)KEEP_EXTREMOvalor_
winsor
valor_
winsor
6IQR-only outlier: flag_IQR = True and ∣zt∣ ≤ 6 and no applicable inconsistencies (NOx or PM)KEEPVALORVALOR
7Missing observation: VALOR is NaN (absence preserved)KEEP
(absence preserved)
NaNNaN
8All other cases (non-extreme, no inconsistencies)KEEPVALORVALOR
* PM applicability: the PM2.5–PM10 coherence check is evaluated only when PM10 and PM2.5 share the same PUNTO_MUESTREO or belong to the same PM_MASS subfamily (TEOM/BAM/GRAV). Otherwise, the PM rule is not applied, and an exception flag is stored (e.g., flag_incoherencia_PM_excepcion/excepcion_PM_por_cambio_punto).
Table 8. Auditable field blocks recorded per daily observation: diagnostics, coherence checks, contextual tags, and final decision.
Table 8. Auditable field blocks recorded per daily observation: diagnostics, coherence checks, contextual tags, and final decision.
BlockFields (Examples)Operational Purpose
Outlier diagnosticsflag_IQR, z_Hampel,
is_extremo_Hampel
Identify outlier candidates/extremes using robust criteria (global and local).
Internal coherence checksincoherencia_NOx,
incoherencia_PM,
flag_incoherencia_PM_excepcion
Detect physico-chemical/hierarchical inconsistencies and document non-applicability due to instrumental comparability constraints.
Contextual taggingexceso_normativo_diario,
normativa_no_evaluable_diario, FLAG_PLAUS_ *
Tag regulatory context and external plausibility; not used as an automatic exclusion rule.
Final decision and outputdecision_final,
razon_final,
valor_winsor, VALOR_robusto
Record the audited decision and the resulting value used in the analysis.
* FLAG_PLAUS_ denotes pollutant-specific annual plausibility flags.
Table 9. Global pre-imputation missingness by pollutant (daily data, 2006–2023). Percentiles (P25, P75) are computed across station–parameter series.
Table 9. Global pre-imputation missingness by pollutant (daily data, 2006–2023). Percentiles (P25, P75) are computed across station–parameter series.
PollutantN_
Series
Ttal_
Days
Missing_
Days
%_Median_
Missing
%_P25_Missing%_P75_
Missing
%_Max_
Missing
%_Weighted_
Missing
PM2.51751,13794235.454.2332.2470.5518.43
PM103096,09161054.723.688.4653.776.35
CO1557,48932754.273.325.7117.795.70
O32184,59043423.933.114.5417.445.13
NO228103,84349903.513.164.2017.904.81
SO227101,31847723.523.174.2917.534.71
NO28100,80346013.603.284.3513.694.56
NOx30115,56348703.522.954.1914.204.21
Table 10. Hold-out validation performance by pollutant… MAE, RMSE, R2, and Bias are computed on masked observations; fit-quality classes follow the primary R2-based criterion in Table 3.
Table 10. Hold-out validation performance by pollutant… MAE, RMSE, R2, and Bias are computed on masked observations; fit-quality classes follow the primary R2-based criterion in Table 3.
PollutantN Validation PairsMAERMSER2BiasOverall Fit Quality
NOx53577.1613.80.691−1.79Excellent (4)
SO254451.824.090.533−0.48Acceptable (2)
NO54102.626.380.614−0.59Good (3)
NO250743.685.560.691−0.69Excellent (4)
CO28060.050.090.7560.00Excellent (4)
O344178.0010.580.656−0.67Excellent (4)
PM1049044.256.460.601−0.76Good (3)
PM2.517602.363.480.605−0.37Good (3)
Table 11. IQR-flagged outlier rates by pollutant and coal power plant (10 km buffer). For each pollutant and plant, the outlier rate is reported as the percentage of observed daily records (%). The table also reports N o u t (number of IQR-flagged outliers) and N o b s (number of observed daily records); percentages are computed as 100 N o u t / N o b s . Plant codes: ABN = CT_ABONO; ASP = CT_AS_PONTES; COM = CT_COMPOSTILLA; LAD = CT_LADA; MEI = CT_MEIRAMA; PER = CT_LA_PEREDA; ROB = CT_LA_ROBLA; SAB = CT_SABON; SOR = CT_SOTO_RIBERA; VEL = CT_VELILLA. “—” indicates that the pollutant was not monitored for that plant.
Table 11. IQR-flagged outlier rates by pollutant and coal power plant (10 km buffer). For each pollutant and plant, the outlier rate is reported as the percentage of observed daily records (%). The table also reports N o u t (number of IQR-flagged outliers) and N o b s (number of observed daily records); percentages are computed as 100 N o u t / N o b s . Plant codes: ABN = CT_ABONO; ASP = CT_AS_PONTES; COM = CT_COMPOSTILLA; LAD = CT_LADA; MEI = CT_MEIRAMA; PER = CT_LA_PEREDA; ROB = CT_LA_ROBLA; SAB = CT_SABON; SOR = CT_SOTO_RIBERA; VEL = CT_VELILLA. “—” indicates that the pollutant was not monitored for that plant.
PollutantABNASPCOMLADMEIPERROBSABSORVEL
CO0.660.020.300.188.570.17
N_out521332143930
N_obs7942602810,9571096167,9917,864
NO3.881.205.613.912.343.791.494.243.141.40
N_out49212316464213724949147856146
N_obs12,69010,225292216,43558446574328734,84617,8953287
NO20.061.101.100.260.290.030.550.390.130.37
N_out61163242172221452412
N_obs949510,5912922164,3558446574401737,28317,8953287
NOx1.681.311.861.430.601.261.191.271.651.13
N_out2131335223535833943829633
N_obs12,69010,169280216,43558446574328734,54017,8952922
O30.000.020.000.000.000.000.000.000.000.00
N_out0100000000
N_obs97696574292216,40532876544328723,28117,8953287
PM100.552.571.400.501.330.341.520.730.270.78
N_out709341763422632414926
N_obs12,6903622292215,28025576574413932,84617,8953318
PM2.50.121.540.540.500.28
N_out1191294717
N_obs94955905541993436150
SO20.825.513.630.252.760.370.273.412.070.40
N_out80584106411612411127637113
N_obs976910,591292216,43558446574401737,46317,8953259
Table 12. Hampel extreme-value rates by pollutant and coal power plant (10 km buffer). Hampel extremes are defined as days with z t > 6 (Section 2.7.3) and are reported as a percentage of observed daily records (%). For each plant–pollutant combination, the table also reports N o u t (number of Hampel extremes) and N o b s (number of observed daily records); percentages are computed as 100 N o u t / N o b s . Plant codes (see Table 11 for full names): ABN, ASP, COM, LAD, MEI, PER, ROB, SAB, SOR, VEL. “—“ indicates that the pollutant was not monitored for that plant.
Table 12. Hampel extreme-value rates by pollutant and coal power plant (10 km buffer). Hampel extremes are defined as days with z t > 6 (Section 2.7.3) and are reported as a percentage of observed daily records (%). For each plant–pollutant combination, the table also reports N o u t (number of Hampel extremes) and N o b s (number of observed daily records); percentages are computed as 100 N o u t / N o b s . Plant codes (see Table 11 for full names): ABN, ASP, COM, LAD, MEI, PER, ROB, SAB, SOR, VEL. “—“ indicates that the pollutant was not monitored for that plant.
PollutantABNASPCOMLADMEIPERROBSABSORVEL
CO0.310.500.520.090.990.37
N_out273067118568
N_obs7942602810,957109616,79917,864
NO1.821.431.031.051.140.941.733.751.170.24
N_out2331433020668625713282248
N_obs12,69010,225292219,59958446574328734,84617,8953287
NO20.091.120.100.180.150.030.240.200.090.03
N_out94533492982141
N_obs94954017292219,59958446574401737,28317,8953287
NOx0.210.470.140.390.130.300.120.540.260.07
N_out27424758204211492
N_obs12,69010,169280219,56858446574328734,54017,8952922
O30.000.000.070.050.030.000.030.040.050.03
N_out002101011071
N_obs97696574292219,56932876544328723,28117,8953287
PM100.390.721.060.460.740.590.440.470.281.39
N_out492631871939261524646
N_obs12,6903622292218,29125576574413932,84617,8953318
PM2.50.181.350.590.780.16
N_out1780514911
N_obs94955905843093436150
SO21.012.111.200.833.280.700.874.130.820.15
N_out9924335162186464015301595
N_obs976910,591292219,59958446574401737,46317,8953259
Table 13. Phase B final decision outcomes by pollutant (daily records). Counts and percentages of KEEP, KEEP_EXTREMO (winsorized), and DROP_NAN after applying the priority decision rule.
Table 13. Phase B final decision outcomes by pollutant (daily records). Counts and percentages of KEEP, KEEP_EXTREMO (winsorized), and DROP_NAN after applying the priority decision rule.
PollutantN (Daily Records)KEEP_EXTREMO (%)DROP_NAN (%)
CO23,9550.61780.0918
NO46,2582.03860.3394
NO243,7420.19430.3566
NOx45,8780.39450.3422
O338,0160.02630.4051
PM1041,3430.41120.3096
PM2.515,5040.41280.2838
SO246,5882.0220.3349
Note: KEEP (%) = 100 − KEEP_EXTREMO − DROP_NAN.
Table 14. Phase B decision summary for 34080004_CT_VELILLA (daily records, 2015–2023).
Table 14. Phase B decision summary for 34080004_CT_VELILLA (daily records, 2015–2023).
PollutantN KEEP_
EXTREMO_N
KEEP_
EXTREMO_pct
DROP_
NAN_N
DROP_
NAN_pct
KEEP
_N
KEEP
_pct
PM103318421.2700327698.73
NOx292230.1000291999.90
NO2328710.0300328699.97
NO328750.1500328299.85
O3328700.00003287100.00
SO2325960.1800325399.82
Table 15. Positive context flag for 34080004_CT_VELILLA: flag_PM10_gt_75 (PM10 > 75; days only).
Table 15. Positive context flag for 34080004_CT_VELILLA: flag_PM10_gt_75 (PM10 > 75; days only).
PUNTO_MUESTREODateVALORVALOR_robustoflag_PM10_gt_75Decision
34080004_10_4923 February 201712144.6868TRUEKEEP_EXTREMO
34080004_10_4927 February 20208528.7912TRUEKEEP_EXTREMO
34080004_10_4915 March 202240474.711TRUEKEEP_EXTREMO
34080004_10_4916 March 202215658.478TRUEKEEP_EXTREMO
34080004_10_495 October 20228150.5824TRUEKEEP_EXTREMO
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fernández Palomares, N.; Álvarez de Prado, L.; Menéndez García, L.A.; Fernández López, D.; Buján, S.; Bernardo Sánchez, A. A Reproducible QA/QC, Imputation and Robust-Series Workflow for Air-Quality Monitoring Time Series. Appl. Sci. 2026, 16, 3396. https://doi.org/10.3390/app16073396

AMA Style

Fernández Palomares N, Álvarez de Prado L, Menéndez García LA, Fernández López D, Buján S, Bernardo Sánchez A. A Reproducible QA/QC, Imputation and Robust-Series Workflow for Air-Quality Monitoring Time Series. Applied Sciences. 2026; 16(7):3396. https://doi.org/10.3390/app16073396

Chicago/Turabian Style

Fernández Palomares, Nuria, Laura Álvarez de Prado, Luis Alfonso Menéndez García, David Fernández López, Sandra Buján, and Antonio Bernardo Sánchez. 2026. "A Reproducible QA/QC, Imputation and Robust-Series Workflow for Air-Quality Monitoring Time Series" Applied Sciences 16, no. 7: 3396. https://doi.org/10.3390/app16073396

APA Style

Fernández Palomares, N., Álvarez de Prado, L., Menéndez García, L. A., Fernández López, D., Buján, S., & Bernardo Sánchez, A. (2026). A Reproducible QA/QC, Imputation and Robust-Series Workflow for Air-Quality Monitoring Time Series. Applied Sciences, 16(7), 3396. https://doi.org/10.3390/app16073396

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop