A Reproducible QA/QC, Imputation and Robust-Series Workflow for Air-Quality Monitoring Time Series
Abstract
1. Introduction
2. Materials and Methods
2.1. Station Selection and Spatial Assignment to Power Plants
2.2. Data Source and Database Structure
2.3. Input Structure and Temporal Harmonization
2.4. Pre-Imputation QA/QC
2.5. Imputation Validation Design (Hold-Out)
| Fit-Quality Category | R2 Criterion (Primary) |
|---|---|
| Excellent (4) | R2 ≥ 0.65 |
| Good (3) | 0.55 ≤ R2 < 0.65 |
| Acceptable (2) | 0.45 ≤ R2 < 0.55 |
| Low (1) | R2 < 0.45 |

2.6. Imputation of Missing Values Using a Chained-Equation Iterative Scheme (Bayesian Ridge)
2.6.1. Scope and Imputation Unit: (Plant, Pollutant)
2.6.2. Preprocessing and Workflow Constraints
2.6.3. Imputer Specification (Bayesian Ridge Regression)
2.6.4. Scale Transformations (Sensitivity Test)
2.6.5. Post-Imputation Controls and Traceability
2.7. Detection and Conservative Treatment of Extreme Values (Outliers) and Construction of Robust Series
2.7.1. Unit of Application and Operational Principle
2.7.2. Robust Global Detection of Candidates ()
2.7.3. Local Diagnosis and Conservative Treatment (Hampel + Winsorization)
2.7.4. Physicochemical and Hierarchical Consistency Checks (NO–NO2–NOx/PM)
- 1.
- Non-negativity (physical–metrological control)
- 2.
- Algebraic–operational consistency of the NO–NO2–NOx system
- 3.
- Hierarchical consistency of particulate-matter fractions PM2.5–PM10
- 4.
- Applicability of PM consistency and exceptions due to instrumental non-comparability
2.7.5. External Plausibility and Regulatory Tagging (Contextual Flags)
- 1.
- External plausibility (EU reports/annual thresholds)
- Minimum coverage: ≥ 75% of valid days in the year. If not reached, the metric is coded as NA and the information on (N_valid_days and coverage) is retained.
- PM10: the annual high regime was computed as p90.4 of daily means, implemented conservatively as the 36th-highest value (no interpolation). FLAG_PLAUS_PM10_P90_4_GT_75 = True is triggered if p90.4 > 75 µg·m−3.
- NO2: the annual mean of daily values was computed. FLAG_PLAUS_NO2_MEAN_GT_100 = True is triggered if the annual mean > 100 µg·m−3.
- PM2.5: the annual mean of daily values was computed. FLAG_PLAUS_PM25_MEAN_GT_30 = True is triggered if the annual mean > 30 µg·m−3.
- O3: the usual reference threshold is defined on the daily maximum of 8 h running means; when only daily means are available, this control is considered non-operational and is documented as NA.
- 2.
- Daily regulatory tagging (context): traceability and output products
- exceso_normativo_diario: indicates that the daily value exceeds a reference threshold applied at daily scale (when interpretable).
- normativa_no_evaluable_diario: indicates that evaluation is not applicable at daily scale (e.g., criteria defined on percentiles or annual averages).
2.8. Coherence Checks and Final Decision Logic to Construct VALOR_robusto
2.8.1. Final Decision Rule and Operational Definition of VALOR_robusto
2.8.2. Output Products and QA/QC Control Plots
3. Results
3.1. Effective Coverage and Preprocessing Outcome (QA/QC) Prior to Imputation
3.2. Hold-Out Validation: Imputation Performance (Phase A)
3.3. Outlier Screening and Construction of the Robust Series (Phase B)
3.4. Final Decision and Generation of VALOR_robusto (Phase B)
- Winsorization dominates in species with a higher extreme burden: NO (KEEP_EXTREMO = 2.0386%) and SO2 (2.022%), with DROP_NAN ~0.33% in both cases (Table 13).
- A closer balance between winsorization and discarding is observed for PM: PM10 (0.4112% vs. 0.3096%) and PM2.5 (0.4128% vs. 0.2838%) (Table 13).
- Discarding exceeds winsorization in species with residual winsorization: O3 (0.0263% vs. 0.4051%) and, to a lesser extent, NO2 (0.1943% vs. 0.3566%) (Table 13).
3.5. Station-Level Example: Traceability and Contextual Plausibility (34080004_CT_VELILLA)
4. Discussion
Limitations and Transferability
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Gobierno de España Plan Nacional Integrado de Energía y Clima (PNIEC) 2021–2030. Available online: https://www.miteco.gob.es/content/dam/miteco/images/es/pnieccompleto_tcm30-508410.pdf (accessed on 27 January 2026).
- Gobierno de España Plan Nacional Integrado de Energía y Clima (PNIEC): Actualización 2023–2030. Available online: https://www.miteco.gob.es/content/dam/miteco/es/energia/files-1/pniec-2023-2030/PNIEC_2024_240924.pdf (accessed on 27 January 2026).
- Gómez-Carracedo, M.P.; Andrade, J.M.; López-Mahía, P.; Muniategui, S.; Prada, D. A Practical Comparison of Single and Multiple Imputation Methods to Handle Complex Missing Data in Air Quality Datasets. Chemom. Intell. Lab. Syst. 2014, 134, 23–33. [Google Scholar] [CrossRef]
- Junger, W.L.; Ponce de Leon, A. Imputation of Missing Data in Time Series for Air Pollutants. Atmos. Environ. 2015, 102, 96–104. [Google Scholar] [CrossRef]
- Rodríguez, S.; López-Darias, J. Extreme Saharan Dust Events Expand Northward over the Atlantic and Europe, Prompting Record-Breaking PM10 and PM2.5 Episodes. Atmos. Chem. Phys. 2024, 24, 12031–12053. [Google Scholar] [CrossRef]
- Wu, H.; Tang, X.; Wang, Z.; Wu, L.; Lu, M.; Wei, L.; Zhu, J. Probabilistic Automatic Outlier Detection for Surface Air Quality Measurements from the China National Environmental Monitoring Network. Adv. Atmos. Sci. 2018, 35, 1522–1532. [Google Scholar] [CrossRef]
- Hadeed, S.J.; O’Rourke, M.K.; Burgess, J.L.; Harris, R.B.; Canales, R.A. Imputation Methods for Addressing Missing Data in Short-Term Monitoring of Air Pollutants. Sci. Total Environ. 2020, 730, 139140. [Google Scholar] [CrossRef] [PubMed]
- Chen, M.; Zhu, H.; Chen, Y.; Wang, Y. A Novel Missing Data Imputation Approach for Time Series Air Quality Data Based on Logistic Regression. Atmosphere 2022, 13, 1044. [Google Scholar] [CrossRef]
- Hua, V.; Nguyen, T.; Dao, M.-S.; Nguyen, H.D.; Nguyen, B.T. The Impact of Data Imputation on Air Quality Prediction Problem. PLoS ONE 2024, 19, e0306303. [Google Scholar] [CrossRef]
- Junninen, H.; Niska, H.; Tuppurainen, K.; Ruuskanen, J.; Kolehmainen, M. Methods for Imputation of Missing Values in Air Quality Data Sets. Atmos. Environ. 2004, 38, 2895–2907. [Google Scholar] [CrossRef]
- Menéndez García, L.A.; Menéndez Fernández, M.; Sokoła-Szewioła, V.; Álvarez de Prado, L.; Ortiz Marqués, A.; Fernández López, D.; Bernardo Sánchez, A. A Method of Pruning and Random Replacing of Known Values for Comparing Missing Data Imputation Models for Incomplete Air Quality Time Series. Appl. Sci. 2022, 12, 6465. [Google Scholar] [CrossRef]
- Zimek, A.; Filzmoser, P. There and Back Again: Outlier Detection between Statistical Reasoning and Data Mining Algorithms. WIREs Data Min. Knowl. Discov. 2018, 8, e1280. [Google Scholar] [CrossRef]
- Schmidt, L.; Schäfer, D.; Geller, J.; Lünenschloss, P.; Palm, B.; Rinke, K.; Rebmann, C.; Rode, M.; Bumberger, J. System for Automated Quality Control (SaQC) to Enable Traceable and Reproducible Data Streams in Environmental Science. Environ. Model. Softw. 2023, 169, 105809. [Google Scholar] [CrossRef]
- European Environment Agency. Air Quality E-Reporting Submission Procedures for Reporting to Eionet CDR. Available online: https://www.eionet.europa.eu/aqportal/doc/AQ_IPR_submission_procedure_2018.pdf (accessed on 27 January 2026).
- European Commission. 2011/850/EU: Commission Implementing Decision of 12 December 2011 laying down rules for Directives 2004/107/EC and 2008/50/EC of the European Parliament and of the Council as regards the reciprocal exchange of information and reporting on ambient air quality (notified under document C(2011) 9068). Off. J. Eur. Union 2011, L 335, 86–106. [Google Scholar]
- European Commission. Directive (EU) 2015/1480 of 28 August 2015 amending several annexes to Directives 2004/107/EC and 2008/50/EC of the European Parliament and of the Council laying down the rules concerning reference methods, data validation and location of sampling points for the assessment of ambient air quality. Off. J. Eur. Union 2015, L 226, 4–11. [Google Scholar]
- Liu, X.; Wang, X.; Zou, L.; Xia, J.; Pang, W. Spatial Imputation for Air Pollutants Data Sets via Low Rank Matrix Completion Algorithm. Environ. Int. 2020, 139, 105713. [Google Scholar] [CrossRef]
- Betancourt, C.; Li, C.W.Y.; Kleinert, F.; Schultz, M.G. Graph Machine Learning for Improved Imputation of Missing Tropospheric Ozone Data. Environ. Sci. Technol. 2023, 57, 18246–18258. [Google Scholar] [CrossRef]
- Willmott, C.J.; Matsuura, K. Advantages of the Mean Absolute Error (MAE) over the Root Mean Square Error (RMSE) in Assessing Average Model Performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
- Reche, C.; Querol, X.; Alastuey, A.; Viana, M.; Pey, J.; Moreno, T.; Rodríguez, S.; González, Y.; Fernández-Camacho, R.; de la Rosa, J.; et al. New Considerations for PM, Black Carbon and Particle Number Concentration for Air Quality Monitoring across Different European Cities. Atmos. Chem. Phys. 2011, 11, 6207–6227. [Google Scholar] [CrossRef]
- World Health Organization. WHO Global Air Quality Guidelines: Particulate Matter (PM2.5 and PM10), Ozone, Nitrogen Dioxide, Sulfur Dioxide and Carbon Monoxide: Executive Summary; World Health Organization: Geneva, Switzerland, 2021; ISBN 978-92-4-003443-3. [Google Scholar]
- Little, R.J.A.; Rubin, D.B. Statistical Analysis with Missing Data, 3rd ed.; Wiley: Hoboken, NJ, USA, 2019; ISBN 978-1-118-59569-5. [Google Scholar] [CrossRef]
- van Buuren, S. Flexible Imputation of Missing Data, 2nd ed.; Chapman and Hall/CRC: New York, NY, USA, 2018. [Google Scholar] [CrossRef]
- Azur, M.J.; Stuart, E.A.; Frangakis, C.; Leaf, P.J. Multiple Imputation by Chained Equations: What Is It and How Does It Work? Int. J. Methods Psychiatr. Res. 2011, 20, 40–49. [Google Scholar] [CrossRef]
- Raghunathan, T.E.; Lepkowski, J.M.; van Hoewyk, J.; Solenberger, P. A Multivariate Technique for Multiply Imputing Missing Values Using a Sequence of Regression Models. Surv. Methodol. 2001, 27, 85–95. [Google Scholar]
- White, I.R.; Royston, P.; Wood, A.M. Multiple Imputation Using Chained Equations: Issues and Guidance for Practice. Stat. Med. 2011, 30, 377–399. [Google Scholar] [CrossRef]
- Dai, X.; Jin, L.; Shi, A.; Shi, L. Outlier Detection and Accommodation in General Spatial Models. Stat. Methods Appl. 2016, 25, 453–475. [Google Scholar] [CrossRef]
- van Zoest, V.M.; Stein, A.; Hoek, G. Outlier Detection in Urban Air Quality Sensor Networks. Water Air Soil Pollut. 2018, 229, 111. [Google Scholar] [CrossRef]
- European Environment Agency. Air Quality Data Validation: Guidance for Monitoring Networks; EEA: Copenhagen, Denmark, 2020. [Google Scholar]
- O’Leary, B.; Reiners, J.J.; Xu, X.; Lemke, L.D. Identification and Influence of Spatio-Temporal Outliers in Urban Air Quality Measurements. Sci. Total Environ. 2016, 573, 55–65. [Google Scholar] [CrossRef] [PubMed]
- The European Parliament and the Council Parliament of the European Union. Directive 2008/50/EC of the European Parliament and of the Council of 21 May 2008 on ambient air quality and cleaner air for Europe. Off. J. Eur. Union 2008, L 152, 1–44. [Google Scholar]
- Gobierno de España. Plan Nacional de Calidad del Aire 2017–2019 (Plan Aire II); Ministerio de Agricultura, Pesca, Alimentación y Medioambiente: Madrid, España, 2017. [Google Scholar]
- Gobierno de España. Real Decreto 102/2011, de 28 de enero, relativo a la mejora de la calidad del aire. Boletín Of. Estado 2011, 25, 9574–9626. [Google Scholar]
- Ministerio para la Transición Ecológica y el Reto Demográfico Inventario de Instalaciones—Inventario Completo|PRTR España. Available online: https://prtr-es.miteco.gob.es/Informes/InventarioInstalacionesIPPC.aspx (accessed on 26 January 2026).
- Golder Associates Dispersion Modelling Guidance: Determining the Need for Industrial PM10 Offsets Under the National Environmental Standards for Air Quality. Available online: https://www.envirolink.govt.nz/assets/Envirolink/1285-HBRC184-Practical-Guidance-on-Dispersion-Modelling-Determining-the-need-for-PM10-offsets-under-the-NES.pdf (accessed on 1 February 2026).
- Environmental Protection Agency Air Dispersion Modelling from Industrial Installations Guidance Note (AG4). Available online: https://www.epa.ie/publications/compliance--enforcement/air/air-guidance-notes/EPA-Air-Dispersion-Modelling-Guidance-Note-(AG4)-2020.pdf (accessed on 27 January 2026).
- Datos Horarios de Calidad del Aire—Datos Abiertos MITECO. Available online: https://catalogo.datosabiertos.miteco.gob.es/catalogo/dataset/19458583-9953-4fe7-a494-e2cc26e89e58 (accessed on 26 January 2026).
- Gobierno de España. Real Decreto 34/2023, de 24 de Enero, por el que se modifican el Real Decreto 102/2011, de 28 de Enero, relativo a la mejora de la calidad del aire; el Reglamento de emisiones industriales y de desarrollo de la Ley 16/2002, de 1 de Julio, de prevención y control integrados de la contaminación, aprobado mediante el Real Decreto 815/2013, de 18 de Octubre; y el Real Decreto 208/2022, de 22 de Marzo, sobre las garantías financieras en materia de residuos. Boletín Of. Estado 2023, 21, 10326–10348. [Google Scholar]
- Quinteros, M.E.; Lu, S.; Blazquez, C.; Cárdenas-R, J.P.; Ossa, X.; Delgado-Saborit, J.-M.; Harrison, R.M.; Ruiz-Rudolph, P. Use of Data Imputation Tools to Reconstruct Incomplete Air Quality Datasets: A Case-Study in Temuco, Chile. Atmos. Environ. 2019, 200, 40–49. [Google Scholar] [CrossRef]
- Lu, P.; Deng, S.; Li, G.; Tuheti, A.; Liu, J. Regional Transport of PM2.5 from Coal-Fired Power Plants in the Fenwei Plain, China. Int. J. Environ. Res. Public Health 2023, 20, 2170. [Google Scholar] [CrossRef]
- Alsaber, A.R.; Pan, J.; Al-Hurban, A. Handling Complex Missing Data Using Random Forest Approach for an Air Quality Monitoring Dataset: A Case Study of Kuwait Environmental Data (2012 to 2018). Int. J. Environ. Res. Public Health 2021, 18, 1333. [Google Scholar] [CrossRef]
- Rodríguez-Barranco, M.; Tobías, A.; Redondo, D.; Molina-Portillo, E.; Sánchez, M.J. Standardizing Effect Size from Linear Regression Models with Log-Transformed Variables for Meta-Analysis. BMC Med. Res. Methodol. 2017, 17, 44. [Google Scholar] [CrossRef]
- Sterne, J.A.C.; White, I.R.; Carlin, J.B.; Spratt, M.; Royston, P.; Kenward, M.G.; Wood, A.M.; Carpenter, J.R. Multiple Imputation for Missing Data in Epidemiological and Clinical Research: Potential and Pitfalls. BMJ 2009, 338, b2393. [Google Scholar] [CrossRef] [PubMed]
- Duan, N. Smearing Estimate: A Nonparametric Retransformation Method. J. Am. Stat. Assoc. 1983, 78, 605–610. [Google Scholar] [CrossRef]
- Pearson, R.K. Outliers in Process Modeling and Identification. IEEE Trans. Control Syst. Technol. 2002, 10, 55–63. [Google Scholar] [CrossRef]
- Osborne, J.W.; Overbay, A. The power of outliers (and why researchers should ALWAYS check for them). Pract. Assess. Res. Eval. 2004, 9, 6. [Google Scholar] [CrossRef]
- Agathokleous, E.; Xu, T.; Yu, L. Outlier Management in Data Analysis: A Checklist for Authors and Reviewers. J. For. Res. 2025, 37, 28. [Google Scholar] [CrossRef]
- Čampulová, M.; Čampula, R.; Holešovský, J. An R Package for Identification of Outliers in Environmental Time Series Data. Environ. Model. Softw. 2022, 155, 105435. [Google Scholar] [CrossRef]
- Tukey, J.W. Exploratory Data Analysis; Addison-Wesley: Reading, MA, USA, 1977. [Google Scholar]
- Sancho Val, J.; Hernando, C.C.; de Baños, L.M. Functional Data Analysis of Air Quality Time Series in Madrid Using FPCA and Splines. Atmos. Environ. 2026, 367, 121741. [Google Scholar] [CrossRef]
- Zuur, A.F.; Ieno, E.N.; Elphick, C.S. A Protocol for Data Exploration to Avoid Common Statistical Problems. Methods Ecol. Evol. 2010, 1, 3–14. [Google Scholar] [CrossRef]
- Hampel, F.R.; Ronchetti, E.M.; Rousseeuw, P.J.; Stahel, W.A. Robust Statistics: The Approach Based on Influence Functions; Wiley: Hoboken, NJ, USA, 1986. [Google Scholar] [CrossRef]
- Roos-Hoefgeest Toribio, M.; Garnung Menéndez, A.; Roos-Hoefgeest Toribio, S.; Álvarez García, I. A Novel Approach to Speed Up Hampel Filter for Outlier Detection. Sensors 2025, 25, 3319. [Google Scholar] [CrossRef]
- Arat, M.M. Detection of Anomalous Nitrogen Dioxide (NO2) Concentration of A District in Ankara: A Reconstruction-Based Approach. J. Polytech. 2025, 28, 101. [Google Scholar] [CrossRef]
- Lagler, F.; Belis, C.; Borowiak, A. A Quality Assurance and Control Program for PM2.5 and PM10 Measurements in European Air Quality Monitoring Networks; Publications Office of the European Union: Luxembourg, 2011; JRC65176, EUR 24851 EN. [Google Scholar] [CrossRef]
- Alastuey, A.; Minguillón, M.C.; Pérez, N.; Querol, X.; Viana, M.; Leeuw, F. PM10 Measurement Methods and Correction Factors: 2009 Status Report; European Topic Centre on Air Pollution and Climate Change Mitigation (ETC/ACM): Bilthoven, The Netherlands, 2011. [Google Scholar]
- Aggarwal, S.G.; Kumar, S.; Mandal, P.; Sarangi, B.; Singh, K.; Pokhariyal, J.; Mishra, S.K.; Agarwal, S.; Sinha, D.; Singh, S.; et al. Traceability Issue in PM2.5 and PM10 Measurements. MAPAN 2013, 28, 153–166. [Google Scholar] [CrossRef]
- Kuhlbusch, T.A.J.; Quincey, P.; Fuller, G.W.; Kelly, F.; Mudway, I.; Viana, M.; Querol, X.; Alastuey, A.; Katsouyanni, K.; Weijers, E.; et al. New Directions: The Future of European Urban Air Quality Monitoring. Atmos. Environ. 2014, 87, 258–260. [Google Scholar] [CrossRef]
- Benschop, N.D.; Zewotir, T.; Naidoo, R.N.; North, D. A New Data-Standardization Procedure for Comprehensive Outlier Detection in Correlated Meteorological Sensor Data. Adv. Stat. Climatol. Meteorol. Oceanogr. 2025, 11, 133–158. [Google Scholar] [CrossRef]
- European Environment Agency. Air Quality in Europe: 2020 Report. Available online: https://data.europa.eu/doi/10.2800/786656 (accessed on 6 January 2026).
- European Environment Agency. Air Quality in Europe—2019 Report; EEA Report No. 10/2019; Publications Office of the European Union: Luxembourg, 2019; ISBN 978-92-9480-088-6. [Google Scholar] [CrossRef]
- Monks, P.S.; Archibald, A.T.; Colette, A.; Cooper, O.; Coyle, M.; Derwent, R.; Fowler, D.; Granier, C.; Law, K.S.; Mills, G.E.; et al. Tropospheric Ozone and Its Precursors from the Urban to the Global Scale from Air Quality to Short-Lived Climate Forcer. Atmos. Chem. Phys. 2015, 15, 8889–8973. [Google Scholar] [CrossRef]
- Chen, Y.; Ma, Q.; Lin, W.; Xu, X.; Yao, J.; Gao, W. Measurement Report: Long-Term Variations in Carbon Monoxide at a Background Station in China’s Yangtze River Delta Region. Atmos. Chem. Phys. 2020, 20, 15969–15982. [Google Scholar] [CrossRef]








| CT_ID | Region (CCAA) | N Station ≤ 10 km | Included |
|---|---|---|---|
| CT_AS_PONTES | Galicia | 2 | Yes |
| CT_SABON | Galicia | 7 | Yes |
| CT_MEIRAMA | Galicia | 2 | Yes |
| CT_COMPOSTILLA | Castilla y León | 1 | Yes |
| CT_LA_ROBLA | Castilla y León | 2 | Yes |
| CT_VELILLA | Castilla y León | 1 | Yes |
| CT_SOTO_RIBERA | Asturias | 4 | Yes |
| CT_LA_PEREDA | Asturias | 1 | Yes |
| CT_LADA | Asturias | 4 | Yes |
| CT_ABONO | Asturias | 4 | Yes |
| CT_ANLLARES | Castilla y León | 0 * | No |
| CT_NARCEA | Asturias | 0 * | No |
| ID_MAPA | COD_LOCAL | Station Type | Area Type | CT_ID | DIST_CT_km | N_CT_10 km * |
|---|---|---|---|---|---|---|
| 1 | 15005011 | Industrial | Rural | CT_SABON | 2.31 | 1 |
| 2 | 15005012 | Industrial | Suburban | CT_SABON | 3.34 | 1 |
| 3 | 15041001 | Industrial | Rural | CT_SABON | 9.20 | 1 |
| 4 | 15030021 | Industrial | Urban | CT_SABON | 6.59 | 1 |
| 5 | 15030027 | Background | Suburban | CT_SABON | 9.30 | 1 |
| 6 | 15030028 | Industrial | Suburban | CT_SABON | 7.20 | 1 |
| 7 | 15030001 | Traffic | Urban | CT_SABON | 7.57 | 1 |
| 8 | 15059004 | Industrial | Rural | CT_MEIRAMA | 7.57 | 1 |
| 9 | 15024001 | Industrial | Suburban | CT_MEIRAMA | 5.34 | 1 |
| 10 | 15070010 | Industrial | Rural | CT_AS_PONTES | 4.54 | 1 |
| 11 | 15070002 | Industrial | Suburban | CT_AS_PONTES | 1.90 | 1 |
| 12 | 24115015 | Industrial | Suburban | CT_COMPOSTILLA | 7.75 | 1 |
| 13 | 24134007 | Industrial | Rural | CT_LA_ROBLA | 1.58 | 1 |
| 14 | 24134006 | Industrial | Suburban | CT_LA_ROBLA | 1.47 | 1 |
| 15 | 34080004 | Industrial | Urban | CT_VELILLA | 2.76 | 1 |
| 16 | 33044033 | Industrial | Suburban | CT_SOTO_RIBERA | 8.56 | 1 |
| 17 | 33044029 | Traffic | Urban | CT_SOTO_RIBERA | 5.13 | 1 |
| 18 | 33044030 | Traffic | Urban | CT_SOTO_RIBERA | 6.88 | 1 |
| 19 | 33044032 | Background | Urban | CT_SOTO_RIBERA | 6.69 | 1 |
| 20 | 33031032 | Background | Urban | CT_LADA | 2.20 | 1 |
| 21 | 33031030 | Industrial | Urban | CT_LADA | 0.71 | 1 |
| 22 | 33031029 | Industrial | Suburban | CT_LADA | 0.65 | 2 * |
| 23 | 33060003 | Background | Suburban | CT_LADA | 9.05 | 1 |
| 24 | 33037012 | Traffic | Urban | CT_LA_PEREDA | 3.29 | 2 * |
| 25 | 33024032 | Background | Suburban | CT_ABONO | 4.31 | 1 |
| 26 | 33024031 | Background | Urban | CT_ABONO | 5.86 | 1 |
| 27 | 33024027 | Traffic | Urban | CT_ABONO | 6.45 | 1 |
| 28 | 33024025 | Traffic | Urban | CT_ABONO | 4.76 | 1 |
| ID | Technique | Proposed Family | Use |
|---|---|---|---|
| 46 | Differential Optical/Optical Scattering | SCATTERING | PM (surrogate, not mass) |
| 47 | Oscillating Microbalance (TEOM) | TEOM → PM_MASS | PM mass |
| 49 | Beta Attenuation Monitor (BAM) | BAM → PM_MASS | PM mass |
| 50 | Gravimetry (filter) | GRAV → PM_MASS | PM mass |
| 54 | Nephelometry | SCATTERING | PM (surrogate, not mass) |
| M | Manual (gravim.) | GRAV → PM_MASS | PM mass |
| Pollutant | Averaging Period/Statistic | Plausible High Reference | Unit |
|---|---|---|---|
| PM10 | Annual p90.4 of daily mean (36th highest) | >75 | µg·m−3 |
| NO2 | Annual mean | >100 | µg·m−3 |
| PM2.5 | Annual mean | >30 | µg·m−3 |
| O3 | p93.2 of daily maximum 8 h mean | >160 | µg·m−3 |
| SO2 | Alert threshold (3 consecutive hours) | 500 | µg·m−3 |
| CO | Daily maximum 8 h running mean | >15 | mg·m−3 |
| Pollutant | Regulatory Reference (Statistic) | Threshold | Unit | Evaluable | Contextual Flag |
|---|---|---|---|---|---|
| PM10 | Daily limit value (24 h mean) | 50 | µg·m−3 | Yes | exceso_normativo_diario |
| SO2 | Daily limit value (24 h mean) | 125 | µg·m−3 | Yes | exceso_normativo_diario |
| NO2 | Annual limit value (annual mean) | 40 | µg·m−3 | No | normativa_no_evaluable_diario |
| NO2 | Hourly limit value (1 h) | 200 | µg·m−3 | No | normativa_no_evaluable_diario |
| O3 | Target value (daily maximum of 8 h running mean) | 120 | µg·m−3 | No * | normativa_no_evaluable_diario |
| O3 | Information threshold (1 h) | 180 | µg·m−3 | No | normativa_no_evaluable_diario |
| O3 | Alert threshold (1 h) | 240 | µg·m−3 | No | normativa_no_evaluable_diario |
| CO | Limit value (daily maximum of 8 h running mean) | 10 | mg·m−3 | No | normativa_no_evaluable_diario |
| SO2 | Alert threshold (3 h) | 500 | µg·m−3 | No * | normativa_no_evaluable_diario |
| NO2 | Alert threshold (3 h) | 400 | µg·m−3 | No | normativa_no_evaluable_diario |
| Priority | Minimum Trigger (Condition) | DECISION | Final Value | Robust Value |
|---|---|---|---|---|
| 1 | Negative or physically impossible value (xt < 0) | DROP_NAN | NaN | NaN |
| 2 | NOx inconsistency (Equation (28)) | DROP_NAN | NaN | NaN |
| 3 | PM inconsistency (if applicable *): (Equation (29)) | DROP_NAN | NaN | NaN |
| 4 | Hampel extreme with ∣zt∣ > 6 and any inconsistency (NOx or PM) | DROP_NAN | NaN | NaN |
| 5 | Hampel with (|zt| > 6) and no inconsistencies (NOx or PM) | KEEP_EXTREMO | valor_ winsor | valor_ winsor |
| 6 | IQR-only outlier: flag_IQR = True and ∣zt∣ ≤ 6 and no applicable inconsistencies (NOx or PM) | KEEP | VALOR | VALOR |
| 7 | Missing observation: VALOR is NaN (absence preserved) | KEEP (absence preserved) | NaN | NaN |
| 8 | All other cases (non-extreme, no inconsistencies) | KEEP | VALOR | VALOR |
| Block | Fields (Examples) | Operational Purpose |
|---|---|---|
| Outlier diagnostics | flag_IQR, z_Hampel, is_extremo_Hampel | Identify outlier candidates/extremes using robust criteria (global and local). |
| Internal coherence checks | incoherencia_NOx, incoherencia_PM, flag_incoherencia_PM_excepcion | Detect physico-chemical/hierarchical inconsistencies and document non-applicability due to instrumental comparability constraints. |
| Contextual tagging | exceso_normativo_diario, normativa_no_evaluable_diario, FLAG_PLAUS_ * | Tag regulatory context and external plausibility; not used as an automatic exclusion rule. |
| Final decision and output | decision_final, razon_final, valor_winsor, VALOR_robusto | Record the audited decision and the resulting value used in the analysis. |
| Pollutant | N_ Series | Ttal_ Days | Missing_ Days | %_Median_ Missing | %_P25_Missing | %_P75_ Missing | %_Max_ Missing | %_Weighted_ Missing |
|---|---|---|---|---|---|---|---|---|
| PM2.5 | 17 | 51,137 | 9423 | 5.45 | 4.23 | 32.24 | 70.55 | 18.43 |
| PM10 | 30 | 96,091 | 6105 | 4.72 | 3.68 | 8.46 | 53.77 | 6.35 |
| CO | 15 | 57,489 | 3275 | 4.27 | 3.32 | 5.71 | 17.79 | 5.70 |
| O3 | 21 | 84,590 | 4342 | 3.93 | 3.11 | 4.54 | 17.44 | 5.13 |
| NO2 | 28 | 103,843 | 4990 | 3.51 | 3.16 | 4.20 | 17.90 | 4.81 |
| SO2 | 27 | 101,318 | 4772 | 3.52 | 3.17 | 4.29 | 17.53 | 4.71 |
| NO | 28 | 100,803 | 4601 | 3.60 | 3.28 | 4.35 | 13.69 | 4.56 |
| NOx | 30 | 115,563 | 4870 | 3.52 | 2.95 | 4.19 | 14.20 | 4.21 |
| Pollutant | N Validation Pairs | MAE | RMSE | R2 | Bias | Overall Fit Quality |
|---|---|---|---|---|---|---|
| NOx | 5357 | 7.16 | 13.8 | 0.691 | −1.79 | Excellent (4) |
| SO2 | 5445 | 1.82 | 4.09 | 0.533 | −0.48 | Acceptable (2) |
| NO | 5410 | 2.62 | 6.38 | 0.614 | −0.59 | Good (3) |
| NO2 | 5074 | 3.68 | 5.56 | 0.691 | −0.69 | Excellent (4) |
| CO | 2806 | 0.05 | 0.09 | 0.756 | 0.00 | Excellent (4) |
| O3 | 4417 | 8.00 | 10.58 | 0.656 | −0.67 | Excellent (4) |
| PM10 | 4904 | 4.25 | 6.46 | 0.601 | −0.76 | Good (3) |
| PM2.5 | 1760 | 2.36 | 3.48 | 0.605 | −0.37 | Good (3) |
| Pollutant | ABN | ASP | COM | LAD | MEI | PER | ROB | SAB | SOR | VEL |
|---|---|---|---|---|---|---|---|---|---|---|
| CO | 0.66 | 0.02 | — | 0.30 | — | 0.18 | — | 8.57 | 0.17 | — |
| N_out | 52 | 1 | — | 33 | — | 2 | — | 1439 | 30 | — |
| N_obs | 7942 | 6028 | — | 10,957 | — | 1096 | — | 167,99 | 17,864 | — |
| NO | 3.88 | 1.20 | 5.61 | 3.91 | 2.34 | 3.79 | 1.49 | 4.24 | 3.14 | 1.40 |
| N_out | 492 | 123 | 164 | 642 | 137 | 249 | 49 | 1478 | 561 | 46 |
| N_obs | 12,690 | 10,225 | 2922 | 16,435 | 5844 | 6574 | 3287 | 34,846 | 17,895 | 3287 |
| NO2 | 0.06 | 1.10 | 1.10 | 0.26 | 0.29 | 0.03 | 0.55 | 0.39 | 0.13 | 0.37 |
| N_out | 6 | 116 | 32 | 42 | 17 | 2 | 22 | 145 | 24 | 12 |
| N_obs | 9495 | 10,591 | 2922 | 164,35 | 5844 | 6574 | 4017 | 37,283 | 17,895 | 3287 |
| NOx | 1.68 | 1.31 | 1.86 | 1.43 | 0.60 | 1.26 | 1.19 | 1.27 | 1.65 | 1.13 |
| N_out | 213 | 133 | 52 | 235 | 35 | 83 | 39 | 438 | 296 | 33 |
| N_obs | 12,690 | 10,169 | 2802 | 16,435 | 5844 | 6574 | 3287 | 34,540 | 17,895 | 2922 |
| O3 | 0.00 | 0.02 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| N_out | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| N_obs | 9769 | 6574 | 2922 | 16,405 | 3287 | 6544 | 3287 | 23,281 | 17,895 | 3287 |
| PM10 | 0.55 | 2.57 | 1.40 | 0.50 | 1.33 | 0.34 | 1.52 | 0.73 | 0.27 | 0.78 |
| N_out | 70 | 93 | 41 | 76 | 34 | 22 | 63 | 241 | 49 | 26 |
| N_obs | 12,690 | 3622 | 2922 | 15,280 | 2557 | 6574 | 4139 | 32,846 | 17,895 | 3318 |
| PM2.5 | 0.12 | 1.54 | — | 0.54 | — | — | — | 0.50 | 0.28 | — |
| N_out | 11 | 91 | — | 29 | — | — | — | 47 | 17 | — |
| N_obs | 9495 | 5905 | — | 5419 | — | — | — | 9343 | 6150 | — |
| SO2 | 0.82 | 5.51 | 3.63 | 0.25 | 2.76 | 0.37 | 0.27 | 3.41 | 2.07 | 0.40 |
| N_out | 80 | 584 | 106 | 41 | 161 | 24 | 11 | 1276 | 371 | 13 |
| N_obs | 9769 | 10,591 | 2922 | 16,435 | 5844 | 6574 | 4017 | 37,463 | 17,895 | 3259 |
| Pollutant | ABN | ASP | COM | LAD | MEI | PER | ROB | SAB | SOR | VEL |
|---|---|---|---|---|---|---|---|---|---|---|
| CO | 0.31 | 0.50 | — | 0.52 | — | 0.09 | — | 0.99 | 0.37 | — |
| N_out | 27 | 30 | — | 67 | — | 1 | — | 185 | 68 | — |
| N_obs | 7942 | 6028 | — | 10,957 | — | 1096 | — | 16,799 | 17,864 | — |
| NO | 1.82 | 1.43 | 1.03 | 1.05 | 1.14 | 0.94 | 1.73 | 3.75 | 1.17 | 0.24 |
| N_out | 233 | 143 | 30 | 206 | 68 | 62 | 57 | 1328 | 224 | 8 |
| N_obs | 12,690 | 10,225 | 2922 | 19,599 | 5844 | 6574 | 3287 | 34,846 | 17,895 | 3287 |
| NO2 | 0.09 | 1.12 | 0.10 | 0.18 | 0.15 | 0.03 | 0.24 | 0.20 | 0.09 | 0.03 |
| N_out | 9 | 45 | 3 | 34 | 9 | 2 | 9 | 82 | 14 | 1 |
| N_obs | 9495 | 4017 | 2922 | 19,599 | 5844 | 6574 | 4017 | 37,283 | 17,895 | 3287 |
| NOx | 0.21 | 0.47 | 0.14 | 0.39 | 0.13 | 0.30 | 0.12 | 0.54 | 0.26 | 0.07 |
| N_out | 27 | 42 | 4 | 75 | 8 | 20 | 4 | 211 | 49 | 2 |
| N_obs | 12,690 | 10,169 | 2802 | 19,568 | 5844 | 6574 | 3287 | 34,540 | 17,895 | 2922 |
| O3 | 0.00 | 0.00 | 0.07 | 0.05 | 0.03 | 0.00 | 0.03 | 0.04 | 0.05 | 0.03 |
| N_out | 0 | 0 | 2 | 10 | 1 | 0 | 1 | 10 | 7 | 1 |
| N_obs | 9769 | 6574 | 2922 | 19,569 | 3287 | 6544 | 3287 | 23,281 | 17,895 | 3287 |
| PM10 | 0.39 | 0.72 | 1.06 | 0.46 | 0.74 | 0.59 | 0.44 | 0.47 | 0.28 | 1.39 |
| N_out | 49 | 26 | 31 | 87 | 19 | 39 | 26 | 152 | 46 | 46 |
| N_obs | 12,690 | 3622 | 2922 | 18,291 | 2557 | 6574 | 4139 | 32,846 | 17,895 | 3318 |
| PM2.5 | 0.18 | 1.35 | — | 0.59 | — | — | — | 0.78 | 0.16 | — |
| N_out | 17 | 80 | — | 51 | — | — | — | 49 | 11 | — |
| N_obs | 9495 | 5905 | — | 8430 | — | — | — | 9343 | 6150 | — |
| SO2 | 1.01 | 2.11 | 1.20 | 0.83 | 3.28 | 0.70 | 0.87 | 4.13 | 0.82 | 0.15 |
| N_out | 99 | 243 | 35 | 162 | 186 | 46 | 40 | 1530 | 159 | 5 |
| N_obs | 9769 | 10,591 | 2922 | 19,599 | 5844 | 6574 | 4017 | 37,463 | 17,895 | 3259 |
| Pollutant | N (Daily Records) | KEEP_EXTREMO (%) | DROP_NAN (%) |
|---|---|---|---|
| CO | 23,955 | 0.6178 | 0.0918 |
| NO | 46,258 | 2.0386 | 0.3394 |
| NO2 | 43,742 | 0.1943 | 0.3566 |
| NOx | 45,878 | 0.3945 | 0.3422 |
| O3 | 38,016 | 0.0263 | 0.4051 |
| PM10 | 41,343 | 0.4112 | 0.3096 |
| PM2.5 | 15,504 | 0.4128 | 0.2838 |
| SO2 | 46,588 | 2.022 | 0.3349 |
| Pollutant | N | KEEP_ EXTREMO_N | KEEP_ EXTREMO_pct | DROP_ NAN_N | DROP_ NAN_pct | KEEP _N | KEEP _pct |
|---|---|---|---|---|---|---|---|
| PM10 | 3318 | 42 | 1.27 | 0 | 0 | 3276 | 98.73 |
| NOx | 2922 | 3 | 0.10 | 0 | 0 | 2919 | 99.90 |
| NO2 | 3287 | 1 | 0.03 | 0 | 0 | 3286 | 99.97 |
| NO | 3287 | 5 | 0.15 | 0 | 0 | 3282 | 99.85 |
| O3 | 3287 | 0 | 0.00 | 0 | 0 | 3287 | 100.00 |
| SO2 | 3259 | 6 | 0.18 | 0 | 0 | 3253 | 99.82 |
| PUNTO_MUESTREO | Date | VALOR | VALOR_robusto | flag_PM10_gt_75 | Decision |
|---|---|---|---|---|---|
| 34080004_10_49 | 23 February 2017 | 121 | 44.6868 | TRUE | KEEP_EXTREMO |
| 34080004_10_49 | 27 February 2020 | 85 | 28.7912 | TRUE | KEEP_EXTREMO |
| 34080004_10_49 | 15 March 2022 | 404 | 74.711 | TRUE | KEEP_EXTREMO |
| 34080004_10_49 | 16 March 2022 | 156 | 58.478 | TRUE | KEEP_EXTREMO |
| 34080004_10_49 | 5 October 2022 | 81 | 50.5824 | TRUE | KEEP_EXTREMO |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Fernández Palomares, N.; Álvarez de Prado, L.; Menéndez García, L.A.; Fernández López, D.; Buján, S.; Bernardo Sánchez, A. A Reproducible QA/QC, Imputation and Robust-Series Workflow for Air-Quality Monitoring Time Series. Appl. Sci. 2026, 16, 3396. https://doi.org/10.3390/app16073396
Fernández Palomares N, Álvarez de Prado L, Menéndez García LA, Fernández López D, Buján S, Bernardo Sánchez A. A Reproducible QA/QC, Imputation and Robust-Series Workflow for Air-Quality Monitoring Time Series. Applied Sciences. 2026; 16(7):3396. https://doi.org/10.3390/app16073396
Chicago/Turabian StyleFernández Palomares, Nuria, Laura Álvarez de Prado, Luis Alfonso Menéndez García, David Fernández López, Sandra Buján, and Antonio Bernardo Sánchez. 2026. "A Reproducible QA/QC, Imputation and Robust-Series Workflow for Air-Quality Monitoring Time Series" Applied Sciences 16, no. 7: 3396. https://doi.org/10.3390/app16073396
APA StyleFernández Palomares, N., Álvarez de Prado, L., Menéndez García, L. A., Fernández López, D., Buján, S., & Bernardo Sánchez, A. (2026). A Reproducible QA/QC, Imputation and Robust-Series Workflow for Air-Quality Monitoring Time Series. Applied Sciences, 16(7), 3396. https://doi.org/10.3390/app16073396

