1. Introduction
Air quality management, the implementation of renewable energy, and digital innovation are becoming key areas of focus for political leaders, scientists, and the industrial sector worldwide. Air pollution is one of the most pressing threats to global public health. According to the World Health Organization, atmospheric pollution is responsible for over 7 million deaths each year across the globe. Harmful substances released into the environment place a burden on healthcare systems while also affecting quality of life and economic productivity [
1,
2,
3,
4].
A particular threat is posed by fine particulate matter (PM
2.5 and PM
10), which can penetrate the lungs and bloodstream, causing inflammation, as well as diseases of the lungs, heart, and nervous system. Other pollutants, such as nitrogen dioxide (NO
2), sulfur dioxide (SO
2), and carbon monoxide (CO), also have harmful effects on human health, contributing to the development of chronic illnesses, exacerbating asthma symptoms, and even leading to premature deaths. According to a report, in Poland, over 50% of PM
2.5 and PM
10 emissions originate from the combustion of coal and wood in households, with pollutants being released close to ground level [
5].
According to the 2021 report by the European Environment Agency (EEA) [
6], in 27 European countries, the estimated number of premature deaths due to PM
2.5 concentrations was approximately 307,000 cases in 2019, and 40,400 cases were attributed to NO
2 concentrations. The total number of potential years of life lost (PYLL) in 2019 due to PM
2.5 exposure in these countries was 4,068,000 years, with 512,800 years due to NO
2 exposure.
From an ecological perspective, polluted air also leads to soil acidification, the eutrophication of water bodies, and the deterioration of living conditions for many plant and animal species, as has been discussed in numerous scientific studies [
7].
In light of current reports from the literature, the issue of air pollution is becoming increasingly severe in urban areas and river valleys, where unfavorable meteorological conditions, such as temperature inversions, promote the accumulation of pollutants near ground level. Additionally, the emission of greenhouse gases and particulate matter directly contributes to climate change through the greenhouse effect and by impacting the atmospheric radiation balance.
Around the world, interdisciplinary efforts are being undertaken to model and reduce air pollution through the use of new technologies and advanced analytical methods. Examples of these initiatives are described below.
China is at the forefront of integrated environmental planning. Yuan et al. [
8] developed simulation-based assessment frameworks to evaluate the combined effects of energy transition and air quality policies on pollutant emissions. Their analysis demonstrated that optimizing the energy structure by increasing the share of renewable energy sources significantly reduces CO
2 and PM
2.5 emissions.
Another country facing intense urban and industrial pollution is India. In order to assess current air quality and forecast pollution levels, researchers used machine learning techniques, such as random forest and support vector machines (SVMs), to predict concentrations of PM
2.5, NO
2, and SO
2 across urban–rural gradients [
9]. The results can assist the scientific community and policymakers in understanding the distribution of air pollution and developing strategies for pollution reduction and air quality improvement in the studied region.
In Germany and the Netherlands, the CLINSH (Clean Inland Shipping) project focused on reducing air pollution emissions from inland shipping and improving the air quality in port and urban areas [
10]. Co-funded by the EU LIFE program, the project tested emission reduction technologies, such as selective catalytic reduction (SCR) systems, particulate filters, hybrid drives, and alternative fuels (e.g., LNG and GTL), on 30 selected vessels operating on the Rhine, the Meuse, and port canals. The project included long-term monitoring of nitrogen oxide (NO
x) and particulate matter (PM) emissions, both during voyages and while docked in ports (e.g., in Duisburg, Rotterdam, and Antwerp). The results showed that implementing low-emission technologies led to an average NO
x reduction of 25% and PM reductions of 69%. CLINSH also developed analytical tools and decision-making models to support local authorities and operators in evaluating the effectiveness of environmental investments and planning actions for decarbonizing waterborne transport.
As the literature and project studies, e.g., the CLINSH project, show, there is a growing need for IoT systems in urban air quality monitoring, especially in environmental risk prediction and management. In the era of digitization and the increasing importance of sustainable development policies, the Internet of Things (IoT) plays an increasingly important role in modern environmental monitoring. In particular, such research is being conducted in the field of air quality, where the IoT enables not only the creation of dense, decentralized sensor networks capable of collecting data with high temporal resolution but also transmitting it in real time to the cloud, which is very important from the point of view of calculations and the current status in this case of air pollution. This approach makes it possible to create predictive models and early warning systems for residents and, for example, local or municipal authorities. For instance, [
11] presented mobile smartphone-based measurement systems for real-time pollution measurements. A paper [
12] reviewed low-cost sensor applications in the context of exposure assessments and environmental policy, highlighting their usefulness as a complement to reference stations. In contrast, a study [
13] provided an end-user perspective, pointing to the need for better communication and data visualization.
In the context of air quality prediction, the authors of [
14,
15] applied LSTM models and hybrid IoT–AI systems to predict PM
2.5 and NO
2 levels effectively. The results of these studies indicate the potential of IoT systems as a pillar of future adaptive air quality management systems, provided that they are correctly calibrated, integrated with artificial intelligence models, and validated under varying field conditions.
As some studies in East Asian cities show [
16], low-cost IoT sensors deployed on urban infrastructure can detect short-term smog episodes and spatial variation in pollutant emissions. The authors of [
17] emphasize that “the paradigm of air pollution monitoring is shifting”, with the IoT enabling a shift from point-based fixed measurements to a distributed, adaptive measurement system. In addition, IoT sensor data are being used as inputs in artificial intelligence models. On the other hand, research being conducted in India and China, where random forest, SVM, and LSTM algorithms have been used to predict PM
2.5, NO
2, and SO
2 levels, allows for predicting emergencies and supporting environmental policy [
18].
Despite the abundance of the literature in this area, there are still gaps in the implementation of the IoT at the urban scale, some of which are listed below:
Insufficient validation of data from low-cost sensors against reference networks;
A small number of studies conducted in mid-sized cities in Central and Eastern Europe;
Lack of integration between sensor data and advanced statistical analysis (non-parametric tests and clustering) in the local context;
Limited documentation of the long-term stability of autonomous (solar-powered) systems.
Therefore, the authors decided to address the above needs and conduct a study that includes the following:
Annual monitoring of air quality using a calibrated autonomous IoT system;
The use of non-parametric statistics (Kruskal–Wallis test and k-means cluster analysis);
Located in the city of Ruse (Bulgaria), which represents a typical urbanized space in the Southeast European region;
The evaluation of seasonal and daily pollution profiles in the context of meteorological conditions.
The authors formulated the following research questions:
Do the levels of air pollutants (PM, NO2, CO, and SO2) show statistically significant variations with regard to the time of day, day of the week, and season?
Does the IoT system provide data that are comparable in consistency and sensitivity to data from classical monitoring networks?
Is it possible to identify reproducible pollution profiles using unsupervised clustering?
Based on this, we made the following hypotheses:
The temporal variability of pollution levels at the study site is statistically significant;
Data from a single IoT sensor, with proper placement and calibration, are sufficient to identify typical air quality scenarios;
Meteorological factors (temperature, humidity, and pressure) significantly correlate with PM and gas concentrations, allowing the data to be grouped into characteristic clusters.
For the purpose of our own research, a national initiative has been implemented in Bulgaria through the University of Ruse, “Angel Kanchev”. As part of the Bulgarian National Recovery and Resilience Plan (project BG-RRP-2.013-0001), co-financed by the European Union under the NextGenerationEU mechanism, the university deployed a high-resolution, autonomous air quality monitoring platform. The system, based on Libelium’s portable sensing technology, was installed near Rodina Boulevard in Ruse and operated continuously from 1 March 2024 to 30 March 2025. It recorded 15 min measurements of CO, CO2, NO2, SO2, PM1, PM2.5, PM10, temperature, humidity, and pressure. The platform is powered by a solar panel, supported by a rechargeable battery, and transmits data via GSM to the Libelium Cloud for visualization and analysis.
Measurement locations in the City of Ruse were selected due to their location in the Danube Valley, which is conducive to temperature inversions and the accumulation of pollutants, their proximity to major traffic arteries and residential areas, which reflects the residents’ exposure, and the existing facilities of the technical university, which allowed for the stable installation and maintenance of the platform for a year. Alternative locations, such as industrial zones or rural areas, do not reflect the typical urban exposure experienced by most metropolitan residents. In turn, selecting multiple measurement points would significantly increase operational costs and require additional personnel and complex logistics.
The measurement platform was built using modern IoT sensors, which allowed us to reduce the cost of its implementation significantly. To better illustrate the differences between our platform and traditional reference stations, the following summarizes the main features of both measurement systems.
Table 1 considers the cost, accuracy, mobility, frequency of measurements, and data system integration capabilities.
The above comparison shows that classical reference stations provide the highest measurement accuracy, but their use is limited due to high costs, low mobility, and a limited number of locations. IoT sensors, on the other hand, despite their lower precision, offer exceptional flexibility, low deployment costs, and the ability to build a dense, distributed measurement network.
In practice, the best results are achieved through the complementary use of the two systems, where reference stations act as calibration and validation points. At the same time, IoT sensors enable the tracking of pollution variability at the micro-environmental scale. This approach supports constructing modern, adaptive air quality management systems based on real data and predictive analysis.
Including an Internet of Things (IoT) component in air quality research is an essential step toward a more individualized and spatially sensitive approach to monitoring the urban environment. As shown in
Section 1, these solutions enable more accurate mapping of temporal and spatial phenomena and offer the potential for integration with modern data analysis and artificial intelligence tools.
In this article, starting from a review of the existing literature and identified research gaps, the authors focus on implementing an autonomous IoT platform in a central European medium-sized city. Particular emphasis is placed on evaluating the effectiveness of a single measurement point, analyzing temporal variability, and using non-parametric and exploratory methods (Kruskal–Wallis test and k-means clustering) to identify pollution patterns.
The following chapters present the detailed measurement methodology, the analytical tools used, and the interpretation of the results about the hypotheses and current decision-making needs in environmental and urban policy.
4. Discussion
The results obtained from one-year high-frequency air quality monitoring in the city of Ruse provide valuable insights into the spatiotemporal dynamics of pollutant concentrations in urban environments. The autonomous measurement system implemented under the National Recovery and Resilience Plan of Bulgaria (project BG-RRP-2.013-0001), co-financed by the European Union through the NextGenerationEU initiative, demonstrates the feasibility and utility of solar-powered, IoT-enabled platforms for environmental data acquisition and analysis.
The descriptive statistics and correlation analyses confirm expected patterns of pollutant variability and interdependence. Very strong positive correlations were observed among particulate matter fractions (e.g., PM
2.5 and PM
10), aligning with findings from previous studies conducted in urban areas in Central and Eastern Europe [
1,
3]. These results suggest a commonality in emission sources, such as residential solid fuel combustion and traffic-related particulate emissions, consistent with Polish national inventories [
5]. The moderate to strong correlations between gaseous pollutants (e.g., CO–NO
2 and CO–SO
2) and the inverse correlation of temperature with PM concentrations support the hypothesis that meteorological factors modulate pollutant dispersion and accumulation, especially during cold seasons when inversion layers often trap pollutants near the ground [
6,
7,
22].
The use of non-parametric statistical methods, such as the Kruskal–Wallis test and Dunn–Sidak post hoc comparisons, allowed for a robust assessment of temporal heterogeneity in pollutant levels. Similar methodologies have been applied successfully in studies from Italy [
23], India [
24], and Romania [
25], validating their relevance in the context of non-normally distributed environmental data. The identification of statistically significant differences in PM
10 and PM
2.5 concentrations across days of the week and parts of the day reflects both human activity rhythms and atmospheric boundary layer dynamics. The highest concentrations were typically recorded in the early morning and evening hours, consistent with periods of increased vehicular traffic and lower atmospheric mixing [
26,
27].
The k-means clustering analysis yielded four distinct pollution profiles, which we interpret as follows. Cluster 1 represents a typical winter scenario with low temperatures and elevated PM concentrations due to residential heating; Cluster 2 corresponds to summer conditions with higher temperatures and improved air dispersion; Cluster 3 reflects episodic NO
2 spikes, likely due to traffic congestion or local combustion events; and Cluster 4 captures extreme pollution episodes, possibly linked to stagnation events or industrial releases. These findings are consistent with previous cluster-based approaches used in cities such as Delhi [
28] and Beijing [
29], highlighting the potential of unsupervised learning methods in classifying urban air quality patterns.
From a methodological standpoint, the integration of a continuously operating sensor network with cloud-based storage and GSM transmission proved to be highly effective. This approach enables real-time access to high-resolution environmental data, which is essential for developing early warning systems, informing public health interventions, and evaluating policy outcomes. Similar sensor networks have been tested in the CLINSH project in Western Europe [
10] and in smart city pilot programs in East Asia [
30], underscoring the growing importance of decentralized, scalable monitoring infrastructure.
Moreover, the reliability of the Libelium-based measurement system was maintained throughout the entire observation period, with a stable energy supply from the solar panel and efficient data transfer. This supports prior evaluations of autonomous monitoring platforms in low-maintenance deployments [
17].
This study contributes to the growing body of literature advocating for adaptive, locally grounded environmental monitoring frameworks in support of regional air quality management. While the current analysis focuses on temporal and meteorological drivers of pollutant variability, future work will integrate emissions inventory data and machine learning techniques to enhance predictive modeling and scenario simulations.
Descriptive statistics revealed episodic pollution peaks for CO, NO2, SO2, and particulate matter, while Spearman correlations highlighted strong interdependencies among pollutant species and inverse relationships with temperature and humidity. Non-parametric tests confirmed that median pollutant levels vary significantly by the day of the week and time of day, reflecting socio-economic rhythms (e.g., traffic and industrial operations). Clustering key pollutants (PM10, PM2.5, and CO) with meteorological variables yielded four distinct profiles, including a “winter-like” cluster, a “summer-like” clean air cluster, an NO2 spike cluster, and a high-pollution inversion cluster. These findings underscore the need for adaptive monitoring and targeted emission reduction measures tied to specific temporal and meteorological scenarios.
5. Conclusions
This study demonstrated the effectiveness of a fully autonomous, solar-powered air quality monitoring system implemented by the University of Ruse, “Angel Kanchev”, as part of the Bulgarian National Recovery and Resilience Plan. The use of Libelium’s modular platform enabled the continuous measurement of key air pollutants (CO, CO2, NO2, SO2, PM1, PM2.5, and PM10) and meteorological variables (temperature, humidity, and pressure) at high temporal resolution over a 12-month period. The system’s integration of GSM data transmission and cloud-based storage proved to be robust, reliable, and suited for long-term urban deployment.
From a data analysis perspective, the following conclusions can be drawn:
Pollutant distributions exhibit high asymmetry and heavy-tailed behavior, particularly for PM10, where extreme values point to episodic pollution events associated with unfavorable atmospheric conditions;
Spearman rank correlations revealed strong interdependencies among particulate fractions and significant negative associations with temperature, confirming the role of meteorological variables in pollutant dynamics;
Kruskal–Wallis and Dunn–Sidak tests identified statistically significant differences in pollutant levels by the day of the week and time of day, reflecting human activity cycles and boundary layer effects;
Unsupervised clustering (k-means) yielded four distinct pollution profiles: cold season accumulation, clean air summer patterns, localized NO2 peaks, and high-pollution inversion episodes;
These distinct clusters, together with the system’s uninterrupted year-long operation, demonstrate that measurements from a single, properly placed, and calibrated IoT sensor are sufficient to identify and monitor typical urban air quality scenarios.
These findings not only confirm the reliability and precision of the deployed monitoring infrastructure but also illustrate the added value of integrating environmental sensing with non-parametric statistics and machine learning approaches. The system’s portability, autonomy, and real-time data access make it an attractive model for other urban regions seeking to expand air quality surveillance capabilities.
Future research will focus on the predictive modeling of air quality based on meteorological inputs and human activity proxies. Time-series decomposition (e.g., STL and ARIMA), multivariate regression, and deep learning methods such as LSTM will be explored to forecast pollutant concentrations and identify leading indicators of acute air quality deterioration. Integration with traffic flow and energy usage datasets may further enhance the interpretability of pollution patterns and guide targeted mitigation strategies in real time.
Limitations and Generalisability
Despite the strengths of our year-long, high-frequency monitoring campaign, several limitations should be noted. First, all measurements derive from a single sensor node located on the university campus; micro-scale factors (local traffic density, building-induced turbulence, and street canyon effects) may therefore cause biases regarding absolute pollutant levels and temporal patterns. Second, although the Libelium platform was factory-calibrated and proven to be stable over 12 months, the lack of concurrent reference station collocation means that potential sensor drift or cross-sensitivities cannot be fully excluded. Third, the dataset spans only one annual cycle. Inter-annual variability (e.g., unusually cold winters, extended heatwaves, and episodic industrial emissions) may yield slightly different cluster structures or temporal contrasts. Fourth, our meteorological dataset omitted wind speed, wind direction, and solar radiation, all of which can strongly influence pollutant dispersion and secondary aerosol formation; inclusion of these parameters could refine cluster definitions and improve predictive models. With regard to generalizability, the core methodology—deployment of a single, solar-powered IoT node coupled with non-parametric tests and k-means clustering—can readily be adopted in other mid-sized European cities exhibiting similar continental climates and diurnal traffic cycles. However, local topography, emission source profiles, and urban morphology will affect the number and character of identifiable pollution clusters. Any replication should therefore include initial site-specific validation (ideally via short-term co-location with reference instruments), as well as the adaptation of clustering features to local emission sources and meteorological regimes. In sum, while our results convincingly demonstrate the potential of a lone IoT sensor for capturing dominant urban air quality scenarios, extension to broader spatial networks and longer observation periods will be essential to fully characterize city-wide and year-to-year variability.